AI Engine lab experiment for paper Deep Reinforcement Learning for Join Order Enumeration
-
download pycharm free version on official website.
-
wsl2 are not fully support GUI, but we can follow this article to set up a usable version: gui set up
-
once you can luanch X-server, you are able to excute pycharm.sh to open the IDE in your ubuntu server
After pycharm being successfully set up, you can git clone this repo to your local.(use venv in pycharm or use anaconda, for the project environment) (ps: feel free to use Pycharm in Windows as alternative)
numpy
psycopg2 (psycopg2-binary alternative)
tensorflow ==1.15.0
tensorforce==0.4.3
matplotlib
Python=3.6
mo-future=3.147
moz-sql-parser==4.18
Pillow = 8.1.0
1> download postgres into ubuntu system follow this: part1
2> access to part2
a> sudo service postgresql status //to check status
b> sudo service postgresql start // start server
c> sudo service postgresql status //double check status is 'online' now
1> git clone this repo to your local ubuntu system
(you may need to set up ssh key in your local linux system to use your github account, one reference: ssh key generation)
2> access to this page to follow section 'Step-by-step instructions'
a> do 1>download *gz files
b> skip 2>, directly go to https://github.com/alberanid/imdbpy, follow 'Installation' section
It may be easier to git clone this repo instead of pip installing it.
c> before 3>, create a postgres account, follow https://dev.to/ohaleks/set-up-wsl2-postgresql-and-phoenix-liveview-on-windows-3ol5, use: sudo -u postgres createuser <your_username>
make sure to run the command in the directory that postgres has access to like /tmp
(ps: need to lift the access ability, can be done later, use postgres: sudo -u postgres createdb imdbload)
d> before 4>, go to https://dev.to/ohaleks/set-up-wsl2-postgresql-and-phoenix-liveview-on-windows-3ol5,
follow from: sudo -u postgres psql.
after give privileges, use \l to make sure imdbload is there.
e 4> python3 imdbpy2sql.py -d PATH_TO_GZ_FILES -u postgres://username:password@hostname/imdbload
ps:
1>here change PATH_TO_GZ_FILES to the absolute path of your downloaded gz files
2>change username and password to your own
3>change hostname to 'localhost'
4>imdbpy2sql.py is found under the bin directory
5>if this gives an error, try replacing postgres with postgresql like this: python3 imdbpy2sql.py -d PATH_TO_GZ_FILES -u postgresql://username:password@hostname/imdbload
The last step will take a loooong time, wait for processing... after this, download join order benchmark database part is finished.
1> Manually run two sql query 'schema.sql' and 'fkindexes.sql' (in the project, under jo-benchmark/)
a> use 'sudo -u postgres psql' to start postgres cli
b> in postgres, switch to imdbload: \c imdbload
c> use \dt to make sure you have datasets in it.
d> open schema.sql, copy and paste to excute the sql sentences in postgres(your linux, imdbload environment)
* All of those sql sentences in this file should already been done by previous steps, we only need add 'movie_info_idx':
'CREATE TABLE movie_info_idx (
id integer NOT NULL PRIMARY KEY,
movie_id integer NOT NULL,
info_type_id integer NOT NULL,
info text NOT NULL,
note text
);'
e> open fkindexes.sql, excute the sql sentences in postgres
2> run queries2db.py(under porject directory) to get table 'queries'
a> change configuration in /home/PycharmProjects/ReJOIN/config/database.py to your own account
b> at line 9 of the script, change '-U postgres' to your own user name, such as '-U terryguo'
c> Suggestion: run it using terminal, it may ask you type in password of the user at the end
d> in postgresql you may need to assgin superuser to your own account: ALTER USER your_user WITH SUPERUSER;
some hints for postgres:
to change user: set role <username>;
to delete a table: drop table table_name;
> note: if you can get stuck for a long time trying to drop a table, try to follow the answer from here: https://dba.stackexchange.com/questions/217593/drop-table-taking-too-long
e> if you get an error with importing moz_sql_parser, try to install version 4.7.21002 of the package: pip install moz_sql_parser==4.7.21002
All experiments are here:
Some example:
- Train target group 4 for 200 episodes
sudo python3 main.py -e 200 -g 1 -tg 4 -se 100 -s ./saved_model/group4-200/
(PS:To run the main.py for training if the error is related to installing .dll files, installing the Microsoft Visual C++ 2015 Redistributable Update 3 (x64 version) should fix it. link:https://www.microsoft.com/en-us/download/details.aspx?id=53587 )
Now the plots are in ./outputs folder (default) and the model in ./saved_model/
-
Restore saved model and test group 4
sudo python3 main.py -e 3 -g 1 -tg 4 -r ./saved_model/group4-200/ --testing -o ./outputs/testing/
-
Restore saved model and keep training on group 5 for 500 episodes
sudo python3 main.py -e 200 -g 1 -tg 5 -se 100 -r ./saved_model/group4-200/ -s ./saved_model/group5-500/
-
Execute a single query
python main.py --query 3a --episodes 150
follow https://github.com/gregrahn/tpch-kit
follow: https://ankane.org/tpc-h, before 'Generate queries'
https://github.com/haochending/benchmarks/tree/master/tpc-ds-2.4/openLookeng