Skip to content

GUOXINTONG/rejoin

 
 

Repository files navigation

rejoin

AI Engine lab experiment for paper Deep Reinforcement Learning for Join Order Enumeration

Pre-requieste

Pycharm set up

  • download pycharm free version on official website.

  • wsl2 are not fully support GUI, but we can follow this article to set up a usable version: gui set up

  • once you can luanch X-server, you are able to excute pycharm.sh to open the IDE in your ubuntu server

After pycharm being successfully set up, you can git clone this repo to your local.(use venv in pycharm or use anaconda, for the project environment) (ps: feel free to use Pycharm in Windows as alternative)

Library dependencis attention:

numpy

psycopg2 (psycopg2-binary alternative)

tensorflow ==1.15.0

tensorforce==0.4.3

matplotlib

Python=3.6

mo-future=3.147

moz-sql-parser==4.18

Pillow = 8.1.0

Postgres

1> download postgres into ubuntu system follow this: part1

2> access to part2

a> sudo service postgresql status   //to check status

b> sudo service postgresql start    // start server

c> sudo service postgresql status   //double check status is 'online' now

Download join order benchmark, datbase part

1> git clone this repo to your local ubuntu system

(you may need to set up ssh key in your local linux system to use your github account, one reference: ssh key generation)

2> access to this page to follow section 'Step-by-step instructions'

a> do 1>download *gz files

b> skip 2>, directly go to https://github.com/alberanid/imdbpy, follow 'Installation' section
    It may be easier to git clone this repo instead of pip installing it.    

c> before 3>, create a postgres account, follow https://dev.to/ohaleks/set-up-wsl2-postgresql-and-phoenix-liveview-on-windows-3ol5, use: sudo -u postgres createuser <your_username>  
make sure to run the command in the directory that postgres has access to like /tmp
(ps: need to lift the access ability, can be done later, use postgres: sudo -u postgres createdb imdbload)

d> before 4>, go to https://dev.to/ohaleks/set-up-wsl2-postgresql-and-phoenix-liveview-on-windows-3ol5, 
    follow from: sudo -u postgres psql.
    after give privileges, use \l to make sure imdbload is there.

e 4> python3 imdbpy2sql.py -d PATH_TO_GZ_FILES -u postgres://username:password@hostname/imdbload
    ps:
        1>here change PATH_TO_GZ_FILES to the absolute path of your downloaded gz files
        2>change username and password to your own 
        3>change hostname to 'localhost'
        4>imdbpy2sql.py is found under the bin directory
        5>if this gives an error, try replacing postgres with postgresql like this: python3 imdbpy2sql.py -d PATH_TO_GZ_FILES -u postgresql://username:password@hostname/imdbload

The last step will take a loooong time, wait for processing... after this, download join order benchmark database part is finished.

Create table for join order benchmark data set

1> Manually run two sql query 'schema.sql' and 'fkindexes.sql' (in the project, under jo-benchmark/)

a> use 'sudo -u postgres psql' to start postgres cli

b> in postgres, switch to imdbload: \c imdbload

c> use \dt to make sure you have datasets in it.

d> open schema.sql, copy and paste to excute the sql sentences in postgres(your linux, imdbload environment)
* All of those sql sentences in this file should already been done by previous steps, we only need add 'movie_info_idx':
'CREATE TABLE movie_info_idx (
id integer NOT NULL PRIMARY KEY,
movie_id integer NOT NULL,
info_type_id integer NOT NULL,
info text NOT NULL,
note text
);'

e> open fkindexes.sql, excute the sql sentences in postgres

2> run queries2db.py(under porject directory) to get table 'queries'

a> change configuration in /home/PycharmProjects/ReJOIN/config/database.py to your own account
b> at line 9 of the script, change '-U postgres' to your own user name, such as '-U terryguo'
c> Suggestion: run it using terminal, it may ask you type in password of the user at the end
d> in postgresql you may need to assgin superuser to your own account: ALTER USER your_user WITH SUPERUSER;
some hints for postgres:
  to change user: set role <username>;
  to delete a table: drop table table_name;
    > note: if you can get stuck for a long time trying to drop a table, try to follow the answer from here: https://dba.stackexchange.com/questions/217593/drop-table-taking-too-long 
e> if you get an error with importing moz_sql_parser, try to install version 4.7.21002 of the package: pip install moz_sql_parser==4.7.21002

Re-product the experiment section:

All experiments are here:

Some example:

  • Train target group 4 for 200 episodes sudo python3 main.py -e 200 -g 1 -tg 4 -se 100 -s ./saved_model/group4-200/ (PS:To run the main.py for training if the error is related to installing .dll files, installing the Microsoft Visual C++ 2015 Redistributable Update 3 (x64 version) should fix it. link:https://www.microsoft.com/en-us/download/details.aspx?id=53587 )

Now the plots are in ./outputs folder (default) and the model in ./saved_model/

  • Restore saved model and test group 4 sudo python3 main.py -e 3 -g 1 -tg 4 -r ./saved_model/group4-200/ --testing -o ./outputs/testing/

  • Restore saved model and keep training on group 5 for 500 episodes sudo python3 main.py -e 200 -g 1 -tg 5 -se 100 -r ./saved_model/group4-200/ -s ./saved_model/group5-500/

  • Execute a single query python main.py --query 3a --episodes 150

Test on TPCH data set

follow https://github.com/gregrahn/tpch-kit

follow: https://ankane.org/tpc-h, before 'Generate queries'

https://github.com/haochending/benchmarks/tree/master/tpc-ds-2.4/openLookeng

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%