Skip to content

Instructional materials (course files) for the BBT3104 course (Advanced Database Systems) and the MIT8107 course (Advanced Database Systems). Topic: Query Optimization. Based on the IMDb dataset using PostgreSQL and the Join Order Benchmark (JOB).

License

Notifications You must be signed in to change notification settings

course-files/BBT3104-Lab6of6-QueryOptimization

Repository files navigation

Query Optimization

These are course files (instructional materials) for both the BBT3104: Advanced Database Systems and the MIT8107: Advanced Database Systems courses.

Key Value
Course Code BBT3104 and MIT8107
Course Name Advanced Database Systems (same name for both courses)
URL https://elearning.strathmore.edu/course/view.php?id=3932 and https://elearning.strathmore.edu/course/view.php?id=3277 respectively
Semester Duration 14th August - 25th November 2024
Lecturer Allan Omondi
Contact aomondi_at_strathmore_dot_edu

Internet Movie Database (IMDb)

IMDb captures more than 2.5 million movie titles produced over 133 years by 234,997 different companies with over 4 million actors. It subsequently contains many join-crossing correlations thus making it challenging for cardinality estimators. This is unlike TPC-H and TPC-DS which are considered trivial for cardinality estimators.

imdb_ERD

Data Import

  1. Create the PostgreSQL Docker container using Docker-Compose.yaml

  2. Create the IMDb database in Postgres using schema.sql

  3. Download the .tgz file, decompress it, and import each .csv file into its respective table. You can use DBeaver (recommended) or any other similar database tool when importing. A slightly similar video tutorial is available here on YouTube.
    Alternatively, you can use cinemagoer, formerly IMDbPy, to automate this step.

Join Order Benchmark (JOB) queries

  1. Proceed to execute the Join Order Benchmark (JOB) queries available here on the IMDb database in PostgreSQL.

Q-Error Calculations

  1. Execute the q-error computation code to quantitatively determine the error between the actual cardinality and the estimated cardinality per node for each query here.

Acknowledgments

Initial code has been sourced from join-order-benchmark by Greg Rahn, Moritz Eyssen, maahl, and Max Halford.

Additional code has also been sourced from learnedcardinalities by Andreas Kipf

References

Kipf, A., Kipf, T., Radke, B., Leis, V., Boncz, P., & Kemper, A. (2019). Learned cardinalities: Estimating correlated joins with deep learning. 9th Biennial Conference on Innovative Data Systems Research (CIDR ‘19), CA, USA. https://arxiv.org/abs/1809.00677

Leis, V., Radke, B., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., & Neumann, T. (2018). Query optimization through the looking glass, and what we found running the Join Order Benchmark. The VLDB Journal, 27(5), 643–668. https://doi.org/10.1007/s00778-017-0480-7

About

Instructional materials (course files) for the BBT3104 course (Advanced Database Systems) and the MIT8107 course (Advanced Database Systems). Topic: Query Optimization. Based on the IMDb dataset using PostgreSQL and the Join Order Benchmark (JOB).

Topics

Resources

License

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •  

Languages