Skip to content

armanawn/ssc-workshop-databases-2022

Repository files navigation

SSC Data Science and Analytics Workshop 2022

Intro to Databases in Industry: Data Cleaning, Querying, and Modeling at Scale

Speakers:

  • Rodolfo Lourenzutti, University of British Columbia
  • Arman Seyed-Ahmadi, University of British Columbia
  • Diego Ardila, Shopify

Workshop notebooks

Setup instructions

Recreating databases locally

You can install PostgreSQL on your own machine and load the database dump files provided in the databases/ folder to locally recreate the databases used in the workshop for further practicing. The instructions to do so are provided here.

Notebook environment

The Jupyter notebooks in this repository use a few packages to run SQL commands within the Python environment of the notebooks, which are all provided in the environment.yml. In order to reproduce this environment and make it accessible to Jupyter Lab, you need to install the nb_conda_kernels package in your base environment (or whichever environment Jupyter Lab is installed in) using the following command in your terminal:

conda install nb_conda_kernels

Then run the following command to recreate the environment

conda env create -f environment.yml

A new environment called ssc2022 should appear in the list of kernels when you launch Jupyter Lab on your computer.

License

© 2022 Arman Seyed-Ahmadi, Rodolfo Lourenzutti, Diego Ardila

Software licensed under the MIT License, non-software content licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License. See the license file for more information.