# Welcome to the Covid Tracking Example!

## Workshop Steps

Now that you have opened up the MyBinder environment and are reading this, you are already on the right track! Inside this environment, you will also find:

* sample scripts: This is a folder containing the base of the scripts that you will be working with to finish the exercise. Please look for the triple exclamation points (!!!) as that means that you are being asked to write some code to get things to work!
* README.md: This is just the README file you saw on the Github page.
* requirements.txt: This is a list of the required libraries that were installed upon startup.
* setup.ipynb: The file you are reading right now! Think of this as your home page.

## Step 0: Set up SQL Magic

In [None]:
%config SqlMagic.autocommit=False

In [None]:
%load_ext sql

In [None]:
%sql SHOW TABLES FROM default

In [None]:
%sql select count(*) from mysql.default.covid_cases_europe_daily

In [None]:
%sql select count(*) from mysql.default.covid_deaths_europe_daily

In [None]:
%sql select count(*) from mysql.default.covid_cases_deaths_europe_daily

In [None]:
%sql DROP TABLE mysql.default.covid_cases_europe_daily

In [None]:
%sql DROP TABLE mysql.default.covid_deaths_europe_daily

In [None]:
%sql DROP TABLE mysql.default.covid_cases_deaths_europe_daily

## Step 1: Explore the List of Data Jobs That Have Been Created on the Cloud

In [None]:
! vdk list

## Step 2: Create a Data Job

Now that we have explored the list of created (on the cloud) data jobs, let's create our own.

Keep in mind that we would like to have a sub-folder for the data job,so that our Streamlit script is outside of it and in the main directory.

Based on the information above, try creating a data job titled as follows:
* covid-tracking, dash (-)
* your last name, dash (-)
* your favorite sports team, dash (-)
* your favorite drink.

Please do not use team names and numbers that may be parts of any of your passwords, as the data job names will be visible to all. For example, you can create a data job titled "covid-tracking-smith-man-united-cola".

You can chose any team name that you want, but please create the job at the home directory. This will create a sub-folder for the data job. The home directory is /home/jovyan.

Here's an example code:

In [None]:
! vdk create -n tracking-covid-avramov-man-united-boza -t amld -p /home/jovyan

## Step 3: Work Out the Data Job Template

Now that you have created a data job, please go inside the subfolder and set up the structure of your data job. Here's the general idea.

We want the data job to have three scripts:

* Let's have one SQL script that creates the Covid cases data table in our cloud DB.
* Let's have one SQL script that creates the Covid deaths data table in our cloud DB.
* Let's have one SQL script that creates the Covid cases and deaths cleantable in our cloud DB.
* Let's have one Python script that creates an API call for the Covid cases data and ingests it into our cloud DB.
* Let's have another Python script that creates an API call for the Covid deaths data and ingests it into our cloud DB.
* Let's have one Python script that reads both sets of data from the cloud DB, joins them, cleans them, and saves them in a new table in the DB.
* Let's also have a config.ini file, which specifies how often the data job will be executed, using cron scheduling.

Each of these scripts are present in the sample scripts subfolder. However, we've added some coding challeneges inside of them to make things fun! 

Let's first delete the template scripts that came with the creation of the data job, since we won't need them. Please run the code cell below, making sure to enter your data job's name instead of the 'ENTER NAME HERE'.

In [None]:
! rm "tracking-covid-avramov-man-united-boza/10_sql_step.sql"
! rm "tracking-covid-avramov-man-united-boza/20_python_step.py"
! rm "tracking-covid-avramov-man-united-boza/README.md"
! rm "tracking-covid-avramov-man-united-boza/requirements.txt"
! rm "tracking-covid-avramov-man-united-boza/config.ini"

Let's move the sample scripts to the data job subfolder. Please run the code cell below, making sure to enter your data job's name instead of the 'ENTER NAME HERE'.

In [None]:
! mv "sample scripts/01_create_covid_cases_europe_daily.sql" ~/tracking-covid-avramov-man-united-boza
! mv "sample scripts/02_create_covid_deaths_europe_daily.sql" ~/tracking-covid-avramov-man-united-boza
! mv "sample scripts/03_create_clean_full_table.sql" ~/tracking-covid-avramov-man-united-boza
! mv "sample scripts/10_ingest_covid_cases_data.py"  ~/tracking-covid-avramov-man-united-boza
! mv "sample scripts/20_ingest_covid_deaths_data.py"  ~/tracking-covid-avramov-man-united-boza
! mv "sample scripts/30_clean_merge_transform.py"  ~/tracking-covid-avramov-man-united-boza
! mv "sample scripts/config.ini"  ~/tracking-covid-avramov-man-united-boza

Great! Now you're all set up with the data job:

* You have created a data job on the cloud.
* You have deleted the template files that you do not need.
* You have moved the sample scripts we provided to the data job sub-folder.

The next step is to begin working on each script in the data job! Let's do it!

## Step 4: Data Job - Define the Covid Cases Table (01_create_covid_cases_europe_daily.sql)

In [None]:
! vdk run tracking-covid-avramov-man-united-boza

## Step 5: Data Job - Define the Covid Deaths Table (02_create_covid_deaths_europe_daily.sql)

## Step 6: Data Job - Define the Covid Cases and Deaths Clean Table (03_create_clean_full_table.sql)

## Step 7: Data Job - Incrementally Ingest Covid Cases Data (10_ingest_covid_cases_data.py)

## Step 8: Data Job - Incrementally Ingest Covid Deaths Data  (20_ingest_covid_deaths_data.py)

## Step 9: Data Job - Incrementally Build Covid Cases and Deaths Clean Data (30_clean_merge_transform.py)

## Step 10: Building an Interactive Streamlit Dashboard (build_streamlit_dashboard.py)

In [None]:
import os
print("Open streamlit (in a new tab) at this link:")
print("https://notebooks.gesis.org/binder/jupyter/user/" + os.environ.get("JUPYTERHUB_USER") + "/proxy/8501/")

In [None]:
! streamlit run build_streamlit_dashboard.py