# COVID Database setup and initialization in SQLite

The database will be named as `COVID_DB` and it will be created and initialized using Python and SQLite; with SQLite as the Database Management System and Python for scripting.

We'll use Python to create and initialize the Database. We'll also add the contents of the 2 COVID <abbr title="Comma Seperated Values">CSV</abbr> files to the Database as Tables.

## Connecting to Google Drive (If necessary)
While working on this project using Google Drive there is a need to import the `drive` object from the `google.colab` python module. This is only required while working on this `.ipynb` file and using the SQLite Database with this file if it's worked on in Google Drive.

A `mountpoint` instance must be created to navigate Google Drive, in this case it can be called: `/content/drive`. It will ask for authentication to your Google Drive account. A username and password may be requested. Then simply "Allow" this let this `.ipynb` file create, add, delete, and modify files in whatever Google Drive account is in use.

In [None]:
from google.colab import drive
drive.mount("/content/drive")

Mounted at /content/drive


## The <abbr title="Comma Seperated Values">CSV</abbr> files to use in this Project
The source for the COVID-19 data comes from [*Coronavirus (COVID-19) Deaths*](https://ourworldindata.org/covid-deaths) by <u>**Our World in Data**</u> and has already been seperated into 2 CSV files: `CovidDeaths.csv` and `CovidVaccinations.csv`. These 2 CSV files will each be their own table in the `COVID_DB` database. the code below shows the path to each of these files and should be modified if they are moved to a different location in Google Drive.

In [None]:
path_covid_death = "/content/drive/MyDrive/Documents/Data Analysis Projects/SQL Data Exploration/CovidDeaths.csv"
path_covid_vacc = "/content/drive/MyDrive/Documents/Data Analysis Projects/SQL Data Exploration/CovidVaccinations.csv"

## The Modules (Library) needed for this project.

The Pandas Module provides a means to convert a `csv` file to a DataFrame which in turn can be convert into a table in `COVID_DB`. We'll need help from another Module: SQLAlchemy, to be able to modify the COVID_DB in a object oriented programmatic way. Of course we must also have a module for `SQLite` to create/modify/delete a simple database. Which is the `COVID_DB`. 

In [None]:
import numpy as np
import pandas as pd

In [None]:
import sqlite3
from sqlalchemy import create_engine

If I'm using my Google Drive Account:

In [None]:
engine = create_engine("sqlite:////content/drive/MyDrive/Documents/Data Analysis Projects/SQL Data Exploration/COVID_DB.db")

If I'm using my own computer:

In [None]:
my_conn = create_engine(r"sqlite:///C:\Users\jcwol\Google Drive\Documents\Data Analysis Projects\SQL Data Exploration/COVID_DB.db")

## Reading the `CSV` files and convert into a DataFrame
Now we can get our hands dirty and have Pandas read the `csv` files and store them as a DataFrame. 

Disclaimer: In this project there is no primary keys we could use for when we can create the `COVID_DB` database. Also data cleaning will not be done in this Python Notebook. This is to just start creating and implementing a Database.

In [None]:
df_covid_death = pd.read_csv(path_covid_death)
df_covid_vacc = pd.read_csv(path_covid_vacc)

In [None]:
df_covid_death

Unnamed: 0,iso_code,continent,location,date,population,total_cases,new_cases,new_cases_smoothed,total_deaths,new_deaths,new_deaths_smoothed,total_cases_per_million,new_cases_per_million,new_cases_smoothed_per_million,total_deaths_per_million,new_deaths_per_million,new_deaths_smoothed_per_million,reproduction_rate,icu_patients,icu_patients_per_million,hosp_patients,hosp_patients_per_million,weekly_icu_admissions,weekly_icu_admissions_per_million,weekly_hosp_admissions,weekly_hosp_admissions_per_million
0,AFG,Asia,Afghanistan,2020-02-24,39835428.0,5.0,5.0,,,,,0.126,0.126,,,,,,,,,,,,,
1,AFG,Asia,Afghanistan,2020-02-25,39835428.0,5.0,0.0,,,,,0.126,0.000,,,,,,,,,,,,,
2,AFG,Asia,Afghanistan,2020-02-26,39835428.0,5.0,0.0,,,,,0.126,0.000,,,,,,,,,,,,,
3,AFG,Asia,Afghanistan,2020-02-27,39835428.0,5.0,0.0,,,,,0.126,0.000,,,,,,,,,,,,,
4,AFG,Asia,Afghanistan,2020-02-28,39835428.0,5.0,0.0,,,,,0.126,0.000,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
157471,ZWE,Africa,Zimbabwe,2022-01-22,15092171.0,228179.0,218.0,363.143,5292.0,4.0,7.714,15119.031,14.445,24.062,350.645,0.265,0.511,0.6,,,,,,,,
157472,ZWE,Africa,Zimbabwe,2022-01-23,15092171.0,228254.0,75.0,310.857,5294.0,2.0,6.714,15124.000,4.969,20.597,350.778,0.133,0.445,0.6,,,,,,,,
157473,ZWE,Africa,Zimbabwe,2022-01-24,15092171.0,228541.0,287.0,297.286,5305.0,11.0,6.714,15143.017,19.016,19.698,351.507,0.729,0.445,,,,,,,,,
157474,ZWE,Africa,Zimbabwe,2022-01-25,15092171.0,228776.0,235.0,330.857,5316.0,11.0,8.286,15158.588,15.571,21.922,352.236,0.729,0.549,,,,,,,,,


In [None]:
df_covid_vacc

Unnamed: 0,iso_code,continent,location,date,new_tests,total_tests,total_tests_per_thousand,new_tests_per_thousand,new_tests_smoothed,new_tests_smoothed_per_thousand,positive_rate,tests_per_case,tests_units,total_vaccinations,people_vaccinated,people_fully_vaccinated,total_boosters,new_vaccinations,new_vaccinations_smoothed,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,total_boosters_per_hundred,new_vaccinations_smoothed_per_million,new_people_vaccinated_smoothed,new_people_vaccinated_smoothed_per_hundred,stringency_index,Unnamed: 27,population_density,median_age,aged_65_older,aged_70_older,gdp_per_capita,extreme_poverty,cardiovasc_death_rate,diabetes_prevalence,female_smokers,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index,excess_mortality_cumulative_absolute,excess_mortality_cumulative,excess_mortality,excess_mortality_cumulative_per_million
0,AFG,Asia,Afghanistan,2020-02-24,,,,,,,,,,,,,,,,,,,,,,,8.33,,54.422,18.6,2.581,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83,0.511,,,,
1,AFG,Asia,Afghanistan,2020-02-25,,,,,,,,,,,,,,,,,,,,,,,8.33,,54.422,18.6,2.581,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83,0.511,,,,
2,AFG,Asia,Afghanistan,2020-02-26,,,,,,,,,,,,,,,,,,,,,,,8.33,,54.422,18.6,2.581,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83,0.511,,,,
3,AFG,Asia,Afghanistan,2020-02-27,,,,,,,,,,,,,,,,,,,,,,,8.33,,54.422,18.6,2.581,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83,0.511,,,,
4,AFG,Asia,Afghanistan,2020-02-28,,,,,,,,,,,,,,,,,,,,,,,8.33,,54.422,18.6,2.581,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83,0.511,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
157471,ZWE,Africa,Zimbabwe,2022-01-22,2626.0,1822879.0,120.783,0.174,4145.0,0.275,0.0876,11.4,tests performed,7506786.0,4239537.0,3267249.0,,9904.0,10567.0,49.74,28.09,21.65,,700.0,5058.0,0.034,,,42.729,19.6,2.822,1.882,1899.775,21.4,307.846,1.82,1.6,30.7,36.791,1.7,61.49,0.571,,,,
157472,ZWE,Africa,Zimbabwe,2022-01-23,1541.0,1824420.0,120.885,0.102,3912.0,0.259,0.0795,12.6,tests performed,7512903.0,4242647.0,3270256.0,,6117.0,10631.0,49.78,28.11,21.67,,704.0,5182.0,0.034,,,42.729,19.6,2.822,1.882,1899.775,21.4,307.846,1.82,1.6,30.7,36.791,1.7,61.49,0.571,,,,
157473,ZWE,Africa,Zimbabwe,2022-01-24,4913.0,1829333.0,121.211,0.326,4043.0,0.268,0.0735,13.6,tests performed,7517985.0,4245063.0,3272922.0,,5082.0,10273.0,49.81,28.13,21.69,,681.0,5009.0,0.033,,,42.729,19.6,2.822,1.882,1899.775,21.4,307.846,1.82,1.6,30.7,36.791,1.7,61.49,0.571,,,,
157474,ZWE,Africa,Zimbabwe,2022-01-25,,,,,,,,,,7525574.0,4248576.0,3276998.0,,7589.0,9579.0,49.86,28.15,21.71,,635.0,4638.0,0.031,,,42.729,19.6,2.822,1.882,1899.775,21.4,307.846,1.82,1.6,30.7,36.791,1.7,61.49,0.571,,,,


## Creating the Tables in `COVID_DB`

At this point using Pandas we can create the tables `Covid_Deaths` and `Covid Vaccinations` off of the `df_covid_death` and `df_covid_vacc` dataframes and also indirectly create the `COVID_DB` the moment we use Pandas' `to_sql` method.

Just incase if the entire Python Notebook needs to be started from top to bottom we can just recreate the tables in `COVID_DB` so that no errors are produced when rerunning these particular code blocks.

In [None]:
df_covid_death.to_sql("CovidDeaths", engine, if_exists='replace')

In [None]:
df_covid_vacc.to_sql("CovidVaccinations", engine, if_exists='replace')

## Checking if the Datababe and Tables exist!

In [None]:
sql_df_death = pd.read_sql("CovidDeaths", engine)

In [None]:
sql_df_death

Unnamed: 0,index,iso_code,continent,location,date,population,total_cases,new_cases,new_cases_smoothed,total_deaths,new_deaths,new_deaths_smoothed,total_cases_per_million,new_cases_per_million,new_cases_smoothed_per_million,total_deaths_per_million,new_deaths_per_million,new_deaths_smoothed_per_million,reproduction_rate,icu_patients,icu_patients_per_million,hosp_patients,hosp_patients_per_million,weekly_icu_admissions,weekly_icu_admissions_per_million,weekly_hosp_admissions,weekly_hosp_admissions_per_million
0,0,AFG,Asia,Afghanistan,2020-02-24,39835428.0,5.0,5.0,,,,,0.126,0.126,,,,,,,,,,,,,
1,1,AFG,Asia,Afghanistan,2020-02-25,39835428.0,5.0,0.0,,,,,0.126,0.000,,,,,,,,,,,,,
2,2,AFG,Asia,Afghanistan,2020-02-26,39835428.0,5.0,0.0,,,,,0.126,0.000,,,,,,,,,,,,,
3,3,AFG,Asia,Afghanistan,2020-02-27,39835428.0,5.0,0.0,,,,,0.126,0.000,,,,,,,,,,,,,
4,4,AFG,Asia,Afghanistan,2020-02-28,39835428.0,5.0,0.0,,,,,0.126,0.000,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
157471,157471,ZWE,Africa,Zimbabwe,2022-01-22,15092171.0,228179.0,218.0,363.143,5292.0,4.0,7.714,15119.031,14.445,24.062,350.645,0.265,0.511,0.6,,,,,,,,
157472,157472,ZWE,Africa,Zimbabwe,2022-01-23,15092171.0,228254.0,75.0,310.857,5294.0,2.0,6.714,15124.000,4.969,20.597,350.778,0.133,0.445,0.6,,,,,,,,
157473,157473,ZWE,Africa,Zimbabwe,2022-01-24,15092171.0,228541.0,287.0,297.286,5305.0,11.0,6.714,15143.017,19.016,19.698,351.507,0.729,0.445,,,,,,,,,
157474,157474,ZWE,Africa,Zimbabwe,2022-01-25,15092171.0,228776.0,235.0,330.857,5316.0,11.0,8.286,15158.588,15.571,21.922,352.236,0.729,0.549,,,,,,,,,


In [None]:
sql_df_vacc = pd.read_sql("CovidVaccinations", engine)

In [None]:
sql_df_vacc

Unnamed: 0,index,iso_code,continent,location,date,new_tests,total_tests,total_tests_per_thousand,new_tests_per_thousand,new_tests_smoothed,new_tests_smoothed_per_thousand,positive_rate,tests_per_case,tests_units,total_vaccinations,people_vaccinated,people_fully_vaccinated,total_boosters,new_vaccinations,new_vaccinations_smoothed,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,total_boosters_per_hundred,new_vaccinations_smoothed_per_million,new_people_vaccinated_smoothed,new_people_vaccinated_smoothed_per_hundred,stringency_index,Unnamed: 27,population_density,median_age,aged_65_older,aged_70_older,gdp_per_capita,extreme_poverty,cardiovasc_death_rate,diabetes_prevalence,female_smokers,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index,excess_mortality_cumulative_absolute,excess_mortality_cumulative,excess_mortality,excess_mortality_cumulative_per_million
0,0,AFG,Asia,Afghanistan,2020-02-24,,,,,,,,,,,,,,,,,,,,,,,8.33,,54.422,18.6,2.581,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83,0.511,,,,
1,1,AFG,Asia,Afghanistan,2020-02-25,,,,,,,,,,,,,,,,,,,,,,,8.33,,54.422,18.6,2.581,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83,0.511,,,,
2,2,AFG,Asia,Afghanistan,2020-02-26,,,,,,,,,,,,,,,,,,,,,,,8.33,,54.422,18.6,2.581,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83,0.511,,,,
3,3,AFG,Asia,Afghanistan,2020-02-27,,,,,,,,,,,,,,,,,,,,,,,8.33,,54.422,18.6,2.581,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83,0.511,,,,
4,4,AFG,Asia,Afghanistan,2020-02-28,,,,,,,,,,,,,,,,,,,,,,,8.33,,54.422,18.6,2.581,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83,0.511,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
157471,157471,ZWE,Africa,Zimbabwe,2022-01-22,2626.0,1822879.0,120.783,0.174,4145.0,0.275,0.0876,11.4,tests performed,7506786.0,4239537.0,3267249.0,,9904.0,10567.0,49.74,28.09,21.65,,700.0,5058.0,0.034,,,42.729,19.6,2.822,1.882,1899.775,21.4,307.846,1.82,1.6,30.7,36.791,1.7,61.49,0.571,,,,
157472,157472,ZWE,Africa,Zimbabwe,2022-01-23,1541.0,1824420.0,120.885,0.102,3912.0,0.259,0.0795,12.6,tests performed,7512903.0,4242647.0,3270256.0,,6117.0,10631.0,49.78,28.11,21.67,,704.0,5182.0,0.034,,,42.729,19.6,2.822,1.882,1899.775,21.4,307.846,1.82,1.6,30.7,36.791,1.7,61.49,0.571,,,,
157473,157473,ZWE,Africa,Zimbabwe,2022-01-24,4913.0,1829333.0,121.211,0.326,4043.0,0.268,0.0735,13.6,tests performed,7517985.0,4245063.0,3272922.0,,5082.0,10273.0,49.81,28.13,21.69,,681.0,5009.0,0.033,,,42.729,19.6,2.822,1.882,1899.775,21.4,307.846,1.82,1.6,30.7,36.791,1.7,61.49,0.571,,,,
157474,157474,ZWE,Africa,Zimbabwe,2022-01-25,,,,,,,,,,7525574.0,4248576.0,3276998.0,,7589.0,9579.0,49.86,28.15,21.71,,635.0,4638.0,0.031,,,42.729,19.6,2.822,1.882,1899.775,21.4,307.846,1.82,1.6,30.7,36.791,1.7,61.49,0.571,,,,


## Database creation, initialization, and Setup has been completed.