## This Notebook:

1) links to the AWS database

2) cleans the AWS database

3) creates a local path for the output Cases_Cleaned/ML_cases.csv

4) creates a local path for the output Deaths_Cleaned/ML_deaths.csv

5) ML model for cases reads ML_cases.csv

6) ML model for cases saves output to user-defined location

5) ML model for deaths reads ML_deaths.csv

6) ML model for deaths saves output to user-defined location

7) creates PostgresSQL database for machine learning models


## Working:

w1)  create data input<br>
w2)  format data into Pandas DataFrame<br>
w3)  import DataFrame into PostgresSQL database (locally)<br>


In [1]:
# import dependencies

import pandas as pd


In [2]:
#results database information

name_nb = "ML_pn_rev1"

In [72]:
#results database information
run_nb = {}
run_counter = 2
# run_counter +=1
run_nb['notebook'] = run_counter
run_nb

{'notebook': 2}

## AWS db cleaner


**RELEVANT DATAFRAMES:  df, df_cases, df_deaths**

FILE: vax_cases_death.csv

SOURCE:  AWS download from SQL database


In [4]:
#Import dependencies

import re


### **Step 1:**  

Read AWS file into Pandas

In [5]:
# read the file

file_path = "https://initial-datasets.s3.amazonaws.com/vax_cases_deaths.csv"
df = pd.read_csv(file_path)
df

Unnamed: 0,date,mmwr_week,year,location,distributed,administered,total_cases,total_deaths
0,2020-12-19,51,2020,AK,26325,1607,42659,267
1,2020-12-26,52,2020,AK,45250,11427,44394,271
2,2021-01-02,53,2021,AK,54975,18401,46530,287
3,2021-01-09,1,2021,AK,92875,28539,48571,297
4,2021-01-16,2,2021,AK,149650,54193,50264,300
...,...,...,...,...,...,...,...,...
2387,2021-10-02,39,2021,WY,674445,521105,91439,996
2388,2021-10-09,40,2021,WY,690305,536884,94580,1041
2389,2021-10-16,41,2021,WY,698865,547566,97479,1080
2390,2021-10-23,42,2021,WY,716325,557340,100174,1149


In [6]:
#results database information


if file_path == "https://initial-datasets.s3.amazonaws.com/vax_cases_deaths.csv":
    source_db = "AWS database csv file"
    file_id = file_path





In [7]:
df.columns

Index(['date', 'mmwr_week', 'year', 'location', 'distributed', 'administered',
       'total_cases', 'total_deaths'],
      dtype='object')

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2392 entries, 0 to 2391
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   date          2392 non-null   object
 1   mmwr_week     2392 non-null   int64 
 2   year          2392 non-null   int64 
 3   location      2392 non-null   object
 4   distributed   2392 non-null   int64 
 5   administered  2392 non-null   int64 
 6   total_cases   2392 non-null   int64 
 7   total_deaths  2392 non-null   int64 
dtypes: int64(6), object(2)
memory usage: 149.6+ KB


### **Step 2:**  
    


In [9]:
# location to string

df["location"] = df["location"].astype(str)
df["location"].dtypes

dtype('O')

In [10]:
# Delete the column "submission_date."

df.drop(columns = ["date"], inplace = True)



### **Step 4:** 

Make the label columns

In [11]:
# Add the label columns to df. 

df["2020_mean_cases"] = 0
df["2020_mean_deaths"] = 0


In [12]:
# mean cases for 2020 and for 2020 and 2021 combined

mean_cases = [90135, 325018]

mean_cases_value = mean_cases[0]

# mean deaths for 2020 and for 2020 and 2021 combined

mean_deaths = [2634, 6365]

mean_deaths_value = mean_deaths[0]

# Populate "2020_mean_cases" with 1 or 0 
# Populate "2020_mean_deaths" with 1 or 0 

# cases:  selected only year 2020 for 

for index, row in df.iterrows():
    x = row["total_cases"]
    if x >= int(mean_cases_value):
        df.loc[index, "2020_mean_cases"]=1
    else:
        df.loc[index, "2020_mean_cases"]=0

print(df["2020_mean_cases"].value_counts())

# deaths

for index, row in df.iterrows():
    x = row["total_deaths"]
    if x >= int(mean_deaths_value):
        df.loc[index, "2020_mean_deaths"]=1
    else:
        df.loc[index, "2020_mean_deaths"]=0

print(df["2020_mean_deaths"].value_counts())


1    2048
0     344
Name: 2020_mean_cases, dtype: int64
1    1668
0     724
Name: 2020_mean_deaths, dtype: int64


In [13]:
#delete columns "total_cases" and "total_deaths"

df.drop(columns = ["total_cases", "total_deaths"], inplace = True)


In [14]:
df.head()

Unnamed: 0,mmwr_week,year,location,distributed,administered,2020_mean_cases,2020_mean_deaths
0,51,2020,AK,26325,1607,0,0
1,52,2020,AK,45250,11427,0,0
2,53,2021,AK,54975,18401,0,0
3,1,2021,AK,92875,28539,0,0
4,2,2021,AK,149650,54193,0,0


In [15]:
#results database information


#make copies for statistical analysis only

#Xa = df_cases_2020.copy()
#Xb = df.copy()

# CASES:  describe Xa and make dataframe

#stats_Xa = Xa["tot_cases"].describe()
#stats_Xa_cases_df = pd.DataFrame(stats_Xa)
#stats_Xa_cases_ds = stats_Xa_cases_df["tot_cases"].squeeze()

# CASES:  describe Xa and make dataframe

#stats_Xb = Xb["tot_cases"].describe()
#stats_Xb_cases_df = pd.DataFrame(stats_Xb)
#stats_Xb_cases_ds = stats_Xb_cases_df["tot_cases"].squeeze()

# DEATHS:  describe Xa and make dataframe

#stats_Xa = Xa["tot_death"].describe()
#stats_Xa_death_df = pd.DataFrame(stats_Xa)
#stats_Xa_death_ds = stats_Xa_death_df["tot_death"].squeeze()

# DEATHS:  describe Xa and make dataframe

#stats_Xb = Xb["tot_death"].describe()
#stats_Xb_death_df = pd.DataFrame(stats_Xb)
#stats_Xb_death_ds = stats_Xb_death_df["tot_death"].squeeze()

#stats_Xa_cases_ds.to_dict()



In [16]:
# name of the statistics dataset used for the label column (name_statsfile)

#name_statsfile = "stats_Xa_cases_ds"

#the statistic used for the setting the label column (name_statistic)

#name_statistic = "mean"


### **Step 5:** 

Perform OneHotEncoding on object columns

In [17]:
# import dependencies

from sklearn.preprocessing import OneHotEncoder, LabelEncoder


In [18]:
obj_list = df.dtypes[df.dtypes == "object"].index.to_list()
obj_list

['location']

In [19]:
# Apply OneHotEncoder to objects

enc = OneHotEncoder(sparse = False)
encoded_df = pd.DataFrame(enc.fit_transform(df[obj_list]))
encoded_df.columns = enc.get_feature_names(obj_list)
encoded_df.head(200)

Unnamed: 0,location_AK,location_AL,location_AR,location_AZ,location_CA,location_CO,location_CT,location_DC,location_DE,location_FL,...,location_TN,location_TX,location_UT,location_VA,location_VI,location_VT,location_WA,location_WI,location_WV,location_WY
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
196,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
197,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
198,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [20]:
df = df.merge(encoded_df, left_index = True, right_index = True)
df = df.drop(obj_list, 1)
df.head(200)

Unnamed: 0,mmwr_week,year,distributed,administered,2020_mean_cases,2020_mean_deaths,location_AK,location_AL,location_AR,location_AZ,...,location_TN,location_TX,location_UT,location_VA,location_VI,location_VT,location_WA,location_WI,location_WV,location_WY
0,51,2020,26325,1607,0,0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,52,2020,45250,11427,0,0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,53,2021,54975,18401,0,0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,1,2021,92875,28539,0,0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2,2021,149650,54193,0,0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,9,2021,13885120,10415023,1,1,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
196,10,2021,16376020,11881857,1,1,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
197,11,2021,18875980,14276125,1,1,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
198,12,2021,21865730,16944916,1,1,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [21]:
df.columns

Index(['mmwr_week', 'year', 'distributed', 'administered', '2020_mean_cases',
       '2020_mean_deaths', 'location_AK', 'location_AL', 'location_AR',
       'location_AZ', 'location_CA', 'location_CO', 'location_CT',
       'location_DC', 'location_DE', 'location_FL', 'location_GA',
       'location_HI', 'location_IA', 'location_ID', 'location_IL',
       'location_IN', 'location_KS', 'location_KY', 'location_LA',
       'location_MA', 'location_MD', 'location_ME', 'location_MI',
       'location_MN', 'location_MO', 'location_MS', 'location_MT',
       'location_NC', 'location_ND', 'location_NE', 'location_NH',
       'location_NJ', 'location_NM', 'location_NV', 'location_NY',
       'location_OH', 'location_OK', 'location_OR', 'location_PA',
       'location_RI', 'location_SC', 'location_SD', 'location_TN',
       'location_TX', 'location_UT', 'location_VA', 'location_VI',
       'location_VT', 'location_WA', 'location_WI', 'location_WV',
       'location_WY'],
      dtype='object')

### **Step 6:** 

Make dataframes for cases and deaths

In [22]:
#. Make a new dataframe for cases only.

# first reorder columns

columns_cases = ['mmwr_week', 'year', 'distributed', 'administered', '2020_mean_cases',
       '2020_mean_deaths', 'location_AK', 'location_AL', 'location_AR',
       'location_AZ', 'location_CA', 'location_CO', 'location_CT',
       'location_DC', 'location_DE', 'location_FL', 'location_GA',
       'location_HI', 'location_IA', 'location_ID', 'location_IL',
       'location_IN', 'location_KS', 'location_KY', 'location_LA',
       'location_MA', 'location_MD', 'location_ME', 'location_MI',
       'location_MN', 'location_MO', 'location_MS', 'location_MT',
       'location_NC', 'location_ND', 'location_NE', 'location_NH',
       'location_NJ', 'location_NM', 'location_NV', 'location_NY',
       'location_OH', 'location_OK', 'location_OR', 'location_PA',
       'location_RI', 'location_SC', 'location_SD', 'location_TN',
       'location_TX', 'location_UT', 'location_VA', 'location_VI',
       'location_VT', 'location_WA', 'location_WI', 'location_WV',
       'location_WY']

columns_cases_new = [ 'year', 'mmwr_week', 'distributed', 'administered', 
        'location_AK', 'location_AL', 'location_AR',
       'location_AZ', 'location_CA', 'location_CO', 'location_CT',
       'location_DC', 'location_DE', 'location_FL', 'location_GA',
       'location_HI', 'location_IA', 'location_ID', 'location_IL',
       'location_IN', 'location_KS', 'location_KY', 'location_LA',
       'location_MA', 'location_MD', 'location_ME', 'location_MI',
       'location_MN', 'location_MO', 'location_MS', 'location_MT',
       'location_NC', 'location_ND', 'location_NE', 'location_NH',
       'location_NJ', 'location_NM', 'location_NV', 'location_NY',
       'location_OH', 'location_OK', 'location_OR', 'location_PA',
       'location_RI', 'location_SC', 'location_SD', 'location_TN',
       'location_TX', 'location_UT', 'location_VA', 'location_VI',
       'location_VT', 'location_WA', 'location_WI', 'location_WV',
       'location_WY', '2020_mean_cases', '2020_mean_deaths']

df = df.reindex(columns = columns_cases_new )

# next drop out death-related columns for df_cases

df_cases = df.copy()
df_cases.drop(columns = ['2020_mean_deaths'], inplace = True)

df_deaths = df.copy()
df_deaths.drop(columns = ['2020_mean_cases'], inplace = True)

In [23]:
df_cases.head()

Unnamed: 0,year,mmwr_week,distributed,administered,location_AK,location_AL,location_AR,location_AZ,location_CA,location_CO,...,location_TX,location_UT,location_VA,location_VI,location_VT,location_WA,location_WI,location_WV,location_WY,2020_mean_cases
0,2020,51,26325,1607,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
1,2020,52,45250,11427,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
2,2021,53,54975,18401,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
3,2021,1,92875,28539,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
4,2021,2,149650,54193,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0


In [24]:
df_deaths.head()

Unnamed: 0,year,mmwr_week,distributed,administered,location_AK,location_AL,location_AR,location_AZ,location_CA,location_CO,...,location_TX,location_UT,location_VA,location_VI,location_VT,location_WA,location_WI,location_WV,location_WY,2020_mean_deaths
0,2020,51,26325,1607,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
1,2020,52,45250,11427,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
2,2021,53,54975,18401,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
3,2021,1,92875,28539,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
4,2021,2,149650,54193,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0


### **Step 7:** 

Save dataframes as csv files for folders Cases_Cleaned and Deaths_Cleaned

In [25]:
import os

In [26]:
# Save df_cases as csv file.

os.makedirs("Cases_Cleaned/",exist_ok=True)
df_cases.to_csv('Cases_Cleaned/ML_cases_vcd.csv', index = False)



In [27]:
# Save df_deaths as csv file.

os.makedirs("Deaths_Cleaned/",exist_ok=True)
df_deaths.to_csv('Deaths_Cleaned/ML_deaths_vcd.csv', index = False)



In [28]:
#results database information

casesfile_id = f"ML_cases_vcd.csv_{run_counter}"
deathsfile_id = f"ML_deaths_vcd.csv_{run_counter}"

## MACHINE LEARNING

### FIRST MODEL

TITLE: cases

MODEL: RandomForest

FILE:  Cases_Cleaned/ML_cases_vcd.csv


In [29]:
#results database information

type_model_cases = "Random Forest"
name_model_cases = "cases"

In [30]:
# Initial imports.
import pandas as pd
from path import Path
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

In [31]:
# Loading data
file_path = Path("Cases_Cleaned/ML_cases_vcd.csv")
df_cases = pd.read_csv(file_path)
df_cases.head()

Unnamed: 0,year,mmwr_week,distributed,administered,location_AK,location_AL,location_AR,location_AZ,location_CA,location_CO,...,location_TX,location_UT,location_VA,location_VI,location_VT,location_WA,location_WI,location_WV,location_WY,2020_mean_cases
0,2020,51,26325,1607,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
1,2020,52,45250,11427,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
2,2021,53,54975,18401,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
3,2021,1,92875,28539,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
4,2021,2,149650,54193,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0


In [32]:
# Define the features set.
X = df_cases.copy()
X = X.drop("2020_mean_cases", axis=1)
X.head()

Unnamed: 0,year,mmwr_week,distributed,administered,location_AK,location_AL,location_AR,location_AZ,location_CA,location_CO,...,location_TN,location_TX,location_UT,location_VA,location_VI,location_VT,location_WA,location_WI,location_WV,location_WY
0,2020,51,26325,1607,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2020,52,45250,11427,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2021,53,54975,18401,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,2021,1,92875,28539,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2021,2,149650,54193,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [33]:
# Define the target set.
y = df_cases["2020_mean_cases"].ravel()


In [34]:
# Splitting into Train and Test sets.
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=78)

In [35]:
# Creating a StandardScaler instance.
scaler = StandardScaler()
# Fitting the Standard Scaler with the training data.
X_scaler = scaler.fit(X_train)

# Scaling the data.
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)

In [36]:
# set input deck to be used for both cases and deaths

def input_deck(n):
    
# format is [n_estimators, random_state, criterion, max_depth, max_features, min_impurity_decrease, oob_score]

    rf_input = [
        
        [128, 78, 'gini', None, 'auto', 0.0, False],
        [128, 78, 'gini', None, 'auto', 0.0, True],
        [128, 78, 'entropy', None, 'auto', 0.0, False],
        [128, 78, 'entropy', None, 'auto', 0.0, True],
        [128, 78, 'gini', 10, 'sqrt', 0.0, False],
        [128, 78, 'gini', 10, 'sqrt', 0.0, True],
        [128, 78, 'entropy', 10, 'sqrt', 0.0, False],
        [128, 78, 'entropy', 10, 'sqrt', 0.0, True],
        [128, 78, 'gini', None, 'sqrt', 0.02, False],
        [128, 78, 'gini', None, 'sqrt', 0.02, True],
        [128, 78, 'entropy', None, 'sqrt', 0.02, False],
        [128, 78, 'entropy', None, 'sqrt', 0.02, True],
        [128, 78, 'entropy', 10, 'sqrt', 0.0, True],
        [128, 78, 'gini', None, 'sqrt', 0.5, False],
        [128, 78, 'gini', None, 'sqrt', 0.5, True],
        [128, 78, 'entropy', None, 'sqrt', 0.5, True],
        [128, 78, 'entropy', None, 'sqrt', 0.5, True]
        
    ]
    
    rf_input_params = rf_input[n]
    
    return rf_input_params

def input_params(n):

    n_estimators = input_deck(n)[0]
    random_state = input_deck(n)[1]
    criterion = input_deck(n)[2]
    max_depth = input_deck(n)[3]
    max_features = input_deck(n)[4]
    min_impurity_decrease = input_deck(n)[5]
    oob_score = input_deck(n)[6]
    
    return n_estimators, random_state, criterion, max_depth, max_features, min_impurity_decrease, oob_score 

    


In [73]:
# set the input parameters

n_estimators, random_state, criterion, max_depth, max_features, min_impurity_decrease, oob_score = input_params(1)

print("input deck = " + f"{n_estimators}, {random_state}, {criterion}, {max_depth}, {max_features}, "
          f"{min_impurity_decrease}, {oob_score}")

input deck = 128, 78, gini, None, auto, 0.0, True


In [74]:
# Create a random forest classifier.
rf_model = RandomForestClassifier(n_estimators=n_estimators, random_state=random_state,
                                  criterion=criterion,
                                  max_depth=max_depth, max_features =max_features,
                                  min_impurity_decrease = min_impurity_decrease,
                                 oob_score = oob_score) 


#results database information

run_dt = pd.to_datetime('now').strftime('%Y-%m-%d %H:%M:%S')

In [75]:
#results database information

#parameter names used in the arguments
# n_estimators=128
# random_state=78
# criterion = 'gini' or 'entropy'
# max_depth = None or 10
# max_features = 'auto' or 'sqrt'
# min_impurity_decrease = 0.0 or a fraction
# oob_score = False or True

rf_pars = rf_model.get_params()
rf_n_estimators = rf_pars['n_estimators']
rf_random_state = rf_pars['random_state']
rf_criterion = rf_pars['criterion']
rf_max_depth = rf_pars['max_depth']
rf_max_features = rf_pars['max_features']
rf_min_impurity_decrease = rf_pars['min_impurity_decrease']
rf_oob_score = rf_pars['oob_score']


par_name_1 = f"n_estimators={rf_n_estimators}"
par_name_2 = f"random_state={rf_random_state}"
par_name_3 = f"criterion={rf_criterion}"
par_name_4 = f"max_depth={rf_max_depth}"
par_name_5 = f"max_features={rf_max_features}"
par_name_6 = f"max_depth={rf_min_impurity_decrease}"
par_name_7 = f"max_features={rf_oob_score}"

par_name_6


'max_depth=0.0'

In [76]:
# Fitting the model
rf_model = rf_model.fit(X_train_scaled, y_train)

In [77]:
# Making predictions using the testing data.
predictions = rf_model.predict(X_test_scaled)

In [78]:
# Calculating the confusion matrix.
cm = confusion_matrix(y_test, predictions)

# Create a DataFrame from the confusion matrix.
cm_df = pd.DataFrame(
    cm, index=["Actual 0", "Actual 1"], columns=["Predicted 0", "Predicted 1"])

cm_df

Unnamed: 0,Predicted 0,Predicted 1
Actual 0,177,3
Actual 1,0,418


In [79]:
#results database information

CM_A0P0_cases= cm_df.loc["Actual 0", "Predicted 0"]
CM_A0P1_cases= cm_df.loc["Actual 0", "Predicted 1"]
CM_A1P0_cases= cm_df.loc["Actual 1", "Predicted 0"]
CM_A1P1_cases= cm_df.loc["Actual 1", "Predicted 1"]


In [80]:
# Calculating the accuracy score.
acc_score = accuracy_score(y_test, predictions)

In [81]:
#results database information

acc_score_cases = acc_score

In [82]:
# Displaying results
print("Confusion Matrix")
display(cm_df)
print(f"Accuracy Score : {acc_score}")
print("Classification Report")
rep = classification_report(y_test, predictions)
print(classification_report(y_test, predictions))

Confusion Matrix


Unnamed: 0,Predicted 0,Predicted 1
Actual 0,177,3
Actual 1,0,418


Accuracy Score : 0.9949832775919732
Classification Report
              precision    recall  f1-score   support

           0       1.00      0.98      0.99       180
           1       0.99      1.00      1.00       418

    accuracy                           0.99       598
   macro avg       1.00      0.99      0.99       598
weighted avg       1.00      0.99      0.99       598



In [83]:
#results database information
from sklearn import metrics

def get_classification_report(y_test, y_pred):
    # Source: https://
    # stackoverflow.com/questions/39662398/scikit-learn-output-metrics-classification-report-into-csv-
    # tab-delimited-format
    report = metrics.classification_report(y_test, y_pred, output_dict=True)
    df_classification_report = pd.DataFrame(report).transpose()
    df_classification_report = df_classification_report.sort_values(by=['f1-score'], ascending=False)
    return df_classification_report

CR_cases_df = get_classification_report(y_test, predictions)

CR_P0_cases = CR_cases_df.loc['0', 'precision']
CR_P1_cases = CR_cases_df.loc['1', 'precision']
CR_R0_cases = CR_cases_df.loc['0', 'recall']
CR_R1_cases = CR_cases_df.loc['1', 'recall']
CR_f1_0_cases = CR_cases_df.loc['0', 'f1-score']
CR_f1_1_cases = CR_cases_df.loc['1', 'f1-score']

CR_cases_df

Unnamed: 0,precision,recall,f1-score,support
1,0.992874,1.0,0.996424,418.0
accuracy,0.994983,0.994983,0.994983,0.994983
weighted avg,0.995019,0.994983,0.994971,598.0
macro avg,0.996437,0.991667,0.99401,598.0
0,1.0,0.983333,0.991597,180.0


In [84]:
# sort the features by their importance.
imp_list = sorted(zip(rf_model.feature_importances_, X.columns), reverse=True)
df_importance_cases = pd.DataFrame(imp_list)
df_importance_cases.rename(columns = {0 :'Importance_cases'}, inplace = True)
df_importance_cases.rename(columns = {1 :'Feature_cases'}, inplace = True)
df_importance_cases['notebook'] = run_nb['notebook']


# ********** new code

df_importance_cases['run_dt'] = run_dt
cols_imp = ['notebook', 'run_dt', 'Feature_cases', 'Importance_cases']

# ***********


# cols_imp = ['notebook', 'Feature_cases', 'Importance_cases']
df_importance_cases = df_importance_cases.reindex(columns = cols_imp)
df_importance_cases.head()

Unnamed: 0,notebook,run_dt,Feature_cases,Importance_cases
0,2,2021-11-10 21:37:50,distributed,0.287286
1,2,2021-11-10 21:37:50,administered,0.177247
2,2,2021-11-10 21:37:50,mmwr_week,0.100095
3,2,2021-11-10 21:37:50,location_UT,0.037901
4,2,2021-11-10 21:37:50,location_HI,0.024842


## MACHINE LEARNING

### FIRST MODEL

TITLE: deaths

MODEL: RandomForest

FILE:  Cases_Cleaned/ML_deaths_vcd.csv

In [85]:
#results database information

model_id = 1

In [86]:
#results database information

type_model_deaths = "Random Forest"
name_model_deaths = "deaths"

In [87]:
# Loading data
file_path = Path("Deaths_Cleaned/ML_deaths_vcd.csv")
df_deaths = pd.read_csv(file_path)
df_deaths.head()

Unnamed: 0,year,mmwr_week,distributed,administered,location_AK,location_AL,location_AR,location_AZ,location_CA,location_CO,...,location_TX,location_UT,location_VA,location_VI,location_VT,location_WA,location_WI,location_WV,location_WY,2020_mean_deaths
0,2020,51,26325,1607,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
1,2020,52,45250,11427,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
2,2021,53,54975,18401,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
3,2021,1,92875,28539,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
4,2021,2,149650,54193,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0


In [88]:
# Define the features set.
X = df_deaths.copy()
X = X.drop("2020_mean_deaths", axis=1)
X.head()

Unnamed: 0,year,mmwr_week,distributed,administered,location_AK,location_AL,location_AR,location_AZ,location_CA,location_CO,...,location_TN,location_TX,location_UT,location_VA,location_VI,location_VT,location_WA,location_WI,location_WV,location_WY
0,2020,51,26325,1607,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2020,52,45250,11427,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2021,53,54975,18401,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,2021,1,92875,28539,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2021,2,149650,54193,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [89]:
# Define the target set.
y = df_deaths["2020_mean_deaths"].ravel()


In [90]:
# Splitting into Train and Test sets.
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=78)

In [91]:
# Creating a StandardScaler instance.
scaler = StandardScaler()
# Fitting the Standard Scaler with the training data.
X_scaler = scaler.fit(X_train)

# Scaling the data.
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)

In [92]:
# Create a random forest classifier.

rf_model = RandomForestClassifier(n_estimators=n_estimators, random_state=random_state,
                                  criterion=criterion,
                                  max_depth=max_depth, max_features =max_features,
                                  min_impurity_decrease = min_impurity_decrease,
                                 oob_score = oob_score) 

In [93]:
# Fitting the model
rf_model = rf_model.fit(X_train_scaled, y_train)

In [94]:
# Making predictions using the testing data.
predictions = rf_model.predict(X_test_scaled)

In [95]:
# Calculating the confusion matrix.
cm = confusion_matrix(y_test, predictions)

# Create a DataFrame from the confusion matrix.
cm_df = pd.DataFrame(
    cm, index=["Actual 0", "Actual 1"], columns=["Predicted 0", "Predicted 1"])

cm_df

Unnamed: 0,Predicted 0,Predicted 1
Actual 0,177,3
Actual 1,0,418


In [96]:
#results database information

CM_A0P0_death= cm_df.loc["Actual 0", "Predicted 0"]
CM_A0P1_death= cm_df.loc["Actual 0", "Predicted 1"]
CM_A1P0_death= cm_df.loc["Actual 1", "Predicted 0"]
CM_A1P1_death= cm_df.loc["Actual 1", "Predicted 1"]

In [97]:
# Calculating the accuracy score.
acc_score = accuracy_score(y_test, predictions)

In [98]:
#results database information

acc_score_death = acc_score

In [99]:
# Displaying results
print("Confusion Matrix")
display(cm_df)
print(f"Accuracy Score : {acc_score}")
print("Classification Report")
print(classification_report(y_test, predictions))

Confusion Matrix


Unnamed: 0,Predicted 0,Predicted 1
Actual 0,177,3
Actual 1,0,418


Accuracy Score : 0.9949832775919732
Classification Report
              precision    recall  f1-score   support

           0       1.00      0.98      0.99       180
           1       0.99      1.00      1.00       418

    accuracy                           0.99       598
   macro avg       1.00      0.99      0.99       598
weighted avg       1.00      0.99      0.99       598



In [100]:
#results database information
from sklearn import metrics

def get_classification_report(y_test, y_pred):
    # Source: https://
    # stackoverflow.com/questions/39662398/scikit-learn-output-metrics-classification-report-into-csv-
    # tab-delimited-format
    report = metrics.classification_report(y_test, y_pred, output_dict=True)
    df_classification_report = pd.DataFrame(report).transpose()
    df_classification_report = df_classification_report.sort_values(by=['f1-score'], ascending=False)
    return df_classification_report

CR_death_df = get_classification_report(y_test, predictions)

CR_P0_death = CR_death_df.loc['0', 'precision']
CR_P1_death = CR_death_df.loc['1', 'precision']
CR_R0_death = CR_death_df.loc['0', 'recall']
CR_R1_death = CR_death_df.loc['1', 'recall']
CR_f1_0_death = CR_death_df.loc['0', 'f1-score']
CR_f1_1_death = CR_death_df.loc['1', 'f1-score']


CR_death_df

Unnamed: 0,precision,recall,f1-score,support
1,0.992874,1.0,0.996424,418.0
accuracy,0.994983,0.994983,0.994983,0.994983
weighted avg,0.995019,0.994983,0.994971,598.0
macro avg,0.996437,0.991667,0.99401,598.0
0,1.0,0.983333,0.991597,180.0


In [101]:
# sort the features by their importance.
imp_list = sorted(zip(rf_model.feature_importances_, X.columns), reverse=True)
df_importance_death = pd.DataFrame(imp_list)
df_importance_death.rename(columns = {0 :'Importance_death'}, inplace = True)
df_importance_death.rename(columns = {1 :'Feature_death'}, inplace = True)
df_importance_death['notebook'] = run_nb['notebook']


# ************** new code

df_importance_death['run_dt'] = run_dt
cols_imp = ['notebook', 'run_dt', 'Feature_death', 'Importance_death']

# ****************


# cols_imp = ['notebook', 'Feature_death', 'Importance_death']
df_importance_death = df_importance_death.reindex(columns = cols_imp)
df_importance_death

Unnamed: 0,notebook,run_dt,Feature_death,Importance_death
0,2,2021-11-10 21:37:50,distributed,0.287286
1,2,2021-11-10 21:37:50,administered,0.177247
2,2,2021-11-10 21:37:50,mmwr_week,0.100095
3,2,2021-11-10 21:37:50,location_UT,0.037901
4,2,2021-11-10 21:37:50,location_HI,0.024842
5,2,2021-11-10 21:37:50,location_ME,0.023657
6,2,2021-11-10 21:37:50,location_NH,0.023369
7,2,2021-11-10 21:37:50,location_VI,0.022846
8,2,2021-11-10 21:37:50,location_DE,0.020916
9,2,2021-11-10 21:37:50,location_AK,0.020422


## PostgresSQL Database

### Database to hold machine learning results

#### Version:  results_rev0

### CREATE 4 DATAFRAMES FOR IMPORTING INTO POSTGRESQL DATABASE

In [102]:
if run_counter == 1:

    
# df_model
   
    name_nb_dict = {"name_nb":name_nb}
    run_dt_dict = {"run_dt":run_dt}
    run_nb_dict = run_nb
    source_db_dict = {"source_db":source_db}
    file_id_dict = {"file_id":file_id}
    model_id_dict = {"model_id":model_id}
    type_model_cases_dict = {"type_model_cases":type_model_cases}
    type_model_deaths_dict = {"type_model_deaths":type_model_deaths}
    name_model_cases_dict = {"name_model_cases":name_model_cases}
    name_model_deaths_dict = {"name_model_deaths":name_model_deaths}
    par_name_1_dict = {"par_name_1":par_name_1}
    par_name_2_dict = {"par_name_2":par_name_2}
    par_name_3_dict = {"par_name_3":par_name_3}
    par_name_4_dict = {"par_name_4":par_name_4}
    par_name_5_dict = {"par_name_5":par_name_5}
    par_name_6_dict = {"par_name_6":par_name_6}
    par_name_7_dict = {"par_name_7":par_name_7}
    casesfile_id_dict = {"casesfile_id":casesfile_id}
    deathsfile_id_dict = {"deathsfile_id":deathsfile_id}
    mean_cases_dict = {"mean_cases_value":mean_cases_value}
    mean_deaths_dict = {"mean_deaths_value":mean_deaths_value}
    
    data = [run_nb_dict,name_nb_dict, run_dt_dict, source_db_dict, file_id_dict, model_id_dict,
            type_model_cases_dict, type_model_deaths_dict,  name_model_cases_dict, name_model_deaths_dict,
            par_name_1_dict, par_name_2_dict, par_name_3_dict, par_name_4_dict, par_name_5_dict,
            par_name_6_dict, par_name_7_dict, casesfile_id_dict, deathsfile_id_dict,
            mean_cases_dict,mean_deaths_dict]
    
    data_merged = {}
    for x in data:
        data_merged.update(x)
    data_list = [data_merged]
    
    df_model = pd.DataFrame(data_list)
    
    
# df_model_results

    results_dict ={

        'notebook': run_nb_dict['notebook'],
        'run_dt':run_dt_dict['run_dt'],
        'CM_A0P0_cases':CM_A0P0_cases,
        'CM_A0P1_cases':CM_A0P1_cases,
        'CM_A1P0_cases':CM_A1P0_cases,
        'CM_A1P1_cases':CM_A1P1_cases,
        'CM_A0P0_death':CM_A0P0_death,
        'CM_A0P1_death':CM_A0P1_death,
        'CM_A1P0_death':CM_A1P0_death,
        'CM_A1P1_death':CM_A1P1_death,
        'acc_score_cases':acc_score_cases,
        'acc_score_death':acc_score_death,
        'CR_P0_cases':CR_P0_cases,
        'CR_P1_cases':CR_P1_cases,
        'CR_R0_cases':CR_R0_cases,
        'CR_R1_cases':CR_R1_cases,
        'CR_f1_0_cases':CR_f1_0_cases,
        'CR_f1_1_cases':CR_f1_1_cases,
        'CR_P0_death':CR_P0_death,
        'CR_P1_death':CR_P1_death,
        'CR_R0_death':CR_R0_death,
        'CR_R1_death':CR_R1_death,
        'CR_f1_0_death':CR_f1_0_death,
        'CR_f1_1_death':CR_f1_1_death

    }

    results_list = [results_dict]
    df_model_results = pd.DataFrame(results_list)

# df_model_importances

    df_model_importances = pd.merge(df_importance_cases, df_importance_death, left_index =True, right_index=True)
    df_model_importances.drop(columns=["notebook_y", "Feature_death"], inplace = True)
    df_model_importances.rename(columns = {'notebook_x':'notebook','Feature_cases':"Feature"}, inplace = True)
    
    
# *********** new code

    df_model_importances.drop(columns = ["run_dt_y"], inplace = True)
    df_model_importances.rename(columns = {'run_dt_x':'run_dt'}, inplace = True)


# *********************

    

# initialize the new dataframes

    df_model_new = df_model.copy()
    df_model_results_new = df_model_results.copy()
    df_model_importances_new = df_model_importances.copy()

# saved copies for resetting the dataframes

    df_model_first_run = df_model.copy()
    df_model_results_first_run = df_model_results.copy()
    df_model_importances_first_run = df_model_importances.copy()
        
else:
    
# dataframes for run_counter > 1

# df_model

   
    name_nb_dict = {"name_nb":name_nb}
    run_dt_dict = {"run_dt":run_dt}
    run_nb_dict = run_nb
    source_db_dict = {"source_db":source_db}
    file_id_dict = {"file_id":file_id}
    model_id_dict = {"model_id":model_id}
    type_model_cases_dict = {"type_model_cases":type_model_cases}
    type_model_deaths_dict = {"type_model_deaths":type_model_deaths}
    name_model_cases_dict = {"name_model_cases":name_model_cases}
    name_model_deaths_dict = {"name_model_deaths":name_model_deaths}
    par_name_1_dict = {"par_name_1":par_name_1}
    par_name_2_dict = {"par_name_2":par_name_2}
    par_name_3_dict = {"par_name_3":par_name_3}
    par_name_4_dict = {"par_name_4":par_name_4}
    par_name_5_dict = {"par_name_5":par_name_5}
    par_name_6_dict = {"par_name_6":par_name_6}
    par_name_7_dict = {"par_name_7":par_name_7}
    casesfile_id_dict = {"casesfile_id":casesfile_id}
    deathsfile_id_dict = {"deathsfile_id":deathsfile_id}
    mean_cases_dict = {"mean_cases_value":mean_cases_value}
    mean_deaths_dict = {"mean_deaths_value":mean_deaths_value}
    
    data = [run_nb_dict,name_nb_dict, run_dt_dict, source_db_dict, file_id_dict, model_id_dict,
            type_model_cases_dict, type_model_deaths_dict,  name_model_cases_dict, name_model_deaths_dict,
            par_name_1_dict, par_name_2_dict, par_name_3_dict, par_name_4_dict, par_name_5_dict,
            par_name_6_dict, par_name_7_dict, casesfile_id_dict, deathsfile_id_dict,
            mean_cases_dict,mean_deaths_dict]
    
    data_merged = {}
    for x in data:
        data_merged.update(x)
    data_list = [data_merged]
    
    df_model = pd.DataFrame(data_list)
    

# df_model_results

    
    results_dict ={

        'notebook': run_nb_dict['notebook'],
        'run_dt':run_dt_dict['run_dt'],
        'CM_A0P0_cases':CM_A0P0_cases,
        'CM_A0P1_cases':CM_A0P1_cases,
        'CM_A1P0_cases':CM_A1P0_cases,
        'CM_A1P1_cases':CM_A1P1_cases,
        'CM_A0P0_death':CM_A0P0_death,
        'CM_A0P1_death':CM_A0P1_death,
        'CM_A1P0_death':CM_A1P0_death,
        'CM_A1P1_death':CM_A1P1_death,
        'acc_score_cases':acc_score_cases,
        'acc_score_death':acc_score_death,
        'CR_P0_cases':CR_P0_cases,
        'CR_P1_cases':CR_P1_cases,
        'CR_R0_cases':CR_R0_cases,
        'CR_R1_cases':CR_R1_cases,
        'CR_f1_0_cases':CR_f1_0_cases,
        'CR_f1_1_cases':CR_f1_1_cases,
        'CR_P0_death':CR_P0_death,
        'CR_P1_death':CR_P1_death,
        'CR_R0_death':CR_R0_death,
        'CR_R1_death':CR_R1_death,
        'CR_f1_0_death':CR_f1_0_death,
        'CR_f1_1_death':CR_f1_1_death

    }

    results_list = [results_dict]
    df_model_results = pd.DataFrame(results_list)


# df_model_importances

    df_model_importances = pd.merge(df_importance_cases, df_importance_death, left_index =True, right_index=True)
    df_model_importances.drop(columns=["notebook_y", "Feature_death"], inplace = True)
    df_model_importances.rename(columns = {'notebook_x':'notebook','Feature_cases':"Feature"}, inplace = True)
    
    
# *********** new code

    df_model_importances.drop(columns = ["run_dt_y"], inplace = True)
    df_model_importances.rename(columns = {'run_dt_x':'run_dt'}, inplace = True)


# *********************

    

# concat dataframes

    df_model_new = pd.concat([df_model_new, df_model], ignore_index = True)
    # df_set_stats_new = pd.concat([df_set_stats_new, df_set_stats], ignore_index = True)
    df_model_results_new = pd.concat([df_model_results_new, df_model_results],ignore_index = True)
    df_model_importances_new = pd.concat([df_model_importances_new, df_model_importances], ignore_index = True)


# the 3 dataframes to be put into PostgresSql:
#
# df_model_new
# df_model_results_new
# df_model_importances_new
#
#
#
# df_set_stats_new is NOT used in this notebook



In [106]:
  df_model_importances_new

Unnamed: 0,notebook,run_dt,Feature,Importance_cases,Importance_death
0,1,2021-11-10 21:32:16,distributed,0.182235,0.287286
1,1,2021-11-10 21:32:16,administered,0.117482,0.177247
2,1,2021-11-10 21:32:16,location_HI,0.101488,0.100095
3,1,2021-11-10 21:32:16,location_DC,0.085096,0.037901
4,1,2021-11-10 21:32:16,location_VT,0.080171,0.024842
...,...,...,...,...,...
107,2,2021-11-10 21:37:50,location_OH,0.000419,0.000419
108,2,2021-11-10 21:37:50,location_IL,0.000387,0.000387
109,2,2021-11-10 21:37:50,location_CA,0.000348,0.000348
110,2,2021-11-10 21:37:50,location_NY,0.000271,0.000271


### Import the dataframes into PostgreSQL tables
### after completing all desired runs

In [793]:
from sqlalchemy import create_engine

In [794]:
from config import db_password

In [795]:
db_string = f"postgresql://postgres:{db_password}@127.0.0.1:5432/MLmodels"

In [796]:
engine = create_engine(db_string)

In [802]:
# df_model_new.to_sql(name='mlinputs', con=engine)
#df_set_stats_new.to_sql(name='mlsetstats', con=engine)
#df_model_results_new.to_sql(name = 'rfresults', con=engine)
#df_model_importances_new.to_sql(name = 'rfimportances', con = engine)

## STOP BEFORE RUNNING THE NEXT CELL

In [120]:
# RESET OF DATAFRAMES TO THE FIRST RUN.  ONLY REST IF NEEDED.
#
# ARE YOU SURE YOU WANT TO RESET?
# ALL RUNS AFTER THE FIRST WILL BE GONE!



reset_dataframes = False
if reset_dataframes == True:
    df_model_new = df_model_first_run
    df_model_results_new = df_model_results_first_run
    df_model_importances_new = df_model_importances_first_run 
    


## Activities

1)  Define data sets
    a.  year 2020 (Xa)
    b.  both years 2020 and 2021 (Xb)
    
2)  Create statistics for dataset Xa and for dataset Xb

3)  Database information for df_model <br>
    a.  name of the notebook (name_nb)<br>
    b.  datetime of run (run_dt)<br>
    c.  identifier for run of the notebook (run_nb)<br>
    d.  source of database (source_db)<br>
    e.  identifier for database csv file cleaned (file_id)<br>
    f.  model identifier (model_id)
    g.  type of model (type_model_cases, type_model_deaths)<br>
    h.  name of model (name_model_cases, name_model_deaths)<br>
    h.1 model parameter name (par_name_1)
    h.2 model parameter name (par_name_2)
    h.3 model parameter name (par_name_3)
    h.4 model parameter name (par_name_4)
    i.  identifier for ML_cases.csv used (casesfile_id)<br>
    j.  identifier for ML_deaths.csv used (deathsfile_id)<br>
    k.  statistics from Xa (stats_Xa_cases_ds, stats_Xa_death_ds)<br>
    l.  statistics from Xb (stats_Xb_cases_ds, stats_Xb_death_ds)<br>
    m.  name of the statistics dataset used for the label column (name_statsfile)<br>
    n.  the statistic used for the setting the label column (name_statistic)<br>
    o.  model's pandas dataframe (df_model)<br>
    
    Database information for df_model_results
    a. cm_df cases and deaths
    b. acc scor for cases and deaths
    c. classification report for cases and deaths
    d. feature importances for cases and deaths
    
    
