# <u> Introduction </u>

This notebook serves as a 'sandbox' for all health data in relation to the "City of Grand-Rapids" project.

It **will** include all the datasets, and can additionally include any of the following:

*   Exploratory Data Analysis
*   Visualizations
*   Data Transformations
*   Machine Learning Models

**Created by Jimmy Gray-Jones, using Google Colab**


# <u> Importing Libraries & Datasets </u>

## <u> Libraries </u>

In [None]:
#For numerical analysis and number generation
import numpy as np

#For data manipulation
import pandas as pd

#For data visualization
import matplotlib.pyplot as plt
import seaborn as sns

## Datasets

Datasets are imported to this notebook directly from the github raw files

In [None]:
url = 'https://raw.githubusercontent.com/lafeirjo/City_Of_Grand_Rapids_Social_Impact/main/Datasets/Health%20Data/Raw%20Datasets/HE_-_Percent_of_Children_with_Elevated_Blood_Lead_Levels_20240127.csv'
HE__Percent_of_Children_with_Elevated_Blood_Lead_Levels_20240127 = pd.read_csv(url)

url = 'https://raw.githubusercontent.com/lafeirjo/City_Of_Grand_Rapids_Social_Impact/main/Datasets/Health%20Data/Raw%20Datasets/Health%20Care%20Diversity.csv'
Health_Care_Diversity = pd.read_csv(url)

url = 'https://raw.githubusercontent.com/lafeirjo/City_Of_Grand_Rapids_Social_Impact/main/Datasets/Health%20Data/Raw%20Datasets/Health_Care_Diversity_Age_Range.csv'
Health_Care_Diversity_Age_Range = pd.read_csv(url)

url = 'https://raw.githubusercontent.com/lafeirjo/City_Of_Grand_Rapids_Social_Impact/main/Datasets/Health%20Data/Raw%20Datasets/Patient%20to%20Clinician%20Ratios.csv'
Patient_to_Clinician_Ratios = pd.read_csv(url)

url = 'https://raw.githubusercontent.com/lafeirjo/City_Of_Grand_Rapids_Social_Impact/main/Datasets/Health%20Data/Raw%20Datasets/Patient%20to%20Dentist%20Ratios.csv'
Patient_to_Dentist_Ratios = pd.read_csv(url)

url = 'https://raw.githubusercontent.com/lafeirjo/City_Of_Grand_Rapids_Social_Impact/main/Datasets/Health%20Data/Raw%20Datasets/Patient%20to%20Other_Clinician%20Ratios.csv'
Patient_to_Other_Clinician_Ratios = pd.read_csv(url)

url = 'https://raw.githubusercontent.com/lafeirjo/City_Of_Grand_Rapids_Social_Impact/main/Datasets/Health%20Data/Raw%20Datasets/Patient_to_Mental_Health_Professional_Ratios.csv'
Patient_to_Mental_Health_Professional_Ratios = pd.read_csv(url)

url = 'https://raw.githubusercontent.com/lafeirjo/City_Of_Grand_Rapids_Social_Impact/main/Datasets/Health%20Data/Raw%20Datasets/SC_-_Number_of_Serious_Injuries_and_Fatalities_by_Mode_20240122.csv'
SC__Number_of_Serious_Injuries_and_Fatalities_by_Mode_20240122 = pd.read_csv(url)

url = 'https://raw.githubusercontent.com/lafeirjo/City_Of_Grand_Rapids_Social_Impact/main/Datasets/Health%20Data/Raw%20Datasets/Uninsured%20People.csv'
Uninsured_People = pd.read_csv(url)

## Displaying singular row of each Pandas Dataframe

In [None]:
HE__Percent_of_Children_with_Elevated_Blood_Lead_Levels_20240127.head(1)

Unnamed: 0,Geo,Group,Date,Year,Population,Tested,Pct_Tested,EBLL,Pct_EBLL,CEBLL,...,Ven_5_9,Pct_Ven_5_9,Ven_10_14,Pct_Ven_10_14,Ven_15_19,Pct_Ven_15_19,Ven_20_39,Pct_Ven_20_39,Ven_GTE_40,Pct_Ven_GTE_40
0,48006,Children 1-2,2017,2017,,20.0,,,,,...,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
Health_Care_Diversity.head(1)

Unnamed: 0,ID Year,Year,ID Health Coverage,Health Coverage,ID Gender,Gender,Health Insurance by Gender and Age,Geography,ID Geography,Slug Geography,Share
0,2021,2021,0,Private,0,Male,60804,"Grand Rapids, MI",16000US2634000,grand-rapids-mi,49.087739


In [None]:
Health_Care_Diversity_Age_Range.head(1)

Unnamed: 0,ID Year,Year,ID Kaiser Coverage,Kaiser Coverage,ID Age,Age,Health Insurance Policies,Geography,ID Geography,Slug Geography,Share
0,2021,2021,0,Medicaid,0,Under 18 Years,21059,"Grand Rapids, MI",16000US2634000,grand-rapids-mi,47.320405


In [None]:
Patient_to_Clinician_Ratios.head(1)

Unnamed: 0,ID Year,Year,Patient to Primary Care Physician Ratio,Patient to Primary Care Physician Ratio Data Source Years,Geography,ID Geography,Slug Geography
0,2022,2022,4542,2019,"Allegan County, MI",05000US26005,allegan-county-mi


In [None]:
Patient_to_Dentist_Ratios.head(1)

Unnamed: 0,ID Year,Year,Patient to Dentist Ratio,Patient to Dentist Ratio Data Source Years,Geography,ID Geography,Slug Geography
0,2022,2022,2973,2020,"Allegan County, MI",05000US26005,allegan-county-mi


In [None]:
Patient_to_Other_Clinician_Ratios.head(1)

Unnamed: 0,ID Year,Year,Other Primary Care Providers,Other Primary Care Providers Data Source Years,Geography,ID Geography,Slug Geography
0,2022,2022,2766,2021,"Allegan County, MI",05000US26005,allegan-county-mi


In [None]:
SC__Number_of_Serious_Injuries_and_Fatalities_by_Mode_20240122.head(1)

Unnamed: 0,Calendar Year,Date,Mode,Crash Level,Location,Gender,# of Persons,% of all Fatalities,% of all Injuries
0,2015,12/31/2015 12:00:00 AM,All,K (Fatality),All,Male,6,,


In [None]:
Uninsured_People.head(1)

Unnamed: 0,ID Kaiser Coverage,Kaiser Coverage,ID Year,Year,Health Insurance Policies,Geography,ID Geography,Slug Geography,share
0,0,Medicaid,2021,2021,44503,"Grand Rapids, MI",16000US2634000,grand-rapids-mi,22.8517


## General Information about Datasets

In [None]:
#HE__Percent_of_Children_with_Elevated_Blood_Lead_Levels_20240127
#Health_Care_Diversity
#Health_Care_Diversity_Age_Range
#Patient_to_Clinician_Ratios
#Patient_to_Dentist_Ratios
#Patient_to_Other_Clinician_Ratios
#SC__Number_of_Serious_Injuries_and_Fatalities_by_Mode_20240122
#Uninsured_People = pd.read_csv(url)

In [None]:
#Displaying all of the column names of each dataset
print(HE__Percent_of_Children_with_Elevated_Blood_Lead_Levels_20240127.columns)
print('')
print(Health_Care_Diversity.columns)
print('')
print(Health_Care_Diversity_Age_Range.columns)
print('')
print(Patient_to_Clinician_Ratios.columns)
print('')
print(Patient_to_Dentist_Ratios.columns)
print('')
print(Patient_to_Other_Clinician_Ratios.columns)
print('')
print(SC__Number_of_Serious_Injuries_and_Fatalities_by_Mode_20240122.columns)
print('')
print(Uninsured_People.columns)

Index(['Geo', 'Group', 'Date', 'Year', 'Population', 'Tested', 'Pct_Tested',
       'EBLL', 'Pct_EBLL', 'CEBLL', 'Pct_CEBLL', 'VEBLL', 'Pct_VEBLL',
       'Ven_5_9', 'Pct_Ven_5_9', 'Ven_10_14', 'Pct_Ven_10_14', 'Ven_15_19',
       'Pct_Ven_15_19', 'Ven_20_39', 'Pct_Ven_20_39', 'Ven_GTE_40',
       'Pct_Ven_GTE_40'],
      dtype='object')

Index(['ID Year', 'Year', 'ID Health Coverage', 'Health Coverage', 'ID Gender',
       'Gender', 'Health Insurance by Gender and Age', 'Geography',
       'ID Geography', 'Slug Geography', 'Share'],
      dtype='object')

Index(['ID Year', 'Year', 'ID Kaiser Coverage', 'Kaiser Coverage', 'ID Age',
       'Age', 'Health Insurance Policies', 'Geography', 'ID Geography',
       'Slug Geography', 'Share'],
      dtype='object')

Index(['ID Year', 'Year', 'Patient to Primary Care Physician Ratio',
       'Patient to Primary Care Physician Ratio Data Source Years',
       'Geography', 'ID Geography', 'Slug Geography'],
      dtype='object')

Index(['ID Year

In [None]:
print("Shape of HE__Percent_of_Children_with_Elevated_Blood_Lead_Levels_20240127:",HE__Percent_of_Children_with_Elevated_Blood_Lead_Levels_20240127.shape)
print('')
print("Shape of Health_Care_Diversity:",Health_Care_Diversity.shape)
print('')
print("Shape of Health_Care_Diversity_Age_Range:",Health_Care_Diversity_Age_Range.shape)
print('')
print("Shape of Patient_to_Clinician_Ratios:",Patient_to_Clinician_Ratios.shape)
print('')
print("Shape of Patient_to_Dentist_Ratios:",Patient_to_Dentist_Ratios.shape)
print('')
print("Shape of Patient_to_Other_Clinician_Ratios:",Patient_to_Other_Clinician_Ratios.shape)
print('')
print("Shape of SC__Number_of_Serious_Injuries_and_Fatalities_by_Mode_20240122",SC__Number_of_Serious_Injuries_and_Fatalities_by_Mode_20240122.shape)
print('')
print("Shape of Uninsured_People:",Uninsured_People.shape)

Shape of HE__Percent_of_Children_with_Elevated_Blood_Lead_Levels_20240127: (22416, 23)

Shape of Health_Care_Diversity: (36, 11)

Shape of Health_Care_Diversity_Age_Range: (216, 11)

Shape of Patient_to_Clinician_Ratios: (72, 7)

Shape of Patient_to_Dentist_Ratios: (72, 7)

Shape of Patient_to_Other_Clinician_Ratios: (72, 7)

Shape of SC__Number_of_Serious_Injuries_and_Fatalities_by_Mode_20240122 (312, 9)

Shape of Uninsured_People: (54, 9)


# <u>Data Engineering & Transformations </u>

In this section, I will be demonstrating all the changes I've made to the original datasets for the purposes of making the data easier to work with. Any changes in the original pandas dataframes will be supplemented by a new pandas dataframe with "_cleaned" attached at the end of it.

\\

If combining two or more dataframes, then it will have an entirely different name

\\

Although this section is used for cleaning the data, I will commonly return to the uncleaned data for more a more robust and truthful analysis. The "cleaned" data is in no-way meant to represent the final datasets we will be using, but rather data that is easier to work with

## Dropping NA values

For any dataset w/ NA values, I am dropping them.

These rows are not inherently useless, however it will allow me to see any specifically impactful data for certain datasets

In [None]:
#Removing all rows containing NA values
HE__Percent_of_Children_with_Elevated_Blood_Lead_Levels_20240127_cleaned = HE__Percent_of_Children_with_Elevated_Blood_Lead_Levels_20240127.dropna()

#Dropping two columns that only contained NA values
SC__Number_of_Serious_Injuries_and_Fatalities_by_Mode_20240122_cleaned = SC__Number_of_Serious_Injuries_and_Fatalities_by_Mode_20240122.drop(['% of all Fatalities','% of all Injuries'],1)

  SC__Number_of_Serious_Injuries_and_Fatalities_by_Mode_20240122_cleaned = SC__Number_of_Serious_Injuries_and_Fatalities_by_Mode_20240122.drop(['% of all Fatalities','% of all Injuries'],1)


In [None]:
HE__Percent_of_Children_with_Elevated_Blood_Lead_Levels_20240127_cleaned

Unnamed: 0,Geo,Group,Date,Year,Population,Tested,Pct_Tested,EBLL,Pct_EBLL,CEBLL,...,Ven_5_9,Pct_Ven_5_9,Ven_10_14,Pct_Ven_10_14,Ven_15_19,Pct_Ven_15_19,Ven_20_39,Pct_Ven_20_39,Ven_GTE_40,Pct_Ven_GTE_40
36,GRAND RAPIDS,Children < 6,2010,2010,18726.0,6558.0,0.350,746.0,0.114,541.0,...,139.0,0.021,42.0,0.006,12.0,0.002,12.0,0.002,0.0,0.0
38,GRAND RAPIDS,Children < 6,2012,2012,18427.0,6605.0,0.358,593.0,0.090,461.0,...,89.0,0.013,24.0,0.004,10.0,0.002,9.0,0.001,0.0,0.0
40,GRAND RAPIDS,Children < 6,2014,2014,19078.0,6377.0,0.334,400.0,0.063,281.0,...,78.0,0.012,21.0,0.003,12.0,0.002,8.0,0.001,0.0,0.0
42,GRAND RAPIDS,Children < 6,2016,2016,18297.0,6617.0,0.362,523.0,0.079,333.0,...,136.0,0.021,26.0,0.004,16.0,0.002,12.0,0.002,0.0,0.0
44,GRAND RAPIDS,Children < 6,2018,2018,17250.0,6049.0,0.351,271.0,0.045,100.0,...,127.0,0.021,23.0,0.004,9.0,0.001,12.0,0.002,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22368,49969,Children < 6,2012,2012,49.0,22.0,0.449,0.0,0.000,0.0,...,0.0,0.000,0.0,0.000,0.0,0.000,0.0,0.000,0.0,0.0
22370,49969,Children < 6,2014,2014,69.0,23.0,0.333,0.0,0.000,0.0,...,0.0,0.000,0.0,0.000,0.0,0.000,0.0,0.000,0.0,0.0
22371,49969,Children < 6,2015,2015,74.0,15.0,0.203,0.0,0.000,0.0,...,0.0,0.000,0.0,0.000,0.0,0.000,0.0,0.000,0.0,0.0
22372,49969,Children < 6,2016,2016,78.0,19.0,0.244,0.0,0.000,0.0,...,0.0,0.000,0.0,0.000,0.0,0.000,0.0,0.000,0.0,0.0


In [None]:
SC__Number_of_Serious_Injuries_and_Fatalities_by_Mode_20240122_cleaned

Unnamed: 0,Calendar Year,Date,Mode,Crash Level,Location,Gender,# of Persons
0,2015,12/31/2015 12:00:00 AM,All,K (Fatality),All,Male,6
1,2015,12/31/2015 12:00:00 AM,All,A (Serious Injury),All,Male,33
2,2015,12/31/2015 12:00:00 AM,All,K (Fatality),All,Female,0
3,2015,12/31/2015 12:00:00 AM,All,A (Serious Injury),All,Female,28
4,2015,12/31/2015 12:00:00 AM,All,K (Fatality),All,Gender Not Listed,0
...,...,...,...,...,...,...,...
307,2020,11/30/2020 12:00:00 AM,All,K (Fatality),Ward 3,All,2
308,2020,11/30/2020 12:00:00 AM,All,A (Serious Injury),Inside NoF,All,49
309,2020,11/30/2020 12:00:00 AM,All,K (Fatality),Inside NoF,All,5
310,2020,11/30/2020 12:00:00 AM,All,A (Serious Injury),Outside NoF,All,49


## Merging Data

Where applicable, I will merge datasets who have identical shapes and/or corresponding values that I can do a join on.

When merging data, I will create a completely new dataset name for it as opposed to matching the name of the original dataset

In [None]:
Health_Care_Diversity.head(1)

Unnamed: 0,ID Year,Year,ID Health Coverage,Health Coverage,ID Gender,Gender,Health Insurance by Gender and Age,Geography,ID Geography,Slug Geography,Share
0,2021,2021,0,Private,0,Male,60804,"Grand Rapids, MI",16000US2634000,grand-rapids-mi,49.087739


In [None]:
Health_Care_Diversity_Age_Range.head(1)

Unnamed: 0,ID Year,Year,ID Kaiser Coverage,Kaiser Coverage,ID Age,Age,Health Insurance Policies,Geography,ID Geography,Slug Geography,Share
0,2021,2021,0,Medicaid,0,Under 18 Years,21059,"Grand Rapids, MI",16000US2634000,grand-rapids-mi,47.320405


In [None]:
Health_Care_Diversity = Health_Care_Diversity.merge(Health_Care_Diversity_Age_Range,how='inner')
#Health_Care_Diversity = Health_Care_Diversity.drop(['ID Gender','ID Health Coverage','Slug Geography','ID Geography','Geography'],1)
Health_Care_Diversity

Unnamed: 0,ID Year,Year,ID Health Coverage,Health Coverage,ID Gender,Gender,Health Insurance by Gender and Age,Geography,ID Geography,Slug Geography,Share,ID Kaiser Coverage,Kaiser Coverage,ID Age,Age,Health Insurance Policies


In [None]:
#Merging Patient_to_Clinician ratios with Dentist ratios on the year column, to create a new data frame
patient_ratios = Patient_to_Clinician_Ratios.merge(Patient_to_Dentist_Ratios, how='inner')

#Merging the new data frame with Patient_to_Other_Clinician_Ratios
patient_ratios = patient_ratios.merge(Patient_to_Other_Clinician_Ratios, how='inner')

#OMITTING THIS FOR NOW
#V-----------------------------v
#Merging the new data frame with Patient_to_Mental_Health_Professional_Ratios
#patient_ratios = patient_ratios.merge(Patient_to_Mental_Health_Professional_Ratios, how='inner')
#^-----------------------------^

#Dropping redundant columns
patient_ratios = patient_ratios.drop(['Patient to Dentist Ratio Data Source Years','Other Primary Care Providers Data Source Years','Patient to Primary Care Physician Ratio Data Source Years','ID Year'],1)

patient_ratios

  patient_ratios = patient_ratios.drop(['Patient to Dentist Ratio Data Source Years','Other Primary Care Providers Data Source Years','Patient to Primary Care Physician Ratio Data Source Years','ID Year'],1)


Unnamed: 0,Year,Patient to Primary Care Physician Ratio,Geography,ID Geography,Slug Geography,Patient to Dentist Ratio,Other Primary Care Providers
0,2022,4542,"Allegan County, MI",05000US26005,allegan-county-mi,2973,2766
1,2022,2931,"Barry County, MI",05000US26015,barry-county-mi,3651,1320
2,2022,3806,"Ionia County, MI",05000US26067,ionia-county-mi,2391,1403
3,2022,1079,"Kent County, MI",05000US26081,kent-county-mi,1339,467
4,2022,2556,"Montcalm County, MI",05000US26117,montcalm-county-mi,2048,870
...,...,...,...,...,...,...,...
67,2014,1127,"Kent County, MI",05000US26081,kent-county-mi,1549,909
68,2014,2747,"Montcalm County, MI",05000US26117,montcalm-county-mi,2651,1467
69,2014,1416,"Muskegon County, MI",05000US26121,muskegon-county-mi,1949,1309
70,2014,2545,"Newaygo County, MI",05000US26123,newaygo-county-mi,2205,1845


# <u>Data Analysis & Exploration</u>

## Summary Statistics

Using the ".describe()" method from pandas, I am generating quick summary statistics on each dataset to save time and repition from calculating all of these

In [None]:
HE__Percent_of_Children_with_Elevated_Blood_Lead_Levels_20240127.describe()

Unnamed: 0,Date,Year,Population,Tested,Pct_Tested,EBLL,Pct_EBLL,CEBLL,Pct_CEBLL,VEBLL,...,Ven_5_9,Pct_Ven_5_9,Ven_10_14,Pct_Ven_10_14,Ven_15_19,Pct_Ven_15_19,Ven_20_39,Pct_Ven_20_39,Ven_GTE_40,Pct_Ven_GTE_40
count,22416.0,22416.0,8693.0,18483.0,7762.0,13698.0,13566.0,13999.0,13867.0,16510.0,...,17272.0,17140.0,20056.0,19924.0,20981.0,20849.0,21162.0,21030.0,22129.0,21997.0
mean,2014.938348,2014.938348,745.441735,143.167884,0.208564,6.415535,0.012784,2.253161,0.003293,2.303452,...,1.62251,0.002439,0.194256,0.000239,0.023831,1.9e-05,0.013751,1.3e-05,0.0,0.0
std,3.144642,3.144642,1112.828331,284.184706,0.142341,25.605627,0.032313,14.321974,0.012884,12.832104,...,9.498717,0.01299,1.722209,0.002247,0.472508,0.000453,0.346525,0.000411,0.0,0.0
min,2010.0,2010.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,2012.0,2012.0,90.0,21.0,0.135,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,2015.0,2015.0,308.0,55.0,0.183,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,2018.0,2018.0,1061.0,160.0,0.249,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,2020.0,2020.0,19078.0,6980.0,4.5,746.0,0.5,541.0,0.198,245.0,...,204.0,0.5,42.0,0.064,16.0,0.028,16.0,0.031,0.0,0.0


In [None]:
HE__Percent_of_Children_with_Elevated_Blood_Lead_Levels_20240127_cleaned.describe()

Unnamed: 0,Date,Year,Population,Tested,Pct_Tested,EBLL,Pct_EBLL,CEBLL,Pct_CEBLL,VEBLL,...,Ven_5_9,Pct_Ven_5_9,Ven_10_14,Pct_Ven_10_14,Ven_15_19,Pct_Ven_15_19,Ven_20_39,Pct_Ven_20_39,Ven_GTE_40,Pct_Ven_GTE_40
count,2766.0,2766.0,2766.0,2766.0,2766.0,2766.0,2766.0,2766.0,2766.0,2766.0,...,2766.0,2766.0,2766.0,2766.0,2766.0,2766.0,2766.0,2766.0,2766.0,2766.0
mean,2015.167751,2015.167751,381.463485,70.668113,0.205351,1.829356,0.001931,1.022415,0.00135,0.806941,...,0.601952,0.000455,0.110268,6.7e-05,0.048084,2.9e-05,0.046638,2.9e-05,0.0,0.0
std,2.575455,2.575455,923.960452,295.643062,0.167599,24.541771,0.012954,16.200469,0.008688,10.363256,...,7.48227,0.005368,1.608226,0.001144,0.70611,0.000512,0.685416,0.000541,0.0,0.0
min,2010.0,2010.0,2.0,6.0,0.03,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,2013.0,2013.0,78.0,14.0,0.124,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,2015.0,2015.0,169.0,27.0,0.17,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,2017.0,2017.0,378.0,58.0,0.24,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,2019.0,2019.0,19078.0,6617.0,4.5,746.0,0.222,541.0,0.14,205.0,...,139.0,0.137,42.0,0.036,16.0,0.017,16.0,0.019,0.0,0.0


In [None]:
Health_Care_Diversity.describe()

Unnamed: 0,ID Year,Year,ID Health Coverage,ID Gender,Health Insurance by Gender and Age,Share,ID Kaiser Coverage,ID Age,Health Insurance Policies
count,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
mean,,,,,,,,,
std,,,,,,,,,
min,,,,,,,,,
25%,,,,,,,,,
50%,,,,,,,,,
75%,,,,,,,,,
max,,,,,,,,,


In [None]:
Health_Care_Diversity_Age_Range.describe()

Unnamed: 0,ID Year,Year,ID Kaiser Coverage,ID Age,Health Insurance Policies,Share
count,216.0,216.0,216.0,216.0,216.0,216.0
mean,2017.0,2017.0,2.5,1.5,8001.782407,25.0
std,2.587987,2.587987,1.711792,1.120631,10235.161602,20.346561
min,2013.0,2013.0,0.0,0.0,44.0,0.20241
25%,2015.0,2015.0,1.0,0.75,419.25,8.36995
50%,2017.0,2017.0,2.5,1.5,4237.5,21.668559
75%,2019.0,2019.0,4.0,2.25,10989.5,38.441964
max,2021.0,2021.0,5.0,3.0,39669.0,88.64473


In [None]:
Patient_to_Clinician_Ratios.describe()

Unnamed: 0,ID Year,Year,Patient to Primary Care Physician Ratio,Patient to Primary Care Physician Ratio Data Source Years
count,72.0,72.0,72.0,72.0
mean,2018.0,2018.0,2450.25,2015.0
std,2.600108,2.600108,960.406586,2.600108
min,2014.0,2014.0,1079.0,2011.0
25%,2016.0,2016.0,1604.5,2013.0
50%,2018.0,2018.0,2485.0,2015.0
75%,2020.0,2020.0,3037.25,2017.0
max,2022.0,2022.0,4542.0,2019.0


In [None]:
Patient_to_Dentist_Ratios.describe()

Unnamed: 0,ID Year,Year,Patient to Dentist Ratio,Patient to Dentist Ratio Data Source Years
count,72.0,72.0,72.0,72.0
mean,2018.0,2018.0,2338.305556,2016.0
std,2.600108,2.600108,806.772908,2.600108
min,2014.0,2014.0,1339.0,2012.0
25%,2016.0,2016.0,1735.0,2014.0
50%,2018.0,2018.0,2131.0,2016.0
75%,2020.0,2020.0,2729.25,2018.0
max,2022.0,2022.0,4346.0,2020.0


In [None]:
Patient_to_Other_Clinician_Ratios.describe()

Unnamed: 0,ID Year,Year,Other Primary Care Providers,Other Primary Care Providers Data Source Years
count,72.0,72.0,72.0,72.0
mean,2018.0,2018.0,1642.736111,2017.0
std,2.600108,2.600108,795.830932,2.600108
min,2014.0,2014.0,467.0,2013.0
25%,2016.0,2016.0,1030.75,2015.0
50%,2018.0,2018.0,1482.5,2017.0
75%,2020.0,2020.0,2125.75,2019.0
max,2022.0,2022.0,3863.0,2021.0


In [None]:
Health_Care_Diversity.describe()

Unnamed: 0,ID Year,Year,ID Health Coverage,ID Gender,Health Insurance by Gender and Age,Share,ID Kaiser Coverage,ID Age,Health Insurance Policies
count,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
mean,,,,,,,,,
std,,,,,,,,,
min,,,,,,,,,
25%,,,,,,,,,
50%,,,,,,,,,
75%,,,,,,,,,
max,,,,,,,,,


In [None]:
SC__Number_of_Serious_Injuries_and_Fatalities_by_Mode_20240122.describe()

Unnamed: 0,Calendar Year,# of Persons,% of all Fatalities,% of all Injuries
count,312.0,312.0,36.0,36.0
mean,2017.5,6.326923,0.0,0.083333
std,1.710569,11.594801,0.0,0.280306
min,2015.0,0.0,0.0,0.0
25%,2016.0,0.0,0.0,0.0
50%,2017.5,1.0,0.0,0.0
75%,2019.0,5.0,0.0,0.0
max,2020.0,64.0,0.0,1.0


In [None]:
Uninsured_People.describe()

Unnamed: 0,ID Kaiser Coverage,ID Year,Year,Health Insurance Policies,share
count,54.0,54.0,54.0,54.0,54.0
mean,2.5,2017.0,2017.0,32007.12963,16.666667
std,1.723861,2.606233,2.606233,28542.434294,14.841613
min,0.0,2013.0,2013.0,1252.0,0.641449
25%,1.0,2015.0,2015.0,16066.25,8.379751
50%,2.5,2017.0,2017.0,19667.5,10.289566
75%,4.0,2019.0,2019.0,45773.5,23.950343
max,5.0,2021.0,2021.0,95550.0,49.063657


# <u> Visualizations </u>

Below, are some visualizations I've come up for each of my datasets. There's no real format things are following in this section. Just visualizing things from each dataset that make the most sense to show

## Line Charts

In [None]:
mask = HE__Percent_of_Children_with_Elevated_Blood_Lead_Levels_20240127['Year'] == 2020
_2020_data = HE__Percent_of_Children_with_Elevated_Blood_Lead_Levels_20240127[mask]


x = np.arange(0,len(_2020_data.Date),1)

_2020_data

Unnamed: 0,Geo,Group,Date,Year,Population,Tested,Pct_Tested,EBLL,Pct_EBLL,CEBLL,...,Ven_5_9,Pct_Ven_5_9,Ven_10_14,Pct_Ven_10_14,Ven_15_19,Pct_Ven_15_19,Ven_20_39,Pct_Ven_20_39,Ven_GTE_40,Pct_Ven_GTE_40
3,48117,Children 1-2,2020,2020,,40.0,,,,,...,,,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0
4,48120,Children < 6,2020,2020,,250.0,,,,0.0,...,,,,,,,0.0,0.0,0.0,0.0
16,48809,Children < 6,2020,2020,,106.0,,,,,...,,,0.0,0.000,,,0.0,0.0,0.0,0.0
20,48854,Children 1-2,2020,2020,,63.0,,,,,...,0.0,0.000,,,0.0,0.0,0.0,0.0,0.0,0.0
35,GRAND RAPIDS,Children 1-2,2020,2020,,2910.0,,99.0,0.034,39.0,...,46.0,0.016,10.0,0.003,,,,,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22376,49969,Children < 6,2020,2020,,10.0,,0.0,0.000,0.0,...,0.0,0.000,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0
22382,49970,Children 1-2,2020,2020,,,,0.0,0.000,0.0,...,0.0,0.000,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0
22391,49970,Children < 6,2020,2020,,,,0.0,0.000,0.0,...,0.0,0.000,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0
22402,49971,Children 1-2,2020,2020,,,,0.0,0.000,0.0,...,0.0,0.000,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
HE__Percent_of_Children_with_Elevated_Blood_Lead_Levels_20240127.columns

Index(['Geo', 'Group', 'Date', 'Year', 'Population', 'Tested', 'Pct_Tested',
       'EBLL', 'Pct_EBLL', 'CEBLL', 'Pct_CEBLL', 'VEBLL', 'Pct_VEBLL',
       'Ven_5_9', 'Pct_Ven_5_9', 'Ven_10_14', 'Pct_Ven_10_14', 'Ven_15_19',
       'Pct_Ven_15_19', 'Ven_20_39', 'Pct_Ven_20_39', 'Ven_GTE_40',
       'Pct_Ven_GTE_40'],
      dtype='object')

# <u> Predictive Models </u>

In this section, any form of predictive models built for a given dataset will be displayed here.

In [None]:
HE__Percent_of_Children_with_Elevated_Blood_Lead_Levels_20240127_cleaned

Unnamed: 0,Geo,Group,Date,Year,Population,Tested,Pct_Tested,EBLL,Pct_EBLL,CEBLL,...,Ven_5_9,Pct_Ven_5_9,Ven_10_14,Pct_Ven_10_14,Ven_15_19,Pct_Ven_15_19,Ven_20_39,Pct_Ven_20_39,Ven_GTE_40,Pct_Ven_GTE_40
36,GRAND RAPIDS,Children < 6,2010,2010,18726.0,6558.0,0.350,746.0,0.114,541.0,...,139.0,0.021,42.0,0.006,12.0,0.002,12.0,0.002,0.0,0.0
38,GRAND RAPIDS,Children < 6,2012,2012,18427.0,6605.0,0.358,593.0,0.090,461.0,...,89.0,0.013,24.0,0.004,10.0,0.002,9.0,0.001,0.0,0.0
40,GRAND RAPIDS,Children < 6,2014,2014,19078.0,6377.0,0.334,400.0,0.063,281.0,...,78.0,0.012,21.0,0.003,12.0,0.002,8.0,0.001,0.0,0.0
42,GRAND RAPIDS,Children < 6,2016,2016,18297.0,6617.0,0.362,523.0,0.079,333.0,...,136.0,0.021,26.0,0.004,16.0,0.002,12.0,0.002,0.0,0.0
44,GRAND RAPIDS,Children < 6,2018,2018,17250.0,6049.0,0.351,271.0,0.045,100.0,...,127.0,0.021,23.0,0.004,9.0,0.001,12.0,0.002,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22368,49969,Children < 6,2012,2012,49.0,22.0,0.449,0.0,0.000,0.0,...,0.0,0.000,0.0,0.000,0.0,0.000,0.0,0.000,0.0,0.0
22370,49969,Children < 6,2014,2014,69.0,23.0,0.333,0.0,0.000,0.0,...,0.0,0.000,0.0,0.000,0.0,0.000,0.0,0.000,0.0,0.0
22371,49969,Children < 6,2015,2015,74.0,15.0,0.203,0.0,0.000,0.0,...,0.0,0.000,0.0,0.000,0.0,0.000,0.0,0.000,0.0,0.0
22372,49969,Children < 6,2016,2016,78.0,19.0,0.244,0.0,0.000,0.0,...,0.0,0.000,0.0,0.000,0.0,0.000,0.0,0.000,0.0,0.0


# <u>Exporting Data to new csv's </u>

## A note

If applicable, any sort of dataset that was transformed and/or manipulated in the above code will be exported to a new csv file.

If it is a change pertaining to the original file, then the new csv file will have the same name as the original - with "_cleaned" added at the end of the string.

If it is a dataset composed of a join between two or more datasets, then the new dataset will have a different name entirely.

In [None]:
#Because I'm using google colab, doing this allows the file to locally appear in the "Files" section in my colab file
patient_ratios.to_csv('Patient_Ratios.csv')
HE__Percent_of_Children_with_Elevated_Blood_Lead_Levels_20240127_cleaned.to_csv('HE__Percent_of_Children_with_Elevated_Blood_Lead_Levels_20240127_cleaned')