## Table of Contents

1. [Objective](#section1)<br> 
2. [Importing Python Libraries](#section2)<br> 
3. [Establishing Connection with MongoDB and Loading JSON File](#section3)<br> 
4. [Storing the MongoDB Collection Data in a Dataframe ](#section4)<br> 
5. [Data Pre-Processing](#section5)<br>
    - 5.1 [Removing Meta-Data Columns](#section50101)<br>     
    - 5.2 [Type-casting Date and State Variable into Date and String Datatype](#section50102)<br>     

<a id=section1></a>
## Objective
The end-objective is to evaluate the impact of COVID-19 on hsopitals across United States during the period of March-2020 to March 2021. We can intricately analyze the parameters like COVID Admissions, Critical Staff Shortage , Inpatient Beds, ICU occupancy, Adult and Pediatric Bifurcations across different states.   

<a id=section2></a>
## 2. Importing Python Libraries

In [6]:
# Importing Required Python Libraries
import json
from pymongo import MongoClient
import pandas as pd
import numpy as np
import os

<a id=section3></a>
## 3. Establishing Connection with MongoDB and Loading JSON File

In [7]:
#Establishing Connection with Mongo Server
try:
    client = MongoClient('localhost', 27017)
    print("Connected!")
except:
    print("Unable to connect to MongoDB instance")

Connected!


In [8]:
# Setting up the working directory
os.chdir('/Users/adityaraj/Desktop/Database and Analytics Programming Folder/CA Project/')
#Opening the COVID JSON File and reading the data and storing the data within the defined Collection Name
try:
    with open('COVID-19 Reported Patient Impact and Hospital Capacity by State Timeseries.json') as f:
        file_data = json.load(f)
except FileNotFoundError:
    print("File not found at the location")
except:
    print("Error in Processing File")
client.close()

In [9]:
#Creating an instance on MongoDB and initializing DB and connection parameters, Inserting a file into the New collection

db = client['COVID']
collection_covid = db['COVID_HOSPITAL']
collection_covid.insert_one(file_data)
client.close()

In [10]:
#Extracting Column Names from the Columns tag within JSON file and storing into a 'cols' list

getData = pd.DataFrame(list(collection_covid.find())) 
cols = []

for data in getData["meta"]:
    for column in data['view']['columns']:
        columnName = column['name']
        if columnName not in cols:
            cols.append(columnName)
        
print(cols)

['sid', 'id', 'position', 'created_at', 'created_meta', 'updated_at', 'updated_meta', 'meta', 'state', 'date', 'critical_staffing_shortage_today_yes', 'critical_staffing_shortage_today_no', 'critical_staffing_shortage_today_not_reported', 'critical_staffing_shortage_anticipated_within_week_yes', 'critical_staffing_shortage_anticipated_within_week_no', 'critical_staffing_shortage_anticipated_within_week_not_reported', 'hospital_onset_covid', 'hospital_onset_covid_coverage', 'inpatient_beds', 'inpatient_beds_coverage', 'inpatient_beds_used', 'inpatient_beds_used_coverage', 'inpatient_beds_used_covid', 'inpatient_beds_used_covid_coverage', 'previous_day_admission_adult_covid_confirmed', 'previous_day_admission_adult_covid_confirmed_coverage', 'previous_day_admission_adult_covid_suspected', 'previous_day_admission_adult_covid_suspected_coverage', 'previous_day_admission_pediatric_covid_confirmed', 'previous_day_admission_pediatric_covid_confirmed_coverage', 'previous_day_admission_pediatri

<a id=section4></a>
## 4. Storing the MongoDB Collection Data in a Dataframe

In [11]:
#Store the data into a dataframe and use "cols" list as column names in the dataframe
COVID_HOSPITAL = pd.DataFrame(getData["data"][0], columns = cols)
COVID_HOSPITAL.head()

Unnamed: 0,sid,id,position,created_at,created_meta,updated_at,updated_meta,meta,state,date,...,inpatient_bed_covid_utilization_denominator,adult_icu_bed_covid_utilization,adult_icu_bed_covid_utilization_coverage,adult_icu_bed_covid_utilization_numerator,adult_icu_bed_covid_utilization_denominator,adult_icu_bed_utilization,adult_icu_bed_utilization_coverage,adult_icu_bed_utilization_numerator,adult_icu_bed_utilization_denominator,geocoded_state
0,row-2ws6~7xbg-x7td,00000000-0000-0000-96B1-2F449842D794,0,1616853107,,1616853107,,{ },IA,2021-01-03T00:00:00,...,8024,0.1783536585365853,124,117,656,0.68,126,459,675,POINT (-93.500061 42.074659)
1,row-9hxe~53ka~ex4u,00000000-0000-0000-9F29-CA9CC62C6CAA,0,1616853107,,1616853107,,{ },ID,2021-01-03T00:00:00,...,3644,0.2317880794701986,51,70,302,0.6245954692556634,52,193,309,POINT (-114.659366 44.389073)
2,row-8irx.ii9u_yemm,00000000-0000-0000-E2CD-4B54EBD757E5,0,1616853107,,1616853107,,{ },IL,2021-01-03T00:00:00,...,32079,0.2276353276353276,202,799,3510,0.6284596030192899,207,2248,3577,POINT (-89.148632 40.124144)
3,row-dvbe_iika-mmc2,00000000-0000-0000-DDE2-94A11162072C,0,1616853107,,1616853107,,{ },IN,2021-01-03T00:00:00,...,18456,0.2989206945096199,164,637,2131,0.7408256880733946,166,1615,2180,POINT (-86.2818 39.919991)
4,row-38b6_nv3m~rijp,00000000-0000-0000-7AD5-E5DB09B972BF,0,1616853107,,1616853107,,{ },KS,2021-01-03T00:00:00,...,8738,0.3196622436670687,147,265,829,0.7403055229142186,150,630,851,POINT (-98.38018 38.484729)


<a id=section5></a>
## 5. Data Preprocessing

<a id=section50101></a>
## 5.1. Removing Meta-Data Columns

In [12]:
#We have first 8 columns related to META data, we can drop these columns from our COVID_HOSPITAL dataframe

COVID_HOSPITAL.drop(COVID_HOSPITAL.columns[0:8], axis = 1, inplace = True)
COVID_HOSPITAL.head()
# COVID_HOSPITAL.info()

Unnamed: 0,state,date,critical_staffing_shortage_today_yes,critical_staffing_shortage_today_no,critical_staffing_shortage_today_not_reported,critical_staffing_shortage_anticipated_within_week_yes,critical_staffing_shortage_anticipated_within_week_no,critical_staffing_shortage_anticipated_within_week_not_reported,hospital_onset_covid,hospital_onset_covid_coverage,...,inpatient_bed_covid_utilization_denominator,adult_icu_bed_covid_utilization,adult_icu_bed_covid_utilization_coverage,adult_icu_bed_covid_utilization_numerator,adult_icu_bed_covid_utilization_denominator,adult_icu_bed_utilization,adult_icu_bed_utilization_coverage,adult_icu_bed_utilization_numerator,adult_icu_bed_utilization_denominator,geocoded_state
0,IA,2021-01-03T00:00:00,6,63,57,6,62,58,2,124,...,8024,0.1783536585365853,124,117,656,0.68,126,459,675,POINT (-93.500061 42.074659)
1,ID,2021-01-03T00:00:00,4,47,1,7,44,1,5,51,...,3644,0.2317880794701986,51,70,302,0.6245954692556634,52,193,309,POINT (-114.659366 44.389073)
2,IL,2021-01-03T00:00:00,19,171,17,17,173,17,73,202,...,32079,0.2276353276353276,202,799,3510,0.6284596030192899,207,2248,3577,POINT (-89.148632 40.124144)
3,IN,2021-01-03T00:00:00,22,142,2,29,135,2,9,164,...,18456,0.2989206945096199,164,637,2131,0.7408256880733946,166,1615,2180,POINT (-86.2818 39.919991)
4,KS,2021-01-03T00:00:00,10,136,4,13,133,4,14,147,...,8738,0.3196622436670687,147,265,829,0.7403055229142186,150,630,851,POINT (-98.38018 38.484729)


<a id=section50102></a>
## 5.2. Type-casting Date and State Variable into Date and String Datatype

In [13]:
# Changing the datatype of date and state column to date and string
COVID_HOSPITAL['date']= pd.to_datetime(COVID_HOSPITAL['date'])
COVID_HOSPITAL['state']=COVID_HOSPITAL['state'].astype(str)