# COVID-19 Case Counts for San Diego County
**[Work in progress]**

This notebook loads COVID-19 case numbers for San Diego county by Zip code for ingestion into a Knowledge Graph.

Data source: [County of San Diego, Health and Human Services Agency, Public Health Services, Epidemiology and Immunization Services Branch](https://www.sandiegocounty.gov/content/sdc/hhsa/programs/phs/community_epidemiology/dc/2019-nCoV/status.html)

[County of San Diego - Coronavirus Disease 2019 (COVID-19) Dashboard](https://www.arcgis.com/apps/opsdashboard/index.html#/96feda77f12f46638b984fcb1d17bd24)

Authors: Ilya Zaslavsky (zaslavsk@sdsc.edu), Peter Rose (pwrose@ucsd.edu)

In [1]:
import os
import pandas as pd
from pathlib import Path
from py2neo import Graph
from arcgis.features import FeatureLayerCollection

In [2]:
pd.options.display.max_rows = None  # display all rows
pd.options.display.max_columns = None  # display all columsns

In [3]:
NEO4J_IMPORT = Path(os.getenv('NEO4J_IMPORT'))
print(NEO4J_IMPORT)

/Users/peter/Library/Application Support/com.Neo4j.Relate/data/dbmss/dbms-8bf637fc-0d20-4d9f-9c6f-f7e72e92a4da/import


### Get data from ArcGIS web service

In [4]:
sd_dashboard_service = 'https://services1.arcgis.com/1vIhDJwtG5eNmiqX/ArcGIS/rest/services/Covid19_San_Diego_County_PUBLIC_VIEW/FeatureServer'

In [5]:
db_item = FeatureLayerCollection(sd_dashboard_service)

### Clean up cummulative case counts

In [6]:
cases = pd.DataFrame.spatial.from_layer(db_item.layers[0])

In [7]:
cases.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40577 entries, 0 to 40576
Data columns (total 8 columns):
 #   Column                   Non-Null Count  Dtype         
---  ------                   --------------  -----         
 0   Case_Count               40460 non-null  float64       
 1   OBJECTID                 40577 non-null  int64         
 2   SDEP_SANGIS_ZIPCODE_ZIP  40577 non-null  int64         
 3   SHAPE                    40577 non-null  geometry      
 4   UpdateDate               40577 non-null  datetime64[ns]
 5   ZipText                  40577 non-null  object        
 6   Zip_Code                 37300 non-null  object        
 7   rate_100k                26107 non-null  float64       
dtypes: datetime64[ns](1), float64(2), geometry(1), int64(2), object(2)
memory usage: 2.5+ MB


In [8]:
cases.dropna(subset=['Case_Count'], inplace=True)

In [9]:
cases.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 40460 entries, 0 to 40576
Data columns (total 8 columns):
 #   Column                   Non-Null Count  Dtype         
---  ------                   --------------  -----         
 0   Case_Count               40460 non-null  float64       
 1   OBJECTID                 40460 non-null  int64         
 2   SDEP_SANGIS_ZIPCODE_ZIP  40460 non-null  int64         
 3   SHAPE                    40460 non-null  geometry      
 4   UpdateDate               40460 non-null  datetime64[ns]
 5   ZipText                  40460 non-null  object        
 6   Zip_Code                 37270 non-null  object        
 7   rate_100k                26107 non-null  float64       
dtypes: datetime64[ns](1), float64(2), geometry(1), int64(2), object(2)
memory usage: 2.8+ MB


In [10]:
cases['cases'] = cases['Case_Count'].astype(int)
cases['date'] = cases['UpdateDate'].dt.normalize()
cases.rename(columns={'ZipText': 'zipCode'}, inplace=True)

In [11]:
cases = cases[['zipCode', 'cases', 'date']]

In [12]:
cases.to_csv(NEO4J_IMPORT / "02c-SDHHSACases.csv", index=False)

In [13]:
cases.head()

Unnamed: 0,zipCode,cases,date
0,91902,8,2020-03-31
1,91910,17,2020-03-31
2,91911,13,2020-03-31
3,91913,14,2020-03-31
4,91914,2,2020-03-31


### TODO: Additional data available

In [14]:
ConfirmHospitalICuDeaths_df = pd.DataFrame.spatial.from_layer(db_item.layers[1])

In [15]:
ConfirmHospitalICuDeaths_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 409 entries, 0 to 408
Data columns (total 24 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   OBJECTID                409 non-null    int64         
 1   Date                    409 non-null    datetime64[ns]
 2   Tests                   405 non-null    float64       
 3   Positives               409 non-null    int64         
 4   Hospitalized            406 non-null    float64       
 5   ICU                     400 non-null    float64       
 6   Deaths                  399 non-null    float64       
 7   NewCases                409 non-null    int64         
 8   Age_9                   391 non-null    float64       
 9   Age10_19                391 non-null    float64       
 10  Age40_49                391 non-null    float64       
 11  Age50_59                391 non-null    float64       
 12  Age60_69                391 non-null    float64   

In [16]:
ConfirmHospitalICuDeaths_df.head()

Unnamed: 0,OBJECTID,Date,Tests,Positives,Hospitalized,ICU,Deaths,NewCases,Age_9,Age10_19,Age40_49,Age50_59,Age60_69,Age70_79,Age80_Plus,AgeUnknow,Age20_29,GenderFemale,GenderMale,GendeUnk,NewTests,Age30_39,Rolling_Perc_Pos_Cases,SHAPE
0,1,2020-03-11 08:00:00,123.0,5,,,,1,,,,,,,,,,,,,,,,"{""x"": -13358335.3445, ""y"": 3894443.7920999974,..."
1,2,2020-03-12 08:00:00,147.0,10,,,,5,,,,,,,,,,,,,,,,"{""x"": -13358335.3445, ""y"": 3894443.7920999974,..."
2,3,2020-03-13 08:00:00,273.0,19,,,,9,,,,,,,,,,,,,52.0,,,"{""x"": -13358335.3445, ""y"": 3894443.7920999974,..."
3,4,2020-03-14 08:00:00,288.0,25,12.0,,,6,,,,,,,,,,,,,14.0,,,"{""x"": -13361313.6588, ""y"": 3896230.780699998, ..."
4,5,2020-03-15 08:00:00,313.0,37,10.0,,,12,,,,,,,,,,,,,25.0,,,"{""x"": -13358931.007399999, ""y"": 3896230.780699..."


In [17]:
AgeGenderPoints_df = pd.DataFrame.spatial.from_layer(db_item.layers[2])

In [18]:
AgeGenderPoints_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 221 entries, 0 to 220
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   OBJECTID   221 non-null    int64         
 1   AgeGender  195 non-null    object        
 2   Count_     208 non-null    float64       
 3   Date       208 non-null    datetime64[ns]
 4   SHAPE      221 non-null    geometry      
dtypes: datetime64[ns](1), float64(1), geometry(1), int64(1), object(1)
memory usage: 8.8+ KB


In [19]:
AgeGenderPoints_df.head()

Unnamed: 0,OBJECTID,AgeGender,Count_,Date,SHAPE
0,1,0-9 years,0.0,2020-03-21,"{""x"": -13358931.007399999, ""y"": 3894443.792099..."
1,2,10-19 years,2.0,2020-03-21,"{""x"": -13358931.007399999, ""y"": 3894443.792099..."
2,3,20-29 years,27.0,2020-03-21,"{""x"": -13358931.007399999, ""y"": 3894443.792099..."
3,4,30-39 years,37.0,2020-03-21,"{""x"": -13358931.007399999, ""y"": 3894443.792099..."
4,5,40-49 years,29.0,2020-03-21,"{""x"": -13358931.007399999, ""y"": 3894443.792099..."


In [20]:
CompiledCopyDashUpdate_df = pd.DataFrame.spatial.from_layer(db_item.layers[3])

In [21]:
CompiledCopyDashUpdate_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18577 entries, 0 to 18576
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   OBJECTID        18577 non-null  int64         
 1   SHAPE           18574 non-null  geometry      
 2   active          0 non-null      object        
 3   confirmedcases  17708 non-null  float64       
 4   deaths          0 non-null      object        
 5   lastupdate      18577 non-null  datetime64[ns]
 6   loctype         18577 non-null  object        
 7   name            18577 non-null  object        
 8   recovered       0 non-null      object        
dtypes: datetime64[ns](1), float64(1), geometry(1), int64(1), object(5)
memory usage: 1.3+ MB


In [22]:
CompiledCopyDashUpdate_df.head()

Unnamed: 0,OBJECTID,SHAPE,active,confirmedcases,deaths,lastupdate,loctype,name,recovered
0,1,"{""x"": -13055943.7678, ""y"": 3911860.2880999967,...",,15.0,,2020-03-25 08:00:00,Incorporated City,CARLSBAD,
1,2,"{""x"": -13025980.1151, ""y"": 3845995.7070999965,...",,17.0,,2020-03-25 08:00:00,Incorporated City,CHULA VISTA,
2,3,"{""x"": -13040305.7757, ""y"": 3849683.3521, ""spat...",,0.0,,2020-03-25 08:00:00,Incorporated City,CORONADO,
3,4,"{""x"": -13053598.1182, ""y"": 3890456.4196999967,...",,5.0,,2020-03-25 08:00:00,Incorporated City,DEL MAR,
4,5,"{""x"": -13019977.8232, ""y"": 3869010.3783000037,...",,19.0,,2020-03-25 08:00:00,Incorporated City,EL CAJON,
