# Example queries for Case Counts on COVID-19 Knowledge Graph
[Work in progress]

This notebook demonstrates how to run Cypher queries to retrieve and aggregate COVID-19 case counts.

COVID-19 case numbers are provided by:

Country and US County level data: [JHU](https://github.com/covid-19-net/covid-19-community/blob/master/reference_data/DataProvider.csv#L6)

San Diego data by zip code: [SDHHSA](https://github.com/covid-19-net/covid-19-community/blob/master/reference_data/DataProvider.csv#L7)

In [1]:
import datetime
import pandas as pd
from py2neo import Graph

In [2]:
pd.options.display.max_rows = None  # display all rows
pd.options.display.max_columns = None  # display all columsns

#### Connect to COVID-19-Net Knowledge Graph

In [3]:
graph = Graph("bolt://132.249.238.185:7687", user="reader", password="demo")

In [4]:
# currently defining "yesterday" as two days from current date, 
# since there are time periods during the day, where data from the 
# previous day are not yet available in UTC date.
today = datetime.datetime.utcnow().date()
yesterday = today - datetime.timedelta(days=2)

## COVID Case Data

### Case counts by Country

In [5]:
query = """
MATCH (c:Cases{date: date($day), source: 'JHU'})-[:REPORTED_IN]->(n:Country)
RETURN n.name, c.cases, c.deaths, c.date as dateUTC
ORDER by n.name
"""
graph.run(query, day=yesterday).to_data_frame()

Unnamed: 0,n.name,c.cummulativeConfirmed,c.cummulativeDeaths,dateUTC
0,Afghanistan,38143,1402,2020-08-29
1,Albania,9279,275,2020-08-29
2,Algeria,43781,1491,2020-08-29
3,Andorra,1124,53,2020-08-29
4,Angola,2551,107,2020-08-29
5,Anguilla,3,0,2020-08-29
6,Antigua and Barbuda,94,3,2020-08-29
7,Argentina,401239,8353,2020-08-29
8,Armenia,43626,872,2020-08-29
9,Aruba,1975,10,2020-08-29


### Case counts by US States aggregated from US Counties
Note, some counties in the Johns Hopkins dataset cannot be mapped to US counties. They are listed in as "None"

In [6]:
query = """
MATCH (c:Cases{date: $date, source: 'JHU'})-[:REPORTED_IN]->(a:Admin2)-[:IN]->(a1:Admin1)
RETURN a1.name as state, a1.code, a1.location, sum(c.deaths) as deaths, sum(c.cases) as confirmed, c.date as date
ORDER BY a1.code
"""
graph.run(query, date=yesterday).to_data_frame()

Unnamed: 0,state,a1.code,a1.location,deaths,confirmed,date
0,Alaska,AK,"(-150.00028, 64.00028)",37,5180,2020-08-29
1,Alabama,AL,"(-86.75026, 32.75041)",2059,115284,2020-08-29
2,Arkansas,AR,"(-92.50044, 34.75037)",772,59477,2020-08-29
3,Arizona,AZ,"(-111.50098, 34.5003)",5006,201287,2020-08-29
4,California,CA,"(-119.75126, 37.25022)",12894,702038,2020-08-29
5,Colorado,CO,"(-105.50083, 39.00027)",1942,57020,2020-08-29
6,Connecticut,CT,"(-72.66648, 41.66704)",4465,52391,2020-08-29
7,District of Columbia,DC,"(-77.00025, 38.91706)",605,13925,2020-08-29
8,Delaware,DE,"(-75.49992, 39.00039)",604,17071,2020-08-29
9,Florida,FL,"(-82.5001, 28.75054)",11105,618019,2020-08-29


### Current cases in San Diego County

In [7]:
admin2 = 'San Diego County'

query = """
MATCH (c:Cases{date: date($day), source: 'JHU'})-[:REPORTED_IN]->(a:Admin2{name: $admin2})
RETURN a.name as name, c.cases as confirmed, c.deaths as deaths, c.date as date
"""
graph.run(query, admin2=admin2, day=yesterday).to_data_frame()

Unnamed: 0,name,confirmed,deaths,date
0,San Diego County,38047,679,2020-08-29


### COVID-19 Cases Time Series in San Diego County

In [8]:
query = """
MATCH (c:Cases{source: 'JHU'})-[:REPORTED_IN]->(a:Admin2{name: $admin2})
RETURN a.name as name, c.cases as confirmed, c.deaths as deaths, c.date as date
ORDER BY c.date DESC
"""
graph.run(query, admin2=admin2).to_data_frame()

Unnamed: 0,name,confirmed,deaths,date
0,San Diego County,38300,682,2020-08-30
1,San Diego County,38047,679,2020-08-29
2,San Diego County,37784,676,2020-08-28
3,San Diego County,37499,673,2020-08-27
4,San Diego County,37222,668,2020-08-26
5,San Diego County,36994,665,2020-08-25
6,San Diego County,36727,660,2020-08-24
7,San Diego County,36540,660,2020-08-23
8,San Diego County,36203,660,2020-08-22
9,San Diego County,35912,652,2020-08-21


### COVID-19 Cases in San Diego Country by Zip code
The latest data may show up with a delay. If no output is show, adjust the date to a day earlier using this format: `date('2020-06-21')`.

In [9]:
query = """
MATCH (c:Cases{date: date($day), source: 'SDHHSA'})-[:REPORTED_IN]->(p:PostalCode)
RETURN p.name as zip, p.placeName, p.location, c.cases as confirmed, c.date as date
ORDER by zip
"""
graph.run(query, day=yesterday).to_data_frame()

Unnamed: 0,zip,p.placeName,p.location,confirmed,date
0,91901,Alpine,"(-116.7543, 32.8282)",95,2020-08-29
1,91902,Bonita,"(-117.0221, 32.6671)",220,2020-08-29
2,91905,Boulevard,"(-116.32, 32.6719)",7,2020-08-29
3,91906,Campo,"(-116.4905, 32.6605)",19,2020-08-29
4,91910,Chula Vista,"(-117.0676, 32.6371)",1595,2020-08-29
5,91911,Chula Vista,"(-117.0565, 32.6084)",1914,2020-08-29
6,91913,Chula Vista,"(-116.9852, 32.6513)",686,2020-08-29
7,91914,Chula Vista,"(-116.9652, 32.6587)",233,2020-08-29
8,91915,Chula Vista,"(-116.9408, 32.6315)",409,2020-08-29
9,91916,Descanso,"(-116.6027, 32.873)",10,2020-08-29


### COVID 19 Cases Time Series for Carlsbad, California
Cases are aggregated from the zip-level data (note, some zip code areas may cross city boundaries)

In [10]:
query = """
MATCH (c:Cases{source: 'SDHHSA'})-[:REPORTED_IN]->(p:PostalCode{placeName:'Carlsbad'})-[:IN*]->(a:Admin1{name: 'California'})
RETURN c.date as date, sum(c.cases) as confirmed
ORDER by date DESC
"""
graph.run(query).to_data_frame()

Unnamed: 0,date,confirmed
0,2020-08-29,593
1,2020-08-28,591
2,2020-08-27,585
3,2020-08-26,584
4,2020-08-25,575
5,2020-08-24,574
6,2020-08-23,573
7,2020-08-22,568
8,2020-08-21,561
9,2020-08-20,559


### COVID-19 cases aggregated by US Regions
Here we aggregate US county-level data over 2 hops:

`Admin2 -> USDivision -> USRegion`

using the variable-length relationship [:IN*].

In [11]:
query = """
MATCH (c:Cases{date: date($day), source: 'JHU'})-[:REPORTED_IN]->(:Admin2)-[:IN*]->(u:USRegion)
RETURN sum(c.cases) AS count, u.name AS USRegion
ORDER by count DESC
"""
graph.run(query, day=yesterday).to_data_frame()

Unnamed: 0,count,USRegion
0,2696928,South Region
1,1249632,West Region
2,965376,Northeast Region
3,936911,Midwest Region


### COVID-19 cases aggregated by UN Region
Here we aggregate country-level data over up to 3 hops:

`country -> UNSubRegion -> UNIntermediateRegion -> UNRegion`

using the variable-length relationship `[:IN*`].

In [12]:
query = """
MATCH (c:Cases{date: date($day), source: 'JHU'})-[:REPORTED_IN]->(:Country)-[:IN*]->(u:UNRegion)
RETURN sum(c.cases) AS count, u.name AS UNRegion
ORDER by count DESC
"""
df = graph.run(query, day=yesterday).to_data_frame()
df

Unnamed: 0,count,UNRegion
0,13156406,Americas
1,6808125,Asia
2,3546247,Europe
3,1241841,Africa
4,2715,Oceania


In [13]:
print("Total number of confirmed cases:", df['count'].sum())

Total number of confirmed cases: 24755334
