# Example queries for Case Counts on COVID-19 Knowledge Graph
[Work in progress]

This notebook demonstrates how to run Cypher queries to retrieve and aggregate COVID-19 case counts.

COVID-19 case numbers are provided by:

Country and US County level data: [JHU](https://github.com/covid-19-net/covid-19-community/blob/master/reference_data/DataProvider.csv#L6)

San Diego data by zip code: [SDHHSA](https://github.com/covid-19-net/covid-19-community/blob/master/reference_data/DataProvider.csv#L7)

In [1]:
import datetime
import pandas as pd
from py2neo import Graph

In [2]:
pd.options.display.max_rows = None  # display all rows
pd.options.display.max_columns = None  # display all columsns

#### Connect to COVID-19-Net Knowledge Graph

In [3]:
graph = Graph("bolt://132.249.238.185:7687", user="reader", password="demo")

In [4]:
# currently defining "yesterday" as two days from current date, 
# since there are time periods during the day, where data from the 
# previous day are not yet available in UTC date.
today = datetime.datetime.utcnow().date()
yesterday = today - datetime.timedelta(days=2)

## COVID Case Data

### Case counts by Country

In [5]:
query = """
MATCH (c:Cases{date: date($day), source: 'JHU'})-[:REPORTED_IN]->(n:Country)
RETURN n.name, c.cummulativeConfirmed, c.cummulativeDeaths, c.date as dateUTC
ORDER by n.name
"""
graph.run(query, day=yesterday).to_data_frame()

Unnamed: 0,n.name,c.cummulativeConfirmed,c.cummulativeDeaths,dateUTC
0,Afghanistan,31517,746,2020-06-30
1,Albania,2535,62,2020-06-30
2,Algeria,13907,912,2020-06-30
3,Andorra,855,52,2020-06-30
4,Angola,284,13,2020-06-30
5,Anguilla,3,0,2020-06-30
6,Antigua and Barbuda,69,3,2020-06-30
7,Argentina,64530,1307,2020-06-30
8,Armenia,25542,443,2020-06-30
9,Aruba,103,3,2020-06-30


### Case counts by US States aggregated from US Counties
Note, some counties in the Johns Hopkins dataset cannot be mapped to US counties. They are listed in as "None"

In [6]:
query = """
MATCH (c:Cases{date: $date, source: 'JHU'})-[:REPORTED_IN]->(a:Admin2)-[:IN]->(a1:Admin1)
RETURN a1.name as state, a1.code, a1.location, sum(c.cummulativeDeaths) as deaths, sum(c.cummulativeConfirmed) as confirmed, c.date as date
ORDER BY a1.code
"""
graph.run(query, date=yesterday).to_data_frame()

Unnamed: 0,state,a1.code,a1.location,deaths,confirmed,date
0,Alaska,AK,"(-150.00028, 64.00028)",14,934,2020-06-30
1,Alabama,AL,"(-86.75026, 32.75041)",924,37244,2020-06-30
2,Arkansas,AR,"(-92.50044, 34.75037)",267,19244,2020-06-30
3,Arizona,AZ,"(-111.50098, 34.5003)",1644,79227,2020-06-30
4,California,CA,"(-119.75126, 37.25022)",6032,227815,2020-06-30
5,Colorado,CO,"(-105.50083, 39.00027)",1690,32698,2020-06-30
6,Connecticut,CT,"(-72.66648, 41.66704)",4322,46310,2020-06-30
7,Delaware,DE,"(-75.49992, 39.00039)",509,11436,2020-06-30
8,Florida,FL,"(-82.5001, 28.75054)",3451,149617,2020-06-30
9,Georgia,GA,"(-83.50018, 32.75042)",2756,74581,2020-06-30


### Current cases in San Diego County

In [7]:
admin2 = 'San Diego County'

query = """
MATCH (c:Cases{date: date($day), source: 'JHU'})-[:REPORTED_IN]->(a:Admin2{name: $admin2})
RETURN a.name as name, c.cummulativeConfirmed as confirmed, c.cummulativeDeaths as deaths, c.date as date
"""
graph.run(query, admin2=admin2, day=yesterday).to_data_frame()

Unnamed: 0,name,confirmed,deaths,date
0,San Diego County,14149,365,2020-06-30


### COVID-19 Cases Time Series in San Diego County

In [8]:
query = """
MATCH (c:Cases{source: 'JHU'})-[:REPORTED_IN]->(a:Admin2{name: $admin2})
RETURN a.name as name, c.cummulativeConfirmed as confirmed, c.cummulativeDeaths as deaths, c.date as date
ORDER BY c.date DESC
"""
graph.run(query, admin2=admin2).to_data_frame()

Unnamed: 0,name,confirmed,deaths,date
0,San Diego County,14623,370,2020-07-01
1,San Diego County,14149,365,2020-06-30
2,San Diego County,13832,361,2020-06-29
3,San Diego County,13334,361,2020-06-28
4,San Diego County,12837,360,2020-06-27
5,San Diego County,12401,358,2020-06-26
6,San Diego County,11961,352,2020-06-25
7,San Diego County,11626,347,2020-06-24
8,San Diego County,11294,341,2020-06-23
9,San Diego County,11096,338,2020-06-22


### COVID-19 Cases in San Diego Country by Zip code
The latest data may show up with a delay. If no output is show, adjust the date to a day earlier using this format: `date('2020-06-21')`.

In [9]:
query = """
MATCH (c:Cases{date: date($day), source: 'SDHHSA'})-[:REPORTED_IN]->(p:PostalCode)
RETURN p.name as zip, p.placeName, p.location, c.cummulativeConfirmed as confirmed, c.date as date
ORDER by zip
"""
graph.run(query, day=yesterday).to_data_frame()

Unnamed: 0,zip,p.placeName,p.location,confirmed,date
0,91901,Alpine,"(-116.7543, 32.8282)",26,2020-06-30
1,91902,Bonita,"(-117.0221, 32.6671)",73,2020-06-30
2,91905,Boulevard,"(-116.32, 32.6719)",4,2020-06-30
3,91906,Campo,"(-116.4905, 32.6605)",7,2020-06-30
4,91910,Chula Vista,"(-117.0676, 32.6371)",643,2020-06-30
5,91911,Chula Vista,"(-117.0565, 32.6084)",860,2020-06-30
6,91913,Chula Vista,"(-116.9852, 32.6513)",279,2020-06-30
7,91914,Chula Vista,"(-116.9652, 32.6587)",89,2020-06-30
8,91915,Chula Vista,"(-116.9408, 32.6315)",150,2020-06-30
9,91916,Descanso,"(-116.6027, 32.873)",3,2020-06-30


### COVID 19 Cases Time Series for Carlsbad, California
Cases are aggregated from the zip-level data (note, some zip code areas may cross city boundaries)

In [10]:
query = """
MATCH (c:Cases{source: 'SDHHSA'})-[:REPORTED_IN]->(p:PostalCode{placeName:'Carlsbad'})-[:IN*]->(a:Admin1{name: 'California'})
RETURN c.date as date, sum(c.cummulativeConfirmed) as confirmed
ORDER by date DESC
"""
graph.run(query).to_data_frame()

Unnamed: 0,date,confirmed
0,2020-06-30,182
1,2020-06-29,167
2,2020-06-28,160
3,2020-06-27,155
4,2020-06-26,149
5,2020-06-25,139
6,2020-06-24,136
7,2020-06-23,127
8,2020-06-22,122
9,2020-06-21,115


### COVID-19 cases aggregated by US Regions
Here we aggregate US county-level data over 2 hops:

`Admin2 -> USDivision -> USRegion`

using the variable-length relationship [:IN*].

In [11]:
query = """
MATCH (c:Cases{date: date($day), source: 'JHU'})-[:REPORTED_IN]->(:Admin2)-[:IN*]->(u:USRegion)
RETURN sum(c.cummulativeConfirmed) AS count, u.name AS USRegion
ORDER by count DESC
"""
graph.run(query, day=yesterday).to_data_frame()

Unnamed: 0,count,USRegion
0,835921,Northeast Region
1,815518,South Region
2,448123,Midwest Region
3,438263,West Region


### COVID-19 cases aggregated by UN Region
Here we aggregate country-level data over up to 3 hops:

`country -> UNSubRegion -> UNIntermediateRegion -> UNRegion`

using the variable-length relationship `[:IN*`].

In [12]:
query = """
MATCH (c:Cases{date: date($day), source: 'JHU'})-[:REPORTED_IN]->(:Country)-[:IN*]->(u:UNRegion)
RETURN sum(c.cummulativeConfirmed) AS count, u.name AS UNRegion
ORDER by count DESC
"""
df = graph.run(query, day=yesterday).to_data_frame()
df

Unnamed: 0,count,UNRegion
0,5217921,Americas
1,2427296,Europe
2,2222484,Asia
3,407399,Africa
4,1640,Oceania


In [13]:
print("Total number of confirmed cases:", df['count'].sum())

Total number of confirmed cases: 10276740
