# Example queries for Case Counts on COVID-19 Knowledge Graph
[Work in progress]

This notebook demonstrates how to run Cypher queries to retrieve and aggregate COVID-19 case counts.

COVID-19 case numbers are provided by:

Country and US County level data: [JHU](https://github.com/covid-19-net/covid-19-community/blob/master/reference_data/DataProvider.csv#L6)

San Diego data by zip code: [SDHHSA](https://github.com/covid-19-net/covid-19-community/blob/master/reference_data/DataProvider.csv#L7)

In [1]:
import datetime
import pandas as pd
from py2neo import Graph

In [2]:
pd.options.display.max_rows = None  # display all rows
pd.options.display.max_columns = None  # display all columsns

#### Connect to COVID-19-Net Knowledge Graph

In [3]:
graph = Graph("bolt://132.249.238.185:7687", user="reader", password="demo")

#### Get lastest update day for JHU data

In [4]:
query = """
MATCH (:Admin2)<-[:REPORTED_IN]-(c:Cases{source: 'JHU'}) 
RETURN max(c.date)  // returns a neotime.Date object
"""
date = graph.evaluate(query).to_native() # convert to datetime.date object

## COVID Case Data

### Case counts by Country (using Johns Hopkins COVID-19 data)

In [5]:
query = """
MATCH (c:Cases{date: date($date), source: 'JHU'})-[:REPORTED_IN]->(n:Country)
RETURN n.name, c.cases, c.deaths, c.date as date
ORDER by n.name
"""
graph.run(query, date=date).to_data_frame()

Unnamed: 0,n.name,c.cases,c.deaths,date
0,Afghanistan,54483,2370,2021-01-22
1,Albania,70655,1303,2021-01-22
2,Algeria,105124,2856,2021-01-22
3,Andorra,9416,93,2021-01-22
4,Angola,19269,452,2021-01-22
5,Anguilla,15,0,2021-01-22
6,Antigua and Barbuda,195,6,2021-01-22
7,Argentina,1853830,46575,2021-01-22
8,Armenia,165711,3030,2021-01-22
9,Aruba,6656,56,2021-01-22


### Case counts by US States aggregated from US Counties
Note, some counties in the Johns Hopkins dataset cannot be mapped to US counties. They are listed in as "None"

In [6]:
query = """
MATCH (c:Cases{date: $date, source: 'JHU'})-[:REPORTED_IN]->(a:Admin2)-[:IN]->(a1:Admin1)
RETURN a1.name as state, a1.code, a1.location, sum(c.deaths) as deaths, sum(c.cases) as confirmed, c.date as date
ORDER BY a1.code
"""
graph.run(query, date=date).to_data_frame()

Unnamed: 0,state,a1.code,a1.location,deaths,confirmed,date
0,Alaska,AK,"(-150.00028, 64.00028)",254,52669,2021-01-22
1,Alabama,AL,"(-86.75026, 32.75041)",6486,436087,2021-01-22
2,Arkansas,AR,"(-92.50044, 34.75037)",4549,278195,2021-01-22
3,Arizona,AZ,"(-111.50098, 34.5003)",12001,708040,2021-01-22
4,California,CA,"(-119.75126, 37.25022)",36259,3121386,2021-01-22
5,Colorado,CO,"(-105.50083, 39.00027)",5462,382973,2021-01-22
6,Connecticut,CT,"(-72.66648, 41.66704)",6814,237028,2021-01-22
7,District of Columbia,DC,"(-77.00025, 38.91706)",867,34905,2021-01-22
8,Delaware,DE,"(-75.49992, 39.00039)",1027,73061,2021-01-22
9,Florida,FL,"(-82.5001, 28.75054)",25011,1624312,2021-01-22


### Current cases in San Diego County

In [7]:
admin2 = 'San Diego County'

query = """
MATCH (c:Cases{date: date($date), source: 'JHU'})-[:REPORTED_IN]->(a:Admin2{name: $admin2})
RETURN a.name as name, c.cases as confirmed, c.deaths as deaths, c.date as date
"""
graph.run(query, admin2=admin2, date=date).to_data_frame()

Unnamed: 0,name,confirmed,deaths,date
0,San Diego County,222578,2301,2021-01-22


### COVID-19 Cases Time Series in San Diego County

In [8]:
query = """
MATCH (c:Cases{source: 'JHU'})-[:REPORTED_IN]->(a:Admin2{name: $admin2})
RETURN a.name as name, c.cases as confirmed, c.deaths as deaths, c.date as date
ORDER BY c.date DESC
"""
graph.run(query, admin2=admin2).to_data_frame()

Unnamed: 0,name,confirmed,deaths,date
0,San Diego County,222578,2301,2021-01-22
1,San Diego County,219731,2222,2021-01-21
2,San Diego County,218555,2174,2021-01-20
3,San Diego County,216835,2109,2021-01-19
4,San Diego County,214337,2103,2021-01-18
5,San Diego County,211787,2103,2021-01-17
6,San Diego County,209897,2065,2021-01-16
7,San Diego County,206870,2037,2021-01-15
8,San Diego County,204175,2005,2021-01-14
9,San Diego County,201580,1952,2021-01-13


### COVID-19 Cases in San Diego Country by Zip code

Get lastest update day for SDHHSA data

In [9]:
query = """
MATCH (:PostalCode)<-[:REPORTED_IN]-(c:Cases{source: 'SDHHSA'}) 
RETURN max(c.date)  // returns a neotime.Date object
"""
date_sdhhsa = graph.evaluate(query).to_native() # convert to datetime.date object

In [10]:
query = """
MATCH (c:Cases{date: date($date), source: 'SDHHSA'})-[:REPORTED_IN]->(p:PostalCode)
RETURN p.name as zip, p.placeName, p.location, c.cases as confirmed, c.date as date
ORDER by zip
"""
graph.run(query, date=date_sdhhsa).to_data_frame()

Unnamed: 0,zip,p.placeName,p.location,confirmed,date
0,91901,Alpine,"(-116.7543, 32.8282)",1004,2021-01-21
1,91902,Bonita,"(-117.0221, 32.6671)",1091,2021-01-21
2,91905,Boulevard,"(-116.32, 32.6719)",56,2021-01-21
3,91906,Campo,"(-116.4905, 32.6605)",239,2021-01-21
4,91910,Chula Vista,"(-117.0676, 32.6371)",7439,2021-01-21
5,91911,Chula Vista,"(-117.0565, 32.6084)",9188,2021-01-21
6,91913,Chula Vista,"(-116.9852, 32.6513)",3718,2021-01-21
7,91914,Chula Vista,"(-116.9652, 32.6587)",1162,2021-01-21
8,91915,Chula Vista,"(-116.9408, 32.6315)",2248,2021-01-21
9,91916,Descanso,"(-116.6027, 32.873)",73,2021-01-21


### COVID 19 Cases Time Series for Carlsbad, California
Cases are aggregated from the zip-level data (note, some zip code areas may cross city boundaries)

In [11]:
query = """
MATCH (c:Cases{source: 'SDHHSA'})-[:REPORTED_IN]->(p:PostalCode{placeName:'Carlsbad'})-[:IN*]->(a:Admin1{name: 'California'})
RETURN c.date as date, sum(c.cases) as confirmed
ORDER by date DESC
"""
graph.run(query).to_data_frame()

Unnamed: 0,date,confirmed
0,2021-01-21,3683
1,2021-01-20,3650
2,2021-01-19,3613
3,2021-01-18,3588
4,2021-01-16,3505
5,2021-01-15,3479
6,2021-01-14,3436
7,2021-01-13,3379
8,2021-01-12,3334
9,2021-01-11,3292


### COVID-19 cases aggregated by US Regions
Here we aggregate US county-level data over 2 hops:

`Admin2 -> USDivision -> USRegion`

using the variable-length relationship [:IN*].

In [12]:
query = """
MATCH (c:Cases{date: date($date), source: 'JHU'})-[:REPORTED_IN]->(:Admin2)-[:IN*]->(u:USRegion)
RETURN sum(c.cases) AS count, u.name AS USRegion
ORDER by count DESC
"""
graph.run(query, date=date).to_data_frame()

Unnamed: 0,count,USRegion
0,9533817,South Region
1,5709329,West Region
2,5543101,Midwest Region
3,3662754,Northeast Region


### COVID-19 cases aggregated by UN Region
Here we aggregate country-level data over up to 3 hops:

`country -> UNSubRegion -> UNIntermediateRegion -> UNRegion`

using the variable-length relationship `[:IN*`].

In [13]:
query = """
MATCH (c:Cases{date: date($date), source: 'JHU'})-[:REPORTED_IN]->(:Country)-[:IN*]->(u:UNRegion)
RETURN sum(c.cases) AS count, u.name AS UNRegion
ORDER by count DESC
"""
df = graph.run(query, date=date).to_data_frame()
df

Unnamed: 0,count,UNRegion
0,42827946,Americas
1,28659580,Europe
2,22397619,Asia
3,3406952,Africa
4,21087,Oceania


In [14]:
print("Total number of confirmed cases:", df['count'].sum())

Total number of confirmed cases: 97313184
