# Example queries for Case Counts on COVID-19 Knowledge Graph
[Work in progress]

This notebook demonstrates how to run Cypher queries to retrieve and aggregate COVID-19 case counts.

COVID-19 case numbers are provided by:

Country and US County level data: [JHU](https://github.com/covid-19-net/covid-19-community/blob/master/reference_data/DataProvider.csv#L6)

San Diego data by zip code: [SDHHSA](https://github.com/covid-19-net/covid-19-community/blob/master/reference_data/DataProvider.csv#L7)

In [1]:
import datetime
import pandas as pd
from py2neo import Graph

In [2]:
pd.options.display.max_rows = None  # display all rows
pd.options.display.max_columns = None  # display all columsns

#### Connect to COVID-19-Net Knowledge Graph

In [3]:
graph = Graph("bolt://132.249.238.185:7687", user="reader", password="demo")

In [4]:
# currently defining "yesterday" as two days from current date, 
# since there are time periods during the day, where data from the 
# previous day are not yet available in UTC date.
today = datetime.datetime.utcnow().date()
yesterday = today - datetime.timedelta(days=1)

## COVID Case Data

### Case counts by Country (using Johns Hopkins COVID-19 data)

In [5]:
query = """
MATCH (c:Cases{date: date($day), source: 'JHU'})-[:REPORTED_IN]->(n:Country)
RETURN n.name, c.cases, c.deaths, c.date as date
ORDER by n.name
"""
graph.run(query, day=yesterday).to_data_frame()

Unnamed: 0,n.name,c.cases,c.deaths,date
0,Afghanistan,41032,1523,2020-10-27
1,Albania,19729,487,2020-10-27
2,Algeria,56706,1931,2020-10-27
3,Andorra,4410,72,2020-10-27
4,Angola,9871,271,2020-10-27
5,Anguilla,3,0,2020-10-27
6,Antigua and Barbuda,124,3,2020-10-27
7,Argentina,1116609,29730,2020-10-27
8,Armenia,80410,1222,2020-10-27
9,Aruba,4437,36,2020-10-27


### Case counts by US States aggregated from US Counties
Note, some counties in the Johns Hopkins dataset cannot be mapped to US counties. They are listed in as "None"

In [6]:
query = """
MATCH (c:Cases{date: $date, source: 'JHU'})-[:REPORTED_IN]->(a:Admin2)-[:IN]->(a1:Admin1)
RETURN a1.name as state, a1.code, a1.location, sum(c.deaths) as deaths, sum(c.cases) as confirmed, c.date as date
ORDER BY a1.code
"""
graph.run(query, date=yesterday).to_data_frame()

Unnamed: 0,state,a1.code,a1.location,deaths,confirmed,date
0,Alaska,AK,"(-150.00028, 64.00028)",70,14736,2020-10-27
1,Alabama,AL,"(-86.75026, 32.75041)",2892,186437,2020-10-27
2,Arkansas,AR,"(-92.50044, 34.75037)",1857,105581,2020-10-27
3,Arizona,AZ,"(-111.50098, 34.5003)",5890,240121,2020-10-27
4,California,CA,"(-119.75126, 37.25022)",17460,914888,2020-10-27
5,Colorado,CO,"(-105.50083, 39.00027)",2236,98710,2020-10-27
6,Connecticut,CT,"(-72.66648, 41.66704)",4595,68484,2020-10-27
7,District of Columbia,DC,"(-77.00025, 38.91706)",644,16906,2020-10-27
8,Delaware,DE,"(-75.49992, 39.00039)",686,24158,2020-10-27
9,Florida,FL,"(-82.5001, 28.75054)",16505,784455,2020-10-27


### Current cases in San Diego County

In [7]:
admin2 = 'San Diego County'

query = """
MATCH (c:Cases{date: date($day), source: 'JHU'})-[:REPORTED_IN]->(a:Admin2{name: $admin2})
RETURN a.name as name, c.cases as confirmed, c.deaths as deaths, c.date as date
"""
graph.run(query, admin2=admin2, day=yesterday).to_data_frame()

Unnamed: 0,name,confirmed,deaths,date
0,San Diego County,55210,877,2020-10-27


### COVID-19 Cases Time Series in San Diego County

In [8]:
query = """
MATCH (c:Cases{source: 'JHU'})-[:REPORTED_IN]->(a:Admin2{name: $admin2})
RETURN a.name as name, c.cases as confirmed, c.deaths as deaths, c.date as date
ORDER BY c.date DESC
"""
graph.run(query, admin2=admin2).to_data_frame()

Unnamed: 0,name,confirmed,deaths,date
0,San Diego County,55210,877,2020-10-27
1,San Diego County,54941,870,2020-10-26
2,San Diego County,54583,870,2020-10-25
3,San Diego County,54314,868,2020-10-24
4,San Diego County,53928,867,2020-10-23
5,San Diego County,53498,866,2020-10-22
6,San Diego County,53263,863,2020-10-21
7,San Diego County,53000,857,2020-10-20
8,San Diego County,52735,853,2020-10-19
9,San Diego County,52355,853,2020-10-18


### COVID-19 Cases in San Diego Country by Zip code
The latest data may show up with a delay. If no output is show, adjust the date to a day earlier using this format: `date('2020-06-21')`.

In [9]:
query = """
MATCH (c:Cases{date: date($day), source: 'SDHHSA'})-[:REPORTED_IN]->(p:PostalCode)
RETURN p.name as zip, p.placeName, p.location, c.cases as confirmed, c.date as date
ORDER by zip
"""
graph.run(query, day=yesterday).to_data_frame()

### COVID 19 Cases Time Series for Carlsbad, California
Cases are aggregated from the zip-level data (note, some zip code areas may cross city boundaries)

In [10]:
query = """
MATCH (c:Cases{source: 'SDHHSA'})-[:REPORTED_IN]->(p:PostalCode{placeName:'Carlsbad'})-[:IN*]->(a:Admin1{name: 'California'})
RETURN c.date as date, sum(c.cases) as confirmed
ORDER by date DESC
"""
graph.run(query).to_data_frame()

Unnamed: 0,date,confirmed
0,2020-10-26,841
1,2020-10-25,838
2,2020-10-24,832
3,2020-10-23,830
4,2020-10-22,822
5,2020-10-21,815
6,2020-10-20,808
7,2020-10-19,801
8,2020-10-18,794
9,2020-10-17,787


### COVID-19 cases aggregated by US Regions
Here we aggregate US county-level data over 2 hops:

`Admin2 -> USDivision -> USRegion`

using the variable-length relationship [:IN*].

In [11]:
query = """
MATCH (c:Cases{date: date($day), source: 'JHU'})-[:REPORTED_IN]->(:Admin2)-[:IN*]->(u:USRegion)
RETURN sum(c.cases) AS count, u.name AS USRegion
ORDER by count DESC
"""
graph.run(query, day=yesterday).to_data_frame()

Unnamed: 0,count,USRegion
0,3895410,South Region
1,1772984,Midwest Region
2,1757378,West Region
3,1195430,Northeast Region


### COVID-19 cases aggregated by UN Region
Here we aggregate country-level data over up to 3 hops:

`country -> UNSubRegion -> UNIntermediateRegion -> UNRegion`

using the variable-length relationship `[:IN*`].

In [12]:
query = """
MATCH (c:Cases{date: date($day), source: 'JHU'})-[:REPORTED_IN]->(:Country)-[:IN*]->(u:UNRegion)
RETURN sum(c.cases) AS count, u.name AS UNRegion
ORDER by count DESC
"""
df = graph.run(query, day=yesterday).to_data_frame()
df

Unnamed: 0,count,UNRegion
0,19799899,Americas
1,13186291,Asia
2,8824410,Europe
3,1747522,Africa
4,9030,Oceania


In [13]:
print("Total number of confirmed cases:", df['count'].sum())

Total number of confirmed cases: 43567152
