# Example queries for Case Counts on COVID-19 Knowledge Graph
[Work in progress]

This notebook demonstrates how to run Cypher queries to retrieve and aggregate COVID-19 case data.

COVID-19 case numbers are provided by:

Country and US County level data: [JHU](https://github.com/covid-19-net/covid-19-community/blob/master/reference_data/DataProvider.csv#L6)

San Diego data by zip code: [SDHHSA](https://github.com/covid-19-net/covid-19-community/blob/master/reference_data/DataProvider.csv#L7)

In [1]:
import datetime
import pandas as pd
from py2neo import Graph

In [2]:
pd.options.display.max_rows = None  # display all rows
pd.options.display.max_columns = None  # display all columsns

#### Connect to COVID-19-Net Knowledge Graph

In [3]:
graph = Graph("bolt://132.249.238.185:7687", user="reader", password="demo")

In [4]:
# currently defining "yesterday" as two days from current date, 
# since there are time periods during the day, where data from the 
# previous day are not yet available in UTC date.
today = datetime.datetime.utcnow().date()
yesterday = today - datetime.timedelta(days=2)

## COVID Case Data

### Case counts by Country

In [5]:
query = """
MATCH (c:Cases{date: date($day), source: 'JHU'})-[:REPORTED_IN]->(n:Country)
RETURN n.name, c.cummulativeConfirmed, c.cummulativeDeaths, c.date as dateUTC
ORDER by n.name
"""
graph.run(query, day=yesterday).to_data_frame()

Unnamed: 0,n.name,c.cummulativeConfirmed,c.cummulativeDeaths,dateUTC
0,Afghanistan,29157,598,2020-06-22
1,Albania,1995,44,2020-06-22
2,Algeria,11920,852,2020-06-22
3,Andorra,855,52,2020-06-22
4,Angola,186,10,2020-06-22
5,Anguilla,3,0,2020-06-22
6,Antigua and Barbuda,26,3,2020-06-22
7,Argentina,44931,1043,2020-06-22
8,Armenia,20588,360,2020-06-22
9,Aruba,101,3,2020-06-22


### Case counts by US States aggregated from US Counties
Note, some counties in the Johns Hopkins dataset cannot be mapped to US counties. They are listed in as "None"

In [6]:
query = """
MATCH (c:Cases{date: $date, source: 'JHU'})-[:REPORTED_IN]->(a:Admin2)-[:IN]->(a1:Admin1)
RETURN a1.name as state, a1.code, a1.location, sum(c.cummulativeDeaths) as deaths, sum(c.cummulativeConfirmed) as confirmed, c.date as dateUTC
ORDER BY a1.code
"""
graph.run(query, date=yesterday).to_data_frame()

Unnamed: 0,state,a1.code,a1.location,deaths,confirmed,dateUTC
0,Alaska,AK,"(-150.00028, 64.00028)",12,755,2020-06-22
1,Alabama,AL,"(-86.75026, 32.75041)",829,29827,2020-06-22
2,Arkansas,AR,"(-92.50044, 34.75037)",224,14707,2020-06-22
3,Arizona,AZ,"(-111.50098, 34.5003)",1350,54599,2020-06-22
4,California,CA,"(-119.75126, 37.25022)",5518,181580,2020-06-22
5,Colorado,CO,"(-105.50083, 39.00027)",1651,30689,2020-06-22
6,Connecticut,CT,"(-72.66648, 41.66704)",4263,45512,2020-06-22
7,Delaware,DE,"(-75.49992, 39.00039)",435,10803,2020-06-22
8,Florida,FL,"(-82.5001, 28.75054)",3125,98376,2020-06-22
9,Georgia,GA,"(-83.50018, 32.75042)",2606,60913,2020-06-22


### Current cases in San Diego County

In [7]:
admin2 = 'San Diego County'

query = """
MATCH (c:Cases{date: date($day), source: 'JHU'})-[:REPORTED_IN]->(a:Admin2{name: $admin2})
RETURN a.name as name, c.cummulativeConfirmed as confirmed, c.cummulativeDeaths as deaths, c.date as date
"""
graph.run(query, admin2=admin2, day=yesterday).to_data_frame()

Unnamed: 0,name,confirmed,deaths,date
0,San Diego County,11096,338,2020-06-22


### COVID-19 Cases Time Series in San Diego County

In [8]:
query = """
MATCH (c:Cases{source: 'JHU'})-[:REPORTED_IN]->(a:Admin2{name: $admin2})
RETURN a.name as name, c.cummulativeConfirmed as confirmed, c.cummulativeDeaths as deaths, c.date as dateUTC
ORDER BY c.date DESC
"""
graph.run(query, admin2=admin2).to_data_frame()

Unnamed: 0,name,confirmed,deaths,dateUTC
0,San Diego County,11096,338,2020-06-22
1,San Diego County,10794,338,2020-06-21
2,San Diego County,10484,338,2020-06-20
3,San Diego County,10350,332,2020-06-19
4,San Diego County,10092,331,2020-06-18
5,San Diego County,9730,323,2020-06-17
6,San Diego County,9730,323,2020-06-16
7,San Diego County,9610,320,2020-06-15
8,San Diego County,9440,319,2020-06-14
9,San Diego County,9314,319,2020-06-13


### Current COVID-19 Cases in San Diego Country by Zip code

In [9]:
query = """
MATCH (c:Cases{date: date($day), source: 'SDHHSA'})-[:REPORTED_IN]->(p:PostalCode)
RETURN p.name as zip, p.placeName, p.location, c.cummulativeConfirmed as confirmed, c.date as date
ORDER by zip
"""
graph.run(query, day=yesterday).to_data_frame()

### COVID 19 Cases Time Series for Carlsbad, California
Cases are aggregated from the zip-level data (note, some zip code areas may cross city boundaries)

In [10]:
query = """
MATCH (c:Cases{source: 'SDHHSA'})-[:REPORTED_IN]->(p:PostalCode{placeName:'Carlsbad'})-[:IN*]->(a:Admin1{name: 'California'})
RETURN c.date as date, sum(c.cummulativeConfirmed) as confirmed
ORDER by date
"""
graph.run(query).to_data_frame()

Unnamed: 0,date,confirmed
0,2020-03-30,31
1,2020-03-31,32
2,2020-04-01,37
3,2020-04-02,39
4,2020-04-03,41
5,2020-04-04,42
6,2020-04-05,42
7,2020-04-06,42
8,2020-04-07,43
9,2020-04-08,43
