# Example queries on COVID-19 Knowledge Graph

This notebook shows how to run simple [Cypher](https://neo4j.com/developer/cypher-query-language/) queries on the knowledge graph.

In [1]:
import os
import time
import pandas as pd
from py2neo import Graph

### Setup Pandas parameters

In [2]:
pd.options.display.max_rows = None  # display all rows
pd.options.display.max_columns = None  # display all columsns

In [3]:
def make_clickable(val):
    return f'<a target="_blank" href="{val}">{val}</a>'

### Start Neo4j database

In [4]:
NEO4J_HOME = os.getenv('NEO4J_HOME')
print(NEO4J_HOME)

/Users/peter/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-993db298-6374-4f0a-9a9a-d0783480877a/installation-3.5.14


In [5]:
!"$NEO4J_HOME"/bin/neo4j start

Active database: graph.db
Directories in use:
  home:         /Users/peter/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-993db298-6374-4f0a-9a9a-d0783480877a/installation-3.5.14
  config:       /Users/peter/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-993db298-6374-4f0a-9a9a-d0783480877a/installation-3.5.14/conf
  logs:         /Users/peter/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-993db298-6374-4f0a-9a9a-d0783480877a/installation-3.5.14/logs
  plugins:      /Users/peter/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-993db298-6374-4f0a-9a9a-d0783480877a/installation-3.5.14/plugins
  import:       NOT SET
  data:         /Users/peter/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-993db298-6374-4f0a-9a9a-d0783480877a/installation-3.5.14/data
  certificates: /Users/peter/Library/Application Support/Neo4j Desktop/Application/neo4

Wait until database is started up

In [9]:
# TODO check database status instead of waiting for 15 seconds. If steps below fail, run sleep command and try again.
time.sleep(15)

In [10]:
graph = Graph(password="neo4jbinder")

## Query graph about available Dashboards

### List COVID-19 Dashboards

In [11]:
query = """
MATCH (:Outbreak{name:'COVID-19'})-[:EXPLORE_IN]->(d:Dashboard)
RETURN d.name as name, d.description as description, d.url as url
"""
df = graph.run(query).to_data_frame()
df.style.format({'url': make_clickable})

Unnamed: 0,name,description,url
0,Florida-COVID-19,Florida's COVID-19 Data and Surveillance Dashboard,https://experience.arcgis.com/experience/96dd742462124fa0b38ddedb9b25e429
1,Informationisbeautiful-COVID-19,COVID-19 #Coronavirus DataPack,https://informationisbeautiful.net/visualizations/covid-19-coronavirus-infographic-datapack/
2,OurWorldInData-COVID-19,Coronavirus Disease (COVID-19) – Statistics and Research,https://ourworldindata.org/coronavirus
3,KINEVIZ-COVID-19,Global SARS-CoV-2 (COVID-19) Tracking,https://www.kineviz.com/covid19
4,HongKong-COVID-19,Coronavirus Disease (COVID-19) in HK,https://www.coronavirus.gov.hk/eng/index.html
5,HealthMap-COVID-19,Novel Coronavirus (COVID-19),https://www.healthmap.org/covid-19/?mod=article_inline
6,BBC-COVID-19,Coronavirus: A visual guide to the pandemic,https://www.bbc.com/news/world-51235105
7,TheBaseLab-COVID-19,Data & News Update,https://coronavirus.thebaselab.com/
8,Nextstrain-COVID-19,Genomic epidemiology of novel coronavirus,https://nextstrain.org/ncov
9,Singapore-COVID-19,Dashboard of the COVID-19 Virus Outbreak in Singapore,https://co.vid19.sg/dashboard


### Find COVID-19 Dashboards for specific cities

In [12]:
query = """
MATCH (c:City)-[:EXPLORE_IN]-(d:Dashboard)
RETURN c.name as city, d.name as name, d.url as url
"""
df = graph.run(query).to_data_frame()
df.style.format({'url': make_clickable})

Unnamed: 0,city,name,url
0,Singapore,Singapore-COVID-19,https://co.vid19.sg/dashboard
1,Hong Kong,HongKong-COVID-19,https://www.coronavirus.gov.hk/eng/index.html


## Explore Strain Data

### List coronavirus outbreaks

In [13]:
query = """
MATCH (p:Pathogen)-[:CAUSES]->(o:Outbreak)
RETURN p.acronym as acronym, p.name as pathogen, p.taxonomy_id as taxonomy_id, o.name as outbreak, o.start_date as start_date
"""
graph.run(query).to_data_frame()

Unnamed: 0,acronym,pathogen,taxonomy_id,outbreak,start_date
0,SARS-CoV-2,Severe acute respiratory syndrome coronavirus 2,2697049,COVID-19,2019
1,MERS-CoV,Middle East respiratory syndrome-related coron...,1335626,MERS,2012
2,SARS-CoV,Severe acute respiratory syndrome-related coro...,694009,SARS,2003


### List person demographics and strain information for California
Note, demographs have become unavailable recently (see: https://github.com/nextstrain/ncov/issues/251)

In [14]:
query = """
MATCH (a:Admin1)<-[:LOCATED_IN]-(p:Person)-[:CARRIES]->(s:Strain)
WHERE a.name = 'California'
RETURN p.age as age, p.sex as sex, p.exposure_location as exposure_location, s.name as strain, s.clade as clade
"""
graph.run(query).to_data_frame()

Unnamed: 0,age,sex,exposure_location,strain,clade
0,,,California,USA/CA9/2020,A7
1,,,California,USA/CA8/2020,
2,,,Hubei,USA/CA7/2020,B4
3,,,Hubei,USA/CA6/2020,
4,,,Hubei,USA/CA5/2020,
5,,,California,USA/CA4/2020,
6,,,California,USA/CA3/2020,
7,,,Hubei,USA/CA2/2020,
8,,,Hubei,USA/CA1/2020,B
9,,,California,USA/CA-PC101P/2020,A2a


#### Same query using parameterized Cypher
Parameters to Cypher queries can be passed as key-value arguments. Parameters in Cypher are named and are wrapped in curly braces.

In [15]:
admin1 = 'California'

query = """
MATCH (a:Admin1{name: {admin1}})<-[:LOCATED_IN]-(p:Person)-[:CARRIES]->(s:Strain)
RETURN p.age as age, p.sex as sex, p.exposure_location as exposure_location, 
       s.name as strain, s.clade as clade, s.date as date
ORDER BY s.date
"""
graph.run(query, admin1=admin1).to_data_frame().head(100)

Unnamed: 0,age,sex,exposure_location,strain,clade,date
0,,,Hubei,USA/CA2/2020,,2020-01-22
1,,,Hubei,USA/CA1/2020,B,2020-01-23
2,,,Hubei,USA/CA6/2020,,2020-01-27
3,,,Hubei,USA/CA5/2020,,2020-01-29
4,,,California,USA/CA4/2020,,2020-01-29
5,,,California,USA/CA3/2020,,2020-01-29
6,,,Hubei,USA/CA7/2020,B4,2020-02-06
7,,,California,USA/CA8/2020,,2020-02-10
8,,,California,USA/CA9/2020,A7,2020-02-23
9,,,California,USA/CA-CDPH-UC4/2020,A7,2020-02-27


### Where did clade A originate?

In [16]:
clade = 'A'

query = """
MATCH (s:Strain)--(a:Country)
WHERE s.clade STARTS WITH {clade}
RETURN s.clade as clade, s.name, s.date, a.name
ORDER BY s.date
"""
graph.run(query, clade=clade).to_data_frame().head(100)

Unnamed: 0,clade,s.name,s.date,a.name
0,A3,Wuhan/HBCDC-HB-05/2020,2020-01-18,Mainland China
1,A3,Shandong/IVDC-SD-001/2020,2020-01-19,Mainland China
2,A1a,Hangzhou/ZJU-01/2020,2020-01-25,Mainland China
3,A2,China/Shanghai/SH0014,2020-01-28,Mainland China
4,A2,Germany/BavPat1/2020,2020-01-28,Germany
5,A1a,Italy/INMI1-cs/2020,2020-01-29,Italy
6,A1a,Italy/SPL1/2020,2020-01-29,Italy
7,A3,China/Shanghai/SH0022,2020-01-30,Mainland China
8,A3,China/Shanghai/SH0023,2020-01-30,Mainland China
9,A2,China/Shanghai/SH0086,2020-01-31,Mainland China


### Find persons that imported the virus from another location

In [17]:
query = """
MATCH (c:Admin1)<-[:LOCATED_IN]-(p:Person)-[:CARRIES]->(s:Strain)
WHERE c.name <> p.exposure_location
RETURN c.name as `state/province`, p.age as age, p.sex as sex, p.exposure_location as exposure_location, 
       s.name as strain, s.clade as clade
ORDER BY p.exposure_location
"""
graph.run(query).to_data_frame()

Unnamed: 0,state/province,age,sex,exposure_location,strain,clade
0,Kerala,,,China,India/1-31/2020,B
1,Kerala,,,China,India/1-27/2020,
2,Panama City,,,Comunitat Valenciana,Panama/328677/2020,A2a
3,British Columbia,,,Europe,Canada/BC_78548/2020,A1a
4,British Columbia,,,Grand Princess,Canada/BC_64686/2020,B1
5,Minnesota,,,Grand Princess,USA/MN3-MDH3/2020,B1
6,Minnesota,,,Grand Princess,USA/MN1-MDH1/2020,
7,British Columbia,,,Hong Kong,Canada/BC_35720/2020,
8,New South Wales,,,Hubei,Australia/NSW01/2020,B
9,Queensland,,,Hubei,Australia/QLD01/2020,B4


### Strains in Sydney

In [18]:
city = 'Sydney'

query = """
MATCH (c:City{name: {city}})<-[:LOCATED_IN]-(p:Person)-[:CARRIES]->(s:Strain)
RETURN c.name as city, s.name as strain, s.clade as clade, p.exposure_location, s.date as date
ORDER BY s.date
"""
graph.run(query, city=city).to_data_frame()

Unnamed: 0,city,strain,clade,p.exposure_location,date
0,Sydney,Australia/NSW02/2020,,New South Wales,2020-01-22
1,Sydney,Australia/NSW01/2020,B,Hubei,2020-01-24
2,Sydney,Australia/NSW03/2020,,New South Wales,2020-01-25
3,Sydney,Australia/NSW10/2020,,New South Wales,2020-02-28
4,Sydney,Australia/NSW09/2020,A3,New South Wales,2020-02-28
5,Sydney,Australia/NSW08/2020,,New South Wales,2020-02-28
6,Sydney,Australia/NSW05/2020,A3,Iran,2020-02-28
7,Sydney,Australia/NSW07/2020,A3,New South Wales,2020-02-29
8,Sydney,Australia/NSW06/2020,A3,Iran,2020-02-29
9,Sydney,Australia/NSW11/2020,A3,Iran,2020-03-02


In [19]:
### Stop Neo4j database when done

In [20]:
!"$NEO4J_HOME"/bin/neo4j stop

Stopping Neo4j.. stopped
