# Mapping COVID-19 with BigQuery

My colleague, Amir Hormati, pointed me to a Johns Hopkins dataset of COVID-19 confirmed cases. It’s a [nice little CSV file](https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv) and, of course, I wanted to explore the data in BigQuery. Follow along with me!


## Load the data into BigQuery

The data is small enough that we can download it using wget

In [3]:
%%bash
rm -rf time_series_19-covid-Confirmed.csv
wget --quiet https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv

In [5]:
!ls *.csv

time_series_19-covid-Confirmed.csv


### Create schema file

In [29]:
import json
def fix_col_name(name):
    if name == 'Lat':
        return 'latitude'
    elif name == 'Long':
        return 'longitude'
    elif str(name[0]).isnumeric():
        # it's a date.
        return 'on_' + name
    else:
        return name

with open('time_series_19-covid-Confirmed.csv') as ifp:
    schema = []
    columns = ifp.readline().strip().split(',')
    for index, col in enumerate(columns):
        entry = {
            'mode' : 'nullable',
            'name' : fix_col_name(col.replace('/', '_')),
            'type' : 'STRING' if (index < 2) else ('FLOAT' if (index < 4) else 'INTEGER')
        }
        schema.append(entry)
    with open('schema.json', 'w') as ofp:
        ofp.write(json.dumps(schema, indent=2))

In [30]:
!head -30 schema.json

[
  {
    "name": "Province_State",
    "mode": "nullable",
    "type": "STRING"
  },
  {
    "name": "Country_Region",
    "mode": "nullable",
    "type": "STRING"
  },
  {
    "name": "latitude",
    "mode": "nullable",
    "type": "FLOAT"
  },
  {
    "name": "longitude",
    "mode": "nullable",
    "type": "FLOAT"
  },
  {
    "name": "on_1_22_20",
    "mode": "nullable",
    "type": "INTEGER"
  },
  {
    "name": "on_1_23_20",
    "mode": "nullable",
    "type": "INTEGER"


### Load the data into BigQuery

Create the dataset "advdata" if it doesn't already exist

In [31]:
%%bash

bq_safe_mk() {
    dataset=$1
    exists=$(bq ls -d | grep -w $dataset)
    if [ -n "$exists" ]; then
       echo "Not creating $dataset since it already exists"
    else
       echo "Creating $dataset"
       bq mk $dataset
    fi
}

bq_safe_mk advdata

SCHEMA="--schema=schema.json --skip_leading_rows=1"

bq $LOC \
   load --null_marker=NULL --replace \
   --source_format=CSV $SCHEMA \
   advdata.covid19 \
   time_series_19-covid-Confirmed.csv

Not creating advdata since it already exists


Upload complete.
Waiting on bqjob_r71701c1c310c22e1_00000170a2aec9a1_1 ... (1s) Current status: DONE   


### Sanity check with a query

In [32]:
%%bigquery

SELECT Province_State, on_1_22_20 as confirmed_cases
FROM advdata.covid19
ORDER BY confirmed_cases desc
LIMIT 5

Unnamed: 0,Province_State,confirmed_cases
0,Hubei,444
1,Guangdong,26
2,Beijing,14
3,Zhejiang,10
4,Shanghai,9


### Plot map using folium

Let's plot the current number of confirmed cases

In [36]:
%pip install --quiet folium

Note: you may need to restart the kernel to use updated packages.


In [4]:
%%bigquery df

SELECT Province_State, latitude, longitude, on_1_22_20 as confirmed_cases
FROM advdata.covid19

In [5]:
df

Unnamed: 0,Province_State,latitude,longitude,confirmed_cases
0,,32.0000,53.0000,0
1,"Omaha, NE (From Diamond Princess)",41.2545,-95.9758,0
2,"Travis, CA (From Diamond Princess)",38.2721,-121.9399,0
3,From Diamond Princess,35.4437,139.6380,0
4,"Lackland, TX (From Diamond Princess)",29.3829,-98.6134,0
...,...,...,...,...
136,Diamond Princess cruise ship,35.4437,139.6380,0
137,Hunan,27.6104,111.7088,4
138,Guangxi,23.8298,108.7881,2
139,Shaanxi,35.1917,108.8701,0


In [6]:
import folium
map_pts = folium.Map(location=[47, -122], zoom_start=8)
for idx, row in df.iterrows():
    popup = '{} {}'.format(row['Province_State'], row['confirmed_cases'])
    folium.Marker( location=[row['latitude'], row['longitude']], popup=popup, radius=row['confirmed_cases']).add_to(map_pts)
map_pts

Copyright 2020 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License