# Genericizing the Covid-19 Query

In a previous notebook, we used a query that looked at a couple weeks back in April, comparing them against each other to find case rate increases week-to-week.  What if we wanted to make that query generic so we could run it any time and get an up-to-date result?  Here's what that query would look like.

# First Make Sure the NYT Data is Up To Date
This will pull down the latest CSV if there's no data from yesterday in the local Postgres table.

In [None]:
# See the file my_nyt_update.py in the jupyter_notebooks directory
from my_nyt_update import update_nyt_if_needed

update_nyt_if_needed()

## Using Python for the Date Math

You should be able to do the date math in SQL as well, but I think this way's a little bit more readable.

In [None]:
from datetime import datetime, timedelta
import pytz

# "last" week is 7 days ending yesterday, "prev" week is the 7 days before that
tz = pytz.timezone("Etc/UTC")
todays_date = tz.localize(datetime.today())
prev_week_start = todays_date - timedelta(days=14)
prev_week_end = todays_date - timedelta(days=8)
last_week_start = todays_date - timedelta(days=7)
last_week_end = todays_date - timedelta(days=1)
p = "%Y-%m-%d"

print("Today is %s.  We will compare %s thorugh %s inclusive to %s through %s." % \
     (todays_date.strftime(p), prev_week_start.strftime(p), prev_week_end.strftime(p), \
     last_week_start.strftime(p), last_week_end.strftime(p)))

# Compare the Last 2 Weeks for Counties Where the Case Rate has Doubled

Leveraging the query we did in Notebook 5, let's take a more generic approach, where we can run the query at any time to do the following:
- Sum the cases over the last 7 days for each region (county)
- Also sum the 7 days before that
- Compare the two
- Find regions where the case rate has increased more than 100%
- Limit the search to regions with more than 25 cases last week

We'll put the result into a new table called ```cases_change_by_fips_yesterday``` so we can map it later.

In [None]:
from my_connect import my_connect
import psycopg2.sql as sql

connection = my_connect()
cursor = connection.cursor()

# This selects the cases for the "prev" week in each FIPS region into a temp table
q1 = sql.SQL("""
DROP TABLE IF EXISTS temp_table2;
SELECT fips, 
SUM(CASE WHEN date BETWEEN {} AND {} AND fips <> 'None' THEN cases_since_prev_day ELSE 0.00 END) AS week1
INTO TEMP TABLE temp_table2
FROM nyt_us_covid19
GROUP BY (fips);
""")

# This selects the cases for the "last" week (most recent 7 days ending yesterday) and computes the difference

q2 = sql.SQL("""

DROP TABLE IF EXISTS cases_change_by_fips_yesterday;

SELECT nyt_us_covid19.fips, week1 as prev_week_cases,

SUM(CASE WHEN date BETWEEN {} AND {} AND nyt_us_covid19.fips <> 'None' 
    THEN cases_since_prev_day ELSE 0.00 END) as last_week_cases,
    
SUM(CASE WHEN date BETWEEN {} AND {} AND nyt_us_covid19.fips <> 'None' 
    THEN cases_since_prev_day ELSE 0.00 END) - week1 AS case_change,
    
-- This prevents division by zero when there are no new cases
(CASE WHEN week1 = 0 THEN 0 ELSE 
  (SUM(CASE WHEN date BETWEEN {} AND {} AND nyt_us_covid19.fips <> 'None' 
    THEN cases_since_prev_day ELSE 0.00 END) - week1) / week1
END) * 100 AS percent_change,

fips.area_name, fips.state, CONCAT(fips.area_name, ', ', fips.state) AS county_state

INTO cases_change_by_fips_yesterday
FROM nyt_us_covid19 
JOIN temp_table2 ON (temp_table2.fips = nyt_us_covid19.fips)
JOIN fips ON (fips.fipstxt = nyt_us_covid19.fips)
GROUP BY (nyt_us_covid19.fips, week1, fips.area_name, fips.state)

HAVING

-- only select rows where last_week_cases > 25
(SUM(CASE WHEN date BETWEEN {} AND {} AND nyt_us_covid19.fips <> 'None' 
    THEN cases_since_prev_day ELSE 0.00 END) > 25)
    
AND

-- only select rows where percent_change > 100
((CASE WHEN week1 = 0 THEN 0 ELSE 
  (SUM(CASE WHEN date BETWEEN {} AND {} AND nyt_us_covid19.fips <> 'None' 
    THEN cases_since_prev_day ELSE 0.00 END) - week1) / week1
END) * 100 > 100)
    
ORDER BY percent_change desc;
""")

y = sql.Literal(last_week_start.strftime(p))
z = sql.Literal(last_week_end.strftime(p))
q1_str = q1.format(sql.Literal(prev_week_start.strftime(p)), sql.Literal(prev_week_end.strftime(p)))
q2_str = q2.format(y, z, y, z, y, z, y, z, y, z)

cursor.execute(q1_str)
connection.commit()
cursor.execute(q2_str)
connection.commit()

In [None]:
import pandas
connection = my_connect()
q = """
SELECT * FROM cases_change_by_fips_yesterday ORDER BY percent_change desc limit 500;
"""
df = pandas.io.sql.read_sql_query(q, connection)
pandas.set_option('display.max_rows', 500)
pandas.set_option('display.width', 150)
print(df.head(500))

Here's an example ArcGIS Map in PowerBI:

<img src="images/map-cases-doubled.png">

# How to Create the Map in Power BI

- Create a new file
- Get Data

<img src="images/filled-map/10-get-data.png">

- Select PostgreSQL and click Connect

<img src="images/filled-map/20-postgres.png">

- Enter 'localhost' as the server, 'sales' as the database, and select the DirectQuery radio:

<img src="images/filled-map/30-postgres-settings.png">

- Select the 'cases_change_by_fips_yesterday' table and click Load

<img src="images/filled-map/40-table-selection.png">

- Under Visualizations, select 'ArcGIS Maps for Power BI'

<img src="images/filled-map/50-arcgis.png">

- In order to get the correct behavior from the ArcGIS maps, you need to use the county and state name.  The FIPS code won't work.  Note that I have generated such a column into the table for this purpose called ```county_state```:

<img src="images/filled-map/60-county-state.png">

- Make sure the Fields tab/area is selected (highlighted box on the left below)
- Drag the ```county_state``` field to Location
- Drag the ```fips``` field to Tooltips
- Drag the ```percent_change``` field to Color

<img src="images/filled-map/70-drag-fields.png">

- Here's an example of the resultant map:

<img src="images/filled-map/80-map-example.png">



*Contents © Copyright 2020 HP Development Company, L.P. SPDX-License-Identifier: MIT*