## Covid data for the counties I care about

The Washington Post has convenient data by state. I care about Washington DC, where I live, and how certain other locations are doing. The state-level data is not fine-grained enough for what I watn to see. MSA and county level covid data are available online, but overwelming and not easily filterable to what I want. I'm creating this tool to provide information at the county level. 

## Plan:

Turn these into GitHub issues.

- Use plotly for interactive visualizations 
- Serve the website via FastAPI or put into Streamlit. 
- Use GitHub actions to get data from NYT repo into a forked repo. I forked, updated, and release a GitHub action.
- Use Prefect to run my script where I pull the data from NYT website and process it daily.
- Use Great Expectations for data quality checking.
- Use PyTest for code checking. 
- May use DVC to version data.
- Could push the data to a database for fun/speed.

- Make an app that allows other users to choose which counties they want to include. Have to make some design decisions about how to show users that information. 

Imports and config

In [1]:
import pandas as pd
import plotly.express as px

pd.options.display.max_rows = 100


Read in data

In [2]:
df_2022 = pd.read_csv('https://raw.githubusercontent.com/nytimes/covid-19-data/master/rolling-averages/us-counties-2022.csv', index_col='date')

df_2022


Unnamed: 0_level_0,geoid,county,state,cases,cases_avg,cases_avg_per_100k,deaths,deaths_avg,deaths_avg_per_100k
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2022-01-01,USA-72999,Unknown,Puerto Rico,0,328.14,,0,0.00,
2022-01-01,USA-72153,Yauco,Puerto Rico,0,66.50,196.40,0,0.00,0.00
2022-01-01,USA-72151,Yabucoa,Puerto Rico,0,63.13,196.30,0,0.00,0.00
2022-01-01,USA-72149,Villalba,Puerto Rico,0,47.50,221.18,0,0.00,0.00
2022-01-01,USA-72147,Vieques,Puerto Rico,0,7.63,91.16,0,0.00,0.00
...,...,...,...,...,...,...,...,...,...
2022-01-31,USA-69110,Saipan,Northern Mariana Islands,0,0.00,0.00,0,0.00,0.00
2022-01-31,USA-69100,Rota,Northern Mariana Islands,0,0.00,0.00,0,0.00,0.00
2022-01-31,USA-78030,St. Thomas,Virgin Islands,27,17.86,34.58,3,0.43,0.83
2022-01-31,USA-78020,St. John,Virgin Islands,8,2.71,65.09,0,0.00,0.00


In [3]:
df_2022.info()


<class 'pandas.core.frame.DataFrame'>
Index: 100920 entries, 2022-01-01 to 2022-01-31
Data columns (total 9 columns):
 #   Column               Non-Null Count   Dtype  
---  ------               --------------   -----  
 0   geoid                100920 non-null  object 
 1   county               100920 non-null  object 
 2   state                100920 non-null  object 
 3   cases                100920 non-null  int64  
 4   cases_avg            100920 non-null  float64
 5   cases_avg_per_100k   99863 non-null   float64
 6   deaths               100920 non-null  int64  
 7   deaths_avg           100920 non-null  float64
 8   deaths_avg_per_100k  99863 non-null   float64
dtypes: float64(4), int64(2), object(3)
memory usage: 7.7+ MB


Finding counties that could be tricky to match spelling/format.

In [4]:
df_2022[df_2022["county"].str.startswith("Alexandria")].head(2)


Unnamed: 0_level_0,geoid,county,state,cases,cases_avg,cases_avg_per_100k,deaths,deaths_avg,deaths_avg_per_100k
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2022-01-01,USA-51510,Alexandria city,Virginia,0,281.14,176.34,0,0.38,0.24
2022-01-02,USA-51510,Alexandria city,Virginia,0,281.14,176.34,0,0.38,0.24


In [5]:
df_2022[df_2022["state"].str.startswith("District")].head(2)


Unnamed: 0_level_0,geoid,county,state,cases,cases_avg,cases_avg_per_100k,deaths,deaths_avg,deaths_avg_per_100k
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2022-01-01,USA-11001,District of Columbia,District of Columbia,0,2103.0,297.98,0,0.4,0.06
2022-01-02,USA-11001,District of Columbia,District of Columbia,0,2103.0,297.98,0,0.4,0.06


In [6]:
df_2022[df_2022["state"].str.contains("York")].head(2)


Unnamed: 0_level_0,geoid,county,state,cases,cases_avg,cases_avg_per_100k,deaths,deaths_avg,deaths_avg_per_100k
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2022-01-01,USA-36998,New York City,New York,45341,34646.38,415.58,20,27.89,0.33
2022-01-01,USA-36123,Yates,New York,28,13.25,53.19,0,0.14,0.57


Filter to counties of interest

In [7]:
counties = [
    "District of Columbia",
    "Wood",
    "Putnam",
    "Montgomery",
    "Prince George's",
    "Arlington",
    "Alexandria city",
    "New York City",  # README at NYT mentions some NE are city, not county
    "Allegheny",
    "Cook",
    "Baltimore",
    "Franklin",
    "Clermont",
    "Somerset",
    "Philadelphia",
    "Denver",
    "Boulder",
    "San Francisco",
    "Los Angeles",
    "Pima",
    "Manatee",
    "Fairfax"]


In [8]:
cols = ["county", "state", "geoid", "cases_avg_per_100k"]


See each state/county once.

In [9]:
df_2022_smaller.drop_duplicates(subset=["county", "state"])


NameError: name 'df_2022_smaller' is not defined

In [None]:

df_2022_smaller = df_2022.loc[df_2022["county"].isin(counties), cols]
df_2022_smaller


Unnamed: 0_level_0,county,state,geoid,cases_avg_per_100k
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2022-01-01,Wood,Wisconsin,USA-55141,82.19
2022-01-01,Wood,West Virginia,USA-54107,55.42
2022-01-01,Putnam,West Virginia,USA-54079,73.14
2022-01-01,Franklin,Washington,USA-53021,32.26
2022-01-01,Montgomery,Virginia,USA-51121,46.25
...,...,...,...,...
2022-01-30,Montgomery,Arkansas,USA-05097,104.93
2022-01-30,Franklin,Arkansas,USA-05047,152.41
2022-01-30,Pima,Arizona,USA-04019,187.36
2022-01-30,Montgomery,Alabama,USA-01101,199.57


Get 2021 data

In [None]:
df_2021 = pd.read_csv('https://raw.githubusercontent.com/nytimes/covid-19-data/master/rolling-averages/us-counties-2021.csv', index_col='date')
df_2021.shape

(1189116, 9)

In [None]:
df_2021_smaller = df_2021.loc[df_2021["county"].isin(counties), cols]
df_2021_smaller

Unnamed: 0_level_0,county,state,geoid,cases_avg_per_100k
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2021-01-01,Wood,Wisconsin,USA-55141,48.73
2021-01-01,Wood,West Virginia,USA-54107,79.02
2021-01-01,Putnam,West Virginia,USA-54079,58.71
2021-01-01,Franklin,Washington,USA-53021,51.01
2021-01-01,Montgomery,Virginia,USA-51121,32.33
...,...,...,...,...
2021-12-31,Montgomery,Arkansas,USA-05097,12.72
2021-12-31,Franklin,Arkansas,USA-05047,20.97
2021-12-31,Pima,Arizona,USA-04019,47.44
2021-12-31,Montgomery,Alabama,USA-01101,148.42


In [None]:
df_2020 = pd.read_csv('https://raw.githubusercontent.com/nytimes/covid-19-data/master/rolling-averages/us-counties-2020.csv', index_col='date')
df_2020.shape

(888907, 9)

In [None]:
df_2020_smaller = df_2020.loc[df_2020["county"].isin(counties), cols]
df_2020_smaller

Unnamed: 0_level_0,county,state,geoid,cases_avg_per_100k
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-01-24,Cook,Illinois,USA-17031,0.00
2020-01-25,Cook,Illinois,USA-17031,0.00
2020-01-26,Cook,Illinois,USA-17031,0.00
2020-01-26,Los Angeles,California,USA-06037,0.00
2020-01-27,Cook,Illinois,USA-17031,0.00
...,...,...,...,...
2020-12-31,Montgomery,Arkansas,USA-05097,46.10
2020-12-31,Franklin,Arkansas,USA-05047,47.58
2020-12-31,Pima,Arizona,USA-04019,82.53
2020-12-31,Montgomery,Alabama,USA-01101,50.15


In [None]:
df_combo = pd.concat([df_2020_smaller, df_2021_smaller, df_2022_smaller])
df_combo

Unnamed: 0_level_0,county,state,geoid,cases_avg_per_100k
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-01-24,Cook,Illinois,USA-17031,0.00
2020-01-25,Cook,Illinois,USA-17031,0.00
2020-01-26,Cook,Illinois,USA-17031,0.00
2020-01-26,Los Angeles,California,USA-06037,0.00
2020-01-27,Cook,Illinois,USA-17031,0.00
...,...,...,...,...
2022-01-30,Montgomery,Arkansas,USA-05097,104.93
2022-01-30,Franklin,Arkansas,USA-05047,152.41
2022-01-30,Pima,Arizona,USA-04019,187.36
2022-01-30,Montgomery,Alabama,USA-01101,199.57


Convert geoid to FIPS code for plotting

In [None]:
df_combo["fips"] = df_combo["geoid"].str[-5:]
df_combo


Unnamed: 0_level_0,county,state,geoid,cases_avg_per_100k,fips
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-01-24,Cook,Illinois,USA-17031,0.00,17031
2020-01-25,Cook,Illinois,USA-17031,0.00,17031
2020-01-26,Cook,Illinois,USA-17031,0.00,17031
2020-01-26,Los Angeles,California,USA-06037,0.00,06037
2020-01-27,Cook,Illinois,USA-17031,0.00,17031
...,...,...,...,...,...
2022-01-30,Montgomery,Arkansas,USA-05097,104.93,05097
2022-01-30,Franklin,Arkansas,USA-05047,152.41,05047
2022-01-30,Pima,Arizona,USA-04019,187.36,04019
2022-01-30,Montgomery,Alabama,USA-01101,199.57,01101


In [None]:
df_2022.info()

<class 'pandas.core.frame.DataFrame'>
Index: 97671 entries, 2022-01-01 to 2022-01-30
Data columns (total 9 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   geoid                97671 non-null  object 
 1   county               97671 non-null  object 
 2   state                97671 non-null  object 
 3   cases                97671 non-null  int64  
 4   cases_avg            97671 non-null  float64
 5   cases_avg_per_100k   96641 non-null  float64
 6   deaths               97671 non-null  int64  
 7   deaths_avg           97671 non-null  float64
 8   deaths_avg_per_100k  96641 non-null  float64
dtypes: float64(4), int64(2), object(3)
memory usage: 7.5+ MB


Filter to fips codes of counties I want. 

If ever make into an app, will change to have folks choose State and then County from drop downs.


In [None]:
fips_counties = [
    "11001",
    "24033",
    "24031",
    "17031",
    "39173",
    "39137",
    "39113",
    "39049",
    "51013",
    "42101",
    "42003",
    "39025",
    "08031",
    "08013",
    "04019",
    "24005",
    "06037",
    "06075",
    "36998",
    # "36061",
    "12081",
    "51510",
    "51059",
    "55083"
]

cols = ["county", "state", "fips", "cases_avg_per_100k"]

df_combo_counties = df_combo.loc[df_combo["fips"].isin(fips_counties), cols]
df_combo_counties


Unnamed: 0_level_0,county,state,fips,cases_avg_per_100k
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-01-24,Cook,Illinois,17031,0.00
2020-01-25,Cook,Illinois,17031,0.00
2020-01-26,Cook,Illinois,17031,0.00
2020-01-26,Los Angeles,California,06037,0.00
2020-01-27,Cook,Illinois,17031,0.00
...,...,...,...,...
2022-01-30,Denver,Colorado,08031,84.17
2022-01-30,Boulder,Colorado,08013,129.72
2022-01-30,San Francisco,California,06075,156.79
2022-01-30,Los Angeles,California,06037,220.07


In [None]:
px.line(
    df_combo_counties, x=df_combo_counties.index, y="cases_avg_per_100k", color="county"
)

NameError: name 'px' is not defined

Montgomery is kind of a mess

7-day rolling average of cases as of yesterday's data

TODO: Include older data.

Read historic data and concatenate DataFrames.

## Map

Most recent 7 day moving average.

Scatter geo. 

Future direction: could make an animation over time. Could do choropleth too. Could show the DataFrame for just the most recent day in an app, too.

In [None]:
most_recent_date = df_combo_counties.index.max()

In [None]:
int('012'.zfill(5))

12

In [None]:
df_newest = df_combo_counties[df_combo_counties.index == most_recent_date].sort_values(
    by="cases_avg_per_100k"
)
df_newest

# could add % change from week earlier


Unnamed: 0_level_0,county,state,fips,cases_avg_per_100k
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2022-01-30,Baltimore,Maryland,24005,29.82
2022-01-30,Prince George's,Maryland,24033,31.0
2022-01-30,Montgomery,Maryland,24031,42.43
2022-01-30,Philadelphia,Pennsylvania,42101,50.77
2022-01-30,District of Columbia,District of Columbia,11001,52.97
2022-01-30,Fairfax,Virginia,51059,74.86
2022-01-30,New York City,New York,36998,75.46
2022-01-30,Putnam,Ohio,39137,75.94
2022-01-30,Franklin,Ohio,39049,79.72
2022-01-30,Arlington,Virginia,51013,84.14


In [None]:
# Need a shape file or to  get lat lon for each of the counties
# harder to find then expected. Found a mapping here: https://simplemaps.com/data/us-counties.


In [None]:
df_mapping = pd.read_csv('uscounties.csv')
df_mapping

Unnamed: 0,county,county_ascii,county_full,county_fips,state_id,state_name,lat,lng,population
0,Los Angeles,Los Angeles,Los Angeles County,6037,CA,California,34.3207,-118.2248,10081570
1,Cook,Cook,Cook County,17031,IL,Illinois,41.8401,-87.8168,5198275
2,Harris,Harris,Harris County,48201,TX,Texas,29.8577,-95.3936,4646630
3,Maricopa,Maricopa,Maricopa County,4013,AZ,Arizona,33.3490,-112.4915,4328810
4,San Diego,San Diego,San Diego County,6073,CA,California,33.0341,-116.7353,3316073
...,...,...,...,...,...,...,...,...,...
3137,Arthur,Arthur,Arthur County,31005,NE,Nebraska,41.5689,-101.6958,427
3138,McPherson,McPherson,McPherson County,31117,NE,Nebraska,41.5682,-101.0605,395
3139,King,King,King County,48269,TX,Texas,33.6166,-100.2558,237
3140,Loving,Loving,Loving County,48301,TX,Texas,31.8493,-103.5799,98


Need to add a 0 in front for the mapping county fips.

In [None]:
df_mapping['county_fips_str'] = df_mapping['county_fips'].astype(str).str.zfill(5)
df_mapping

Unnamed: 0,county,county_ascii,county_full,county_fips,state_id,state_name,lat,lng,population,county_fips_str
0,Los Angeles,Los Angeles,Los Angeles County,6037,CA,California,34.3207,-118.2248,10081570,06037
1,Cook,Cook,Cook County,17031,IL,Illinois,41.8401,-87.8168,5198275,17031
2,Harris,Harris,Harris County,48201,TX,Texas,29.8577,-95.3936,4646630,48201
3,Maricopa,Maricopa,Maricopa County,4013,AZ,Arizona,33.3490,-112.4915,4328810,04013
4,San Diego,San Diego,San Diego County,6073,CA,California,33.0341,-116.7353,3316073,06073
...,...,...,...,...,...,...,...,...,...,...
3137,Arthur,Arthur,Arthur County,31005,NE,Nebraska,41.5689,-101.6958,427,31005
3138,McPherson,McPherson,McPherson County,31117,NE,Nebraska,41.5682,-101.0605,395,31117
3139,King,King,King County,48269,TX,Texas,33.6166,-100.2558,237,48269
3140,Loving,Loving,Loving County,48301,TX,Texas,31.8493,-103.5799,98,48301


In [None]:
df_ll = pd.merge(left=df_newest, right=df_mapping, how='left', left_on='fips', right_on='county_fips_str')
df_ll


Unnamed: 0,county_x,state,fips,cases_avg_per_100k,county_y,county_ascii,county_full,county_fips,state_id,state_name,lat,lng,population,county_fips_str
0,Baltimore,Maryland,24005,29.82,Baltimore,Baltimore,Baltimore County,24005.0,MD,Maryland,39.4627,-76.6393,828018.0,24005.0
1,Prince George's,Maryland,24033,31.0,Prince George's,Prince George's,Prince George's County,24033.0,MD,Maryland,38.8295,-76.8473,908670.0,24033.0
2,Montgomery,Maryland,24031,42.43,Montgomery,Montgomery,Montgomery County,24031.0,MD,Maryland,39.1363,-77.2042,1043530.0,24031.0
3,Philadelphia,Pennsylvania,42101,50.77,Philadelphia,Philadelphia,Philadelphia County,42101.0,PA,Pennsylvania,40.0077,-75.1339,1579075.0,42101.0
4,District of Columbia,District of Columbia,11001,52.97,District of Columbia,District of Columbia,District of Columbia,11001.0,DC,District of Columbia,38.9047,-77.0163,692683.0,11001.0
5,Fairfax,Virginia,51059,74.86,Fairfax,Fairfax,Fairfax County,51059.0,VA,Virginia,38.8368,-77.277,1145862.0,51059.0
6,New York City,New York,36998,75.46,,,,,,,,,,
7,Putnam,Ohio,39137,75.94,Putnam,Putnam,Putnam County,39137.0,OH,Ohio,41.0221,-84.1317,33911.0,39137.0
8,Franklin,Ohio,39049,79.72,Franklin,Franklin,Franklin County,39049.0,OH,Ohio,39.9695,-83.0093,1290360.0,39049.0
9,Arlington,Virginia,51013,84.14,Arlington,Arlington,Arlington County,51013.0,VA,Virginia,38.8786,-77.1011,233464.0,51013.0


In [None]:
df_ll.set_index('county_x', inplace=True)


In [None]:
df_ll

Unnamed: 0_level_0,state,fips,cases_avg_per_100k,county_y,county_ascii,county_full,county_fips,state_id,state_name,lat,lng,population,county_fips_str
county_x,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Baltimore,Maryland,24005,29.82,Baltimore,Baltimore,Baltimore County,24005.0,MD,Maryland,39.4627,-76.6393,828018.0,24005.0
Prince George's,Maryland,24033,31.0,Prince George's,Prince George's,Prince George's County,24033.0,MD,Maryland,38.8295,-76.8473,908670.0,24033.0
Montgomery,Maryland,24031,42.43,Montgomery,Montgomery,Montgomery County,24031.0,MD,Maryland,39.1363,-77.2042,1043530.0,24031.0
Philadelphia,Pennsylvania,42101,50.77,Philadelphia,Philadelphia,Philadelphia County,42101.0,PA,Pennsylvania,40.0077,-75.1339,1579075.0,42101.0
District of Columbia,District of Columbia,11001,52.97,District of Columbia,District of Columbia,District of Columbia,11001.0,DC,District of Columbia,38.9047,-77.0163,692683.0,11001.0
Fairfax,Virginia,51059,74.86,Fairfax,Fairfax,Fairfax County,51059.0,VA,Virginia,38.8368,-77.277,1145862.0,51059.0
New York City,New York,36998,75.46,,,,,,,,,,
Putnam,Ohio,39137,75.94,Putnam,Putnam,Putnam County,39137.0,OH,Ohio,41.0221,-84.1317,33911.0,39137.0
Franklin,Ohio,39049,79.72,Franklin,Franklin,Franklin County,39049.0,OH,Ohio,39.9695,-83.0093,1290360.0,39049.0
Arlington,Virginia,51013,84.14,Arlington,Arlington,Arlington County,51013.0,VA,Virginia,38.8786,-77.1011,233464.0,51013.0


Get New York's lat/lon manually - fips for Manhattan is 36998, NYT aggregates data for the city

In [None]:
df_mapping.head(30)


Unnamed: 0,county,county_ascii,county_full,county_fips,state_id,state_name,lat,lng,population,county_fips_str
0,Los Angeles,Los Angeles,Los Angeles County,6037,CA,California,34.3207,-118.2248,10081570,6037
1,Cook,Cook,Cook County,17031,IL,Illinois,41.8401,-87.8168,5198275,17031
2,Harris,Harris,Harris County,48201,TX,Texas,29.8577,-95.3936,4646630,48201
3,Maricopa,Maricopa,Maricopa County,4013,AZ,Arizona,33.349,-112.4915,4328810,4013
4,San Diego,San Diego,San Diego County,6073,CA,California,33.0341,-116.7353,3316073,6073
5,Orange,Orange,Orange County,6059,CA,California,33.703,-117.7611,3168044,6059
6,Miami-Dade,Miami-Dade,Miami-Dade County,12086,FL,Florida,25.615,-80.5623,2699428,12086
7,Dallas,Dallas,Dallas County,48113,TX,Texas,32.7666,-96.7779,2606868,48113
8,Kings,Kings,Kings County,36047,NY,New York,40.6395,-73.9385,2589974,36047
9,Riverside,Riverside,Riverside County,6065,CA,California,33.7437,-115.9938,2411439,6065


In [None]:
df_mapping[df_mapping['county_fips'] == 36061]


Unnamed: 0,county,county_ascii,county_full,county_fips,state_id,state_name,lat,lng,population,county_fips_str
20,New York,New York,New York County,36061,NY,New York,40.7781,-73.9675,1631993,36061


In [None]:
df_mapping.loc[df_mapping['county_fips'] == 36061, 'lng']

20   -73.9675
Name: lng, dtype: float64

In [None]:
df_ll.loc['New York City', 'lat'] = 40.7781
df_ll.loc['New York City', 'lng'] = -73.9675

In [None]:
df_ll

Unnamed: 0_level_0,state,fips,cases_avg_per_100k,county_y,county_ascii,county_full,county_fips,state_id,state_name,lat,lng,population,county_fips_str
county_x,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Baltimore,Maryland,24005,29.82,Baltimore,Baltimore,Baltimore County,24005.0,MD,Maryland,39.4627,-76.6393,828018.0,24005.0
Prince George's,Maryland,24033,31.0,Prince George's,Prince George's,Prince George's County,24033.0,MD,Maryland,38.8295,-76.8473,908670.0,24033.0
Montgomery,Maryland,24031,42.43,Montgomery,Montgomery,Montgomery County,24031.0,MD,Maryland,39.1363,-77.2042,1043530.0,24031.0
Philadelphia,Pennsylvania,42101,50.77,Philadelphia,Philadelphia,Philadelphia County,42101.0,PA,Pennsylvania,40.0077,-75.1339,1579075.0,42101.0
District of Columbia,District of Columbia,11001,52.97,District of Columbia,District of Columbia,District of Columbia,11001.0,DC,District of Columbia,38.9047,-77.0163,692683.0,11001.0
Fairfax,Virginia,51059,74.86,Fairfax,Fairfax,Fairfax County,51059.0,VA,Virginia,38.8368,-77.277,1145862.0,51059.0
New York City,New York,36998,75.46,,,,,,,40.7781,-73.9675,,
Putnam,Ohio,39137,75.94,Putnam,Putnam,Putnam County,39137.0,OH,Ohio,41.0221,-84.1317,33911.0,39137.0
Franklin,Ohio,39049,79.72,Franklin,Franklin,Franklin County,39049.0,OH,Ohio,39.9695,-83.0093,1290360.0,39049.0
Arlington,Virginia,51013,84.14,Arlington,Arlington,Arlington County,51013.0,VA,Virginia,38.8786,-77.1011,233464.0,51013.0


In [None]:
px.scatter_geo(
    df_ll,
    lat="lat",
    lon="lng",
    size="cases_avg_per_100k",
    scope="usa",
    color="cases_avg_per_100k",
    color_continuous_scale='Temps',
    hover_name='county_y'
)


Scatter geo circle sizes are often population, I don't think I want to do that here. Maybe if this was MSA data. Things could get tricky in places where the NYT data isn't broken out by county (e.g. NYC).

Let's change the labels.

Could display guages/indicators/KPI-type information, but might just do directly in Streamlit if decide to use Streamlit.

Could explore sparklines.

Subplots are possible. Probably need to drop down to vanilla plotting.

In [None]:

df = px.data.gapminder().query("continent == 'Oceania'")
fig = px.line(df, x='year', y='gdpPercap', facet_row='country')
fig.update_layout(yaxis_title=None) # only removes final y title
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)


fig.show()

# example with lower level api example on https://community.plotly.com/t/sparklines-from-dataframe/38174

# maybe make a template for subplots if possible

# move the country label to the left and make it just the name

The following are adapted from answers [here](https://stackoverflow.com/questions/64462790/how-to-plot-plotly-gauge-charts-next-to-each-other-with-python). The method of making traces seems to work better in terms of spacing between the plots.


In [None]:

import plotly.graph_objs as go
from plotly.subplots import make_subplots

trace1 = go.Indicator(mode="gauge+number",    value=400,    domain={'row' : 1, 'column' : 1}, title={'text': "Speed 1"})
trace2 = go.Indicator(mode="gauge+number",    value=250,    domain={'row' : 1, 'column' : 2}, title={'text': "Speed 2"})

fig = make_subplots(
    rows=1,
    cols=2,
    specs=[[{'type' : 'indicator'}, {'type' : 'indicator'}]],
    )

fig.append_trace(trace1, row=1, col=1)
fig.append_trace(trace2, row=1, col=2)

fig

In [None]:
import plotly.graph_objs as go

# traces with separate domains to form a subplot
trace1 = go.Indicator(mode="gauge+number",    value=400,    domain={'x': [0.0, 0.4], 'y': [0.0, 1]},    title={'text': "Speed 1"})

trace2 = go.Indicator(mode="gauge+number",    value=250,    domain={'x': [0.6, 1.0], 'y': [0., 1.00]},    title={'text': "Speed 2"})

# layout and figure production
layout = go.Layout(height = 600,
                   width = 600,
                   autosize = False,
                   title = 'Side by side gauge diagrams')
fig = go.Figure(data = [trace1, trace2], layout = layout)
fig

Geopandas with plotly with Jupyter with conda with Python 3.10 is currently an impossibility. 3.9 seems like probably an impossibility, too. At least with conda, via conda-forge of conda channels or pip. Tried several fresh environments.

In [None]:
# px.choropleth(df_newest,  locations='fips', color='cases_avg_per_100k', scope='usa'  )
# %pip install plotly-geo pyshp
# %pip install shapely

In [None]:
# import plotly.figure_factory as ff

# ff.create_choropleth(fips=df_newest['fips'], values=df_newest['cases_avg_per_100k'])

In [None]:
# %conda install plotly -c conda-forge
# %conda install -c plotly plotly-geo
# %conda install geopandas -c conda-forge 