## Covid data for the counties I care about

The Washington Post has convenient data by state. I care about Washington DC, where I live, and how certain other locations are doing. The state-level data is not fine-grained enough for what I watn to see. MSA and county level covid data are available online, but overwelming and not easily filterable to what I want. I'm creating this tool to provide information at the county level. 

## Plan:

Turn these into GitHub issues.

- Use plotly for interactive visualizations 
- Serve the website via FastAPI or put into Streamlit. If use FastAPI, use MyPy to check that county/state option selected is a dict key/value
- Use GitHub actions to get data from NYT repo into a forked repo. I forked, updated, and release a GitHub action.
- Use Prefect to run my script where I pull the data from NYT website and process it daily.
- Use Great Expectations for data quality checking.
- Use PyTest for code checking. 
- May use DVC to version data.
- Could push the data to a database for fun/speed.

- Make an app that allows other users to choose which counties they want to include. Have to make some design decisions about how to show users that information. 

Imports and config

In [None]:
import pandas as pd
import plotly.express as px

pd.options.display.max_rows = 100


Read in data

In [2]:
def read_data(oldest_year:int = 2020, newest_year:int = 2022): 
    """Reads in the data from the nytimes and concatenate into a single pandas DataFrame.
    
    Args:
      oldest_year: first year of data to use
      newest_year: most recent year of data to use
    """

    # TODO use MyPy class to check that user value is between 2020 and 2022

    df_dicts = {}   # dictionary to hold the data for each year before concatenation

    for year in range(oldest_year, newest_year+1):
        df_dicts[f'df_{year}'] = pd.read_csv(
    f'https://raw.githubusercontent.com/nytimes/covid-19-data/master/rolling-averages/us-counties-{year}.csv', index_col='date'
    )
    
    return pd.concat(df_dicts.values())
    

In [3]:
df_21_22 = read_data(2021, 2022)

In [4]:
df_21_22.head(2)

Unnamed: 0_level_0,geoid,county,state,cases,cases_avg,cases_avg_per_100k,deaths,deaths_avg,deaths_avg_per_100k
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2021-01-01,USA-72999,Unknown,Puerto Rico,-17,35.29,,0,0.0,
2021-01-01,USA-72153,Yauco,Puerto Rico,4,3.0,8.86,0,0.0,0.0


In [5]:
df_21_22.shape

(1355319, 9)

In [42]:
df_21_22.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1355319 entries, 2021-01-01 to 2022-02-20
Data columns (total 10 columns):
 #   Column               Non-Null Count    Dtype  
---  ------               --------------    -----  
 0   geoid                1355319 non-null  object 
 1   county               1355319 non-null  object 
 2   state                1355319 non-null  object 
 3   cases                1355319 non-null  int64  
 4   cases_avg            1355319 non-null  float64
 5   cases_avg_per_100k   1339968 non-null  float64
 6   deaths               1355319 non-null  int64  
 7   deaths_avg           1355319 non-null  float64
 8   deaths_avg_per_100k  1339968 non-null  float64
 9   fips                 1355319 non-null  object 
dtypes: float64(4), int64(2), object(4)
memory usage: 113.7+ MB


Write out the file. Parquet for size, feather for speed reading in. Could also put in db.

Have to not have an index for feather format.

In [64]:
df_21_22.reset_index().to_feather(f'../data/2021-2022-all-covid-data-through-{df_21_22.tail(1).index.values[0]}.feather')  #48.8mb feb. 21 

In [66]:
df = pd.read_feather(f"../data/2021-2022-all-covid-data-through-{df_21_22.tail(1).index.values[0]}.feather")
   
df["date"] = pd.to_datetime(
        df["date"]
    )  # negating some of the speed benefit of feather
df.set_index("date", inplace=True) 

# 1.7s

In [68]:
df_21_22.to_parquet(f'../data/2021-2022-all-covid-data-through-{df_21_22.tail(1).index.values[0]}.parquet') #11.5mb feb. 21

In [78]:
df = pd.read_parquet("../data/2021-2022-all-covid-data-through-2022-02-20.parquet")

# .8s

In [79]:
df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1355319 entries, 2021-01-01 to 2022-02-20
Data columns (total 10 columns):
 #   Column               Non-Null Count    Dtype  
---  ------               --------------    -----  
 0   geoid                1355319 non-null  object 
 1   county               1355319 non-null  object 
 2   state                1355319 non-null  object 
 3   cases                1355319 non-null  int64  
 4   cases_avg            1355319 non-null  float64
 5   cases_avg_per_100k   1339968 non-null  float64
 6   deaths               1355319 non-null  int64  
 7   deaths_avg           1355319 non-null  float64
 8   deaths_avg_per_100k  1339968 non-null  float64
 9   fips                 1355319 non-null  object 
dtypes: float64(4), int64(2), object(4)
memory usage: 113.7+ MB


In [62]:
# df_21_22.to_csv(f'2021-2022-all-covid-data-through{df_21_22.tail(1).index.values[0]}.csv') # 92.9mb feb. 21

Filter to counties of interest

In [6]:
counties = [
    "District of Columbia",
    "Wood",
    "Putnam",
    "Montgomery",
    "Prince George's",
    "Arlington",
    "Alexandria city",
    "New York City",  # README at NYT mentions some NE are city, not county!
    "Allegheny",
    "Cook",
    "Baltimore",
    "Franklin",
    "Clermont",
    "Somerset",
    "Philadelphia",
    "Denver",
    "Boulder",
    "San Francisco",
    "Los Angeles",
    "Pima",
    "Manatee",
    "Fairfax"]


In [7]:
cols = ["county", "state", "geoid", "cases_avg_per_100k"]


See each state/county of selected counties once.

In [8]:
df_once = df_21_22.loc[df_21_22["county"].isin(counties), cols]
df_once


Unnamed: 0_level_0,county,state,geoid,cases_avg_per_100k
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2021-01-01,Wood,Wisconsin,USA-55141,48.73
2021-01-01,Wood,West Virginia,USA-54107,79.02
2021-01-01,Putnam,West Virginia,USA-54079,58.71
2021-01-01,Franklin,Washington,USA-53021,51.01
2021-01-01,Montgomery,Virginia,USA-51121,32.33
...,...,...,...,...
2022-02-20,Montgomery,Arkansas,USA-05097,17.49
2022-02-20,Franklin,Arkansas,USA-05047,25.00
2022-02-20,Pima,Arizona,USA-04019,37.05
2022-02-20,Montgomery,Alabama,USA-01101,17.91


In [9]:
df_21_22.drop_duplicates(subset=["county", "state"])


Unnamed: 0_level_0,geoid,county,state,cases,cases_avg,cases_avg_per_100k,deaths,deaths_avg,deaths_avg_per_100k
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2021-01-01,USA-72999,Unknown,Puerto Rico,-17,35.29,,0,0.0,
2021-01-01,USA-72153,Yauco,Puerto Rico,4,3.00,8.86,0,0.0,0.0
2021-01-01,USA-72151,Yabucoa,Puerto Rico,10,7.29,22.66,0,0.0,0.0
2021-01-01,USA-72149,Villalba,Puerto Rico,8,2.43,11.31,0,0.0,0.0
2021-01-01,USA-72147,Vieques,Puerto Rico,0,1.00,11.96,0,0.0,0.0
...,...,...,...,...,...,...,...,...,...
2021-01-01,USA-78030,St. Thomas,Virgin Islands,4,4.00,7.75,0,0.0,0.0
2021-01-01,USA-78020,St. John,Virgin Islands,0,1.00,23.98,0,0.0,0.0
2021-01-01,USA-78010,St. Croix,Virgin Islands,1,3.00,5.93,0,0.0,0.0
2021-09-17,USA-60999,Unknown,American Samoa,1,0.14,,0,0.0,


Convert geoid to FIPS code for plotting

In [11]:
df_21_22["fips"] = df_21_22["geoid"].str[-5:]
df_21_22


Unnamed: 0_level_0,geoid,county,state,cases,cases_avg,cases_avg_per_100k,deaths,deaths_avg,deaths_avg_per_100k,fips
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2021-01-01,USA-72999,Unknown,Puerto Rico,-17,35.29,,0,0.00,,72999
2021-01-01,USA-72153,Yauco,Puerto Rico,4,3.00,8.86,0,0.00,0.00,72153
2021-01-01,USA-72151,Yabucoa,Puerto Rico,10,7.29,22.66,0,0.00,0.00,72151
2021-01-01,USA-72149,Villalba,Puerto Rico,8,2.43,11.31,0,0.00,0.00,72149
2021-01-01,USA-72147,Vieques,Puerto Rico,0,1.00,11.96,0,0.00,0.00,72147
...,...,...,...,...,...,...,...,...,...,...
2022-02-20,USA-69110,Saipan,Northern Mariana Islands,0,0.00,0.00,0,0.00,0.00,69110
2022-02-20,USA-69100,Rota,Northern Mariana Islands,0,0.00,0.00,0,0.00,0.00,69100
2022-02-20,USA-78030,St. Thomas,Virgin Islands,0,9.11,17.65,0,0.57,1.11,78030
2022-02-20,USA-78020,St. John,Virgin Islands,0,0.00,0.00,0,0.00,0.00,78020


Filter to fips codes of counties I want. 

If ever make into an app, will change to have folks choose State and then County from drop downs.


In [12]:
fips_counties = [
    "11001",
    "24033",
    "24031",
    "17031",
    "39173",
    "39137",
    "39113",
    "39049",
    "51013",
    "42101",
    "42003",
    "39025",
    "08031",
    "08013",
    "04019",
    "24005",
    "06037",
    "06075",
    "36998",
    # "36061",
    "12081",
    "51510",
    "51059",
    "55083"
]

cols = ["county", "state", "fips", "cases_avg_per_100k"]


In [13]:

df_21_22_counties = df_21_22.loc[df_21_22["fips"].isin(fips_counties), cols]
df_21_22_counties


Unnamed: 0_level_0,county,state,fips,cases_avg_per_100k
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2021-01-01,Oconto,Wisconsin,55083,46.70
2021-01-01,Fairfax,Virginia,51059,34.45
2021-01-01,Arlington,Virginia,51013,34.92
2021-01-01,Alexandria city,Virginia,51510,34.14
2021-01-01,Philadelphia,Pennsylvania,42101,36.06
...,...,...,...,...
2022-02-20,Denver,Colorado,08031,24.95
2022-02-20,Boulder,Colorado,08013,31.27
2022-02-20,San Francisco,California,06075,31.24
2022-02-20,Los Angeles,California,06037,30.00


In [24]:
px.line(
    df_21_22_counties, x=df_21_22_counties.index, y="cases_avg_per_100k", color="county"
)

In [21]:
df_21_22_counties.index = pd.to_datetime(df_21_22_counties.index)
df_21_22_counties.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 9568 entries, 2021-01-01 to 2022-02-20
Data columns (total 4 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   county              9568 non-null   object 
 1   state               9568 non-null   object 
 2   fips                9568 non-null   object 
 3   cases_avg_per_100k  9568 non-null   float64
dtypes: float64(1), object(3)
memory usage: 373.8+ KB


In [22]:
df_22_counties = df_21_22_counties.loc['2022']

In [23]:
px.line(
    df_22_counties, x=df_22_counties.index, y="cases_avg_per_100k", color="county"
)

## Map

Most recent 7 day moving average.

Scatter geo. 

Future direction: could make an animation over time.  Could show the DataFrame for just the most recent day in an app, too.

In [26]:
most_recent_date = df_21_22_counties.index.max()

In [27]:
int('012'.zfill(5))

12

In [28]:
df_newest = df_21_22_counties[df_21_22_counties.index == most_recent_date].sort_values(
    by="cases_avg_per_100k"
)
df_newest

# could add % change from week earlier


Unnamed: 0_level_0,county,state,fips,cases_avg_per_100k
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2022-02-20,Prince George's,Maryland,24033,6.66
2022-02-20,Putnam,Ohio,39137,9.7
2022-02-20,Baltimore,Maryland,24005,10.53
2022-02-20,Montgomery,Maryland,24031,10.7
2022-02-20,Philadelphia,Pennsylvania,42101,12.41
2022-02-20,Montgomery,Ohio,39113,13.94
2022-02-20,Fairfax,Virginia,51059,14.15
2022-02-20,Franklin,Ohio,39049,14.57
2022-02-20,New York City,New York,36998,15.33
2022-02-20,District of Columbia,District of Columbia,11001,17.75


In [29]:
# Need a shape file or to  get lat lon for each of the counties
# harder to find then expected. Found a mapping here: https://simplemaps.com/data/us-counties.


In [30]:
df_mapping = pd.read_csv('../uscounties.csv')
df_mapping

Unnamed: 0,county,county_ascii,county_full,county_fips,state_id,state_name,lat,lng,population
0,Los Angeles,Los Angeles,Los Angeles County,6037,CA,California,34.3207,-118.2248,10081570
1,Cook,Cook,Cook County,17031,IL,Illinois,41.8401,-87.8168,5198275
2,Harris,Harris,Harris County,48201,TX,Texas,29.8577,-95.3936,4646630
3,Maricopa,Maricopa,Maricopa County,4013,AZ,Arizona,33.3490,-112.4915,4328810
4,San Diego,San Diego,San Diego County,6073,CA,California,33.0341,-116.7353,3316073
...,...,...,...,...,...,...,...,...,...
3137,Arthur,Arthur,Arthur County,31005,NE,Nebraska,41.5689,-101.6958,427
3138,McPherson,McPherson,McPherson County,31117,NE,Nebraska,41.5682,-101.0605,395
3139,King,King,King County,48269,TX,Texas,33.6166,-100.2558,237
3140,Loving,Loving,Loving County,48301,TX,Texas,31.8493,-103.5799,98


Need to add a 0 in front for the mapping county fips.

In [31]:
df_mapping['county_fips_str'] = df_mapping['county_fips'].astype(str).str.zfill(5)
df_mapping

Unnamed: 0,county,county_ascii,county_full,county_fips,state_id,state_name,lat,lng,population,county_fips_str
0,Los Angeles,Los Angeles,Los Angeles County,6037,CA,California,34.3207,-118.2248,10081570,06037
1,Cook,Cook,Cook County,17031,IL,Illinois,41.8401,-87.8168,5198275,17031
2,Harris,Harris,Harris County,48201,TX,Texas,29.8577,-95.3936,4646630,48201
3,Maricopa,Maricopa,Maricopa County,4013,AZ,Arizona,33.3490,-112.4915,4328810,04013
4,San Diego,San Diego,San Diego County,6073,CA,California,33.0341,-116.7353,3316073,06073
...,...,...,...,...,...,...,...,...,...,...
3137,Arthur,Arthur,Arthur County,31005,NE,Nebraska,41.5689,-101.6958,427,31005
3138,McPherson,McPherson,McPherson County,31117,NE,Nebraska,41.5682,-101.0605,395,31117
3139,King,King,King County,48269,TX,Texas,33.6166,-100.2558,237,48269
3140,Loving,Loving,Loving County,48301,TX,Texas,31.8493,-103.5799,98,48301


In [32]:
df_ll = pd.merge(left=df_newest, right=df_mapping, how='left', left_on='fips', right_on='county_fips_str')
df_ll


Unnamed: 0,county_x,state,fips,cases_avg_per_100k,county_y,county_ascii,county_full,county_fips,state_id,state_name,lat,lng,population,county_fips_str
0,Prince George's,Maryland,24033,6.66,Prince George's,Prince George's,Prince George's County,24033.0,MD,Maryland,38.8295,-76.8473,908670.0,24033.0
1,Putnam,Ohio,39137,9.7,Putnam,Putnam,Putnam County,39137.0,OH,Ohio,41.0221,-84.1317,33911.0,39137.0
2,Baltimore,Maryland,24005,10.53,Baltimore,Baltimore,Baltimore County,24005.0,MD,Maryland,39.4627,-76.6393,828018.0,24005.0
3,Montgomery,Maryland,24031,10.7,Montgomery,Montgomery,Montgomery County,24031.0,MD,Maryland,39.1363,-77.2042,1043530.0,24031.0
4,Philadelphia,Pennsylvania,42101,12.41,Philadelphia,Philadelphia,Philadelphia County,42101.0,PA,Pennsylvania,40.0077,-75.1339,1579075.0,42101.0
5,Montgomery,Ohio,39113,13.94,Montgomery,Montgomery,Montgomery County,39113.0,OH,Ohio,39.7546,-84.2906,531670.0,39113.0
6,Fairfax,Virginia,51059,14.15,Fairfax,Fairfax,Fairfax County,51059.0,VA,Virginia,38.8368,-77.277,1145862.0,51059.0
7,Franklin,Ohio,39049,14.57,Franklin,Franklin,Franklin County,39049.0,OH,Ohio,39.9695,-83.0093,1290360.0,39049.0
8,New York City,New York,36998,15.33,,,,,,,,,,
9,District of Columbia,District of Columbia,11001,17.75,District of Columbia,District of Columbia,District of Columbia,11001.0,DC,District of Columbia,38.9047,-77.0163,692683.0,11001.0


In [33]:
df_ll.set_index('county_x', inplace=True)

In [34]:
df_ll

Unnamed: 0_level_0,state,fips,cases_avg_per_100k,county_y,county_ascii,county_full,county_fips,state_id,state_name,lat,lng,population,county_fips_str
county_x,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Prince George's,Maryland,24033,6.66,Prince George's,Prince George's,Prince George's County,24033.0,MD,Maryland,38.8295,-76.8473,908670.0,24033.0
Putnam,Ohio,39137,9.7,Putnam,Putnam,Putnam County,39137.0,OH,Ohio,41.0221,-84.1317,33911.0,39137.0
Baltimore,Maryland,24005,10.53,Baltimore,Baltimore,Baltimore County,24005.0,MD,Maryland,39.4627,-76.6393,828018.0,24005.0
Montgomery,Maryland,24031,10.7,Montgomery,Montgomery,Montgomery County,24031.0,MD,Maryland,39.1363,-77.2042,1043530.0,24031.0
Philadelphia,Pennsylvania,42101,12.41,Philadelphia,Philadelphia,Philadelphia County,42101.0,PA,Pennsylvania,40.0077,-75.1339,1579075.0,42101.0
Montgomery,Ohio,39113,13.94,Montgomery,Montgomery,Montgomery County,39113.0,OH,Ohio,39.7546,-84.2906,531670.0,39113.0
Fairfax,Virginia,51059,14.15,Fairfax,Fairfax,Fairfax County,51059.0,VA,Virginia,38.8368,-77.277,1145862.0,51059.0
Franklin,Ohio,39049,14.57,Franklin,Franklin,Franklin County,39049.0,OH,Ohio,39.9695,-83.0093,1290360.0,39049.0
New York City,New York,36998,15.33,,,,,,,,,,
District of Columbia,District of Columbia,11001,17.75,District of Columbia,District of Columbia,District of Columbia,11001.0,DC,District of Columbia,38.9047,-77.0163,692683.0,11001.0


Get New York's lat/lon manually - fips for Manhattan is 36998, NYT aggregates data for the city! 

In [35]:
df_mapping.head(30)


Unnamed: 0,county,county_ascii,county_full,county_fips,state_id,state_name,lat,lng,population,county_fips_str
0,Los Angeles,Los Angeles,Los Angeles County,6037,CA,California,34.3207,-118.2248,10081570,6037
1,Cook,Cook,Cook County,17031,IL,Illinois,41.8401,-87.8168,5198275,17031
2,Harris,Harris,Harris County,48201,TX,Texas,29.8577,-95.3936,4646630,48201
3,Maricopa,Maricopa,Maricopa County,4013,AZ,Arizona,33.349,-112.4915,4328810,4013
4,San Diego,San Diego,San Diego County,6073,CA,California,33.0341,-116.7353,3316073,6073
5,Orange,Orange,Orange County,6059,CA,California,33.703,-117.7611,3168044,6059
6,Miami-Dade,Miami-Dade,Miami-Dade County,12086,FL,Florida,25.615,-80.5623,2699428,12086
7,Dallas,Dallas,Dallas County,48113,TX,Texas,32.7666,-96.7779,2606868,48113
8,Kings,Kings,Kings County,36047,NY,New York,40.6395,-73.9385,2589974,36047
9,Riverside,Riverside,Riverside County,6065,CA,California,33.7437,-115.9938,2411439,6065


In [36]:
df_mapping[df_mapping['county_fips'] == 36061]


Unnamed: 0,county,county_ascii,county_full,county_fips,state_id,state_name,lat,lng,population,county_fips_str
20,New York,New York,New York County,36061,NY,New York,40.7781,-73.9675,1631993,36061


In [37]:
df_mapping.loc[df_mapping['county_fips'] == 36061, 'lng']

20   -73.9675
Name: lng, dtype: float64

In [38]:
df_ll.loc['New York City', 'lat'] = 40.7781
df_ll.loc['New York City', 'lng'] = -73.9675

In [39]:
df_ll

Unnamed: 0_level_0,state,fips,cases_avg_per_100k,county_y,county_ascii,county_full,county_fips,state_id,state_name,lat,lng,population,county_fips_str
county_x,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Prince George's,Maryland,24033,6.66,Prince George's,Prince George's,Prince George's County,24033.0,MD,Maryland,38.8295,-76.8473,908670.0,24033.0
Putnam,Ohio,39137,9.7,Putnam,Putnam,Putnam County,39137.0,OH,Ohio,41.0221,-84.1317,33911.0,39137.0
Baltimore,Maryland,24005,10.53,Baltimore,Baltimore,Baltimore County,24005.0,MD,Maryland,39.4627,-76.6393,828018.0,24005.0
Montgomery,Maryland,24031,10.7,Montgomery,Montgomery,Montgomery County,24031.0,MD,Maryland,39.1363,-77.2042,1043530.0,24031.0
Philadelphia,Pennsylvania,42101,12.41,Philadelphia,Philadelphia,Philadelphia County,42101.0,PA,Pennsylvania,40.0077,-75.1339,1579075.0,42101.0
Montgomery,Ohio,39113,13.94,Montgomery,Montgomery,Montgomery County,39113.0,OH,Ohio,39.7546,-84.2906,531670.0,39113.0
Fairfax,Virginia,51059,14.15,Fairfax,Fairfax,Fairfax County,51059.0,VA,Virginia,38.8368,-77.277,1145862.0,51059.0
Franklin,Ohio,39049,14.57,Franklin,Franklin,Franklin County,39049.0,OH,Ohio,39.9695,-83.0093,1290360.0,39049.0
New York City,New York,36998,15.33,,,,,,,40.7781,-73.9675,,
District of Columbia,District of Columbia,11001,17.75,District of Columbia,District of Columbia,District of Columbia,11001.0,DC,District of Columbia,38.9047,-77.0163,692683.0,11001.0


In [40]:
px.scatter_geo(
    df_ll,
    lat="lat",
    lon="lng",
    size="cases_avg_per_100k",
    scope="usa",
    color="cases_avg_per_100k",
    color_continuous_scale='Temps',
    hover_name='county_y'
)


Scatter geo circle sizes are often population, I don't think I want to do that here. Maybe if this was MSA data. Things could get tricky in places where the NYT data isn't broken out by county (e.g. NYC).

Let's change the labels.

Could display guages/indicators/KPI-type information, but might just do directly in Streamlit if decide to use Streamlit.

Could explore sparklines.

Subplots are possible. Probably need to drop down to vanilla plotting.

In [41]:

df = px.data.gapminder().query("continent == 'Oceania'")
fig = px.line(df, x='year', y='gdpPercap', facet_row='country')
fig.update_layout(yaxis_title=None) # only removes final y title
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)


fig.show()

# example with lower level api example on https://community.plotly.com/t/sparklines-from-dataframe/38174

# maybe make a template for subplots if possible

# move the country label to the left and make it just the name

The following are adapted from answers [here](https://stackoverflow.com/questions/64462790/how-to-plot-plotly-gauge-charts-next-to-each-other-with-python). The method of making traces seems to work better in terms of spacing between the plots.


In [None]:

import plotly.graph_objs as go
from plotly.subplots import make_subplots

trace1 = go.Indicator(mode="gauge+number",    value=400,    domain={'row' : 1, 'column' : 1}, title={'text': "Speed 1"})
trace2 = go.Indicator(mode="gauge+number",    value=250,    domain={'row' : 1, 'column' : 2}, title={'text': "Speed 2"})

fig = make_subplots(
    rows=1,
    cols=2,
    specs=[[{'type' : 'indicator'}, {'type' : 'indicator'}]],
    )

fig.append_trace(trace1, row=1, col=1)
fig.append_trace(trace2, row=1, col=2)

fig

In [None]:
import plotly.graph_objs as go

# traces with separate domains to form a subplot
trace1 = go.Indicator(mode="gauge+number",    value=400,    domain={'x': [0.0, 0.4], 'y': [0.0, 1]},    title={'text': "Speed 1"})

trace2 = go.Indicator(mode="gauge+number",    value=250,    domain={'x': [0.6, 1.0], 'y': [0., 1.00]},    title={'text': "Speed 2"})

# layout and figure production
layout = go.Layout(height = 600,
                   width = 600,
                   autosize = False,
                   title = 'Side by side gauge diagrams')
fig = go.Figure(data = [trace1, trace2], layout = layout)
fig

Geopandas with plotly with Jupyter with conda with Python 3.10 is currently an impossibility. 3.9 seems like probably an impossibility, too. At least with conda, via conda-forge of conda channels or pip. Tried several fresh environments.

In [None]:
# px.choropleth(df_newest,  locations='fips', color='cases_avg_per_100k', scope='usa'  )
# %pip install plotly-geo pyshp
# %pip install shapely

In [None]:
# import plotly.figure_factory as ff

# ff.create_choropleth(fips=df_newest['fips'], values=df_newest['cases_avg_per_100k'])

In [None]:
# %conda install plotly -c conda-forge
# %conda install -c plotly plotly-geo
# %conda install geopandas -c conda-forge 