# Example of using the Delphi Epidata API

In [None]:
from requests import get
import pandas as pd
from pandas import Grouper

You can access additional documentation on the Delphi Epidata API at <a href=https://cmu-delphi.github.io/delphi-epidata/>this link</a>



The code below brings in data from the <a href=https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/fb-survey.html>Facebook Trends and Impact Survey</a>. This survey was discontinued in mid-2022, so you will only be able to get data for a fairly short time frame here. For this example, I'm only pulling the month of March 2021, but you would want to expand the time frame to get all of the data


Setting up the query:

In [None]:
geo_type = 'state' # change to 'nation' for US average or 'county' for county-level data
# go to https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/fb-survey.html for list of other measures

signal= 'fb-survey:smoothed_wanxious_7d'  # could change to fb-survey:smoothed_wdepressed_7d for depression estimates

time_unit = 'day'   # format for weekly data would be time_unit ='week' and start_date  = YYYY-WW (Year - Week ) formatting

start_date = '2021-03-01'   # start date of data. This measure only goes back to 2021-03-02, so you won't get data before this

end_date = '2021-04-01' # last date to look for (this measure stops in June 2022)

query = f'https://api.delphi.cmu.edu/epidata/covidcast/?signal={signal}&time={time_unit}:{start_date}--{end_date}&geo_type={geo_type}&geo_value=*'

# send the query
result = get(query)
# get json of result
data = result.json()

Now convert the result to a data frame. This is a fairly simple one, so we can reformat it like this:

In [53]:
df =pd.DataFrame(data['epidata'])
df.head()

Unnamed: 0,geo_value,signal,source,geo_type,time_type,time_value,direction,issue,lag,missing_value,missing_stderr,missing_sample_size,value,stderr,sample_size
0,ak,smoothed_wanxious_7d,fb-survey,state,day,20210302,,20210317,15,0,0,0,13.624087,3.198903,115.0
1,al,smoothed_wanxious_7d,fb-survey,state,day,20210302,,20210317,15,0,0,0,21.060845,1.868853,476.012
2,ar,smoothed_wanxious_7d,fb-survey,state,day,20210302,,20210317,15,0,0,0,21.450088,2.361732,302.0741
3,az,smoothed_wanxious_7d,fb-survey,state,day,20210302,,20210317,15,0,0,0,17.344475,1.536289,607.4173
4,ca,smoothed_wanxious_7d,fb-survey,state,day,20210302,,20210317,15,0,0,0,15.766829,0.700815,2704.0873


The `time_value` column is returned in YYYYMMDD format with no dashes, so "2021-03-02" is 20210302. The <a href = https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html#pandas.to_datetime>`to_datetime`</a> function will allow us to reformat this column so that Python can recognize it as a date, which will make it much easier to do things like aggregate it or plot it.

In [51]:
# convert time_value to a proper date format. You can find an explanation for the format = argument here: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior
df['date'] = pd.to_datetime(df.time_value, format='%Y%m%d')
df.head()

Unnamed: 0,geo_value,signal,source,geo_type,time_type,time_value,direction,issue,lag,missing_value,missing_stderr,missing_sample_size,value,stderr,sample_size,date
0,ak,smoothed_wanxious_7d,fb-survey,state,day,20210302,,20210317,15,0,0,0,13.624087,3.198903,115.0,2021-03-02
1,al,smoothed_wanxious_7d,fb-survey,state,day,20210302,,20210317,15,0,0,0,21.060845,1.868853,476.012,2021-03-02
2,ar,smoothed_wanxious_7d,fb-survey,state,day,20210302,,20210317,15,0,0,0,21.450088,2.361732,302.0741,2021-03-02
3,az,smoothed_wanxious_7d,fb-survey,state,day,20210302,,20210317,15,0,0,0,17.344475,1.536289,607.4173,2021-03-02
4,ca,smoothed_wanxious_7d,fb-survey,state,day,20210302,,20210317,15,0,0,0,15.766829,0.700815,2704.0873,2021-03-02


Now, if you wanted to aggregate the daily metrics to something like a weekly or monthly measure, you could use the `Grouper` function. The code below would give the average values fo each state across each week in the data:

In [52]:
df.groupby([Grouper(key="date", freq='1W'), 'geo_value']).mean(numeric_only=True).reset_index()

Unnamed: 0,date,geo_value,time_value,issue,lag,missing_value,missing_stderr,missing_sample_size,value,stderr,sample_size
0,2021-03-07,ak,20210304.50,20210317.0,12.5,0.0,0.0,0.0,11.173942,1.857505,368.833333
1,2021-03-07,al,20210304.50,20210317.0,12.5,0.0,0.0,0.0,18.836762,1.112486,1594.041450
2,2021-03-07,ar,20210304.50,20210317.0,12.5,0.0,0.0,0.0,20.506961,1.415102,1059.330717
3,2021-03-07,az,20210304.50,20210317.0,12.5,0.0,0.0,0.0,15.914306,0.896962,2179.506583
4,2021-03-07,ca,20210304.50,20210317.0,12.5,0.0,0.0,0.0,15.718593,0.425519,9453.830133
...,...,...,...,...,...,...,...,...,...,...,...
250,2021-04-04,vt,20210347.75,20210404.5,5.0,0.0,0.0,0.0,15.377518,1.353386,710.500750
251,2021-04-04,wa,20210347.75,20210404.5,5.0,0.0,0.0,0.0,17.150510,0.493463,5835.140075
252,2021-04-04,wi,20210347.75,20210404.5,5.0,0.0,0.0,0.0,13.437350,0.506981,4525.990625
253,2021-04-04,wv,20210347.75,20210404.5,5.0,0.0,0.0,0.0,16.253731,0.893131,1705.330425
