# Project Introduction Demo

## Logistics

If you haven’t already filled out this team form, please do it TODAY
https://docs.google.com/spreadsheets/d/1EvWqolU8TT8_RMpLE9IYjQYpXbw8wkKPX2WeQ9g_xzc/edit#gid=0


## Motivation

We will be looking at COVID data to answer the following two questions:
* Given the past data, can we build predictive models that forecast the future of the pandemic so that we can see one step ahead and prepare accordingly?
* Which data is highly relevant to the prediction and how should that affect our policies?


### What You Will Be Doing
You are going to build a predictive model that will uses historical COVID cases and related data to forecast the short-term future number of COVID cases in a particular region. You will do this by creating a time series forecasting model

## Using COVIDCast API

In [2]:
# Installing covidcast
!pip install covidcast



In [3]:
from datetime import date
import covidcast

ModuleNotFoundError: No module named 'covidcast'

https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html


Here is a list of the signals, you can see all the documentation for each one. Going through the documentation will give you all the information you need for calling the signals. This includes information about when the first data points were collected, if the data is available on a daily, or weekly basis, what regions you can call the signal for, and so on.


In [6]:
# This looks at the people who reported COVID-like symptoms from their fb-survey
# from dates 5-1-2020 to 5-7-2020 in all counties 
data = covidcast.signal("fb-survey", "smoothed_cli",
                        date(2020, 5, 1), date(2020, 5, 7),
                        "county")

In [7]:
data.head()

Unnamed: 0,geo_value,signal,time_value,issue,lag,missing_value,missing_stderr,missing_sample_size,value,stderr,sample_size,geo_type,data_source
0,1000,smoothed_cli,2020-05-01,2020-09-03,125,0,0,0,0.82541,0.136003,1722.4551,county,fb-survey
1,1001,smoothed_cli,2020-05-01,2020-09-03,125,0,0,0,1.299425,0.967136,115.8025,county,fb-survey
2,1003,smoothed_cli,2020-05-01,2020-09-03,125,0,0,0,0.696597,0.324753,584.3194,county,fb-survey
3,1015,smoothed_cli,2020-05-01,2020-09-03,125,0,0,0,0.428271,0.548566,122.5577,county,fb-survey
4,1031,smoothed_cli,2020-05-01,2020-09-03,125,0,0,0,0.025579,0.360827,114.8318,county,fb-survey


In [27]:
# Here's another example of doctors visits from dates 11-19-2020 to 2-14-2021
# If I don't specify the state it will give me visits over all states on that day

data = covidcast.signal("doctor-visits", "smoothed_adj_cli", date(2020,11,19),
                        date(2021, 1, 1), geo_type="state")


In [28]:
data.head()

Unnamed: 0,geo_value,signal,time_value,issue,lag,missing_value,missing_stderr,missing_sample_size,value,stderr,sample_size,geo_type,data_source
0,ak,smoothed_adj_cli,2020-11-19,2021-01-31,73,0,5,5,4.794389,,,state,doctor-visits
1,al,smoothed_adj_cli,2020-11-19,2021-01-31,73,0,5,5,8.155846,,,state,doctor-visits
2,ar,smoothed_adj_cli,2020-11-19,2021-01-31,73,0,5,5,12.353495,,,state,doctor-visits
3,az,smoothed_adj_cli,2020-11-19,2021-01-31,73,0,5,5,10.33724,,,state,doctor-visits
4,ca,smoothed_adj_cli,2020-11-19,2021-01-31,73,0,5,5,12.001913,,,state,doctor-visits


In [29]:
# Or I can specify which state I want to look at
data = covidcast.signal("doctor-visits", "smoothed_adj_cli", date(2020,11,19),
                        date(2021, 2, 14), geo_type="state", geo_values="ca")

In [30]:
data.head()

Unnamed: 0,geo_value,signal,time_value,issue,lag,missing_value,missing_stderr,missing_sample_size,value,stderr,sample_size,geo_type,data_source
0,ca,smoothed_adj_cli,2020-11-19,2021-01-31,73,0,5,5,12.001913,,,state,doctor-visits
0,ca,smoothed_adj_cli,2020-11-20,2021-02-01,73,0,5,5,12.648509,,,state,doctor-visits
0,ca,smoothed_adj_cli,2020-11-21,2021-02-02,73,0,5,5,13.103558,,,state,doctor-visits
0,ca,smoothed_adj_cli,2020-11-22,2021-02-03,73,0,5,5,14.299251,,,state,doctor-visits
0,ca,smoothed_adj_cli,2020-11-23,2021-02-04,73,0,5,5,16.211898,,,state,doctor-visits


In [31]:
# I can also look at data from a select list of states
stateList = ['ca', 'ny', 'mo']
data = covidcast.signal("doctor-visits", "smoothed_adj_cli", date(2020,11,19),
                        date(2021, 2, 14), geo_type="state", geo_values= stateList)

In [32]:
data.head()

Unnamed: 0,geo_value,signal,time_value,issue,lag,missing_value,missing_stderr,missing_sample_size,value,stderr,sample_size,geo_type,data_source
0,ca,smoothed_adj_cli,2020-11-19,2021-01-31,73,0,5,5,12.001913,,,state,doctor-visits
1,mo,smoothed_adj_cli,2020-11-19,2021-01-31,73,0,5,5,16.679426,,,state,doctor-visits
2,ny,smoothed_adj_cli,2020-11-19,2021-01-31,73,0,5,5,17.764187,,,state,doctor-visits
0,ca,smoothed_adj_cli,2020-11-20,2021-02-01,73,0,5,5,12.648509,,,state,doctor-visits
1,mo,smoothed_adj_cli,2020-11-20,2021-02-01,73,0,5,5,16.479506,,,state,doctor-visits


COVIDCast has a short tutorial with a few more function calls that can be found here:

https://cmu-delphi.github.io/covidcast/covidcast-py/html/getting_started.html

## COVIDCast Practice Set (10 points)

This ipnb is found in the project folder


### Working with geographic codes

Get the FIPS codes for Los Angeles county, Santa Barbara county, and Orange county.

Find out which counties correspond to the FIPS 06059 and 42003.

Find the FIPS of all counties in California. Create and print out a dictionary that maps county names to FIPS for all the counties in California. Hint: Look at the last example from https://cmu-delphi.github.io/covidcast/covidcast-py/html/getting_started.html.

### Fetching and merging data

Get the number of daily new Covid cases in the California, New York, and Texas from May 2020 to July 2020 by fetching the "US Facts Cases and Deaths" data source (https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/usa-facts.html). 

Get the daily percentages of doctor visits that are related to Covid in California, New York, and Texas from May 2020 to July 2020 by fetching the "Doctor Visits" data source (https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/doctor-visits.html).

Merge the two tables using the covidcast.aggregate_signals method.

## The Rest of the Project

### Data Preparation (25 points)
* Create a notebook called data preparation.ipynb
* Collect and merge data from the sources listed in the project description pdf
* Calculate the ground-truth number of covid cases 
* Collect data from at least 5 signals from the data sources listed on the project description

### Create a Time Series Model (25 points)
* Create a notebook named time series.ipynb
* Create and train 2 models based on the data you collected (an interpretable model and a complex model)


### Summarize Your Data (25 points)
Write a short (1-2 pages) summary of your project

### Poster Presentation (15 points)
Prepare a poster about your project
