# Particle Filter Forecasting Project

I've been connecting a few pre-existing pieces together:

### Process the data:
- Run the hospitalization data through a particle filter to infer the transmission rate, beta.
- Use trend forecasting to predict beta up to 4 weeks into the future. 
- Use the predicted beta to make hospitalization predictions.

### Analyze the data:
- Determine forecast accuracy by comparing the predictions against observed data.
- Finally, compare accuracy results between particle filter and MCMC predictions. 


### Hospitalization Data

In [5]:
import pandas as pd
hosp_data = pd.read_csv('../datasets/hosp_data/hosp_04.csv')
hosp_data.tail(10)

Unnamed: 0.1,Unnamed: 0,date,state,previous_day_admission_influenza_confirmed
1508,73455,2024-04-18,AZ,10.0
1509,67347,2024-04-19,AZ,11.0
1510,70735,2024-04-20,AZ,11.0
1511,65634,2024-04-21,AZ,16.0
1512,68359,2024-04-22,AZ,19.0
1513,70383,2024-04-23,AZ,15.0
1514,74168,2024-04-24,AZ,11.0
1515,66431,2024-04-25,AZ,13.0
1516,71719,2024-04-26,AZ,13.0
1517,73926,2024-04-27,AZ,19.0


After running the particle filter, we have an inferred beta:

In [8]:
beta_data = pd.read_csv('../datasets/pf_results/04_average_beta.csv')
beta_data.head(10)

Unnamed: 0.1,Unnamed: 0,0
0,0,0.128752
1,1,0.143991
2,2,0.144571
3,3,0.148769
4,4,0.148426
5,5,0.147922
6,6,0.142427
7,7,0.135995
8,8,0.136475
9,9,0.13353


### Trend Forecasting
We had a pre-existing R script that forecasts beta up to 28 days into the future.

In [9]:
beta_forecast = pd.read_csv('../datasets/beta_forecast_output/04/2023-10-21/out_logit-beta_trj_rnorm.csv')
beta_forecast.head()

Unnamed: 0,d01,d02,d03,d04,d05,d06,d07,d08,d09,d10,...,d19,d20,d21,d22,d23,d24,d25,d26,d27,d28
0,0.131945,0.131891,0.139822,0.138122,0.137549,0.144387,0.142999,0.134766,0.132679,0.132662,...,0.140529,0.138735,0.135249,0.137083,0.134792,0.134906,0.135586,0.131542,0.14033,0.142688
1,0.129465,0.136867,0.137794,0.135342,0.139278,0.141691,0.143023,0.143377,0.143149,0.14045,...,0.138486,0.13757,0.142695,0.142045,0.143323,0.142983,0.142884,0.149308,0.146088,0.152211
2,0.127714,0.13291,0.134148,0.135436,0.13707,0.13434,0.133643,0.13044,0.12851,0.133441,...,0.135457,0.142364,0.140797,0.135937,0.139404,0.140197,0.141466,0.144062,0.142385,0.146091
3,0.133414,0.135189,0.139751,0.139685,0.136467,0.141228,0.143515,0.143042,0.141554,0.137035,...,0.138632,0.138178,0.135432,0.137949,0.136347,0.131745,0.13479,0.142474,0.140553,0.144844
4,0.127421,0.12998,0.134134,0.1358,0.136086,0.133169,0.130732,0.128726,0.132606,0.130459,...,0.145264,0.145604,0.144135,0.141382,0.132817,0.141907,0.136012,0.142408,0.151633,0.142166


### Hospitalization Forecasting

Then, we use the predicted beta to predict hospitalizations

In [12]:
hosp_forecast = pd.read_csv('../datasets/hosp_forecasts/2023-10-21-PF-flu-predictions.csv')
hosp_forecast = hosp_forecast[['reference_date', 'target_end_date', 'output_type_id', 'value']]
hosp_forecast.head()

Unnamed: 0,reference_date,target_end_date,output_type_id,value
0,2023-10-21,2023-10-28,0.01,16
1,2023-10-21,2023-10-28,0.025,21
2,2023-10-21,2023-10-28,0.05,25
3,2023-10-21,2023-10-28,0.1,32
4,2023-10-21,2023-10-28,0.15,36


## Parallel Processing on Monsoon
This process had to be run for 52 locations, and 25 dates for each location. 

Each location and date are independent of each other, so I had Monsoon 
process all of a location's dates at the same time.

### Results:
- Total runtime: 331.32 minutes.
- Average state runtime: 6.37 minutes.

## Next Steps
- Analyze the data using Weighted Interval Scores. 
- Compare to MCMC results. 
