# Project Title: Time Series Analysis of Pennsylvania and Illinois Weather, Energy Consumption, and Flu Contagion (2013–2015)

## Introduction

In this project, we aim to identify and explore temporal patterns, correlations, and potential causal relationships between weather conditions, energy consumption, and flu contagion across Pennsylvania and Illinois from 2013 to 2015. By analyzing these interrelated datasets, we hope to derive insights that can inform stakeholders in public health and energy management.

### Objectives
- Analyze daily/hourly weather data to identify trends and seasonal patterns.
- Examine hourly energy consumption data to understand usage patterns in relation to weather.
- Investigate weekly flu contagion data to determine correlations with weather and energy consumption.
- Utilize statistical modeling techniques to assess interdependencies among the datasets.

### Key Datasets
1. **USA Weather Dataset (2013–2015)**:
   - Temporal granularity: Hourly
   - Variables of interest: Temperature.

2. **PJM Historic Energy Consumption**:
   - Temporal granularity: Hourly
   - Scope: Energy usage data for Pennsylvania (Duquesne Light) and Illinois (ComEd).

3. **Flu Contagion Dataset by State (2013–2015)**:
   - Temporal granularity: Weekly
   - Variables: Influenza-like illness (ILI) [activity](https://www.cdc.gov/mmwr/volumes/67/wr/mm6722a4.htm) in the USA.

This notebook will guide you through the data ingestion, preparation, exploratory data analysis (EDA), time series modeling, and visualization phases of the project.

In [1]:
# Data Manipulation and Analysis
import pandas as pd  # For data manipulation and analysis using DataFrames
import numpy as np   # For numerical operations and handling arrays

# Statistical Analysis
import scipy.stats as stats  # For statistical tests and distributions
from statsmodels.tsa.stattools import adfuller  # For stationarity tests

# Time Series Analysis
from statsmodels.tsa.arima.model import ARIMA  # For ARIMA modeling
from statsmodels.tsa.seasonal import seasonal_decompose  # For seasonal decomposition of time series

# Machine Learning Libraries
from sklearn.model_selection import train_test_split  # For splitting datasets into training and testing sets
from sklearn.ensemble import RandomForestRegressor  # For regression tasks
import xgboost as xgb  # For gradient boosting models

# Visualization Libraries
import matplotlib.pyplot as plt  # For creating static visualizations
import seaborn as sns            # For enhanced statistical visualizations
import plotly.express as px      # For interactive plots (optional)

# Date and Time Handling
from datetime import datetime     # For date/time manipulation


## Data Ingestion/Wrangling

We begin by retrieving the relevant dataframes for our job.

Weather dataset, granularity in hours. We are only interested in two states. This dataset is the one imposing the time dataframe in our study. As we are interested in its influence on energy consumption and flu activity, we isolate only the temperature (in Kelvin degrees).

In [20]:
path = r'sources/historical_hourly_weather_data- 2012_to_2017/temperature.csv'
df = pd.read_csv(path)
display(df)

Unnamed: 0,datetime,Vancouver,Portland,San Francisco,Seattle,Los Angeles,San Diego,Las Vegas,Phoenix,Albuquerque,...,Philadelphia,New York,Montreal,Boston,Beersheba,Tel Aviv District,Eilat,Haifa,Nahariyya,Jerusalem
0,2012-10-01 12:00:00,,,,,,,,,,...,,,,,,,309.100000,,,
1,2012-10-01 13:00:00,284.630000,282.080000,289.480000,281.800000,291.870000,291.530000,293.410000,296.600000,285.120000,...,285.630000,288.220000,285.830000,287.170000,307.590000,305.470000,310.580000,304.4,304.4,303.5
2,2012-10-01 14:00:00,284.629041,282.083252,289.474993,281.797217,291.868186,291.533501,293.403141,296.608509,285.154558,...,285.663208,288.247676,285.834650,287.186092,307.590000,304.310000,310.495769,304.4,304.4,303.5
3,2012-10-01 15:00:00,284.626998,282.091866,289.460618,281.789833,291.862844,291.543355,293.392177,296.631487,285.233952,...,285.756824,288.326940,285.847790,287.231672,307.391513,304.281841,310.411538,304.4,304.4,303.5
4,2012-10-01 16:00:00,284.624955,282.100481,289.446243,281.782449,291.857503,291.553209,293.381213,296.654466,285.313345,...,285.850440,288.406203,285.860929,287.277251,307.145200,304.238015,310.327308,304.4,304.4,303.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45248,2017-11-29 20:00:00,,282.000000,,280.820000,293.550000,292.150000,289.540000,294.710000,285.720000,...,290.240000,,275.130000,288.080000,,,,,,
45249,2017-11-29 21:00:00,,282.890000,,281.650000,295.680000,292.740000,290.610000,295.590000,286.450000,...,289.240000,,274.130000,286.020000,,,,,,
45250,2017-11-29 22:00:00,,283.390000,,282.750000,295.960000,292.580000,291.340000,296.250000,286.440000,...,286.780000,,273.480000,283.940000,,,,,,
45251,2017-11-29 23:00:00,,283.020000,,282.960000,295.650000,292.610000,292.150000,297.150000,286.140000,...,284.570000,,272.480000,282.170000,,,,,,


In [23]:
df_temperature = df[['datetime', 'Pittsburgh', 'Chicago']].copy()
display(df_temperature)
# df_temperature.to_csv('temperature.csv')

Unnamed: 0,datetime,Pittsburgh,Chicago
0,2012-10-01 12:00:00,,
1,2012-10-01 13:00:00,281.000000,284.010000
2,2012-10-01 14:00:00,281.024767,284.054691
3,2012-10-01 15:00:00,281.088319,284.177412
4,2012-10-01 16:00:00,281.151870,284.300133
...,...,...,...
45248,2017-11-29 20:00:00,285.300000,281.340000
45249,2017-11-29 21:00:00,285.330000,281.690000
45250,2017-11-29 22:00:00,282.910000,281.070000
45251,2017-11-29 23:00:00,280.140000,280.060000


Flu dataset, pay attention to its granularity in weeks.

In [24]:
path = r'sources\fluview\StateDatabySeason55_54,53,52,57,56.csv'
df = pd.read_csv(path)
display(df)

Unnamed: 0,STATENAME,URL,WEBSITE,ACTIVITY LEVEL,ACTIVITY LEVEL LABEL,WEEKEND,WEEK,SEASON
0,Alabama,http://adph.org/influenza/,Influenza Surveillance,Level 1,Minimal,Jun-10-2017,23,2016-17
1,Alabama,http://adph.org/influenza/,Influenza Surveillance,Level 10,High,Mar-25-2017,12,2016-17
2,Alabama,http://adph.org/influenza/,Influenza Surveillance,Level 9,High,Apr-01-2017,13,2016-17
3,Alabama,http://adph.org/influenza/,Influenza Surveillance,Level 4,Low,Apr-08-2017,14,2016-17
4,Alabama,http://adph.org/influenza/,Influenza Surveillance,Level 3,Minimal,Apr-15-2017,15,2016-17
...,...,...,...,...,...,...,...,...
16826,New York City,http://www1.nyc.gov/site/doh/providers/health-...,Surveillance Data,Level 8,High,Dec-22-2012,51,2012-13
16827,New York City,http://www1.nyc.gov/site/doh/providers/health-...,Surveillance Data,Level 1,Minimal,Oct-26-2013,43,2013-14
16828,New York City,http://www1.nyc.gov/site/doh/providers/health-...,Surveillance Data,Level 1,Minimal,Oct-19-2013,42,2013-14
16829,New York City,http://www1.nyc.gov/site/doh/providers/health-...,Surveillance Data,Level 1,Minimal,Oct-12-2013,41,2013-14


In [25]:
df_flu = df[(df['STATENAME'] == 'Pennsylvania') | (df['STATENAME'] == 'Illinois')].copy()
display(df_flu)
# df_flu.to_csv('flu.csv')

Unnamed: 0,STATENAME,URL,WEBSITE,ACTIVITY LEVEL,ACTIVITY LEVEL LABEL,WEEKEND,WEEK,SEASON
4057,Illinois,http://www.dph.illinois.gov/topics-services/di...,Seasonal Influenza Surveillance Reports,Level 10,High,Dec-23-2017,51,2017-18
4058,Illinois,http://www.dph.illinois.gov/topics-services/di...,Seasonal Influenza Surveillance Reports,Level 6,Moderate,Dec-16-2017,50,2017-18
4059,Illinois,http://www.dph.illinois.gov/topics-services/di...,Seasonal Influenza Surveillance Reports,Level 3,Minimal,Dec-09-2017,49,2017-18
4060,Illinois,http://www.dph.illinois.gov/topics-services/di...,Seasonal Influenza Surveillance Reports,Level 3,Minimal,Dec-02-2017,48,2017-18
4061,Illinois,http://www.dph.illinois.gov/topics-services/di...,Seasonal Influenza Surveillance Reports,Level 1,Minimal,Nov-25-2017,47,2017-18
...,...,...,...,...,...,...,...,...
12188,Pennsylvania,https://www.health.pa.gov/topics/disease/Flu/P...,Influenza Weekly Report,Level 1,Minimal,Jul-15-2017,28,2016-17
12189,Pennsylvania,https://www.health.pa.gov/topics/disease/Flu/P...,Influenza Weekly Report,Level 1,Minimal,Jul-22-2017,29,2016-17
12190,Pennsylvania,https://www.health.pa.gov/topics/disease/Flu/P...,Influenza Weekly Report,Level 1,Minimal,May-06-2017,18,2016-17
12191,Pennsylvania,https://www.health.pa.gov/topics/disease/Flu/P...,Influenza Weekly Report,Level 1,Minimal,May-13-2017,19,2016-17
