# Statistical Analysis of Rate Increases

Our hypothesis is that the electricity rates in coal-heavy Appalachian states have been increasing at a (statistically) faster rate than in other US states. Presumably, states that have pivoted to renewables and gas have had more stable rates.

To test this hypothesis, we want to see if there is a correlation between coal consumption and electricity rates over time, using a time-series regression.

## Regional Trends

First, we visually inspect the rate trends. Different regions of the country experienced characteristic patterns of rate changes since 2003. We can cluster the states into 8 regions below:

![](figs/regions/newengland.png)

The New England states generally experienced a large spike in rates around 2009, followed by a steady decline, then another spike around 2014 (possibly due to unstable gas prices?).

![](figs/regions/midatlantic.png)

Maryland, Virginia, and DC saw huge rate increases from 2004-2009, while the other Mid-Atlantic states have been steadily increasing.

![](figs/regions/appalachia.png)

The Appalachian states saw a price spike in 2009, followed by steady increase till today.

![adsf](figs/regions/greatlakes.png)

The Great Lakes region (eastern Midwest) experienced a smaller spike in 2009, followed by various fluctuations. 

![](figs/regions/greatplains.png)

The Great Plains states (western Midwest) have very similar rates, steadily increasing since 2003.

![](figs/regions/southerngas.png)

Gulf Coast/Southeast states saw two spikes: first in 2006 and then in 2009. These are all states where gas comprises the majority of their electricity generation. Curiously Texas has seen a steady decline since 2009.

![](figs/regions/mountain.png)

The Mountain states have all seen rate increases over time, except for Nevada, perhaps due to their switch from coal to natural gas (investigate further).

![](figs/regions/pacific.png)

Pacific states (excluding Hawaii) have seen gradual rate increases with few fluctuations. PNW states have low rates due to hydro.

## Average Rate Increases

As a simple initial analysis, we use linear regression to find the average change in electricity rates in each state. For some states (mid-Atlantic), a linear fit is clearly a poor choice for modeling rate trends. Nonetheless, it provides a reasonable first approximation of the magnitude of rate increases in each state.

We choose to constrain the time period to 2009-onward, to avoid any influence from the 2008 recession and focus on recent uptake of renewables. 35 of the 50 states have R^2 values >0.6.

Below we plot the average change in electricity rates vs. % coal generation for each state in 2018:

![](figs/coal_vs_rate_change.png)

Generally, our coal-heavy states (at the top) tend to have higher rate increases (towards the right) than the rest of the US.

The p-value for this correlation is <0.05, which suggests there is a statistically significant correlation between rate increases and coal generation. Take this with a grain of salt, however, as the linear regressions used to generate the average rate changes may not be accurate.

Ignoring the states with unique circumstances (Alaska, California), there are some notable exceptions to this trend:
* South Dakota is replacing coal with wind, but still has higher rate increases than North Dakota
* Wind is also replacing the dominant generation source in the following states, but rates are still increasing: Minnesota (replacing coal), Idaho (replacing hydro), and Nebraska (replacing nuclear)
* Kansas and Massachusetts are replacing coal with gas, but still high rate increases.

The states with the lowest rate increases are the New England, mid-Atlantic, and southern states. That seems to make a compelling case for gas.

## Time Series Regression

The next step (IN PROGRESS) is to identify if changes in electricity rates are correlated with changes in generation mix, on a year-by-year basis. This would allow us to determine if, for example, shuttering coal for natural gas stabilizes rates, while increasing renewables has little effect. 

In other words: instead of just comparing the *average* rate change with the *2018* coal generation, I want to determine if the *time series* of rates (2009-2018) can be predicted by the *time series* of coal generation (2009-2018), as well as other generation sources.

This was inspired by the generation mix visualizations published last December in the [NYTimes](https://www.nytimes.com/interactive/2018/12/24/climate/how-electricity-generation-changed-in-your-state.html) (below). The data source is [here](https://www.eia.gov/electricity/data/state/annual_generation_state.xls), and is ingested into a dataframe in generation_mix.py.

![](figs/nytimes_iowa.png)

The next step is to implement a time-series regression to determine the predictive power of coal and other generation sources in predicting rate changes. We choose to use non-linear, neural network [MLPRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html) in Python's scikit-learn package to accommodate the nonlinear interactions of rate changes.

The predictor variables (coal generation, hydro, renewables, etc) need to be scaled and standardized, and a training set must be selected. If the model will be used to forecast future rate changes, the model should be cross-validated using different sets of train/test splits. More important than forecasting, however, is just determining the predictive power of coal generation in predicting rate increases.

## Specific Case Studies

I began to take some qualitiative notes on how the generation mix appears to be changing in each state. States are listed in order of rate increase (highest first).
* Coal > wind indicates that coal is historically dominant, but wind share is increasing. 
* __ indicates states with consistently high coal
* ++ indicates states with historically high coal but transitioning. 

After performing statistical analysis on all 50 states, I want to look at specific states, or pairs of neighboring states, to compare:
* Are Illinois / Iowa doing better than the neighboring Appalachian states due to fuel diversification / deregulation?
* Is natural gas currently having the lowest rate impact on customers? How does this correlate with gas prices, and how can we predict this will change if gas prices surge?