### How we built Observ's Power Burn Forecast ###

Let's assume you care about gas fundamentals. Then you have spent a lot of the past three years pondering about the forecast of power burn. How much demand is there going to be at this price level? How do I model the switching? 

Some people model with a rough regression built at national level. Some use power generation data from alternative fuel as a directional indicator. Some may even try borrowing plant-level stack model from the power traders.  Don’t worry. I have tried them all, too —  and it was tedious, exhausting, and not all that predictive.

But what if we take the best out of all other methods and combine them together? This analysis is about a new approach in power burn modeling. The kind of model that brings you closer to understanding the gas market.

**Part 1 : Evaluate the methodologies**

The more I play with EIA data, the more I love America. To have this much data public at such a granularity is amazing. This is a data scientists' wild dream. 

But it's not perfect. The API is buggy and finding the right data series for the analysis is tricky.  We comb through the electricity data but did not find any good leads. 

So we sat there, started to debate on each method:

*Regression at national level* - RISKY. Regional dynamic has evolved so much in the past five years. Even splitting by EIA storage region is too broad. Modeling at state level is the way to go.

*Using power generation data from alternative fuel as a directional indicator* - SPOTTY. Generation from renewables, hydro, and coal are helpful in explaining some changes. But it's same as playing Monday morning quarterback. Totally makes sense from hindsight but I wouldn't trade on it.

*Using Power trader's plant-level stack model* - COSTLY. Licensing fees on the stack model runs in the six-figures. That doesn't even include internal headcount to run the model. The model has way too many variables. You can generate 100-page reports using stack models. Probably useful in project finance. But trading... no.

**Then we had an epiphany.** To get the merit of power stack model, we'll look at plant-level data. To get regional signals, we will model the burn by state.  To get signals from other fuels, we will make adjustments.

**Part 2 : Sort through power plants. Tried machine learning. But at this scale, humans are plenty enough for version 1 of the model**

In [4]:
import plant_analysis3

Temperature for station 53910 at houston, texas fetched


ValueError: Expected object or value

**Part 3 : Combine them with some python magic.**

Now we have all the gas-fired plants grouped into either price-sensitive or price-insensitive. It's time to bring in other variables: 

**Temperature**: All the daily temperature at each state.

**Cash prices**: Physical settlements of each states using the hub that liquid. At least liquid enough for power plant operators to transact hedges.

**Coal prices**: Using coal forward curve. We can spend three days arguing about how to treat coal prices and still have no consensus. But one thing is for sure - **Price-parity chart is useless.**

**Hydro** - EIA hydro generation data . Pull in BPA data, etc to refine forecasts.

**Nuclear outages** - NERC data.

Then it's python magic time! We use a lot of [Sci-kit learn](http://scikit-learn.org/stable/) nearest neighbor regression functions to find the relationship among all these variable with the power burn of the two seperate group for each state. 

**Part 4 : Add all the other pieces. Alternative fuel definitely matters. We account for them with arithmic from stack model.**

After all the black magic of machine learning, we then come back to regular arithmic and start making some adjustments:

1. New natural gas plant build
2. Gas plant retirement
3. Coal plant retirement
4. Growth in renewables
5. Wind generation
6. Solar generation 


With about 15 variables and calculated at state level, the model may seem busy. But, phew, this is like a poem compared to power stack model.

**Part 5: The forecasts are out, but the fun has only begun...**

Once the forecasts are out, it's how we use the forecasts that create the edge in the trades. Here are some ideas:
1. Compare how the forecast has changed. Getting the trend right is so much useful than getting the precision. No one gets paid for nailing the power burn monthly to one decimal. You get paid when you get the surprises and direction right.
2. Use the power burn number to forecast weekly storage number. You can either forecast the storage number with storage samples or you can forecast it with net S&D. Opportunities rise when the two method diverge.
3. The relationship between all these variables and power burn is best assessed when we feed different scenerios. The model results are not distrubted normally and is asymetrical. It's always worth the ad hoc analysis when you spot assymetry in the modeling results.
4. We can also try move the price curve around and observe how demand respond.

These are just a few big ideas to leverage the power burn modeling. We can also drill down to the impact of specific variable, different grouping, etc. But that's for another time.