# Overview
A feeder is bundle of high capacity electrical wires leaving an electricity substation.  PG\&E has about 3,000 feeders, and each one typically serves a few hundred to a few thousand electricity customers.  In much of PG\&E, feeders are supported by wooden poles, though underground feeders are becoming more common.  

This data set organizes information for each feeder in PG\&E, and it's a subset of the data used in [this](https://iopscience.iop.org/article/10.1088/1748-9326/ac8d18/meta) paper.  You can think of each row in the dataset we're about to load in as a "day in the life of a feeder".  That is, each row represents data that were true for the feeder on that day.  Some things don't change (feeder length, voltage level).  But other things do -- like whether or not an ignition ocurred on that day, or the day's maximum wind speed. 

Let's take a quick look at the data.

In [1]:
import pandas as pd
import numpy as np

In [2]:
wildfire_df = pd.read_csv('feeder_daily_small.csv',index_col=0)
wildfire_df.head()

Unnamed: 0_level_0,FeederID,VOLTNUM,Total Miles,Average Transformers Age,Average Support Structure Age,ERC Max,GridMET WS Max,Wire Down,Ignitions,Historical Ignition Count,Hist WD Count
Time Period,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2018-01-01,252951104,12.0,14.44,20.670214,42.483765,68.0,0.9,0,0,0.0,0.0
2018-01-01,103201101,12.0,48.53,27.914755,41.002144,60.0,2.2,0,0,0.0,13.0
2018-01-01,253911104,12.0,52.63,24.57563,49.305794,86.0,1.8,0,0,3.0,3.0
2018-01-01,82021102,12.0,6.09,21.706293,41.477966,41.0,1.8,0,0,0.0,2.0
2018-01-01,152571104,12.0,11.62,23.358622,46.34694,37.0,1.5,0,0,0.0,3.0


In [3]:
wildfire_df.shape

(1081130, 11)

Let's go through these headers one at a time.

* Time Period: This is the day of the record.  
* FeederID: This is a unique value for each feeder.  You can figure out how many feeders are in the data set using the `.unique` method.  See below.
* VOLTNUM: This is the rated voltage of the feeder, in kilovolts.  There are four voltages in PG\&E's system.  See below.
* Average Transformers Age: This field refers to the age (in years) of so-called "secondary transformers."  These are the cylindrical "cans" you see mounted to poles near buildings.  They convert feeder-level voltages (in kilovolts, as above) to voltages we can safely use in our buildings (e.g. 120V, 208V, 240V).  To construct this field, we aggregated together all transformers that belong to a given feeder and calculated their age.  The age changes (ever so slightly) every day.
* Average Support Structure Age: To construct this field, we aggregated together all poles that belong to a given feeder and calculated their age (in years).  The age changes (ever so slightly) every day.
* ERC Max: This is the maximum energy release component of the vegetation surrounding the feeder for the day in question.  It is measured in BTU per square foot, and is derived from daily satellite and weather ovbservations and an element of the GridMET data set from UC Merced.
* GridMET WS Max: This is the maximum daily wind speed, in meters per second, taken from GridMET.
* Wire down: If PG\&E reported that a feeder wire contacted the ground on a given day, this column reports a 1.  Otherwise it is 0.  
* Ignitions: If PG\&E reported that an ignition ocurred on a feeder on a given day, this column reports a 1.  Otherwise it is 0. 
* Historical Ignition Count: This is a tally of all ignitions that ocurred on this feeder on days leading up to the day in question.
* Historical WD Count: This is a tally of all wire down events that ocurred on this feeder on days leading up to the day in question.

There are 2097 unique feeders:

In [4]:
wildfire_df.FeederID.unique().shape

(2096,)

There are four voltage levels for the feeders

In [5]:
wildfire_df.VOLTNUM.unique()

array([12.  , 21.  , 17.  ,  4.16])

There are 621 ignitions in the dataset:

In [6]:
np.sum(wildfire_df.Ignitions)

621

There are about 4,700 wire down records.

In [7]:
np.sum(wildfire_df['Wire Down'])

4726

# Things you could do with these data:
I'll give you a few examples:

1. The data lend themselves cleanly to a 'classification' problem.  Specifically, you could build models that classify feeders into two groups: those that are likely to experience an ignition on a given day, and those that are not.  We'll be learning lots of tools in the course to support that type of modeling.
2. You could also merge data on public safety power shutoffs (PSPS) to try to predict PSPS durations based on feeder characteristics.  This would be a regression problem. We'll learn lots of tools in the course for those, too.  
3. If we decide where to cut off power using risk measures from a predictive model built with these data, what types of communities are most likely to be impacted by the cut offs?  You could use spatial information on where feeders are located along with census data for this analysis.  

But you should think for yourselves.  If you're interested in the grid, wildfire, and public safety, this would be a really fun area to learn more about to come up with your own questions.

Note also that I have data sets that cover more time (back to 2015) and with many more features. These are too big to keep on github, but they are available to you if you wish.  I do not think datahub would easily accommodate the file sizes, but one of our connector assistants could potentially help you find workarounds.

Finally, these data can be merged with other geospatial information if you wish.  We can get you centroids of each feeder, or even shapefiles that describe the entire geographic scope of each feeder. 