## Fuel type definition based on EIA 923 Fuel Receipts and Costs data 

In this notebook, we'll develop a function to classify the fuel type of a given plant based on the aggregate heat content of the fuel delivered to that plant during a given year. The function will take the following arguments:

* PUDL plant ID (Int)
* Year (Int)
* Threshold needed to assign a classification (Float)

Let's talk through some of the psuedocode needed to make this function a reality:
1. Pull in the necessary fuel receipts and costs information from the database using SQLAlchemy
2. Group by plant and year
3. For these groups, sum over fuel heat content delivered for a
* total sum
* sum by three major fuel types - oil, gas, coal
4. We'll compute the percentage of the fuel content delivered over the total sum of heat content delivered for the the three major fuel types
5. If the percentage crosses the threshold, assign it the applicable fuel type string.

Returns: 

The PUDL ID, year, the fuel type.

In [1]:
import sys
import os
sys.path.append(os.path.abspath(os.path.join('..','..')))
from pudl import pudl, ferc1, eia923, settings, constants
from pudl import models, models_ferc1, models_eia923
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sqlalchemy as sa
from sqlalchemy import and_, tuple_
%matplotlib inline

In [2]:
pudl_engine = pudl.db_connect_pudl()

In [55]:
Session = sa.orm.sessionmaker()
Session.configure(bind = pudl_engine)
session = Session()

frc_table = models.PUDLBase.metadata.tables['fuel_receipts_costs_eia923']
plants_eia923_tbl = models.PUDLBase.metadata.tables['plants_eia923']

frc_select = sa.sql.select([frc_table.c.plant_id,
                            plants_eia923_tbl.c.plant_name,
                            plants_eia923_tbl.c.plant_id_pudl,
                            frc_table.c.fuel_group,
                            frc_table.c.fuel_quantity,
                            frc_table.c.average_heat_content,
                            frc_table.c.report_date,
                            frc_table.c.fuel_cost]).\
                            where(frc_table.c.plant_id == plants_eia923_tbl.c.plant_id)
    
frc_df = pd.read_sql(frc_select, pudl_engine)
frc_df

Unnamed: 0,plant_id,plant_name,plant_id_pudl,fuel_group,fuel_quantity,average_heat_content,report_date,fuel_cost
0,3,Barry,32,Coal,120393.0,24.000,2009-01-01,631.1
1,3,Barry,32,Coal,199388.0,23.000,2009-01-01,350.3
2,3,Barry,32,Coal,43105.0,22.785,2009-01-01,355.7
3,3,Barry,32,Coal,9458.0,23.790,2009-01-01,498.0
4,3,Barry,32,Coal,9094.0,24.000,2009-01-01,629.0
5,3,Barry,32,Natural Gas,1902799.0,1.036,2009-01-01,680.9
6,3,Barry,32,Natural Gas,28469.0,1.045,2009-01-01,568.0
7,7,Gadsden,203,Coal,21205.0,24.908,2009-01-01,397.6
8,7,Gadsden,203,Natural Gas,3189.0,1.014,2009-01-01,638.1
9,7,Gadsden,203,Natural Gas,11.0,1.009,2009-01-01,612.1


In [57]:
frc_df[frc_df.fuel_group == 'Petroleum']

Unnamed: 0,plant_id,plant_name,plant_id_pudl,fuel_group,fuel_quantity,average_heat_content,report_date,fuel_cost
32,8,Gorgas,226,Petroleum,362.0,5.875,2009-01-01,1182.3
44,26,E C Gaston,175,Petroleum,859.0,5.825,2009-01-01,2095.5
71,50,Widows Creek,1306,Petroleum,1176.0,5.670,2009-01-01,1046.2
72,50,Widows Creek,1306,Petroleum,2547.0,5.670,2009-01-01,1042.2
73,50,Widows Creek,1306,Petroleum,349.0,5.670,2009-01-01,1116.2
84,56,Charles R Lowman,1310,Petroleum,387.0,5.500,2009-01-01,1238.1
85,56,Charles R Lowman,1310,Petroleum,358.0,5.500,2009-01-01,1265.3
89,64,Lemon Creek,320,Petroleum,20726.0,5.712,2009-01-01,1640.4
90,75,Anchorage 1,1320,Petroleum,100.0,5.520,2009-01-01,1854.4
98,99,Frederickson,197,Petroleum,554.0,5.870,2009-01-01,1247.0


In [109]:
def fuel_type_assigner(pudl_id, year, threshold):
    
    Session = sa.orm.sessionmaker()
    Session.configure(bind = pudl_engine)
    session = Session()
    
    frc_table = models.PUDLBase.metadata.tables['fuel_receipts_costs_eia923']
    plants_eia923_tbl = models.PUDLBase.metadata.tables['plants_eia923']
    
    frc_select = sa.sql.select([frc_table.c.plant_id,
                            plants_eia923_tbl.c.plant_name,
                            plants_eia923_tbl.c.plant_id_pudl,
                            frc_table.c.report_date,
                            frc_table.c.fuel_group,
                            frc_table.c.fuel_quantity,
                            frc_table.c.average_heat_content,
                            frc_table.c.fuel_cost]).\
                            where(frc_table.c.plant_id == plants_eia923_tbl.c.plant_id)
    
    f1_fuel_table = models.PUDLBase.metadata.tables['fuel_ferc1']
    plants_ferc1_tbl = models.PUDLBase.metadata.tables['plants_ferc1']
    
    f1_select = sa.sql.select([plants_ferc1_tbl.c.plant_name,
                            plants_ferc1_tbl.c.plant_id_pudl,
                            f1_fuel_table.c.report_year,
                            f1_fuel_table.c.fuel,
                            f1_fuel_table.c.fuel_qty_burned,
                            f1_fuel_table.c.fuel_avg_mmbtu_per_unit,
                            f1_fuel_table.c.fuel_cost_per_unit_burned]).\
                            where(f1_fuel_table.c.respondent_id == plants_ferc1_tbl.c.respondent_id).\
                            where(f1_fuel_table.c.plant_name == plants_ferc1_tbl.c.plant_name)
    
    frc_df = pd.read_sql(frc_select, pudl_engine)
    
    frc_df['report_date'] = pd.to_datetime(frc_df['report_date'])
    frc_df['mmbtu_delivered'] = frc_df['fuel_quantity'] * frc_df['average_heat_content']
    frc_df['year'] = frc_df['report_date'].dt.year
    
    eia_selected_plant = frc_df[(frc_df.plant_id_pudl == pudl_id) & (frc_df.year == year)]
    
    eia_total_mmbtu_delivered = eia_selected_plant['mmbtu_delivered'].sum()
    
    eia_fuel_group = eia_selected_plant.groupby('fuel_group')
    eia_fuel_sums = eia_fuel_group.aggregate(np.sum)['mmbtu_delivered']
    
    if any(eia_selected_plant.fuel_group == 'Coal'):
        eia_coal_percentage = eia_fuel_sums['Coal'] / eia_total_mmbtu_delivered * 100
    else:
        eia_coal_percentage = 0
    if any(eia_selected_plant.fuel_group == 'Natural Gas'):
        eia_gas_percentage = eia_fuel_sums['Natural Gas'] / eia_total_mmbtu_delivered * 100
    else:
        eia_gas_percentage = 0
    if any(eia_selected_plant.fuel_group == 'Petroleum'):
        eia_oil_percentage = eia_fuel_sums['Petroleum'] / eia_total_mmbtu_delivered * 100
    else:
        eia_oil_percentage = 0
    
    f1_df = pd.read_sql(f1_select, pudl_engine)
    
    ferc_selected_plant = f1_df[(f1_df.plant_id_pudl == pudl_id) & (f1_df.report_year == year)]
    
    ferc_selected_plant['mmbtu_burned'] = ferc_selected_plant['fuel_qty_burned'] * ferc_selected_plant['fuel_avg_mmbtu_per_unit']
    
    ferc_total_mmbtu_burned = ferc_selected_plant['mmbtu_burned'].sum()  
    ferc_fuel_group = ferc_selected_plant.groupby('fuel')
    ferc_fuel_sums = ferc_fuel_group.aggregate(np.sum)['mmbtu_burned']
    
    if any(ferc_selected_plant.fuel == 'coal'):
        ferc_coal_percentage = ferc_fuel_sums['coal'] / ferc_total_mmbtu_burned * 100
    else:
        ferc_coal_percentage = 0
    if any(ferc_selected_plant.fuel == 'gas'):
        ferc_gas_percentage = ferc_fuel_sums['gas'] / ferc_total_mmbtu_burned * 100
    else:
        ferc_gas_percentage = 0
    if any(ferc_selected_plant.fuel == 'oil'):
        ferc_oil_percentage = ferc_fuel_sums['oil'] / ferc_total_mmbtu_burned * 100
    else:
        ferc_oil_percentage = 0
    
    if eia_coal_percentage > threshold and ferc_coal_percentage > threshold:
        plant_type = 'coal'
    
    elif eia_gas_percentage > threshold and ferc_gas_percentage > threshold:
        plant_type = 'gas'
        
    elif eia_oil_percentage > threshold and ferc_oil_percentage > threshold:
        plant_type = 'oil'
    
    else:
        plant_type = 'a mixture of fuels'
    
    return print('EIA923 mmBTUs delivered to %s in %s, %s percent coal, %s percent gas, and %s percent oil.\
FERC: %s percent coal, %s percent gas, %s oil. This plan runs mainly on %s.'\
                 % (eia_selected_plant.plant_name.iloc[0], eia_selected_plant.year.iloc[0], eia_coal_percentage, \
                    eia_gas_percentage, eia_oil_percentage, ferc_coal_percentage, \
                    ferc_gas_percentage, ferc_oil_percentage, plant_type))

In [112]:
fuel_type_assigner(1495,2011, 51)

EIA923 mmBTUs delivered to George Birdsall in 2011, 0 percent coal, 100.0 percent gas, and 0 percent oil.FERC: 0 percent coal, 0 percent gas, 0 oil. This plan runs mainly on a mixture of fuels.


In [56]:
def f1_fuel_type_assigner(pudl_id, year):
    
    Session = sa.orm.sessionmaker()
    Session.configure(bind = pudl_engine)
    session = Session()
    
    f1_fuel_table = models.PUDLBase.metadata.tables['fuel_ferc1']
    plants_ferc1_tbl = models.PUDLBase.metadata.tables['plants_ferc1']
    
    f1_select = sa.sql.select([plants_ferc1_tbl.c.plant_name,
                            plants_ferc1_tbl.c.plant_id_pudl,
                            f1_fuel_table.c.report_year,
                            f1_fuel_table.c.fuel,
                            f1_fuel_table.c.fuel_qty_burned,
                            f1_fuel_table.c.fuel_avg_mmbtu_per_unit,
                            f1_fuel_table.c.fuel_cost_per_unit_burned]).\
                            where(f1_fuel_table.c.respondent_id == plants_ferc1_tbl.c.respondent_id).\
                            where(f1_fuel_table.c.plant_name == plants_ferc1_tbl.c.plant_name)
    
    f1_df = pd.read_sql(f1_select, pudl_engine)
    
    ferc_selected_plant = f1_df[(f1_df.plant_id_pudl == pudl_id) & (f1_df.report_year == year)]
    
    ferc_selected_plant['mmbtu_burned'] = ferc_selected_plant['fuel_qty_burned'] * ferc_selected_plant['fuel_avg_mmbtu_per_unit']
    
    ferc_total_mmbtu_burned = ferc_selected_plant['mmbtu_burned'].sum()  
    ferc_fuel_group = ferc_selected_plant.groupby('fuel')
    ferc_fuel_sums = ferc_fuel_group.aggregate(np.sum)['mmbtu_burned']
    
    if any(ferc_selected_plant.fuel == 'coal'):
        ferc_coal_percentage = ferc_fuel_sums['coal'] / ferc_total_mmbtu_burned * 100
    else:
        ferc_coal_percentage = 0
    if any(ferc_selected_plant.fuel == 'gas'):
        ferc_gas_percentage = ferc_fuel_sums['gas'] / ferc_total_mmbtu_burned * 100
    else:
        ferc_gas_percentage = 0
    if any(ferc_selected_plant.fuel == 'oil'):
        ferc_oil_percentage = ferc_fuel_sums['oil'] / ferc_total_mmbtu_burned * 100
    else:
        ferc_oil_percentage = 0
    
    return print('According to FERC, of the mmBTUs delivered to %s in %s, %s percent  was coal, \
%s percent natural gas, %s percent oil.'\
                 % (ferc_selected_plant.plant_name.iloc[0], ferc_selected_plant.report_year.iloc[0],\
                    ferc_coal_percentage, ferc_gas_percentage, ferc_oil_percentage))

In [57]:
f1_fuel_type_assigner(429, 2011)

According to FERC, of the mmBTUs delivered to Oklaunion in 2011, 99.7504819562 percent  was coal, 0 percent natural gas, 0.2495180438 percent oil.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [48]:
total_mmbtu_delivered = barry['mmbtu_delivered'].sum()
total_mmbtu_delivered

102520465.12199998

In [54]:
grouped = barry.groupby(['fuel_group'])
grouped.aggregate(np.sum)['mmbtu_delivered']

fuel_group
Coal           4.714560e+07
Natural Gas    5.537486e+07
Name: mmbtu_delivered, dtype: float64