# Analyze parsed reports

This notebook analyzes the Texas Commission on Environmental Quality's Air Emission Event Reporting Database data parsed in the previous step. It focuses on reports of events since Aug. 23 in the Texas counties subject to the state's Harvey disaster declaration.

In [1]:
import pandas as pd

In [2]:
DISASTER_DECLARATION_COUNTIES = open("../inputs/disaster-declaration-counties.txt")\
    .read().strip()\
    .split("\n")
len(DISASTER_DECLARATION_COUNTIES)

54

## Load the data

In [3]:
reports = pd.read_csv(
    "../outputs/report-metadata-raw.csv",
)
reports["Event began"] = pd.to_datetime(reports["Event began"])
reports["Event ended"] = pd.to_datetime(reports["Event ended"], errors="coerce")

In [4]:
reports.head()

Unnamed: 0,Action taken,Cause,"City, County",Emissions estimation method,Event began,Event ended,Physical location,Regulated entity RN number,Regulated entity name,This is based on the,Type(s) of air emissions event,report_id
0,,Unauthorized discharge at 150 Persimmon Manhol...,"BAYTOWN, HARRIS",,2017-08-10 20:43:00,2017-08-10 23:45:00,,RN101611457,EAST DISTRICT,FINAL REPORT,WASTEWATER BYPASS,265500
1,,Chlorinated Excursion; Cleared Private Line; C...,"HOUSTON, HARRIS",,2017-08-04 00:00:00,2017-08-04 00:00:00,,RN101607596,BELTWAY WWTP,FINAL REPORT,WASTEWATER BYPASS,265502
2,,Scheduled for Further Repairs;,"HOUSTON, HARRIS",,2017-08-04 00:00:00,2017-08-04 00:00:00,,RN101612158,FWSD 23 WWTP,FINAL REPORT,WASTEWATER BYPASS,265503
3,,Unauthorized Discharge at 1016 Applewood manho...,"FRIENDSWOOD, HARRIS",,2017-08-14 19:00:00,2017-08-14 22:00:00,,RN102183340,BLACKHAWK REGIONAL WTP,FINAL REPORT,WASTEWATER BYPASS,265504
4,,Chlorinated Excursion; Cleared Private Line; C...,"HOUSTON, HARRIS",,2017-08-04 00:00:00,2017-08-04 00:00:00,"9400 White Chapel Ln, Houston, TX",RN101614113,KEEGANS BAYOU WWTP,FINAL REPORT,WASTEWATER BYPASS,265505


## Separate city and county

In [5]:
reports["City"] = reports["City, County"].apply(lambda x: x.split(", ")[0])
reports["City"].value_counts().head(10)

                  108
HOUSTON           103
SAN ANTONIO        28
GOLDSMITH          23
CRANE              23
CORPUS CHRISTI     15
PASADENA           11
BAYTOWN             9
MIDLAND             9
PORT ARTHUR         8
Name: City, dtype: int64

In [6]:
reports["County"] = reports["City, County"].apply(lambda x: x.split(", ")[1])
reports["County"].value_counts().head(10)

HARRIS       149
BEXAR         29
CRANE         27
ECTOR         25
TARRANT       23
NUECES        19
GALVESTON     17
BRAZORIA      17
JEFFERSON     15
GRAYSON       11
Name: County, dtype: int64

### Here are the counties we're not analyzing, as a data-check

In [7]:
print("\n".join(sorted(reports[~reports["County"].str.upper().isin(DISASTER_DECLARATION_COUNTIES)]["County"].unique())))

ANDERSON
ANDREWS
ANGELINA
BELL
BOSQUE
BOWIE
CAMP
CASS
CHEROKEE
CRANE
CULBERSON
DALLAS
DAWSON
DIMMIT
DUVAL
EASTLAND
ECTOR
EL PASO
ELLIS
ERATH
FALLS
FANNIN
FRIO
GAINES
GLASSCOCK
GRAYSON
HARRISON
HAYS
HENDERSON
HIDALGO
HILL
HOOD
HOUSTON
HOWARD
HUNT
HUTCHINSON
JONES
LIMESTONE
LUBBOCK
MARTIN
MCLENNAN
MCMULLEN
MIDLAND
MILAM
MITCHELL
MONTAGUE
NACOGDOCHES
NOLAN
ORANGE
PALO PINTO
PANOLA
PARKER
PECOS
POTTER
RANDALL
REAGAN
REEVES
RUNNELS
RUSK
SABINE
SAN AUGUSTINE
SHELBY
SHERMAN
STEPHENS
TARRANT
TOM GREEN
TRAVIS
UPTON
VAN ZANDT
WEBB
WILLIAMSON
WINKLER
WISE
YOAKUM


## Count by type

In [8]:
REPORT_IDS_TO_IGNORE = [ int(line.split("#")[0].strip())
    for line in open("../inputs/reports-to-ignore.txt") ]
REPORT_IDS_TO_IGNORE

[266113, 266073, 266136, 266156, 266246]

In [9]:
reports_of_interest = reports[
    reports["County"].str.upper().isin(DISASTER_DECLARATION_COUNTIES) &
    ~reports["report_id"].isin(REPORT_IDS_TO_IGNORE) &
    (reports["Event began"] >= "2017-08-23") &
    (reports["Type(s) of air emissions event"].isin([ "EMISSIONS EVENT", "AIR SHUTDOWN" ]))
]

## Summarize report data

In [10]:
reports_of_interest["Regulated entity RN number"].nunique()

42

In [11]:
reports_of_interest.groupby([ "County", "Type(s) of air emissions event" ])\
    .size()\
    .unstack()\
    .fillna(0)\
    .astype(int)\
    .assign(total=lambda x: x.sum(axis=1))\
    .sort_values(["total"], ascending=False)

Type(s) of air emissions event,AIR SHUTDOWN,EMISSIONS EVENT,total
County,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
HARRIS,3,16,19
JEFFERSON,3,6,9
NUECES,2,5,7
BRAZORIA,1,5,6
CHAMBERS,0,3,3
GALVESTON,1,2,3
CALHOUN,1,1,2
WASHINGTON,0,2,2
ATASCOSA,0,1,1
GRIMES,0,1,1


## Analyze contaminants emitted

In [12]:
emissions = pd.read_csv("../outputs/report-emissions-raw.csv")\
    .pipe(lambda x: x[x["report_id"].isin(reports_of_interest["report_id"])])
emissions.head()

Unnamed: 0,report_id,contaminant,authorization,limit,amount_released
701,266106,Opacity,NSR Permit 813,20.0 % op,100.0 % op (est.)
702,266106,Opacity,NSR Permit 812,20.0 % op,100.0 % op (est.)
703,266106,Opacity,NSR Permit 812,20.0 % op,100.0 % op (est.)
765,266114,Benzene,TCEQ Permit 6308,1270.0 LBS/HR,50.0 lbs (est.)
766,266114,Carbon Monoxide,TCEQ Permit 6308,390.2 LBS/HR,10000.0 lbs (est.)


In [13]:
emissions["quantity"] = emissions["amount_released"].apply(lambda x: x.split(" ", 1)[0])\
    .replace("Unknown", pd.np.nan)\
    .astype(float)
emissions["quantity"].head()

701      100.0
702      100.0
703      100.0
765       50.0
766    10000.0
Name: quantity, dtype: float64

In [14]:
emissions["units"] = emissions["amount_released"].apply(lambda x: x.split(" ", 1)[1] if " " in x else None)\
    .replace("Unknown", pd.np.nan)
emissions["units"].head()

701    % op (est.)
702    % op (est.)
703    % op (est.)
765     lbs (est.)
766     lbs (est.)
Name: units, dtype: object

In [15]:
emissions["units"].value_counts()

lbs (est.)     918
% op (est.)     12
Name: units, dtype: int64

In [16]:
EMISSIONS_SUMMARY_COLS = [ 
    "report_id",
    "Event began",
    "Event ended",
    "Regulated entity RN number",
    "Regulated entity name",
    "Type(s) of air emissions event",
    "County",
    "contaminant",
    "authorization",
    "limit",
    "quantity",
    "units",
]

In [17]:
emissions_lbs = emissions[
    emissions["units"] == "lbs (est.)"
].sort_values(["quantity", "report_id", "contaminant"], ascending=False)\
    .pipe(pd.merge, reports, on="report_id")\
    [EMISSIONS_SUMMARY_COLS]
    
emissions_lbs.to_csv("../outputs/largest-emissions-in-lbs.csv", index=False)

In [18]:
emissions_lbs.head()

Unnamed: 0,report_id,Event began,Event ended,Regulated entity RN number,Regulated entity name,Type(s) of air emissions event,County,contaminant,authorization,limit,quantity,units
0,266261,2017-08-27,2017-09-06,RN103919817,CHEVRON PHILLIPS CHEMICAL CEDAR BAYOU PLANT,AIR SHUTDOWN,HARRIS,Carbon Monoxide,1504A,1892.04 LBS/HR,244040.0,lbs (est.)
1,266261,2017-08-27,2017-09-06,RN103919817,CHEVRON PHILLIPS CHEMICAL CEDAR BAYOU PLANT,AIR SHUTDOWN,HARRIS,Carbon Monoxide,37063,106.64 LBS/HR,47000.0,lbs (est.)
2,266261,2017-08-27,2017-09-06,RN103919817,CHEVRON PHILLIPS CHEMICAL CEDAR BAYOU PLANT,AIR SHUTDOWN,HARRIS,Ethylene (gaseous),1504A,1795.72 LBS/HR,46861.0,lbs (est.)
3,266261,2017-08-27,2017-09-06,RN103919817,CHEVRON PHILLIPS CHEMICAL CEDAR BAYOU PLANT,AIR SHUTDOWN,HARRIS,NOX,1504A,261.96 LBS/HR,33788.0,lbs (est.)
4,266261,2017-08-27,2017-09-06,RN103919817,CHEVRON PHILLIPS CHEMICAL CEDAR BAYOU PLANT,AIR SHUTDOWN,HARRIS,Benzene,1504A,1795.72 LBS/HR,27505.0,lbs (est.)


In [19]:
emissions_lbs["quantity"].sum()

3436945.7755999998

## Analyze emissions by facility

In [20]:
lbs_grp = emissions_lbs.groupby([ "Regulated entity RN number" ])

lbs_by_entity = pd.DataFrame({
    "Regulated entity name": lbs_grp["Regulated entity name"].first(),
    "County": lbs_grp["County"].first(),
    "quantity_1000s": (lbs_grp["quantity"].sum() / 1000).round(2),
    "contaminants": lbs_grp["contaminant"].apply(lambda x: " • ".join(sorted(x.str.lower().unique()))),
    "report_ids": lbs_grp["report_id"].apply(lambda x: " • ".join(sorted(x.astype(str).str.lower().unique()))),
    "units": "lbs"
}).sort_values(["quantity_1000s", "Regulated entity name"], ascending=False)

lbs_by_entity.to_csv("../outputs/facilities-with-most-emissions-lbs.csv")
lbs_by_entity

Unnamed: 0_level_0,County,Regulated entity name,contaminants,quantity_1000s,report_ids,units
Regulated entity RN number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
RN103919817,HARRIS,CHEVRON PHILLIPS CHEMICAL CEDAR BAYOU PLANT,"1,3-butadiene • acetylene • benzene • butane •...",745.47,266261 • 266262,lbs
RN100217389,JEFFERSON,FLINT HILLS RESOURCES PORT ARTHUR FACILITY,"1,3-butadiene • acetylene • benzene • butanes ...",533.45,266301 • 266378,lbs
RN100224815,HARRIS,PASADENA TERMINAL,"1 c,2t,3-trimethylcyclopentane • 1,2,4-trimeth...",406.56,266269 • 266556,lbs
RN102579307,HARRIS,EXXON MOBIL BAYTOWN REFINERY,"1,3-butadiene • ammonia • benzene • butane • b...",383.85,266277 • 266294,lbs
RN100825249,BRAZORIA,CHEVRON PHILLIPS CHEMICAL SWEENY OLD OCEAN FAC...,"1,3-butadiene • acetylene • benzene • butanes ...",222.73,266372,lbs
RN100210608,GALVESTON,MARATHON PETROLEUM TEXAS CITY REFINERY,"ammonia • benzene • butadiene • butane, i • bu...",201.51,266376 • 266570,lbs
RN100238708,BRAZORIA,CHOCOLATE BAYOU PLANT,"1,3-butadiene • butene • carbon monoxide • eth...",152.72,266271,lbs
RN102450756,JEFFERSON,EXXONMOBIL BEAUMONT REFINERY,carbon monoxide • hydrogen sulfide • nitrogen ...,108.69,266314 • 266516,lbs
RN100235266,NUECES,FHR CORPUS CHRISTI WEST PLANT,benzene • carbon monoxide • hydrogen sulfide •...,108.51,266120 • 266278,lbs
RN100218973,CALHOUN,FORMOSA POINT COMFORT PLANT,acetylene • benzene • butadiene • butane • but...,91.84,266124,lbs


---

---

---