In [1]:
%pylab inline
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy, csv
import pandas as pd

Populating the interactive namespace from numpy and matplotlib


## North Carolina TRI Data, 1997-2001

In this notebook, we'll be looking at TRI reported releases in North Carolina for the years 1997 through 2001. The three main goals of this notebook are:

    - Explore top releasing facilities in terms of total releases and carcinogenic releases
    - Explore top releasing industry sectors in terms of total releases and carcinogenic releases
    - Visualize on-site releases in NC over this time period using Basemap

#### Top Releasing Facilities
Identifying the facilities in NC that have the highest total, and carcinogenic, releases during this time period will give us an idea of the locations in the state which may be most heavily affected by toxic release.

#### Top Releasing Industry Sectors
Looking at the top releasing industry sectors for NC will be interesting in and of itself, and will let us see if releases in particular sectors are high because of the sector itself or because of the relative density of the industry type in the state of NC. This will be relevant to compare to other states later on.

#### Basemap Visualization
Using Basemap to visualise features of these data in NC will be a good way to think about which features are amenable and relevant to geographical mapping, as opposed to numerical trends that are more naturally represented with charts.

### Importing the TRI Data

In [2]:
# reading in each set of csv data
nc_97 = pd.read_csv('./data/TRI_1997_NC.csv')
nc_98 = pd.read_csv('./data/TRI_1998_NC.csv')
nc_99 = pd.read_csv('./data/TRI_1999_NC.csv')
nc_00 = pd.read_csv('./data/TRI_2000_NC.csv')
nc_01 = pd.read_csv('./data/TRI_2001_NC.csv')

# storing the data in a common array to aggregate for comprehensions, etc
nc_data = [nc_97,nc_98,nc_99,nc_00,nc_01]

In [34]:
# general poking around in this cell -- DELETE MEEEEEEEEEE

## Top Releasing Facilities

In [42]:
#code, charts and graphs related to top releasing facilities

#separating out attributes of interest
totalreleases = [yr[['YEAR','FACILITY_NAME','CHEMICAL','TOTAL_RELEASES']] for yr in nc_data]
topten_byyear = [totalreleases[i].sort_values('TOTAL_RELEASES', ascending=False)[0:10] for i in range(len(totalreleases))]
topten_byyear[0]

Unnamed: 0,YEAR,FACILITY_NAME,CHEMICAL,TOTAL_RELEASES
2512,1997,ELEMENTIS CHROMIUM INC,CHROMIUM COMPOUNDS(EXCEPT CHROMITE ORE MINED I...,9120298.0
1197,1997,PCS PHOSPHATE CO INC,PHOSPHORIC ACID,8391999.0
1538,1997,PCS PHOSPHATE CO INC,AMMONIA,3692213.0
807,1997,SMITHFIELD-TAR HEEL,NITRATE COMPOUNDS,3682714.0
608,1997,GERDAU LONG STEEL NA-CHARLOTTE MILL,ZINC COMPOUNDS,3292326.0
942,1997,BLUE RIDGE PAPER PRODUCTS INC (DBA EVERGREEN P...,METHANOL,1932700.0
281,1997,SHURTAPE TECHNOLOGIES LLC - HICKORY TAPE PLANT,TOLUENE,1864554.0
1065,1997,INTERNATIONAL PAPER RIEGELWOOD MILL,METHANOL,1571000.0
2266,1997,DOMTAR PAPER CO LLC-PLYMOUTH MILL,METHANOL,1480000.0
107,1997,RAILROAD FRICTION PRODUCTS CORP,N-HEXANE,1120000.0


## Top Releasing Industry Sectors

In [48]:
#code, charts and graphs related to top releasing facilities
sectors = [yr[['INDUSTRY_SECTOR', 'CHEMICAL','TOTAL_RELEASES']] for yr in nc_data]
sectors[0].groupby(['INDUSTRY_SECTOR','CHEMICAL']).sum()#.sort_values('TOTAL_RELEASES', ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,TOTAL_RELEASES
INDUSTRY_SECTOR,CHEMICAL,Unnamed: 2_level_1
Apparel,AMMONIA,1065.0
Apparel,CHLORINE,0.0
Apparel,DIMETHYL PHTHALATE,976.0
Apparel,NAPHTHALENE,157.0
Apparel,"SULFURIC ACID (1994 AND AFTER ACID AEROSOLS"" ONLY)""",0.0
Beverages,AMMONIA,101600.0
Beverages,CHLORINE,0.0
Beverages,"HYDROCHLORIC ACID (1995 AND AFTER ACID AEROSOLS"" ONLY)""",82000.0
Beverages,NITRATE COMPOUNDS,21300.0
Beverages,PHOSPHORIC ACID,0.0


## Basemap Visualization

In [5]:
#code and Basemap visualizations