# RIDOH
## Public School Water Lead Contamination Data

#### Source:
http://www.health.ri.gov/data/schools/water/

Note that if you view the source for one of the pages, you'll see all the data is being pulled from a Google doc, which is a plaintext json file: https://docs.google.com/spreadsheet/tq?key=&transpose=0&headers=1&gid=736080368.

In [1]:
import re
import json
import pandas as pd
import requests

fn = "../../../data/ridoh/json.txt"

with open(fn, 'r+') as f:
    crop = len("google.visualization.Query.setResponse('")
    txt = f.readlines()[1][crop-1:-2]
    ridoh_json = json.loads(txt)

In [2]:
# Convert relevant information from json into an array

col_headers = [col['label'] for col in ridoh_json['table']['cols']]

rows_dict = [[cell for cell in row['c']] for row in ridoh_json['table']['rows']]

rows = []
for i in range(len(rows_dict)):
    r = []
    for j in range(len(rows_dict[i])):
        if isinstance(rows_dict[i][j], dict):
            fill = rows_dict[i][j]['v']
        else:
            fill = None
        if isinstance(fill, list):
            fill = str(fill) # Collection times are lists or None
        r.append(fill)
    rows.append(r)

df = pd.DataFrame.from_records(rows, columns=col_headers)
df.head()

Unnamed: 0,Municipality,School,Location,Type,Collection Date,Parts per Billion,Recommendation,COLLECTION_TIME,DETECTION_LIMIT (ppb),Comments,School District / Daycare
0,Providence,Meeting Street School,,,12/20/2006,ND,"<a href=""http://health.ri.gov/water/about/lead...",,0.0,Not detected-no limits listed,Charter
1,East Greenwich,Frenchtown School,Cafeteria,Faucet,4/5/2016,Less than 5,"<a href=""http://health.ri.gov/water/about/lead...",,5.0,flush,East Greenwich
2,East Greenwich,Meadowbrook Farms School,,Water fountain,4/5/2016,Less than 5,"<a href=""http://health.ri.gov/water/about/lead...",,5.0,flush,East Greenwich
3,East Providence,Alice M. Waddington School,Sample 101-4,Water fountain,7/12/2016,Less than 5,"<a href=""http://health.ri.gov/water/about/lead...",,5.0,First Draw,East Providence
4,East Providence,Alice M. Waddington School,Sample 101-5,Water fountain,7/12/2016,Less than 5,"<a href=""http://health.ri.gov/water/about/lead...",,5.0,First Draw,East Providence


In [3]:
# Split dates
dates = df['Collection Date'].str.split(pat='-', expand=True)
dates.columns = ['Date1', 'Date2']
dates['Date1'] = pd.to_datetime(dates['Date1'])
dates['Date2'] = pd.to_datetime(dates['Date2'])
df = df.join(dates)

We can use the data to figure out which schools have had the most amount of time since the last test of their water.

I've listed the most recent test by Municipality+School plus lead ppb for these tests, sorted by date ascending. On dates with multiple tests, the test with the highest ppb recorded is provided. I've listed the oldest 30 tests.

In [4]:
df['max_date'] = df[['Date1','Date2']].max(axis=1)

all_max_dates = df.groupby(['Municipality', 'School'], as_index=False)['max_date'].max()

df = df.merge(all_max_dates, how='left', on=['Municipality', 'School'])
df['ppb'] = df['Parts per Billion'].apply(lambda x : int(re.search('[0-9]*$', str(x)).group(0) or -1))

last_ppb = df.groupby(['Municipality', 'School', 'max_date_y'])['ppb'].max().rename('last_ppb')

df_last_test_results = df[(df['max_date_y']==df['Date1'])|(df['max_date_y']==df['Date2'])]
df_last_test_results = df_last_test_results.merge(last_ppb, how='left', on=['Municipality', 'School', 'max_date_y'])

df_last_test_results[['Municipality', 'School', 'max_date_y', 'last_ppb']] \
    .groupby(['Municipality', 'School', 'max_date_y'], as_index=False) \
    .mean().sort_values(by='max_date_y', ascending=True).reset_index().head(n=30)

Unnamed: 0,index,Municipality,School,max_date_y,last_ppb
0,199,Providence,Meeting Street School,2006-12-20,-1
1,40,Cranston,DCYF Alternative Education Program,2012-08-16,1
2,26,Charlestown,Lakeview Charlestown Early Learning Center,2012-12-31,2
3,101,Glocester,Pinewood Park School,2014-08-18,3
4,206,Providence,Rhode Island School for the Deaf,2014-08-27,5
5,107,Hopkinton,Trinity Lutheran Preschool,2014-08-31,4
6,33,Coventry,Washington Oak Elementary,2014-09-18,1
7,34,Coventry,Western Coventry Elementary,2014-09-18,2
8,290,West Greenwich,The Greene School - Building 1,2014-09-25,7
9,136,Middletown,Silveira Kindergarten & Nursery School,2015-08-30,1


In [5]:
## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## WORK IN PROGRESS
## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Need to bring in Census data by School district
#
# From there we can use these to predict lead concentrations.
#
# Once we explore what data gives us the best fit, we can apply
# the model to data outside of Rhode Island (preferably in other
# states in New England, to reduce unobservable regional effects)

fn_census = "../../../data/census/config.txt"
with open(fn_census, 'r+') as f:
    census_api_key = f.readlines()[0]

SCHOOL_DISTRICTS = ['elementary', 'secondary', 'unified'] # https://api.census.gov/data/2017/acs/acs5/geography.html

DATA_SERIES = {
    'B22003_002E' : 'Household received Food Stamps/SNAP in the past 12 months',
    'B19001_001E' : 'Total households',
    'B19001_002E' : 'Households with income less than $10k',
    'B19001_003E' : 'Households with income $10k-14,999',
    'B15003_017E' : 'High school diploma, 25+ years old',
    'B15003_001E' : 'Whole population 25+ years old'

}

census_api_req = 'https://api.census.gov/data/2017/acs/acs5?get=NAME,{}&for=school%20district%20({}):*&in=state:44&key={}'
# fips for Rhode Island = 44


#for district_type in SCHOOL_DISTRICTS:
#for series in DATA_SERIES:
data = requests.get(census_api_req.format('B19001_002E', 'unified', census_api_key)).content

data.decode('utf-8').split('\n')

['[["NAME","B19001_002E","state","school district (unified)"],',
 '["Chariho Regional School District, Rhode Island","164","44","00150"],',
 '["Cumberland School District, Rhode Island","675","44","00270"],',
 '["Barrington School District, Rhode Island","179","44","00030"],',
 '["Bristol-Warren Regional School District, Rhode Island","808","44","00065"],',
 '["Central Falls School District, Rhode Island","847","44","00120"],',
 '["Burrillville School District, Rhode Island","252","44","00090"],',
 '["Coventry School District, Rhode Island","404","44","00210"],',
 '["Cranston School District, Rhode Island","1666","44","00240"],',
 '["East Greenwich School District, Rhode Island","214","44","00300"],',
 '["East Providence School District, Rhode Island","1504","44","00330"],',
 '["Johnston School District, Rhode Island","654","44","00540"],',
 '["Lincoln School District, Rhode Island","236","44","00570"],',
 '["Middletown School District, Rhode Island","353","44","00630"],',
 '["Narragan