## PEI Drinking Water Quality Summary

Using the data from [OD0039 Drinking Water Quality Summary Results](https://data.princeedwardisland.ca/Environment-and-Food/OD0039-Drinking-Water-Quality-Summary-Results/jq4v-y6dv) we will first explore the data.

From the Open Data Portal:

>This Data set provides detailed information about the quality of drinking water. The application provides a summary of the on-going testing of drinking water done by the Prince Edward Island Analytical Laboratories.

In [1]:
%matplotlib inline
# Dependencies.
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

# Style.
from matplotlib import style
style.use('fivethirtyeight')

In [2]:
# Data.
df = pd.read_csv('../resources/OD0039_Drinking_Water_Quality_Summary_Results.csv')
df.head()

Unnamed: 0,ID,Sample No,Watershed,Community,VMV Code,Value Flag,Value,Lab Analysis Date,Sample Date,Year-Month-Day,Year,Units,Variable Name,Variable Group,Method Description
0,255000001,141119020,West River,Darlington,7,,10.8,11/25/2014 12:00:00 AM,11/19/2014 12:00:00 AM,2014-11-19,2014,(mg/L),Alk Total,general chemistry,colourimetry
1,255000002,141119020,West River,Darlington,140,L,4.0,11/25/2014 12:00:00 AM,11/19/2014 12:00:00 AM,2014-11-19,2014,(µg/L),"Arsenic, dissolved, MS",general chemistry,Mass Spec
2,255000003,141119020,West River,Darlington,156,,9.24,11/25/2014 12:00:00 AM,11/19/2014 12:00:00 AM,2014-11-19,2014,(mg/L),"Calcium, dissolved, OES",general chemistry,OES
3,255000004,141119020,West River,Darlington,5,,15.5,11/25/2014 12:00:00 AM,11/19/2014 12:00:00 AM,2014-11-19,2014,(mg/L),Chloride,general chemistry,colourimetry
4,255000005,141119020,West River,Darlington,147,L,9.0,11/25/2014 12:00:00 AM,11/19/2014 12:00:00 AM,2014-11-19,2014,(µg/L),"Iron, dissolved, MS",general chemistry,Mass Spec


In [3]:
df.nunique()

ID                    162443
Sample No              10870
Watershed                229
Community                536
VMV Code                  19
Value Flag                 2
Value                   8328
Lab Analysis Date        975
Sample Date             1264
Year-Month-Day          1264
Year                       6
Units                      3
Variable Name             19
Variable Group             3
Method Description         6
dtype: int64

In [4]:
# Show VMW Code refers to the Variable Name.
df.loc[df['Variable Name'] == 'Alk Total']['VMV Code'].unique()

array([7], dtype=int64)

In [5]:
var_names = df['Variable Name'].unique().tolist()
var_codes = []
for var in var_names:
    var_codes.append(df.loc[df['Variable Name'] == var]['VMV Code'].unique().tolist()[0])
df_codes = pd.DataFrame({'Variable Name': var_names, 'VMV Code': var_codes})
df_codes

Unnamed: 0,Variable Name,VMV Code
0,Alk Total,7
1,"Arsenic, dissolved, MS",140
2,"Calcium, dissolved, OES",156
3,Chloride,5
4,"Iron, dissolved, MS",147
5,"Magnesium, dissolved, OES",157
6,"Manganese, dissolved",149
7,Nitrate-N,13
8,pH (chem lab),18
9,"Phosphorus, dissolved, OES",161


In [6]:
# Check counts for each of the Variables.
for var in var_names:
    print(df[df['Variable Name'] == var]['ID'].count())

10665
10274
10681
10668
10299
10680
10295
10781
10673
10676
10676
10205
10688
10678
10584
1612
1631
673
4


In [7]:
# 'Ammonia-N' variable only appears 4 times.
df[df['Variable Name'] == 'Ammonia-N']

Unnamed: 0,ID,Sample No,Watershed,Community,VMV Code,Value Flag,Value,Lab Analysis Date,Sample Date,Year-Month-Day,Year,Units,Variable Name,Variable Group,Method Description
145829,255145826,190613063,Charlottetown,Not indicated on request form,102,L,0.1,06/18/2019 12:00:00 AM,06/13/2019 12:00:00 AM,2019-06-13,2019,(mg/L),Ammonia-N,general chemistry,colourimetry
157973,255157974,190918059,Charlottetown,Charlottetown,102,L,0.1,10/02/2019 12:00:00 AM,09/18/2019 12:00:00 AM,2019-09-18,2019,(mg/L),Ammonia-N,general chemistry,colourimetry
159185,255159186,190927038,Charlottetown,Charlottetown,102,,0.154,10/22/2019 12:00:00 AM,09/26/2019 12:00:00 AM,2019-09-26,2019,(mg/L),Ammonia-N,general chemistry,colourimetry
160670,255160671,191011044,Charlottetown,Charlottetown,102,,0.124,10/22/2019 12:00:00 AM,10/10/2019 12:00:00 AM,2019-10-10,2019,(mg/L),Ammonia-N,general chemistry,colourimetry


In [8]:
# Random sample checking.
df[df['Sample No'] == 190613063]

Unnamed: 0,ID,Sample No,Watershed,Community,VMV Code,Value Flag,Value,Lab Analysis Date,Sample Date,Year-Month-Day,Year,Units,Variable Name,Variable Group,Method Description
145828,255145825,190613063,Charlottetown,Not indicated on request form,7,L,8.0,06/25/2019 12:00:00 AM,06/13/2019 12:00:00 AM,2019-06-13,2019,(mg/L),Alk Total,general chemistry,colourimetry
145829,255145826,190613063,Charlottetown,Not indicated on request form,102,L,0.1,06/18/2019 12:00:00 AM,06/13/2019 12:00:00 AM,2019-06-13,2019,(mg/L),Ammonia-N,general chemistry,colourimetry
145830,255145827,190613063,Charlottetown,Not indicated on request form,140,L,0.1,06/24/2019 12:00:00 AM,06/13/2019 12:00:00 AM,2019-06-13,2019,(µg/L),"Arsenic, dissolved, MS",general chemistry,Mass Spec
145831,255145828,190613063,Charlottetown,Not indicated on request form,143,L,0.2,07/05/2019 12:00:00 AM,06/13/2019 12:00:00 AM,2019-06-13,2019,(µg/L),"Cadmium, dissolved, MS",general chemistry,Mass Spec
145832,255145829,190613063,Charlottetown,Not indicated on request form,156,L,0.2,06/24/2019 12:00:00 AM,06/13/2019 12:00:00 AM,2019-06-13,2019,(mg/L),"Calcium, dissolved, OES",general chemistry,OES
145833,255145830,190613063,Charlottetown,Not indicated on request form,5,,1.9,06/25/2019 12:00:00 AM,06/13/2019 12:00:00 AM,2019-06-13,2019,(mg/L),Chloride,general chemistry,colourimetry
145834,255145831,190613063,Charlottetown,Not indicated on request form,144,L,0.2,07/05/2019 12:00:00 AM,06/13/2019 12:00:00 AM,2019-06-13,2019,(µg/L),"Chromium, dissolved, MS",general chemistry,Mass Spec
145835,255145832,190613063,Charlottetown,Not indicated on request form,147,L,2.0,06/24/2019 12:00:00 AM,06/13/2019 12:00:00 AM,2019-06-13,2019,(µg/L),"Iron, dissolved, MS",general chemistry,Mass Spec
145836,255145833,190613063,Charlottetown,Not indicated on request form,157,L,0.1,06/24/2019 12:00:00 AM,06/13/2019 12:00:00 AM,2019-06-13,2019,(mg/L),"Magnesium, dissolved, OES",general chemistry,OES
145837,255145834,190613063,Charlottetown,Not indicated on request form,149,,1.4,06/24/2019 12:00:00 AM,06/13/2019 12:00:00 AM,2019-06-13,2019,(µg/L),"Manganese, dissolved",general chemistry,Manganese
