**Water Quality Units Analysis**

I need to make sure that the units of measure are comparable when comparing the state regulations to the measured values. To do this, I'll need to join the lab results table with the state regulations table and identify places where the units are a mismatch.  Initially, I want to look at all of the lab results that match with the contaminants from the state\_regulations table. In the cleaning phase, the naming conventions from the State Regulations should have been changed to match those in the lab results file.

In [33]:
SELECT  
    DISTINCT 
    parameter, 
    units
FROM    
    lab_results 
WHERE 
    units <> 'mg/L'
GROUP BY parameter, units

parameter,units
Specific Conductance,uS/cm@25 -¦C
"1,2-Dichlorobenzene",ug/L
"1,3-Dichlorobenzene",ug/L
Chlorobenzene,ug/L
Dissolved Beryllium,ug/L
Dissolved Copper,ug/L
Total Iron,ug/L
Total Selenium,ug/L
Organic Phosphorus Pesticides (OPP),ug/L
"2,4-D",ug/L


In [28]:
SELECT 
    *
FROM 
    state_regulations

Contaminant,State_MCL,State_DLR,State_PHG,PHG_Date,Federal_MCL,Federal_MCLG,Units
"1,1,1-Trichloroethane",0.2,0.0005,1.0,2006.0,0.2,0.2,mg/L
"1,1,2,2-Tetrachloroethane",0.001,0.0005,0.0001,2003.0,0.1,0.1,mg/L
"1,1,2-Trichloroethane",0.005,0.0005,0.0003,2006.0,0.005,0.003,mg/L
"1,1,2-Trichlorotrifluoroethane",1.2,0.01,4.0,2011.0,,,mg/L
"1,1-Dichloroethane",0.005,0.0005,0.003,2003.0,,,mg/L
"1,1-Dichloroethene",0.006,0.0005,0.01,1999.0,0.007,0.007,mg/L
"1,2,3-Trichloropropane",5e-06,5e-06,7e-07,2009.0,,,mg/L
"1,2,4-Trichlorobenzene",0.005,0.0005,0.005,1999.0,0.07,0.07,mg/L
"1,2-Dibromo-3-chloropropane (DBCP)",0.0002,1e-05,3e-06,2020.0,0.0002,0.0,mg/L
"1,2-Dichlorobenzene",0.6,0.0005,0.6,2009.0,0.6,0.6,mg/L


In [38]:
SELECT  
        l.station_type,
        l.station_id, 
        l.county_name,
        l.parameter, 
        l.result, 
        l.units,
        s.State_MCL,
        s.units
FROM    lab_results l 
        INNER JOIN 
        state_regulations s
                ON l.parameter = s.contaminant
WHERE 
    parameter = 'Dissolved Nitrate'

station_type,station_id,county_name,parameter,result,units,State_MCL,units.1
Groundwater,13510,Alameda,Dissolved Nitrate,9.0,mg/L,1,mg/L as N
Groundwater,13510,Alameda,Dissolved Nitrate,8.0,mg/L,1,mg/L as N
Groundwater,13510,Alameda,Dissolved Nitrate,10.0,mg/L,1,mg/L as N
Groundwater,13510,Alameda,Dissolved Nitrate,9.0,mg/L,1,mg/L as N
Groundwater,13511,Alameda,Dissolved Nitrate,16.0,mg/L,1,mg/L as N
Groundwater,13511,Alameda,Dissolved Nitrate,12.0,mg/L,1,mg/L as N
Groundwater,13513,Alameda,Dissolved Nitrate,26.0,mg/L,1,mg/L as N
Groundwater,13513,Alameda,Dissolved Nitrate,23.0,mg/L,1,mg/L as N
Groundwater,13513,Alameda,Dissolved Nitrate,20.0,mg/L,1,mg/L as N
Groundwater,13513,Alameda,Dissolved Nitrate,23.0,mg/L,1,mg/L as N


Well this is Fun!  

The Nitrate in mg/L vs. Nitrate as N in mg/L

The nitrate N is the nitrogen Ion in the Nitrate Ion, and the N only makes up 22.6% of the Nitrate ion. 

For measurements made as Dissolved Nitrate in mg/L, these can be converted to Nitrate Nitrogen = Nitrate x 0.226

All of the state/federal standards were recorded in Nitrate AS N, and likewise for Nitrite. Since there were no recorded Nitrites as just Dissolved Nitrite, these will not need to be converted. 

Unfortunately, this is not an adjustment that can be made on the state regulations, since these were field/lab results recorded at different stations and times. So any Dissolved Nitrate with units mg/L will have to be converted to that value x 0.226 with new units of mg/L as N.

In [39]:
SELECT  
    l.station_type, 
    l.county_name,
    l.parameter, 
    l.result, 
    l.units,
    s.state_MCL,
    s.units
FROM    
    lab_results l 
    INNER JOIN 
    state_regulations s
        ON l.parameter = s.contaminant
WHERE 
    l.units = s.units

station_type,county_name,parameter,result,units,state_MCL,units.1
Surface Water,Alameda,Dissolved Mercury,0.0,mg/L,0.002,mg/L
Surface Water,Alameda,Dissolved Nickel,0.002,mg/L,0.1,mg/L
Surface Water,Alameda,Dissolved Nitrate + Nitrite,0.05,mg/L as N,1.0,mg/L as N
Surface Water,Alameda,Dissolved Selenium,0.0,mg/L,0.05,mg/L
Surface Water,Alameda,Dissolved Aluminum,0.0,mg/L,1.0,mg/L
Surface Water,Alameda,Dissolved Antimony,0.0,mg/L,0.006,mg/L
Surface Water,Alameda,Dissolved Arsenic,0.002,mg/L,0.01,mg/L
Surface Water,Alameda,Dissolved Barium,0.059,mg/L,1.0,mg/L
Surface Water,Alameda,Dissolved Beryllium,0.0,mg/L,0.004,mg/L
Surface Water,Alameda,Dissolved Cadmium,0.0,mg/L,0.005,mg/L


The above query is looking at all of the rows that have both matching parameters and units. These are ready to be analyzed and no changes need to be made. However, I also need to look at those where the units do not match and make changes to the standards table to account for these changes. 

Furthermore, I will have to deal with the nitrate situation within the lab results using a calculated field.

In [41]:
SELECT  
        l.station_type, 
        l.county_name,
        l.parameter, 
        l.result, 
        l.units,
        s.state_MCL,
        s.units
FROM    lab_results l 
        INNER JOIN 
        state_regulations s
                ON l.parameter = s.contaminant
WHERE 
    l.units <> s.units


station_type,county_name,parameter,result,units,state_MCL,units.1
Surface Water,Alameda,"1,2,3-Trichloropropane",0.0,ug/L,5e-06,mg/L
Surface Water,Alameda,"1,2,4-Trichlorobenzene",0.0,ug/L,0.005,mg/L
Surface Water,Alameda,"1,2-Dibromo-3-chloropropane (DBCP)",0.0,ug/L,0.0002,mg/L
Surface Water,Alameda,"1,2-Dichlorobenzene",0.0,ug/L,0.6,mg/L
Surface Water,Alameda,"1,2-Dichloroethane",0.0,ug/L,0.0005,mg/L
Surface Water,Alameda,"1,2-Dichloropropane",0.0,ug/L,0.005,mg/L
Surface Water,Alameda,"1,4-Dichlorobenzene",0.0,ug/L,0.005,mg/L
Surface Water,Alameda,Benzene,0.0,ug/L,0.001,mg/L
Surface Water,Alameda,Carbon tetrachloride,0.0,ug/L,0.0005,mg/L
Surface Water,Alameda,Chlorobenzene,0.0,ug/L,0.07,mg/L


There are nearly as many that have the incorrect units as that have correct units. I'm going to group the parameters and units distinctly to find out which ones are the mismatched to determine where to make adjustments in the Descontaminate file. 

Because the state regulations are not going to change each time there is a new measurement, and the data gathering is more likely going to continue with the same units, it seems more reasonable to adapt the state\_regulations to measurements in ug/L for all that were gathered in mg/L, so we don't have to change the raw data each time it is collected. 

Before making any changes, I want to look at all of the state/federal regulation values again.

In [67]:
SELECT  
        l.parameter, 
        l.units,
        s.units
FROM    lab_results l 
        INNER JOIN 
        state_regulations s
                ON l.parameter = s.contaminant
GROUP BY 
    l.parameter,
    l.units, 
    s.units 
ORDER BY 
    l.parameter

parameter,units,units.1
"1,1,1-Trichloroethane",ug/L,mg/L
"1,1,2,2-Tetrachloroethane",ug/L,mg/L
"1,1,2-Trichloroethane",ug/L,mg/L
"1,1,2-Trichlorotrifluoroethane",ug/L,mg/L
"1,1-Dichloroethane",ug/L,mg/L
"1,1-Dichloroethene",ug/L,mg/L
"1,2,3-Trichloropropane",ug/L,mg/L
"1,2,4-Trichlorobenzene",ug/L,mg/L
"1,2-Dibromo-3-chloropropane (DBCP)",ug/L,mg/L
"1,2-Dichlorobenzene",ug/L,mg/L


1. Dissolved Uranium: mg/L as N vs pCi/L - either of these should be in ug/L, mg/L as N makes no sense for uranium
2. mg/L can be converted to pCi/L by multiplying x 0.67 - FIXED
3. Dissolved Strontium: mg/L vs pCi/L
4. EPA states 4 mg/L for Strontium maximum - FIXED
5. Dissolved Nitrate: mg/L (and not in mg/L as N)  -- factor of x 0.226
6. 2,3,7,8-Tetratchlorodibenzo... : pg/L vs mg/L -- factor of 10^9 - FIXED
7. Dissolved Mercury: ng/L vs mg/L -- factor of 10^6; this one is more complicated - there are measurements in ug, ng, and mg/L == set this to ug/L
8. bis(2-Ethyhexyl) pthlalate:  mg/Kg vs mg/L  -- this should be the same if it is Kg of h2o??? -- FIXED
9. See below for the list of ug/L vs mg/L -- factor of 1000
  
Duplicates with different units: 
1. bis(2-ethylhexyl) phthalate in both mg/Kg and ug/L
2. Dissolved Aluminum in mg and ug
3. Dissolved Antimony in mg and ug
4. Dissolved Arsenic in mg and ug
5. Dissolved Barium in mg and ug
6. Dissolved beryllium, cadmium, and copper, lead in mg and ug
7. Dissolved Mercury in mg, ug, and ng
8.

In [65]:
SELECT  
        l.parameter
FROM    lab_results l 
        INNER JOIN 
        state_regulations s
                ON l.parameter = s.contaminant
WHERE 
    l.units = 'mg/L'
    AND 
    s.units = 'mg/L'
GROUP BY 
    l.parameter,
    l.units, 
    s.units 
ORDER BY 
    l.parameter

parameter
Cyanide
Dissolved Aluminum
Dissolved Antimony
Dissolved Arsenic
Dissolved Barium
Dissolved Beryllium
Dissolved Cadmium
Dissolved Copper
Dissolved Fluoride
Dissolved Lead


Next Step - I have a curtailed dataset, regulated_contaminants, that only has the inner joined data from lab results. This is 7x smaller than the original set. I need to do the conversions to rectify the discrepancies before we can put this into Tableau. 