This starts the next phase of the cleaning. We are now working with the cleaned State Regulations table, but there remain a group of contaminants in the lab results that have more than one set of units.

In [4]:
SELECT  s.contaminant, 
        s.reg_units, 
        l.units
FROM    state_regulations s
    LEFT JOIN 
        lab_results l
    ON s.contaminant = l.parameter
WHERE s.reg_units <> l.units
GROUP BY s.contaminant, s.reg_units, l.units

contaminant,reg_units,units
Dissolved Copper,ug/L,mg/L
Dissolved Lead,ug/L,mg/L
Dissolved Selenium,ug/L,mg/L
Dissolved Mercury,ug/L,mg/L
Dissolved Barium,ug/L,mg/L
Dissolved Strontium,ug/L,mg/L
Dissolved Antimony,ug/L,mg/L
Dissolved Beryllium,ug/L,mg/L
bis(2-Ethylhexyl) phthalate,ug/L,mg/Kg
Dissolved Uranium,ug/L,mg/L as N


Most of these are an easy fix - they are a conversion difference from mg/L to ug/L, which is a multiple of 1000. These are shown below:

In [5]:
SELECT  s.contaminant, 
        s.reg_units, 
        l.units
FROM    state_regulations s
    LEFT JOIN 
        lab_results l
    ON  s.contaminant = l.parameter
WHERE   s.reg_units = 'ug/L'
    AND 
        l.units = 'mg/L'
GROUP BY s.contaminant, s.reg_units, l.units

contaminant,reg_units,units
Dissolved Aluminum,ug/L,mg/L
Dissolved Antimony,ug/L,mg/L
Dissolved Arsenic,ug/L,mg/L
Dissolved Barium,ug/L,mg/L
Dissolved Beryllium,ug/L,mg/L
Dissolved Cadmium,ug/L,mg/L
Dissolved Copper,ug/L,mg/L
Dissolved Lead,ug/L,mg/L
Dissolved Mercury,ug/L,mg/L
Dissolved Nickel,ug/L,mg/L


Those which are not listed here are: 

- bis(2-Ethylhexyl) phthalate, which has a measure in mg/Kg - which is measured in water, so can be converted the same way as the above, as 1 kg of water = 1 L of water
- Dissolved Uranium - this has units that make no sense at all. I can either assume that the intended was mg/L or disregard these measurements altogether
- Dissolved Nitrate - Many of the Nitrates were measured as mg/L. As discussed previously, these will have to be multiplied by .226 to convert to mg/L as N
- Dissolved Mercury has both measurements in mg/L as well as ng/L. For ng/L, I will have to divide those values by 1000.

In [13]:
SELECT  s.contaminant, 
        s.reg_units, 
        l.units, 
        l.result,
    CASE 
        WHEN (s.reg_units = 'ug/L' AND l.units = 'mg/L') THEN l.result * 1000 
        WHEN l.units = 'mg/Kg' THEN l.result * 1000
        WHEN (s.contaminant = 'Dissolved Uranium' AND l.units = 'mg/L as N') THEN l.result * 1000
        WHEN (s.contaminant = 'Dissolved Nitrate' AND l.units = 'mg/L') THEN l.result * 0.226
        WHEN (s.contaminant = 'Dissolved Mercury' AND l.units = 'ng/L') THEN l.result / 1000
        ELSE l.result 
    END AS Fixed_Result

FROM    state_regulations s
    LEFT JOIN 
        lab_results l
    ON  s.contaminant = l.parameter
WHERE s.contaminant = 'Dissolved Mercury' AND l.result <> 0

contaminant,reg_units,units,result,Fixed_Result
Dissolved Mercury,ug/L,mg/L,0.2,200.0
Dissolved Mercury,ug/L,mg/L,0.2,200.0
Dissolved Mercury,ug/L,mg/L,0.2,200.0
Dissolved Mercury,ug/L,mg/L,0.01,10.0
Dissolved Mercury,ug/L,mg/L,0.002,2.0
Dissolved Mercury,ug/L,ng/L,0.8,0.0008
Dissolved Mercury,ug/L,ng/L,1.6,0.0016
Dissolved Mercury,ug/L,ng/L,1.0,0.001
Dissolved Mercury,ug/L,ng/L,1.1,0.0011
Dissolved Mercury,ug/L,ng/L,1.0,0.001


All of the corrected values look normal except for the converted data from ng/L to ug/L.  These look at least 1000 times too low, which suggests that they may have been measured in ug/L or mg/L, and not acutally in ng/L. I want to see if these measures came from a small subset of counties or station IDs, and what test method that they used for these.

After looking into the EPA 1631 E(D) method of quantification of mercury, the minimum detectable limit is 0.2 ng/L, and the lab parameters suggest the minimum reporting limit as 0.5 ng/L. The reason these values look very low compared to many others is due to the sensitivity of the testing. It's possible that many of the 0 ug/L or 0 mg/L records had values that would have been reported at the ng/L level, but were rounded to 0 due to the low sensitivity of the assay.

In [18]:
SELECT  s.contaminant, 
        s.reg_units, 
        l.units, 
        l.result,
        l.*,

    CASE 
        WHEN (s.reg_units = 'ug/L' AND l.units = 'mg/L') THEN l.result * 1000 
        WHEN l.units = 'mg/Kg' THEN l.result * 1000
        WHEN (s.contaminant = 'Dissolved Uranium' AND l.units = 'mg/L as N') THEN l.result * 1000
        WHEN (s.contaminant = 'Dissolved Nitrate' AND l.units = 'mg/L') THEN l.result * 0.226
        WHEN (s.contaminant = 'Dissolved Mercury' AND l.units = 'ng/L') THEN l.result / 1000
        ELSE l.result 
    END AS Fixed_Result

FROM    state_regulations s
    LEFT JOIN 
        lab_results l
    ON  s.contaminant = l.parameter
WHERE s.contaminant = 'Dissolved Mercury' AND l.result <> 0 AND l.units = 'ng/L'

contaminant,reg_units,units,result,station_id,station_name,full_station_name,station_number,station_type,latitude,longitude,status_,county_name,sample_code,sample_date,sample_depth,sample_depth_units,parameter,result.1,reporting_limit,units.1,method_name,Fixed_Result
Dissolved Mercury,ug/L,ng/L,9.9,47675,CCSB Weir North,Cache Creek Settling Basin Overflow Weir- North,A0270001,Surface Water,38.687236,121.673675,Reviewed and Validated,Yolo,EH0117B0004,2017-01-11 10:10:00.000,1.0,Meters,Dissolved Mercury,9.9,0.5,ng/L,EPA 1631 E (D),0.0099
Dissolved Mercury,ug/L,ng/L,1.5,47675,CCSB Weir North,Cache Creek Settling Basin Overflow Weir- North,A0270001,Surface Water,38.687236,121.673675,Reviewed and Validated,Yolo,EH0317B0574,2017-03-28 08:45:00.000,1.0,Meters,Dissolved Mercury,1.5,0.5,ng/L,EPA 1631 E (D),0.0015
Dissolved Mercury,ug/L,ng/L,1.6,47676,CCSB Weir South,Cache Creek Settling Basin Overflow Weir- South,A0270002,Surface Water,38.682525,121.673456,Reviewed and Validated,Yolo,EH0317B0385,2017-03-01 11:15:00.000,1.0,Meters,Dissolved Mercury,1.6,0.5,ng/L,EPA 1631 E (D),0.0016
Dissolved Mercury,ug/L,ng/L,2.1,145,Toe Drain YB LISBON,Yolo Bypass Toe Drain Below Lisbon Weir,B9D82851352,Surface Water,38.4749,121.5883,Reviewed and Validated,Yolo,EH0217B0351,2017-02-16 13:55:00.000,1.0,Meters,Dissolved Mercury,2.1,0.5,ng/L,EPA 1631 E (D),0.0021
Dissolved Mercury,ug/L,ng/L,1.6,145,Toe Drain YB LISBON,Yolo Bypass Toe Drain Below Lisbon Weir,B9D82851352,Surface Water,38.4749,121.5883,Reviewed and Validated,Yolo,EH0417B0771,2017-04-26 16:00:00.000,1.0,Meters,Dissolved Mercury,1.6,0.5,ng/L,EPA 1631 E (D),0.0016
Dissolved Mercury,ug/L,ng/L,1.1,47768,ShagSl Bl Stairsteps,Shag Slough Below the Stairsteps,B9S81911416,Surface Water,38.31843,121.69314,Reviewed and Validated,Yolo,EH0317B0531,2017-03-16 09:05:00.000,1.0,Meters,Dissolved Mercury,1.1,0.5,ng/L,EPA 1631 E (D),0.0011
Dissolved Mercury,ug/L,ng/L,3.1,779,Putah Cr@Mace,Putah Cr@Mace,J06137534264208,Other,38.519,121.6951,Reviewed and Validated,Yolo,EH0316B0241,2016-03-15 18:15:00.000,1.0,Meters,Dissolved Mercury,3.1,0.5,ng/L,EPA 1631 E (D),0.0031
Dissolved Mercury,ug/L,ng/L,2.3,47839,LibCut ds Stairsteps,Liberty Cut downstream of the Stairsteps,B9D82001400,Surface Water,38.33334,121.667308,Reviewed and Validated,Yolo,EH0317B0460,2017-03-02 11:35:00.000,1.0,Meters,Dissolved Mercury,2.3,0.5,ng/L,EPA 1631 E (D),0.0023
Dissolved Mercury,ug/L,ng/L,4.7,145,Toe Drain YB LISBON,Yolo Bypass Toe Drain Below Lisbon Weir,B9D82851352,Surface Water,38.4749,121.5883,Reviewed and Validated,Yolo,EH0117B0044,2017-01-12 12:00:00.000,1.0,Meters,Dissolved Mercury,4.7,0.5,ng/L,EPA 1631 E (D),0.0047
Dissolved Mercury,ug/L,ng/L,0.6,47884,North Lindsey,Lindsey Slough Tidal Wetland,B9150500,Surface Water,38.210789,121.793659,Reviewed and Validated,Solano,EH0118B0064,2018-02-06 12:00:00.000,0.5,Meters,Dissolved Mercury,0.6,0.5,ng/L,EPA 1631 E (D),0.0006


Final Step before shifting this to Tableau for a Dashboard - I do not want to keep all of the data from the original lab results table, and I do want to include the new calculated results. I will create a new table and dataset for this analysis, and save it as a csv.

In [3]:
SELECT TOP 1 * 
FROM state_regulations

contaminant,state_max,state_det_limit,state_health_goal,state_health_date,federal_max,federal_max_goal,reg_units
Dissolved Aluminum,1000,50,600,2001,,,ug/L


The columns from the state\_regulations table that need to be kept in the analysis: 

- contaminant
- state\_max
- federal\_max
- reg\_units

In [4]:
SELECT TOP 1 *
FROM lab_results

station_id,station_name,full_station_name,station_number,station_type,latitude,longitude,status_,county_name,sample_code,sample_date,sample_depth,sample_depth_units,parameter,result,reporting_limit,units,method_name
8135,01S04E32C001M,01S04E32C001M,01S04E32C001M,Groundwater,37.8073,121.5617,Review Status Unknown,Alameda,WDIS_0719152,1967-05-03 09:00:00.000,,Feet,Conductance,3480,1,uS/cm,EPA 120.1


The columns that need to be maintained from the lab\_results table: 

- station\_id
- station\_name
- station\_type
- latitude
- longitude
- county\_name
- sample\_date
- parameter
- result
- units

  

Furthermore, I don't need the unregulated data, so I'll perform the join prior to the creation of the new table

In [6]:
SELECT  s.contaminant, 
        s.state_max, 
        s.federal_max,
        l.result,
        s.reg_units, 
        l.station_id,
        l.station_name,
        l.station_type, 
        l.latitude, 
        l.longitude, 
        l.county_name,  
        l.sample_date,
    CASE 
        WHEN (s.reg_units = 'ug/L' AND l.units = 'mg/L') THEN l.result * 1000 
        WHEN l.units = 'mg/Kg' THEN l.result * 1000
        WHEN (s.contaminant = 'Dissolved Uranium' AND l.units = 'mg/L as N') THEN l.result * 1000
        WHEN (s.contaminant = 'Dissolved Nitrate' AND l.units = 'mg/L') THEN l.result * 0.226
        WHEN (s.contaminant = 'Dissolved Mercury' AND l.units = 'ng/L') THEN l.result / 1000
        ELSE l.result 
    END AS Fixed_Result
FROM    state_regulations s
    LEFT JOIN 
        lab_results l
    ON  s.contaminant = l.parameter