This file is the second iteration of the regulated contaminants table: 

- It starts with queries to find inconsistencies in the data by units in the state regulations and the lab results
- Since the lab results will likely continue to be measured the way that they are, and the state regulations are fixed, it seems like a better plan to make adjustments to the state regulations by means of unit conversions to meet most of the lab results, since these will update each year. 
- Throughout this analysis, several conversions were changed and documented in the code, ultimately affecting the Decontaminator.py function
- There are still inconsistencies across the state of CA, where some labs measured things in different units than others. 
    - These will be addressed in a future set of functions 
- Ultimately, the changes were made to the Decontaminator Function, and then the Database was repopulated with the corrected data, reading from the original documentation each time to maintain the fidelity of the data. Since there were only 98 rows in the original documentation, the values were all verified to ensure that these are the correct regulatory standards with appropriate units.

In [9]:
DROP TABLE regulated_contaminants;

SELECT * INTO regulated_contaminants FROM
(    SELECT  s.contaminant, 
        s.state_max AS State_Max,
        s.federal_max AS Federal_Max,
        s.reg_units AS Reg_Units, 
        l.*
    FROM    state_regulations s
        INNER JOIN 
            lab_results l
        ON  l.parameter = s.Contaminant
) AS regulated_contaminants;

In [10]:
SELECT  Contaminant, 
        State_Max,
        Reg_Units, 
        county_name,
        result,
        units, 
        sample_date, 
        (result/State_Max) AS factor
FROM regulated_contaminants
WHERE State_Max < result
    AND 
    Reg_Units <> units
    AND units <> 'ug/L'
ORDER BY (result/State_Max) DESC;

Contaminant,State_Max,Reg_Units,county_name,result,units,sample_date,factor
Dissolved Nitrate,1,10 as N mg/L,Kings,1460.0,mg/L,2001-05-21 09:15:00.000,1460.0
Dissolved Nitrate,1,10 as N mg/L,Lassen,975.0,mg/L,1965-08-10 00:00:00.000,975.0
Dissolved Nitrate,1,10 as N mg/L,Santa Barbara,974.0,mg/L,1961-06-18 00:00:00.000,974.0
Dissolved Nitrate,1,10 as N mg/L,Kern,808.0,mg/L,1963-01-28 00:00:00.000,808.0
Dissolved Nitrate,1,10 as N mg/L,Kings,734.0,mg/L,2001-09-25 10:00:00.000,734.0
Dissolved Nitrate,1,10 as N mg/L,Los Angeles,720.0,mg/L,1957-10-09 00:00:00.000,720.0
Dissolved Nitrate,1,10 as N mg/L,Ventura,700.0,mg/L,1965-10-02 00:00:00.000,700.0
Dissolved Nitrate,1,10 as N mg/L,Ventura,650.0,mg/L,1966-04-21 00:00:00.000,650.0
Dissolved Nitrate,1,10 as N mg/L,Fresno,650.0,mg/L,1966-07-12 00:04:00.000,650.0
Dissolved Arsenic,10,ug/L,Sacramento,5880.0,mg/L,1991-10-18 09:00:00.000,588.0


In [11]:
SELECT  Contaminant, 
        State_Max,
        Reg_Units, 
        county_name,
        result,
        units, 
        sample_date, 
        (result/State_Max) AS factor
FROM regulated_contaminants
WHERE Contaminant = 'Dissolved Uranium'

ORDER BY (result/State_Max) DESC;

Contaminant,State_Max,Reg_Units,county_name,result,units,sample_date,factor
Dissolved Uranium,20,ug/L,Kings,8.47,mg/L as N,2000-01-10 09:10:00.000,0.4235
Dissolved Uranium,20,ug/L,Kings,7.83,mg/L as N,2000-01-11 11:15:00.000,0.3915
Dissolved Uranium,20,ug/L,Kings,7.49,mg/L as N,2000-01-11 10:20:00.000,0.3745


In [1]:
SELECT * 
FROM lab_results
WHERE parameter LIKE '%Uranium%'

station_id,station_name,full_station_name,station_number,station_type,latitude,longitude,status_,county_name,sample_code,sample_date,sample_depth,sample_depth_units,parameter,result,reporting_limit,units,method_name
5553,TD VGD3906,TILE DRAIN VGD3906,VGD3906,Other,,,Review Status Unknown,Kings,FSZ0100B0527,2000-01-10 09:10:00.000,1.0,Meters,Dissolved Uranium,8.47,0.05,mg/L as N,EPA 300.0 28d Hold
4323,TD ERR7525,TILE DRAIN ERR7525,ERR7525,Other,,,Review Status Unknown,Kings,FSZ0100B0518,2000-01-11 10:20:00.000,1.0,Meters,Dissolved Uranium,7.49,0.05,mg/L as N,EPA 300.0 28d Hold
5408,TD GSY0855,TILE DRAIN GSY0855,GSY0855,Other,,,Review Status Unknown,Kings,FSZ0100B0521,2000-01-11 11:15:00.000,1.0,Meters,Dissolved Uranium,7.83,0.05,mg/L as N,EPA 300.0 28d Hold


In [3]:
SELECT * 
FROM state_regulations
WHERE Contaminant LIKE '%Uranium%'

Contaminant,State_MCL,State_DLR,State_PHG,PHG_Date,Federal_MCL,Federal_MCLG,Units
Dissolved Uranium,20,1,0.43,2001,30,0,pCi/L


Changes have been made to the units in the state-regulations database, so this database will be dropped and repopulated

In [5]:
DROP TABLE dbo.state_regulations;

CREATE TABLE state_regulations (
	contaminant VARCHAR(250) NOT NULL, 
	state_max FLOAT NOT NULL, 
	state_det_limit FLOAT, 
	state_health_goal FLOAT, 
	state_health_date INT, 
	federal_max FLOAT,
	federal_max_goal FLOAT, 
	reg_units VARCHAR(50)
);


In [6]:
BULK INSERT dbo.state_regulations
FROM "C:\Users\justi\OneDrive\Desktop\Analytics\Water_Quality\Data\state_regulations.csv"
WITH 
(
	FORMAT = 'CSV',
	FIRSTROW = 2
)
GO
;

In [7]:
SELECT * 
FROM state_regulations;

contaminant,state_max,state_det_limit,state_health_goal,state_health_date,federal_max,federal_max_goal,reg_units
Dissolved Aluminum,1000.0,50.0,600.0,2001.0,,,ug/L
Dissolved Antimony,6.0,6.0,1.0,2016.0,6.0,6.0,ug/L
Dissolved Arsenic,10.0,2.0,0.004,2004.0,10.0,0.0,ug/L
"Asbestos, Chrysotile",7.0,0.2,7.0,2003.0,7.0,7.0,MFL
Dissolved Barium,1000.0,100.0,2000.0,2003.0,2000.0,2000.0,ug/L
Dissolved Beryllium,4.0,1.0,1.0,2003.0,4.0,4.0,ug/L
Dissolved Cadmium,5.0,1.0,0.04,2006.0,5.0,5.0,ug/L
Total Chromium,50.0,10.0,,1999.0,100.0,100.0,ug/L
Cyanide,0.15,0.1,0.15,1997.0,0.2,0.2,mg/L
Dissolved Fluoride,2.0,0.1,1.0,1997.0,4.0,4.0,mg/L


Now that the State Regulations database has been updated to have the most consistent naming and units, it's time to figure out which of the lab measurements are consistent or problematic

In [4]:
SELECT  s.contaminant, 
        s.reg_units, 
        l.units
FROM    state_regulations s
    INNER JOIN 
        lab_results l
    ON  l.parameter = s.Contaminant
WHERE   reg_units <> units
GROUP BY s.contaminant, s.reg_units, l.units 
ORDER BY contaminant

contaminant,reg_units,units
"2,3,7,8-Tetrachlorodibenzo-p-dioxin",ug/L,pg/L
bis(2-Ethylhexyl) phthalate,ug/L,mg/Kg
Dissolved Aluminum,ug/L,mg/L
Dissolved Antimony,ug/L,mg/L
Dissolved Arsenic,ug/L,mg/L
Dissolved Barium,ug/L,mg/L
Dissolved Beryllium,ug/L,mg/L
Dissolved Cadmium,ug/L,mg/L
Dissolved Copper,ug/L,mg/L
Dissolved Lead,ug/L,mg/L


There appear to be 23 mismatched unit combinations. Next, I want to see the total number of combinations, and see if any of these could be adjusted in the state regulations table before this - previously, only Fluoride and Cyanide were adjusted.

In [5]:
SELECT  s.contaminant, 
        s.reg_units, 
        l.units
FROM    state_regulations s
    INNER JOIN 
        lab_results l
    ON  l.parameter = s.Contaminant
GROUP BY s.contaminant, s.reg_units, l.units 
ORDER BY contaminant;


contaminant,reg_units,units
"1,1,1-Trichloroethane",ug/L,ug/L
"1,1,2,2-Tetrachloroethane",ug/L,ug/L
"1,1,2-Trichloroethane",ug/L,ug/L
"1,1,2-Trichlorotrifluoroethane",ug/L,ug/L
"1,1-Dichloroethane",ug/L,ug/L
"1,1-Dichloroethene",ug/L,ug/L
"1,2,3-Trichloropropane",ug/L,ug/L
"1,2,4-Trichlorobenzene",ug/L,ug/L
"1,2-Dibromo-3-chloropropane (DBCP)",ug/L,ug/L
"1,2-Dichlorobenzene",ug/L,ug/L


Most of these results that don't have matching units have some measurements made in ug/L with others in mg/L. These are

- Aluminum
- Antimony
- Arsenic
- Barium
- Beryllium
- bis(2-Ethylhexyl) pthalate
- Cadmium
- Copper
- Lead
- Nickel
- Selenium
- Thallium
- Total Chromium

The others that can be fixed in the db: 

- 2,3,7,8 TpD - this is in pg/L, which is 10^-6 from ug/L
- Nitrite - just need to change the naming to mg/L as N
- Nitrate + Nitrite - need to change units to mg/L as N and factor of 10

More complicated issues: 

- Nitrate in the lab results is mostly recorded as mg/L, but these can be converted to mg/L as N
    - It seems that each station measured one way or the other - this will have to be adjusted on the lab end
    - On the state regs end, we need to adjust values by a factor of 10
- Mercury 
    - This has measurements in ng/L, ug/L and mg/L
    - Can fix these with an extra if statement in the lab results cleanup

In [19]:
SELECT  s.contaminant, 
        s.reg_units, 
        l.units, 
        l.station_id
FROM    state_regulations s
    INNER JOIN 
        lab_results l
    ON  l.parameter = s.contaminant
WHERE   s.contaminant = 'Dissolved Nitrate' 
    AND  
        l.units = 'mg/L'
GROUP BY l.station_id, s.contaminant, s.reg_units, l.units
ORDER BY l.station_id
;

contaminant,reg_units,units,station_id
Dissolved Nitrate,10 as N mg/L,mg/L,1
Dissolved Nitrate,10 as N mg/L,mg/L,3
Dissolved Nitrate,10 as N mg/L,mg/L,7
Dissolved Nitrate,10 as N mg/L,mg/L,10
Dissolved Nitrate,10 as N mg/L,mg/L,12
Dissolved Nitrate,10 as N mg/L,mg/L,13
Dissolved Nitrate,10 as N mg/L,mg/L,73
Dissolved Nitrate,10 as N mg/L,mg/L,74
Dissolved Nitrate,10 as N mg/L,mg/L,75
Dissolved Nitrate,10 as N mg/L,mg/L,78


- To convert the Nitrate measured as mg/L, these values will be multiplied be 0.226 to get the new units of mg/L as N.
    - Why? N is the nitrogen ion present in Nitrate, and it makes up only 22.6% of the molecular weight of nitrate
    - Since the nitrite and nitrate + nitrite measures are in mg/L as N, this is the convention I will stick with.

Further note: 

> The 10 as N is 10 mg/L as N, which is equivalent to 44.28 mg/L of **Nitrate**
> 
> This comes from **10mg N \* (62mg NO3/mmol N) / (14mg N/mmol N)  = 44.28 mg NO3**
> 
> So to convert the NO3 in mg/L, it will simply be the inverse of this...
> 
> X mg NO3 \* (14mg N/mmol N) / (62mg NO3/mmol NO3) = \_\_\_\_ mg N
> 
> **The conversion factor for mg Nitrate to mg N is: 0.2258**
> 
> By stating the State\_MCL is 1.00 in units of   10 as N, then the maximum permissible concentration is 10 mg N, which is the case for Nitrate, and thus 44.28 mg NO3
> 
> For **Nitrite (NO2)**, the measure was 1 as N, which would mean 1 mg is the maximum permissible.
> 
> The conversion for Nitrite is unneceesary since the units match, but the measure of Nitrate + Nitrite is in 10 as N, so the maximum permissible is 10 mg of N between the two of them. Since we don't know the concentration of them, it isn't possible to calculate an accurate measure, but this seems to be looking for the total dissolved Nitrogen, and the max of total dissolved Nitrogen is 10 mg for the State\_MCL. 
> 
> To fix the discrepancies in the table, the Dissolved Nitrate and Dissolved Nitrate + Nitrite needs to be multiplied by 10, so that the units can remain as mg/L as N

- Converting pCi/L to mg/L: 
    - mg/L \* 2/3 = pCi/L 
    - Are any of the lab recorded results in pCi/L? Apparently not (see query below)
    - This is another conversion that can be made to the state regs, which will be pCi/L \* 1.5 = mg/L

In [21]:
SELECT  s.contaminant, 
        s.reg_units, 
        l.units, 
        l.station_id
FROM    state_regulations s
    INNER JOIN 
        lab_results l
    ON  l.parameter = s.contaminant
WHERE   l.units = 'pCi/L' 
;

contaminant,reg_units,units,station_id


Which measures in the state regulations need to be converted?

In [22]:
SELECT  contaminant, 
        reg_units
FROM    state_regulations
WHERE   reg_units = 'pCi/L';

contaminant,reg_units
Gross Alpha Particle,pCi/L
Radium-226 + Radium-228,pCi/L
Dissolved Strontium,pCi/L
Tritium,pCi/L


Only 4 of these are in this measure, though I only believe this will affect Dissolved Strontium, because the other measurements were not collected.

In [24]:
SELECT  s.contaminant, 
        s.reg_units, 
        l.units, 
        AVG(l.result)
FROM    state_regulations s
    INNER JOIN 
        lab_results l
    ON  l.parameter = s.contaminant
WHERE   s.reg_units = 'pCi/L' 
GROUP BY s.contaminant, s.reg_units, l.units;

contaminant,reg_units,units,(No column name)
Dissolved Strontium,pCi/L,mg/L,2.9099984478075283
Dissolved Strontium,pCi/L,ug/L,123.42037037037036


To maintain consistency with the rest of the measurements, I will convert the Dissolved Strontium into ug/L rather than mg/L. This only multiplies by an additional factor of 1000; 

pCi/L \* 1500 = ug/L

This will be applied in the state regulation table

  

\----

  

At this point, I will update the database, so this will be the end of this set of queries:

In [None]:
DROP TABLE dbo.state_regulations;

CREATE TABLE state_regulations (
	contaminant VARCHAR(250) NOT NULL, 
	state_max FLOAT NOT NULL, 
	state_det_limit FLOAT, 
	state_health_goal FLOAT, 
	state_health_date INT, 
	federal_max FLOAT,
	federal_max_goal FLOAT, 
	reg_units VARCHAR(50)
);


In [None]:
BULK INSERT dbo.state_regulations
FROM "C:\Users\justi\OneDrive\Desktop\Analytics\Water_Quality\Data\state_regulations.csv"
WITH 
(
	FORMAT = 'CSV',
	FIRSTROW = 2
)
GO
;