**Water Quality Queries**

Query to see how many Federal Maximum Levels are lower than the California Maximum Levels. 

This should return an empty set, because state restrictions are superceded by federal, and CA tends to be more conservative on its standards than the federal government.

In [32]:
SELECT * 
FROM dbo.state_regulations
WHERE federal_max_level < state_max_level;

contaminant,state_max_level,state_detection_limit,state_health_goal,state_health_date,federal_max_level,federal_max_level_goal,units


As expected, this returned an empty set. Next, I'd like to see how many state standards are lower than the federal standards.

This is a more complicated question than it seems for someone new to SQL. The following query is rather simple, showing WHICH state standards are lower than the federal standards, and since this is a rather short list, we can see that the number is 27. 

The original question only asked HOW MANY of the contaminants have lower state standards than federal, and if this was a much larger dataset, it would be cumbersome to report the output based on the rows affected rather than querying for the intended output.

In [33]:
SELECT  contaminant, 
        state_max_level, 
        federal_max_level
FROM dbo.state_regulations
WHERE federal_max_level > state_max_level;

contaminant,state_max_level,federal_max_level
Barium,1.0,2.0
"Chromium, Total",0.05,0.1
Cyanide,0.15,0.2
Fluoride,2.0,4.0
Uranium,20.0,30.0
Benzene,0.001,0.005
Carbon tetrachloride,0.0005,0.005
"1,4-Dichlorobenzene(p-DCB)",0.005,0.075
"1,2-Dichloroethane (1,2-DCA)",0.0005,0.005
"1,1-Dichloroethylene (1,1-DCE)",0.006,0.007


**Method 1: Using a Subquery**

What we really want to do is Query the COUNT from this result. Simply counting the outputs will deliver the number we're looking for.

The easiest place to introduce the above query as a subquery is in the FROM statement. I'm constructing a nested query from the innermost query outward.

In [34]:
SELECT COUNT(subquery.contaminant) AS Number_of_Stricter_State_Maximums
FROM (
    SELECT  contaminant, 
            state_max_level, 
            federal_max_level
    FROM    dbo.state_regulations
    WHERE   federal_max_level > state_max_level
) AS subquery


Number_of_Stricter_State_Maximums
27


It is possible in the first line to just write SELECT COUNT(\*) AS ..., however, it's unclear that there absolutely MUST be a naming of the Subquery. For clarity, I kept the subquery.contaminant as the value I'm counting. When I personally write a query like this, I would provide a better, more descriptive name for the subquery, since as the complexity increases, so does the difficulty in readability. The code below is the same, only with the modified naming for clarity.

In [35]:
SELECT COUNT(Stricter_State_Maximums.contaminant) AS Number_of_Stricter_State_Maximums
FROM (
    SELECT  contaminant, 
            state_max_level, 
            federal_max_level
    FROM    dbo.state_regulations
    WHERE   federal_max_level > state_max_level
) AS Stricter_State_Maximums

Number_of_Stricter_State_Maximums
27


Hopefully, as a reader, it's easier to follow now that the purpose of my subquery was to isolate the contaminants and their levels where the state level was lower than the federal. Then the parent query is a COUNT function on the subset of Stricter State Maximums, revealing the Number of Stricter State Maximums.

**Method 2: Using a WITH clause, aka CTE or Common Table Expression**

The WITH Clause introduces the subset at the beginning of the statement, and it provides more clarity, as you are following the structure in a linear manner from top to bottom, rather than digging deep into nested functions and crawling back out to the SELECT clause. 

This uses the same original query with the WITH clause to establish the 'Common Table Expression'. For linearity, I will do the same thing as I did with the subquery, labeling the expression as cte first, but then giving it the more appropriate name.

In [36]:
WITH cte AS (
    SELECT  contaminant, 
            state_max_level, 
            federal_max_level
    FROM dbo.state_regulations
    WHERE federal_max_level > state_max_level
)
SELECT COUNT(contaminant) AS Number_of_Stricter_State_Maximums
FROM cte

Number_of_Stricter_State_Maximums
27


In both of these methods, the original query is literally copied into the a clause. In the first, it is part of the FROM clause, whereas in the second it is introduced initially in the WITH clause.  

As I add more complicated queries, the WITH clauses can draw from one another or from additional tables, limiting the need for multiple nestings in subqueries.

The more intuitive labeling, as promised above is as follows.

In [37]:
WITH Stricter_State_Maximums AS (
    SELECT  contaminant, 
            state_max_level, 
            federal_max_level
    FROM dbo.state_regulations
    WHERE federal_max_level > state_max_level
)
SELECT COUNT(contaminant) AS Number_of_Stricter_State_Maximums
FROM Stricter_State_Maximums

Number_of_Stricter_State_Maximums
27


**Adding to the existing Query...**

Show the number of Contaminants that have Stricter State Maximums, Stricter Federal Maximums, and Identical Maximums for State and Federal contamination levels.

I'm already cringing at the logic with the Subqueries, so I'm going to start with the WITH clause approach!

**Method 2: WITH**  

The only changes I will make to each of the different expressions is the logical operator: \<, \>, and =

In [50]:
WITH 
Stricter_State_Maximums AS (
    SELECT  contaminant, 
            state_max_level, 
            federal_max_level
    FROM dbo.state_regulations
    WHERE federal_max_level > state_max_level
), 
Stricter_Federal_Maximums AS (
    SELECT  contaminant, 
            state_max_level, 
            federal_max_level
    FROM dbo.state_regulations
    WHERE federal_max_level < state_max_level
), 
Same_State_Federal_Maximums AS (
    SELECT  contaminant, 
            state_max_level, 
            federal_max_level
    FROM dbo.state_regulations
    WHERE federal_max_level = state_max_level
)
SELECT COUNT(contaminant) AS Number_of_Levels, 'State' AS Stricter_Restriction_Levels
FROM Stricter_State_Maximums
UNION ALL 
SELECT COUNT(contaminant), 'Federal' 
FROM Stricter_Federal_Maximums
UNION ALL 
SELECT COUNT(contaminant), 'Equal'
FROM Same_State_Federal_Maximums;

Number_of_Levels,Stricter_Restriction_Levels
27,State
0,Federal
49,Equal


This is an alternative output, keeping everything in its own column. I've only done this to simplify the following subquery outcome.

In [54]:
WITH 
Stricter_State_Maximums AS (
    SELECT  contaminant, 
            state_max_level, 
            federal_max_level
    FROM dbo.state_regulations
    WHERE federal_max_level > state_max_level
), 
Stricter_Federal_Maximums AS (
    SELECT  contaminant, 
            state_max_level, 
            federal_max_level
    FROM dbo.state_regulations
    WHERE federal_max_level < state_max_level
), 
Same_State_Federal_Maximums AS (
    SELECT  contaminant, 
            state_max_level, 
            federal_max_level
    FROM dbo.state_regulations
    WHERE federal_max_level = state_max_level
)
SELECT  COUNT(SSM.contaminant) AS Number_of_Stricter_State_Restriction_Levels, 
        COUNT(SFM.contaminant) AS Number_of_Stricter_Federal_Restriction_Levels, 
        COUNT(SM.contaminant) AS Number_of_Same_Restriction_Levels
FROM    dbo.state_regulations AS SR
LEFT JOIN Stricter_State_Maximums AS SSM
    ON SR.contaminant = SSM.contaminant
LEFT JOIN Stricter_Federal_Maximums AS SFM
    ON SR.contaminant = SFM.contaminant
LEFT JOIN Same_State_Federal_Maximums AS SM
    ON SR.contaminant = SM.contaminant


Number_of_Stricter_State_Restriction_Levels,Number_of_Stricter_Federal_Restriction_Levels,Number_of_Same_Restriction_Levels
27,0,49


Since I'm drawing the same type of data, it's easy piece together an easy-to-understand output, giving the desired number of stricter state and federal maximums, and the number of equal values using 3 simialar Common Table expressions under the same WITH clause, appended together with UNION ALL.

**Method 1: Subqueries**

In [51]:
SELECT  COUNT(Stricter_State_Maximums.contaminant) AS Number_of_Stricter_State_Maximums,
        COUNT(Stricter_Federal_Maximums.contaminant) AS Number_of_Stricter_Federal_Maximums,
        COUNT(Same_State_Federal_Maximums.contaminant) AS Number_of_Same_State_Federal_Maximums
FROM dbo.state_regulations AS SR
    LEFT JOIN
    (
        SELECT  contaminant, 
                state_max_level, 
                federal_max_level
        FROM    dbo.state_regulations
        WHERE   federal_max_level > state_max_level
    ) AS Stricter_State_Maximums
    ON SR.contaminant = Stricter_State_Maximums.contaminant
    LEFT JOIN
    (
        SELECT  contaminant, 
                state_max_level, 
                federal_max_level
        FROM    dbo.state_regulations
        WHERE   federal_max_level = state_max_level
    ) AS Same_State_Federal_Maximums
        ON SR.contaminant = Same_State_Federal_Maximums.contaminant
    LEFT JOIN
    (
        SELECT  contaminant, 
                state_max_level, 
                federal_max_level
        FROM    dbo.state_regulations
        WHERE   federal_max_level < state_max_level
    ) AS Stricter_Federal_Maximums
        ON SR.contaminant = Stricter_State_Maximums.contaminant

Number_of_Stricter_State_Maximums,Number_of_Stricter_Federal_Maximums,Number_of_Same_State_Federal_Maximums
27,0,49


In the above query, each subquery needed to be joined back to the original table, since the none of the subqueries should have any overlapping data required to make a join. To achieve these results, it is imperative that a LEFT join be used, otherwise the output will not return the correct value.

The output is slightly different, in that here each is a column with its own header instead of rows with an identifying feature.