# T-SQL Tutorials

## SQL Server Bikestores Database

    Subquery – explain the subquery concept and show you how to use various subquery types to select data.
    Correlated subquery – introduce you to the correlated subquery concept.
    EXISTS – test for the existence of rows returned by a subquery.
    ANY – compare a value with a single-column set of values returned by a subquery and return TRUE if the value matches any value in the set.
    ALL – compare a value with a single-column set of values returned by a subquery and return TRUE if the value matches all values in the set.
    CROSS APPLY – perform an inner join of a table with a table-valued function or a correlated subquery.
    OUTER APPLY – perform a left join of a table with a table-valued function or a correlated subquery.

In [2]:
import pyodbc
import os
import pandas as pd

#Check if drivers are installed
[x for x in pyodbc.drivers() if x.startswith("Microsoft Access Driver")]

# Define the connection string
conn_str = (
    r'DRIVER={ODBC Driver 17 for SQL Server};'
    r'SERVER=localhost;'
    r'DATABASE=BikeStores;'
    r'Trusted_Connection=yes;'
)

# Establish the connection
conn = pyodbc.connect(conn_str)

# Create a cursor
cursor = conn.cursor()

### SubQuery

A subquery is a query nested inside another statement such as SELECT, INSERT, UPDATE, or DELETE.

The following statement shows how to use a subquery in the WHERE clause of a SELECT statement to find the sales orders of the customers located in New York:

In [3]:
# execute a query
cursor.execute('''
SELECT
    order_id,
    order_date,
    customer_id
FROM
    sales.orders
WHERE
    customer_id IN (
        SELECT
            customer_id
        FROM
            sales.customers
        WHERE
            city = 'New York'
    )
ORDER BY
    order_date DESC;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,order_id,order_date,customer_id
0,1510,2018-04-09,16
1,1351,2018-01-16,1016
2,1020,2017-07-23,16
3,572,2016-11-24,178
4,514,2016-10-19,927


Note that you must always enclose the SELECT query of a subquery in parentheses ().

A subquery is also known as an inner query or inner select, while the statement containing the subquery is called an outer select or outer query:

![Subquery](SQL-Server-Subquery.png)

In [4]:


# execute a query
cursor.execute('''
SELECT
    customer_id
FROM
    sales.customers
WHERE
    city = 'New York'
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,customer_id
0,16
1,178
2,327
3,411
4,854


Second, SQL Server substitutes customer identification numbers returned by the subquery in the IN operator and executes the outer query to get the final result set.

As you can see, by using the subquery, you can combine two steps. The subquery removes the need for selecting the customer identification numbers and plugging them into the outer query. Moreover, the query itself automatically adjusts whenever the customer data changes.

### Nesting subquery

A subquery can be nested within another subquery. SQL Server supports up to 32 levels of nesting. Consider the following example:

In [5]:
# execute a query
cursor.execute('''
SELECT
    product_name,
    list_price
FROM
    production.products
WHERE
    list_price > (
        SELECT
            AVG (list_price)
        FROM
            production.products
        WHERE
            brand_id IN (
                SELECT
                    brand_id
                FROM
                    production.brands
                WHERE
                    brand_name = 'Strider'
                OR brand_name = 'Trek'
            )
    )
ORDER BY
    list_price;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,product_name,list_price
0,Surly Karate Monkey 27.5+ Frameset - 2017,2499.99
1,Trek Fuel EX 7 29 - 2018,2499.99
2,Surly Krampus Frameset - 2018,2499.99
3,Surly Troll Frameset - 2018,2499.99
4,Trek Domane SL 5 Disc Women's - 2018,2499.99


First, SQL Server executes the following subquery to get a list of brand identification numbers of the Strider and Trek brands:

In [6]:

# execute a query
cursor.execute('''
SELECT
    brand_id
FROM
    production.brands
WHERE
    brand_name = 'Strider'
OR brand_name = 'Trek';
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,brand_id
0,6
1,9


Second, SQL Server calculates the average price list of all products that belong to those brands.

In [7]:

# execute a query
cursor.execute('''
SELECT
    AVG (list_price)
FROM
    production.products
WHERE
    brand_id IN (6,9)
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,Unnamed: 1
0,2450.279855


Third, SQL Server finds the products whose list price is greater than the average list price of all products with the Strider or Trek brand.

SQL Server subquery types
You can use a subquery in many places:

In place of an expression

    With IN or NOT IN
    With ANY or ALL
    With EXISTS or NOT EXISTS
    In UPDATE, DELETE, orINSERT statement
    In the FROM clause

SQL Server subquery is used in place of an expression
If a subquery returns a single value, it can be used anywhere an expression is used.

In the following example, a subquery is used as a column expression named max_list_price in a SELECT statement.

In [9]:

# execute a query
cursor.execute('''
SELECT
    order_id,
    order_date,
    (
        SELECT
            MAX (list_price)
        FROM
            sales.order_items i
        WHERE
            i.order_id = o.order_id
    ) AS max_list_price
FROM
    sales.orders o
order by order_date desc;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,order_id,order_date,max_list_price
0,1615,2018-12-28,2499.99
1,1614,2018-11-28,2299.99
2,1613,2018-11-18,4999.99
3,1612,2018-10-21,1559.99
4,1611,2018-09-06,3199.99


subquery is used with IN operator

A subquery that is used with the IN operator returns a set of zero or more values. After the subquery returns values, the outer query makes use of them.

The following query finds the names of all mountain bikes and road bikes products that the Bike Stores sell.

In [10]:

# execute a query
cursor.execute('''
SELECT
    product_id,
    product_name
FROM
    production.products
WHERE
    category_id IN (
        SELECT
            category_id
        FROM
            production.categories
        WHERE
            category_name = 'Mountain Bikes'
        OR category_name = 'Road Bikes'
    );
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,product_id,product_name
0,1,Trek 820 - 2016
1,2,Ritchey Timberwolf Frameset - 2016
2,3,Surly Wednesday Frameset - 2016
3,4,Trek Fuel EX 8 29 - 2016
4,5,Heller Shagamaw Frame - 2016


This query is evaluated in two steps:

First, the inner query returns a list of category identification numbers that match the names Mountain Bikes and code Road Bikes.

Second, these values are substituted into the outer query that finds the product names which have the category identification number match with one of the values in the list.

SQL Server subquery is used with ANY operator

The subquery is introduced with the ANY operator has the following syntax:

scalar_expression comparison_operator ANY (subquery)
Code language: SQL (Structured Query Language) (sql)
Assuming that the subquery returns a list of value v1, v2, … vn. The ANY operator returns TRUE if one of a comparison pair (scalar_expression, vi) evaluates to TRUE; otherwise, it returns FALSE.

For example, the following query finds the products whose list prices are greater than or equal to the average list price of any product brand.

In [11]:

# execute a query
cursor.execute('''
SELECT
    product_name,
    list_price
FROM
    production.products
WHERE
    list_price >= ANY (
        SELECT
            AVG (list_price)
        FROM
            production.products
        GROUP BY
            brand_id
    )
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,product_name,list_price
0,Trek 820 - 2016,379.99
1,Ritchey Timberwolf Frameset - 2016,749.99
2,Surly Wednesday Frameset - 2016,999.99
3,Trek Fuel EX 8 29 - 2016,2899.99
4,Heller Shagamaw Frame - 2016,1320.99


For each brand, the subquery finds the maximum list price. The outer query uses these max prices and determines which individual product’s list price is greater than or equal to any brand’s maximum list price.

SQL Server subquery is used with ALL operator
The ALL operator has the same syntax as the ANY operator:

scalar_expression comparison_operator ALL (subquery)
Code language: SQL (Structured Query Language) (sql)
The ALL operator returns TRUE if all comparison pairs (scalar_expression, vi) evaluate to TRUE; otherwise, it returns FALSE.

The following query finds the products whose list price is greater than or equal to the average list price returned by the subquery:

In [12]:

# execute a query
cursor.execute('''
SELECT
    product_name,
    list_price
FROM
    production.products
WHERE
    list_price >= ALL (
        SELECT
            AVG (list_price)
        FROM
            production.products
        GROUP BY
            brand_id
    )
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,product_name,list_price
0,Trek Fuel EX 8 29 - 2016,2899.99
1,Trek Slash 8 27.5 - 2016,3999.99
2,Trek Conduit+ - 2016,2999.99
3,Trek Fuel EX 9.8 29 - 2017,4999.99
4,Trek Fuel EX 9.8 27.5 Plus - 2017,5299.99


In [13]:

# execute a query
cursor.execute('''
SELECT
    customer_id,
    first_name,
    last_name,
    city
FROM
    sales.customers c
WHERE
    EXISTS (
        SELECT
            customer_id
        FROM
            sales.orders o
        WHERE
            o.customer_id = c.customer_id
        AND YEAR (order_date) = 2017
    )
ORDER BY
    first_name,
    last_name;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,customer_id,first_name,last_name,city
0,75,Abby,Gamble,Amityville
1,1224,Abram,Copeland,Harlingen
2,673,Adam,Henderson,Los Banos
3,1023,Adena,Blake,Ballston Spa
4,1412,Adrien,Hunter,Rego Park


If you use the NOT EXISTS instead of EXISTS, you can find the customers who did not buy any products in 2017.

In [14]:

# execute a query
cursor.execute('''
SELECT
    customer_id,
    first_name,
    last_name,
    city
FROM
    sales.customers c
WHERE
    NOT EXISTS (
        SELECT
            customer_id
        FROM
            sales.orders o
        WHERE
            o.customer_id = c.customer_id
        AND YEAR (order_date) = 2017
    )
ORDER BY
    first_name,
    last_name;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,customer_id,first_name,last_name,city
0,1174,Aaron,Knapp,Yonkers
1,338,Abbey,Pugh,Forest Hills
2,1085,Adam,Thornton,Central Islip
3,195,Addie,Hahn,Franklin Square
4,1261,Adelaida,Hancock,San Pablo


SQL Server subquery in the FROM clause

Suppose that you want to find the average of the sum of orders of all sales staff. To do this, you can first find the number of orders by staff:

In [15]:


# execute a query
cursor.execute('''
SELECT 
   staff_id, 
   COUNT(order_id) order_count
FROM 
   sales.orders
GROUP BY 
   staff_id;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,staff_id,order_count
0,9,86
1,3,184
2,6,553
3,7,540
4,2,164


Then, you can apply the AVG() function to this result set. Since a query returns a result set that looks like a virtual table, you can place the whole query in the FROM clause of another query like this:

In [16]:

# execute a query
cursor.execute('''
SELECT 
   AVG(order_count) average_order_count_by_staff
FROM
(
    SELECT 
	staff_id, 
        COUNT(order_id) order_count
    FROM 
	sales.orders
    GROUP BY 
	staff_id
) t;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,average_order_count_by_staff
0,269


The query that you place in the FROM clause must have a table alias. In this example, we used the t as the table alias for the subquery.  To come up with the final result, SQL Server carries the following steps:

    Execute the subquery in the FROM clause.
    
    Use the result of the subquery and execute the outer query.