### Topics:
    Correlated Subquery
    Exists

![MSSQL serverSample DB](SQL-Server-Sample-Database.png)

## SQL Server Correlated Subquery

A correlated subquery is a subquery that uses the values of the outer query. In other words, the correlated subquery depends on the outer query for its values.

Because of this dependency, a correlated subquery cannot be executed independently as a simple subquery.

Moreover, a correlated subquery is executed repeatedly, once for each row evaluated by the outer query. The correlated subquery is also known as a repeating subquery.

In [1]:
import pyodbc
import os
import pandas as pd

#Check if drivers are installed
[x for x in pyodbc.drivers() if x.startswith("Microsoft Access Driver")]

# Define the connection string
conn_str = (
    r'DRIVER={ODBC Driver 17 for SQL Server};'
    r'SERVER=localhost;'
    r'DATABASE=BikeStores;'
    r'Trusted_Connection=yes;'
)

# Establish the connection
conn = pyodbc.connect(conn_str)

# Create a cursor
cursor = conn.cursor()

In [2]:
# execute a query
cursor.execute('''
SELECT
    product_name,
    list_price,
    category_id
FROM
    production.products p1
WHERE
    list_price IN (
        SELECT
            MAX (p2.list_price)
        FROM
            production.products p2
        WHERE
            p2.category_id = p1.category_id
        GROUP BY
            p2.category_id
    )
ORDER BY
    category_id,
    product_name;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(10)

Unnamed: 0,product_name,list_price,category_id
0,Electra Straight 8 3i (20-inch) - Boy's - 2017,489.99,1
1,Electra Townie 3i EQ (20-inch) - Boys' - 2017,489.99,1
2,Trek Superfly 24 - 2017/2018,489.99,1
3,Electra Townie Go! 8i - 2017/2018,2599.99,2
4,Electra Townie Commute Go! - 2018,2999.99,3
5,Electra Townie Commute Go! Ladies' - 2018,2999.99,3
6,Trek Boone 7 Disc - 2018,3999.99,4
7,Trek Powerfly 7 FS - 2018,4999.99,5
8,Trek Powerfly 8 FS Plus - 2017,4999.99,5
9,Trek Super Commuter+ 8S - 2018,4999.99,5


In this example, for each product evaluated by the outer query, the subquery finds the highest price of all products in its category.

If the price of the current product is equal to the highest price of all products in its category, the product is included in the result set. This process continues for the next product and so on.

As you can see, the correlated subquery is executed once for each product evaluated by the outer query.

## SQL Server EXISTS

The EXISTS operator is a logical operator that allows you to check whether a subquery returns any row. The EXISTS operator returns TRUE if the subquery returns one or more rows.

The following shows the syntax of the SQL Server EXISTS operator:

EXISTS ( subquery)

Code language: SQL (Structured Query Language) (sql)
In this syntax, the subquery is a SELECT statement only. As soon as the subquery returns rows, the EXISTS operator returns TRUE and stop processing immediately.

Note that even though the subquery returns a NULL value, the EXISTS operator is still evaluated to TRUE.

### A) Using EXISTS with a subquery returns NULL example

In [3]:

# execute a query
cursor.execute('''
SELECT
    customer_id,
    first_name,
    last_name
FROM
    sales.customers
WHERE
    EXISTS (SELECT NULL)
ORDER BY
    first_name,
    last_name;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(10)

Unnamed: 0,customer_id,first_name,last_name
0,1174,Aaron,Knapp
1,338,Abbey,Pugh
2,75,Abby,Gamble
3,1224,Abram,Copeland
4,673,Adam,Henderson
5,1085,Adam,Thornton
6,195,Addie,Hahn
7,1261,Adelaida,Hancock
8,22,Adelle,Larsen
9,1023,Adena,Blake


In this example, the subquery returned a result set that contains NULL which causes the EXISTS operator to evaluate to TRUE. Therefore, the whole query returns all rows from the customers table.

### B) Using EXISTS with a correlated subquery example

In [4]:

# execute a query
cursor.execute('''
SELECT
    customer_id,
    first_name,
    last_name
FROM
    sales.customers c
WHERE
    EXISTS (
        SELECT
            COUNT (*)
        FROM
            sales.orders o
        WHERE
            customer_id = c.customer_id
        GROUP BY
            customer_id
        HAVING
            COUNT (*) > 2
    )
ORDER BY
    first_name,
    last_name;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(10)

Unnamed: 0,customer_id,first_name,last_name
0,20,Aleta,Shepard
1,32,Araceli,Golden
2,64,Bobbie,Foster
3,47,Bridgette,Guerra
4,17,Caren,Stephens
5,5,Charolette,Rice
6,50,Cleotilde,Booth
7,24,Corene,Wall
8,4,Daryl,Spence
9,1,Debra,Burks


In this example, we had a correlated subquery that returns customers who place more than two orders.

If the number of orders placed by the customer is less than or equal to two, the subquery returns an empty result set that causes the EXISTS operator to evaluate to FALSE.

Based on the result of the EXISTS operator, the customer will be included in the result set.

### C) EXISTS vs. IN example

The following statement uses the IN operator to find the orders of the customers from San Jose:

In [5]:

# execute a query
cursor.execute('''
SELECT
    *
FROM
    sales.orders
WHERE
    customer_id IN (
        SELECT
            customer_id
        FROM
            sales.customers
        WHERE
            city = 'San Jose'
    )
ORDER BY
    customer_id,
    order_date;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(10)

Unnamed: 0,order_id,customer_id,order_status,order_date,required_date,shipped_date,store_id,staff_id
0,1411,109,4,2018-03-01,2018-03-02,2018-03-02,1,2
1,1584,109,2,2018-04-26,2018-04-26,,1,3
2,1275,165,4,2017-11-29,2017-12-01,2017-11-30,1,2
3,1591,165,2,2018-04-27,2018-04-27,,1,2
4,156,357,4,2016-04-03,2016-04-06,2016-04-05,1,3
5,868,868,4,2017-05-01,2017-05-04,2017-05-02,1,3
6,1336,904,4,2018-01-09,2018-01-10,2018-01-12,1,2
7,1026,1370,4,2017-07-26,2017-07-28,2017-07-29,1,2
8,927,1438,4,2017-06-03,2017-06-05,2017-06-06,1,2


In [6]:

# execute a query
cursor.execute('''
SELECT
    *
FROM
    sales.orders o
WHERE
    EXISTS (
        SELECT
            customer_id
        FROM
            sales.customers c
        WHERE
            o.customer_id = c.customer_id
        AND city = 'San Jose'
    )
ORDER BY
    o.customer_id,
    order_date;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(10)

Unnamed: 0,order_id,customer_id,order_status,order_date,required_date,shipped_date,store_id,staff_id
0,1411,109,4,2018-03-01,2018-03-02,2018-03-02,1,2
1,1584,109,2,2018-04-26,2018-04-26,,1,3
2,1275,165,4,2017-11-29,2017-12-01,2017-11-30,1,2
3,1591,165,2,2018-04-27,2018-04-27,,1,2
4,156,357,4,2016-04-03,2016-04-06,2016-04-05,1,3
5,868,868,4,2017-05-01,2017-05-04,2017-05-02,1,3
6,1336,904,4,2018-01-09,2018-01-10,2018-01-12,1,2
7,1026,1370,4,2017-07-26,2017-07-28,2017-07-29,1,2
8,927,1438,4,2017-06-03,2017-06-05,2017-06-06,1,2


##### EXISTS vs. JOIN

The EXISTS operator returns TRUE or FALSE while the JOIN clause returns rows from another table.

You use the EXISTS operator to test if a subquery returns any row and short circuits as soon as it does. On the other hand, you use JOIN to extend the result set by combining it with the columns from related tables.

In practice, you use the EXISTS when you need to check the existence of rows from related tables without returning data from them.

## SQL Server CROSS APPLY

The CROSS APPLY clause allows you to perform an inner join a table with a table-valued function or a correlated subquery.

The CROSS APPLY clause works like an INNER JOIN clause. But instead of joining two tables, the CROSS APPLY clause joins a table with a table-valued function or a correlated subquery.


In this syntax:
    
- table1 is the main table from which you want to join.
- table_function: is the table-valued function to apply to each row. Alternatively, you can use a correlated subquery.
- column: is the column from table1 that will be passed as a parameter to the table_function.
- alias is the alias for the result set returned by the table_function.

##### 1) Using the SQL Server CROSS APPLY clause to join a table with a correlated subquery

The following example uses the CROSS APPLY clause to join the production.categories table with a correlated subquery to retrieve the top two most expensive products for each product category:

In [7]:

# execute a query
cursor.execute('''
SELECT
  c.category_name,
  r.product_name,
  r.list_price
FROM
  production.categories c
  CROSS APPLY (
    SELECT
      TOP 2 *
    FROM
      production.products p
    WHERE
      p.category_id = c.category_id
    ORDER BY
      list_price DESC,
      product_name
  ) r
ORDER BY
  c.category_name,
  r.list_price DESC;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(10)

Unnamed: 0,category_name,product_name,list_price
0,Children Bicycles,Electra Straight 8 3i (20-inch) - Boy's - 2017,489.99
1,Children Bicycles,Electra Townie 3i EQ (20-inch) - Boys' - 2017,489.99
2,Comfort Bicycles,Electra Townie Go! 8i - 2017/2018,2599.99
3,Comfort Bicycles,Electra Townie Balloon 7i EQ - 2018,899.99
4,Cruisers Bicycles,Electra Townie Commute Go! - 2018,2999.99
5,Cruisers Bicycles,Electra Townie Commute Go! Ladies' - 2018,2999.99
6,Cyclocross Bicycles,Trek Boone 7 Disc - 2018,3999.99
7,Cyclocross Bicycles,Trek Boone 7 - 2017,3499.99
8,Electric Bikes,Trek Powerfly 7 FS - 2018,4999.99
9,Electric Bikes,Trek Powerfly 8 FS Plus - 2017,4999.99


For each row from the production.categories table, the CROSS APPLY executes the following correlated subquery to retrieve the top two most expensive products:

##### 2) Using the CROSS APPLY clause to join a table with a table-valued function

First, define a table-valued function that returns the top two most expensive products by category id:

In [8]:
# execute a query
cursor.execute('''
CREATE FUNCTION GetTopProductsByCategory (@category_id INT)
RETURNS TABLE
AS
RETURN (
    SELECT TOP 2 *
    FROM production.products p
    WHERE p.category_id = @category_id 
    ORDER BY list_price DESC, product_name
);
''')


<pyodbc.Cursor at 0x1a05ef54c30>

Second, use the CROSS APPLY clause with the table-valued function GetTopProductsByCategory to retrieve the top two most expensive products within each category:

In [9]:

# execute a query
cursor.execute('''
SELECT
  c.category_name,
  r.product_name,
  r.list_price
FROM
  production.categories c
  CROSS APPLY GetTopProductsByCategory(c.category_id) r
ORDER BY
  c.category_name,
  r.list_price DESC;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(10)

Unnamed: 0,category_name,product_name,list_price
0,Children Bicycles,Electra Straight 8 3i (20-inch) - Boy's - 2017,489.99
1,Children Bicycles,Electra Townie 3i EQ (20-inch) - Boys' - 2017,489.99
2,Comfort Bicycles,Electra Townie Go! 8i - 2017/2018,2599.99
3,Comfort Bicycles,Electra Townie Balloon 7i EQ - 2018,899.99
4,Cruisers Bicycles,Electra Townie Commute Go! - 2018,2999.99
5,Cruisers Bicycles,Electra Townie Commute Go! Ladies' - 2018,2999.99
6,Cyclocross Bicycles,Trek Boone 7 Disc - 2018,3999.99
7,Cyclocross Bicycles,Trek Boone 7 - 2017,3499.99
8,Electric Bikes,Trek Powerfly 7 FS - 2018,4999.99
9,Electric Bikes,Trek Powerfly 8 FS Plus - 2017,4999.99


It returns the same result as the query that uses the correlated subquery above.

##### 3) Using the CROSS APPLY clause to process JSON data

First, create a table called product_json to store the product data:

In [11]:
cursor.execute('''
CREATE TABLE product_json(
    id INT IDENTITY PRIMARY KEY,
    info NVARCHAR(MAX)
);
''')

<pyodbc.Cursor at 0x1a05ef54c30>

In the product_json table:

- id is the primary key column with the identity attribute.
- info is the NVARCHAR(MAX) that will store the JSON data.

Second, insert rows into the product_json table:

In [12]:
cursor.execute('''
INSERT INTO product_json(info)
VALUES 
    ('{"Name": "Laptop", "Price": 999, "Category": "Electronics"}'),
    ('{"Name": "Headphones", "Price": 99, "Category": "Electronics"}'),
    ('{"Name": "Book", "Price": 15, "Category": "Books"}');
''')

<pyodbc.Cursor at 0x1a05ef54c30>

Third, extract information from the info JSON data using the CROSS APPLY clause with the OPENJSON() function:

In [13]:
cursor.execute('''
SELECT
  p.id,
  j.*
FROM
  product_json p
  CROSS APPLY OPENJSON (p.info) WITH
  (
    Name NVARCHAR(100),
    Price DECIMAL(10, 2),
    Category NVARCHAR(100)
  ) AS j;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(10)

Unnamed: 0,id,Name,Price,Category
0,1,Laptop,999.0,Electronics
1,2,Headphones,99.0,Electronics
2,3,Book,15.0,Books


#### 4) Using the CROSS APPLY clause to remove the nested REPLACE() function

First, create a table called companies that stores the company names:

In [14]:
cursor.execute('''
CREATE TABLE companies(
   id INT IDENTITY PRIMARY KEY,
   name VARCHAR(255) NOT NULL
);
''')

<pyodbc.Cursor at 0x1a05ef54c30>

Insert into companies table

In [15]:
cursor.execute('''
INSERT INTO
  companies (name)
VALUES
  ('ABC Corporation'),
  ('XYZ Inc.'),
  ('JK Pte Ltd');
''')

<pyodbc.Cursor at 0x1a05ef54c30>

Suppose you want to get the company names without words like Corporation, Inc., and Pte Ltd. To achieve this, you can use multiple REPLACE() functions.

Third, retrieve the company names from the companies table:

In [17]:
cursor.execute('''
SELECT TRIM(REPLACE(REPLACE(REPLACE(name,'Corporation',''), 'Inc.',''),'Pte Ltd','')) company_name
FROM companies;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(10)

Unnamed: 0,company_name
0,ABC
1,XYZ
2,JK


The query works as expected but it is quite complex. To fix this, you can utilize the CROSS APPLY clause follows:

In [18]:
cursor.execute('''
SELECT TRIM(r3.name) company_name
FROM companies c
CROSS APPLY (SELECT REPLACE(c.name,'Corporation', '') name) AS r1 
CROSS APPLY (SELECT REPLACE(r1.name,'Inc.', '') name) AS r2
CROSS APPLY (SELECT REPLACE(r2.name,'Pte Ltd', '') name) AS r3;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(10)

Unnamed: 0,company_name
0,ABC
1,XYZ
2,JK


In this query, we use a series of CROSS APPLY clauses to progressively replace specific words (Corporation, Inc., and Pte Ltd) from the company names.

Outer Apply: https://www.sqlservertutorial.net/sql-server-basics/sql-server-outer-apply/

### Outer Apply

Using the OUTER APPLY clause to join a table with a table-valued function

In [19]:
cursor.execute('''
CREATE FUNCTION GetLatestQuantityDiscount (@product_id INT) 
RETURNS TABLE 
AS RETURN (
  SELECT
    TOP 1 i.*
  FROM
    sales.order_items i
    INNER JOIN sales.orders o ON o.order_id = i.order_id
  WHERE
    product_id = @product_id
  ORDER BY
    order_date DESC
);
''')

<pyodbc.Cursor at 0x1a05ef54c30>

Second, use the OUTER APPLY clause with the table-valued function GetLatestQuantityDiscount to retrieve the latest quantity and discount of each product in the production.products table:

In [20]:
cursor.execute('''
SELECT
  p.product_name,
  r.quantity,
  r.discount
FROM
  production.products p 
OUTER APPLY GetLatestQuantityDiscount(p.product_id) r
WHERE
  p.brand_id = 1
ORDER BY
  r.quantity;
''')


# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(10)

Unnamed: 0,product_name,quantity,discount
0,Electra Townie Go! 8i Ladies' - 2018,,
1,Electra Savannah 1 (20-inch) - Girl's - 2018,,
2,Electra Sweet Ride 1 (20-inch) - Girl's - 2018,,
3,Electra Townie Original 21D - 2018,1.0,0.2
4,Electra Townie Balloon 7i EQ - 2018,1.0,0.1
5,Electra Townie Balloon 3i EQ - 2017/2018,1.0,0.1
6,Electra Townie Balloon 8D EQ - 2016/2017/2018,1.0,0.07
7,Electra Townie Commute 27D - 2018,1.0,0.07
8,Electra Townie Commute 27D Ladies - 2018,1.0,0.05
9,Electra Soft Serve 1 (16-inch) - Girl's - 2018,1.0,0.2
