# T-SQL Tutorials

## SQL Server HR Database

https://www.sqltutorial.org/sql-sample-database/

In [1]:
import pyodbc
import os
import pandas as pd

#Check if drivers are installed
[x for x in pyodbc.drivers() if x.startswith("Microsoft Access Driver")]

# Define the connection string
conn_str = (
    r'DRIVER={ODBC Driver 17 for SQL Server};'
    r'SERVER=localhost;'
    r'DATABASE=hr;'
    r'Trusted_Connection=yes;'
)

# Establish the connection
conn = pyodbc.connect(conn_str)

# Create a cursor
cursor = conn.cursor()

In [3]:
# execute a query
cursor.execute('''
SELECT  
    *
FROM 
    hr.candidates
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,id,fullname
0,1,John Doe
1,2,Lily Bush
2,3,Peter Drucker
3,4,Jane Doe


In [4]:
# execute a query
cursor.execute('''
SELECT  
    *
FROM 
    hr.employees
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,id,fullname
0,1,John Doe
1,2,Jane Doe
2,3,Michael Scott
3,4,Jack Sparrow


SQL Server Inner Join
Inner join produces a data set that includes rows from the left table, and matching rows from the right table.

The following example uses the inner join clause to get the rows from the candidates table that has the corresponding rows with the same values in the fullname column of the employees table:

### SQL Server Inner Join

Inner join produces a data set that includes rows from the left table, and matching rows from the right table.

The following example uses the inner join clause to get the rows from the candidates table that has the corresponding rows with the same values in the fullname column of the employees table:

In [3]:
# execute a query
cursor.execute('''
SELECT  
    c.id candidate_id,
    c.fullname candidate_name,
    e.id employee_id,
    e.fullname employee_name
FROM 
    hr.candidates c
    INNER JOIN hr.employees e 
        ON e.fullname = c.fullname;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,candidate_id,candidate_name,employee_id,employee_name
0,1,John Doe,1,John Doe
1,4,Jane Doe,2,Jane Doe


### SQL Server Left Join

Left join selects data starting from the left table and matching rows in the right table. The left join returns all rows from the left table and the matching rows from the right table. If a row in the left table does not have a matching row in the right table, the columns of the right table will have nulls.

The left join is also known as the left outer join. The outer keyword is optional.

The following statement joins the candidates table with the employees table using left join:

In [4]:

# execute a query
cursor.execute('''
SELECT  
	c.id candidate_id,
	c.fullname candidate_name,
	e.id employee_id,
	e.fullname employee_name
FROM 
	hr.candidates c
	LEFT JOIN hr.employees e 
		ON e.fullname = c.fullname;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,candidate_id,candidate_name,employee_id,employee_name
0,1,John Doe,1.0,John Doe
1,2,Lily Bush,,
2,3,Peter Drucker,,
3,4,Jane Doe,2.0,Jane Doe


In [8]:

# execute a query
cursor.execute('''
SELECT  
    c.id candidate_id,
    c.fullname candidate_name,
    e.id employee_id,
    e.fullname employee_name
FROM 
    hr.candidates c
    LEFT JOIN hr.employees e 
        ON e.fullname = c.fullname
WHERE 
    e.id IS NULL;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,candidate_id,candidate_name,employee_id,employee_name
0,2,Lily Bush,,
1,3,Peter Drucker,,


### SQL Server Right Join

The right join or right outer join selects data starting from the right table. It is a reversed version of the left join.

The right join returns a result set that contains all rows from the right table and the matching rows in the left table. If a row in the right table does not have a matching row in the left table, all columns in the left table will contain nulls.

The following example uses the right join to query rows from candidates and employees tables:

In [9]:
# execute a query
cursor.execute('''
SELECT  
    c.id candidate_id,
    c.fullname candidate_name,
    e.id employee_id,
    e.fullname employee_name
FROM 
    hr.candidates c
    RIGHT JOIN hr.employees e 
        ON e.fullname = c.fullname;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,candidate_id,candidate_name,employee_id,employee_name
0,1.0,John Doe,1,John Doe
1,4.0,Jane Doe,2,Jane Doe
2,,,3,Michael Scott
3,,,4,Jack Sparrow


Similarly, you can get rows that are available only in the right table by adding a WHERE clause to the above query as follows:

In [10]:

# execute a query
cursor.execute('''
SELECT  
    c.id candidate_id,
    c.fullname candidate_name,
    e.id employee_id,
    e.fullname employee_name
FROM 
    hr.candidates c
    RIGHT JOIN hr.employees e 
        ON e.fullname = c.fullname
WHERE
    c.id IS NULL;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,candidate_id,candidate_name,employee_id,employee_name
0,,,3,Michael Scott
1,,,4,Jack Sparrow


#### SQL Server full join

The full outer join or full join returns a result set that contains all rows from both left and right tables, with the matching rows from both sides where available. In case there is no match, the missing side will have NULL values.

The following example shows how to perform a full join between the candidates and employees tables:

In [11]:

# execute a query
cursor.execute('''
SELECT  
    c.id candidate_id,
    c.fullname candidate_name,
    e.id employee_id,
    e.fullname employee_name
FROM 
    hr.candidates c
    FULL JOIN hr.employees e 
        ON e.fullname = c.fullname;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,candidate_id,candidate_name,employee_id,employee_name
0,1.0,John Doe,1.0,John Doe
1,2.0,Lily Bush,,
2,3.0,Peter Drucker,,
3,4.0,Jane Doe,2.0,Jane Doe
4,,,3.0,Michael Scott


To select rows that exist in either the left or right table, you exclude rows that are common to both tables by adding a WHERE clause as shown in the following query:

In this query, c is the alias for the sales.customers table and o is the alias for the sales.orders table.

In [12]:

# execute a query
cursor.execute('''
SELECT  
    c.id candidate_id,
    c.fullname candidate_name,
    e.id employee_id,
    e.fullname employee_name
FROM 
    hr.candidates c
    FULL JOIN hr.employees e 
        ON e.fullname = c.fullname
WHERE
    c.id IS NULL OR
    e.id IS NULL;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,candidate_id,candidate_name,employee_id,employee_name
0,2.0,Lily Bush,,
1,3.0,Peter Drucker,,
2,,,3.0,Michael Scott
3,,,4.0,Jack Sparrow


In [13]:
conn.close()

### INNER JOIN clause

In [17]:
import pyodbc
import os
import pandas as pd

#Check if drivers are installed
[x for x in pyodbc.drivers() if x.startswith("Microsoft Access Driver")]

# Define the connection string
conn_str = (
    r'DRIVER={ODBC Driver 17 for SQL Server};'
    r'SERVER=localhost;'
    r'DATABASE=BikeStores;'
    r'Trusted_Connection=yes;'
)

# Establish the connection
conn = pyodbc.connect(conn_str)

# Create a cursor
cursor = conn.cursor()

In [18]:

# execute a query
cursor.execute('''
SELECT
    product_name,
    list_price,
    category_id
FROM
    production.products
ORDER BY
    product_name DESC;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,product_name,list_price,category_id
0,Trek XM700+ Lowstep - 2018,3499.99,5
1,Trek XM700+ - 2018,3499.99,5
2,Trek X-Caliber Frameset - 2018,1499.99,6
3,Trek X-Caliber 8 - 2018,999.99,6
4,Trek X-Caliber 8 - 2017,999.99,6


The query returned only a list of category identification numbers, not the category names. To include the category names in the result set, you use the INNER JOIN clause as follows:

In [20]:

# execute a query
cursor.execute('''
SELECT
    product_name,
    category_name,
    list_price
FROM
    production.products p
INNER JOIN production.categories c 
    ON p.category_id = c.category_id
ORDER BY
    product_name DESC;

''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,product_name,category_name,list_price
0,Trek XM700+ Lowstep - 2018,Electric Bikes,3499.99
1,Trek XM700+ - 2018,Electric Bikes,3499.99
2,Trek X-Caliber Frameset - 2018,Mountain Bikes,1499.99
3,Trek X-Caliber 8 - 2018,Mountain Bikes,999.99
4,Trek X-Caliber 8 - 2017,Mountain Bikes,999.99


In this query:

The c and p are the table aliases of the production.categories  and  production.products tables.

By doing this, when you reference a column in these tables, you can use the alias.column_name instead of using the table_name.column_name.

For example, the query uses c.category_id instead of production.categories.category_id. Hence, it saves you some typing.

For each row in the production.products table, the inner join clause matches it with every row in the product.categories table based on the values of the category_id column:

If both rows have the same value in the category_id column, the inner join forms a new row whose columns are from the rows of the production.categories and production.products tables according to the columns in the select list and includes this new row in the result set.
If the row in the production.products table doesn’t match the row from the production.categories table, the inner join clause just ignores these rows and does not include them in the result set.

In [21]:

# execute a query
cursor.execute('''
SELECT
    product_name,
    category_name,
    brand_name,
    list_price
FROM
    production.products p
INNER JOIN production.categories c ON c.category_id = p.category_id
INNER JOIN production.brands b ON b.brand_id = p.brand_id
ORDER BY
    product_name DESC;

''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head()

Unnamed: 0,product_name,category_name,brand_name,list_price
0,Trek XM700+ Lowstep - 2018,Electric Bikes,Trek,3499.99
1,Trek XM700+ - 2018,Electric Bikes,Trek,3499.99
2,Trek X-Caliber Frameset - 2018,Mountain Bikes,Trek,1499.99
3,Trek X-Caliber 8 - 2018,Mountain Bikes,Trek,999.99
4,Trek X-Caliber 8 - 2017,Mountain Bikes,Trek,999.99


### LEFT JOIN clause

In [24]:

# execute a query
cursor.execute('''
SELECT
    product_name,
    order_id
FROM
    production.products p
LEFT JOIN sales.order_items o ON o.product_id = p.product_id
ORDER BY
    order_id;

''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(20)

Unnamed: 0,product_name,order_id
0,Electra Townie Go! 8i Ladies' - 2018,
1,Trek Checkpoint ALR 5 Women's - 2019,
2,Electra Savannah 1 (20-inch) - Girl's - 2018,
3,Trek Checkpoint ALR Frameset - 2019,
4,Trek Precaliber 12 Girl's - 2018,
5,Surly Krampus Frameset - 2018,
6,Trek 820 - 2016,
7,Trek Checkpoint SL 5 Women's - 2019,
8,Trek Checkpoint ALR 4 Women's - 2019,
9,Trek Kids' Dual Sport - 2018,


In [25]:


# execute a query
cursor.execute('''
SELECT
    product_name,
    order_id
FROM
    production.products p
LEFT JOIN sales.order_items o ON o.product_id = p.product_id
WHERE order_id IS NOT NULL

''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(20)

Unnamed: 0,product_name,order_id
0,Electra Townie Original 7D EQ - Women's - 2016,1
1,Trek Remedy 29 Carbon Frameset - 2016,1
2,Surly Straggler - 2016,1
3,Electra Townie Original 7D EQ - 2016,1
4,Trek Fuel EX 8 29 - 2016,1
5,Electra Townie Original 7D EQ - Women's - 2016,2
6,Electra Townie Original 7D EQ - 2016,2
7,Surly Wednesday Frameset - 2016,3
8,Electra Townie Original 7D EQ - Women's - 2016,3
9,Ritchey Timberwolf Frameset - 2016,4


![MULTIPLE LEFT JOIN](orders-order_items-products.png)

In [26]:

# execute a query
cursor.execute('''
SELECT
    p.product_name,
    o.order_id,
    i.item_id,
    o.order_date
FROM
    production.products p
	LEFT JOIN sales.order_items i
		ON i.product_id = p.product_id
	LEFT JOIN sales.orders o
		ON o.order_id = i.order_id
ORDER BY
    order_id;

''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(20)

Unnamed: 0,product_name,order_id,item_id,order_date
0,Electra Savannah 1 (20-inch) - Girl's - 2018,,,
1,Electra Townie Go! 8i Ladies' - 2018,,,
2,Trek Checkpoint ALR 5 Women's - 2019,,,
3,Trek Checkpoint ALR Frameset - 2019,,,
4,Trek Precaliber 12 Girl's - 2018,,,
5,Surly Krampus Frameset - 2018,,,
6,Trek Checkpoint SL 5 Women's - 2019,,,
7,Trek 820 - 2016,,,
8,Trek Checkpoint ALR 4 Women's - 2019,,,
9,Trek Kids' Dual Sport - 2018,,,


##### LEFT JOIN: conditions in ON vs. WHERE clause

In [27]:

# execute a query
cursor.execute('''
SELECT
    product_name,
    order_id
FROM
    production.products p
LEFT JOIN sales.order_items o 
   ON o.product_id = p.product_id
WHERE order_id = 100
ORDER BY
    order_id;

''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(20)

Unnamed: 0,product_name,order_id
0,Electra Townie Original 21D - 2016,100
1,Surly Straggler 650b - 2016,100
2,Trek Slash 8 27.5 - 2016,100
3,Electra Townie Original 7D EQ - 2016,100
4,Electra Townie Original 21D - 2016,100


Let’s move the condition order_id = 100 to the ON clause:

In [28]:

# execute a query
cursor.execute('''
SELECT
    p.product_id,
    product_name,
    order_id
FROM
    production.products p
    LEFT JOIN sales.order_items o 
         ON o.product_id = p.product_id AND 
            o.order_id = 100
ORDER BY
    order_id DESC;

''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(20)

Unnamed: 0,product_id,product_name,order_id
0,24,Electra Townie Original 21D - 2016,100.0
1,7,Trek Slash 8 27.5 - 2016,100.0
2,16,Electra Townie Original 7D EQ - 2016,100.0
3,11,Surly Straggler 650b - 2016,100.0
4,12,Electra Townie Original 21D - 2016,100.0
5,13,Electra Cruiser 1 (24-Inch) - 2016,
6,14,Electra Girl's Hawaii 1 (16-inch) - 2015/2016,
7,15,Electra Moto 1 - 2016,
8,17,Pure Cycles Vine 8-Speed - 2016,
9,18,Pure Cycles Western 3-Speed - Women's - 2015/2016,


### RIGHT JOIN

In [29]:

# execute a query
cursor.execute('''
SELECT
    product_name,
    order_id
FROM
    sales.order_items o
    RIGHT JOIN production.products p 
        ON o.product_id = p.product_id
ORDER BY
    order_id;


''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(20)

Unnamed: 0,product_name,order_id
0,Electra Savannah 1 (20-inch) - Girl's - 2018,
1,Electra Townie Go! 8i Ladies' - 2018,
2,Trek Checkpoint ALR 5 Women's - 2019,
3,Trek Checkpoint ALR Frameset - 2019,
4,Trek Precaliber 12 Girl's - 2018,
5,Surly Krampus Frameset - 2018,
6,Trek Checkpoint SL 5 Women's - 2019,
7,Trek 820 - 2016,
8,Trek Checkpoint ALR 4 Women's - 2019,
9,Trek Kids' Dual Sport - 2018,


### full outer join

The FULL OUTER JOIN is a clause of the SELECT statement. The FULL OUTER JOIN clause returns a result set that includes rows from both left and right tables.

When no matching rows exist for the row in the left table, the columns of the right table will contain NULL. Likewise, when no matching rows exist for the row in the right table, the column of the left table will contain NULL.

In [31]:

# execute a query
cursor.execute('''
SELECT
    *
FROM
    sales.order_items o
    FULL OUTER JOIN production.products p 
        ON o.product_id = p.product_id
ORDER BY
    order_id;


''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(20)

Unnamed: 0,order_id,item_id,product_id,quantity,list_price,discount,product_name,brand_id,category_id,model_year
0,,,284,,319.99,,Electra Savannah 1 (20-inch) - Girl's - 2018,1,1,2018
1,,,195,,2599.99,,Electra Townie Go! 8i Ladies' - 2018,1,5,2018
2,,,318,,1999.99,,Trek Checkpoint ALR 5 Women's - 2019,9,7,2019
3,,,321,,3199.99,,Trek Checkpoint ALR Frameset - 2019,9,7,2019
4,,,267,,199.99,,Trek Precaliber 12 Girl's - 2018,9,1,2018
5,,,121,,2499.99,,Surly Krampus Frameset - 2018,8,6,2018
6,,,319,,2799.99,,Trek Checkpoint SL 5 Women's - 2019,9,7,2019
7,,,1,,379.99,,Trek 820 - 2016,9,6,2016
8,,,316,,1699.99,,Trek Checkpoint ALR 4 Women's - 2019,9,7,2019
9,,,125,,469.99,,Trek Kids' Dual Sport - 2018,9,6,2018


### CROSS JOIN clause

A cross join allows you to combine rows from the first table with every row of the second table. In other words, it returns the Cartesian product of two tables.

Here’s the basic syntax for a cross join:

SELECT
  select_list
FROM
  T1
CROSS JOIN T2;

Code language: SQL (Structured Query Language) (sql)
In this syntax:

T1 and T2 are the tables that you want to perform a cross join.
Unlike other join types such as INNER JOIN or LEFT JOIN, the cross join does not require a join condition.

![CROSS JOIN](SQL-Server-CROSS-JOIN-example.png)

In [32]:

# execute a query
cursor.execute('''
SELECT
    product_id,
    product_name,
    store_id,
    0 AS quantity
FROM
    production.products
CROSS JOIN sales.stores
ORDER BY
    product_name,
    store_id;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(20)

Unnamed: 0,product_id,product_name,store_id,quantity
0,257,Electra Amsterdam Fashion 3i Ladies' - 2017/2018,1,0
1,257,Electra Amsterdam Fashion 3i Ladies' - 2017/2018,2,0
2,257,Electra Amsterdam Fashion 3i Ladies' - 2017/2018,3,0
3,81,Electra Amsterdam Fashion 7i Ladies' - 2017,1,0
4,81,Electra Amsterdam Fashion 7i Ladies' - 2017,2,0
5,81,Electra Amsterdam Fashion 7i Ladies' - 2017,3,0
6,70,Electra Amsterdam Original 3i - 2015/2017,1,0
7,70,Electra Amsterdam Original 3i - 2015/2017,2,0
8,70,Electra Amsterdam Original 3i - 2015/2017,3,0
9,82,Electra Amsterdam Original 3i Ladies' - 2017,1,0


The result set can be used for the stocktaking procedure at the month-end or year-end closing.

The following statement finds the products that have no sales across the stores:

In [33]:

# execute a query
cursor.execute('''
SELECT
    s.store_id,
    p.product_id,
    ISNULL(sales, 0) sales
FROM
    sales.stores s
CROSS JOIN production.products p
LEFT JOIN (
    SELECT
        s.store_id,
        p.product_id,
        SUM (quantity * i.list_price) sales
    FROM
        sales.orders o
    INNER JOIN sales.order_items i ON i.order_id = o.order_id
    INNER JOIN sales.stores s ON s.store_id = o.store_id
    INNER JOIN production.products p ON p.product_id = i.product_id
    GROUP BY
        s.store_id,
        p.product_id
) c ON c.store_id = s.store_id
AND c.product_id = p.product_id
WHERE
    sales IS NULL
ORDER BY
    product_id,
    store_id;
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(20)

Unnamed: 0,store_id,product_id,sales
0,1,1,0.0
1,2,1,0.0
2,3,1,0.0
3,3,34,0.0
4,1,35,0.0
5,3,47,0.0
6,3,54,0.0
7,3,55,0.0
8,3,68,0.0
9,1,72,0.0


### Self Join

A self join allows you to join a table to itself. It helps query hierarchical data or compare rows within the same table.

A self join uses the inner join or left join clause. Because the query that uses the self join references the same table, the table alias is used to assign different names to the same table within the query.

In [34]:


# execute a query
cursor.execute('''
SELECT
    e.first_name + ' ' + e.last_name employee,
    m.first_name + ' ' + m.last_name manager
FROM
    sales.staffs e
INNER JOIN sales.staffs m ON m.staff_id = e.manager_id
ORDER BY
    manager;

''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(20)

Unnamed: 0,employee,manager
0,Mireya Copeland,Fabiola Jackson
1,Jannette David,Fabiola Jackson
2,Kali Vargas,Fabiola Jackson
3,Marcelene Boyer,Jannette David
4,Venita Daniel,Jannette David
5,Genna Serrano,Mireya Copeland
6,Virgie Wiggins,Mireya Copeland
7,Layla Terrell,Venita Daniel
8,Bernardine Houston,Venita Daniel
