In [15]:
import duckdb
import pandas as pd

In [16]:
pd.set_option("display.max_columns", None)
pd.set_option("display.max_colwidth", None)
pd.set_option("display.max_rows", None)

In [17]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [18]:
%config SqlMagic.autopandas = True
%config SqlMagic.displaycon = False
%config SqlMagic.feedback = False

In [19]:
%sql duckdb:///:memory:

In [20]:
orders_df = pd.read_excel("../data/Sample - Superstore.xls", "Orders")
returns_df = pd.read_excel("../data/Sample - Superstore.xls", "Returns")
people_df = pd.read_excel("../data/Sample - Superstore.xls", "People")

In [21]:
%%sql
CREATE TABLE IF NOT EXISTS orders AS
SELECT *
FROM orders_df


Unnamed: 0,Count


In [22]:
%%sql
CREATE TABLE IF NOT EXISTS returns AS
SELECT *
FROM returns_df

Unnamed: 0,Count


In [23]:
%%sql
CREATE TABLE IF NOT EXISTS people AS
SELECT *
FROM people_df

Unnamed: 0,Count


# Intermediate Notebook

In this second notebook you will begin to look at more compliated SQL concepts. Working with more than 1 table is more often than not what you will be doing if you are working with SQL on a professional level. Unions and joins are the operators you can enact in order to be able to do that. In SQL, the UNION operator is used to combine the results of two or more SELECT statements into a single result set. It allows you to merge rows from different tables or queries that have the same column structure. A JOIN operation combines rows from two or more tables based on a related column between them. It allows you to retrieve data that is spread across multiple tables by establishing relationships between them. The types of joins we will be focusing on are as follows:
* INNER JOIN: Returns only the matching rows between the tables based on the specified join condition.

* LEFT JOIN (or LEFT OUTER JOIN): Returns all the rows from the left table and the matching rows from the right table. If there is no match, NULL values are returned for the columns of the right table.

* RIGHT JOIN (or RIGHT OUTER JOIN): Returns all the rows from the right table and the matching rows from the left table. If there is no match, NULL values are returned for the columns of the left table.

* FULL JOIN (or FULL OUTER JOIN): Returns all the rows from both tables and combines them based on the join condition. If there is no match, NULL values are returned for the unmatched columns.

JOINs are typically used with the ON keyword, which specifies the column(s) used to match the rows between tables. By utilizing JOIN operations, you can extract data from multiple tables simultaneously, retrieve related information, and perform complex queries by leveraging the relationships within a database.






## UNION

UNION is important in SQL because it allows you to combine and merge data from different tables or queries into a single result set. This functionality provides several key benefits:

* Data Integration: In real-world scenarios, data is often stored in multiple tables or databases. UNION enables you to integrate data from various sources, facilitating comprehensive analysis and reporting. It allows you to create a unified view of the data, making it easier to work with and draw insights from diverse datasets.

* Simplified Queries: Instead of executing separate queries and manually merging the results outside of SQL, UNION lets you achieve the same result with a single SQL statement. This streamlines the querying process and reduces the need for complex data manipulation in other programming environments.

* Consistent Data Representation: When working with different databases or tables that have similar column structures, UNION ensures a consistent representation of data in the output. It aligns columns correctly and ensures that each row in the result set corresponds to a unique set of data.


Note that with a UNION, the data types of both columns must be the same, but the column names can be different. 

In [34]:
%%sql
SELECT order_id, customer_name
FROM orders_df

UNION

SELECT order_id, returned
FROM returns_df
limit 20

Unnamed: 0,order_id,customer_name
0,CA-2016-152156,Claire Gute
1,CA-2016-138688,Darrin Van Huff
2,US-2015-108966,Sean O'Donnell
3,CA-2014-115812,Brosina Hoffman
4,CA-2017-114412,Andrew Allen
5,CA-2016-161389,Irene Maddox
6,US-2015-118983,Harold Pawlan
7,CA-2014-105893,Pete Kriz
8,CA-2014-167164,Alejandro Grove
9,CA-2014-143336,Zuschuss Donatelli


## JOIN

Joins are vital and powerful in SQL because they enable you to establish relationships between tables and retrieve data from multiple sources based on these relationships. Here are some key reasons why joins are essential in SQL:

* Relational Database Strength: Joins are at the core of the relational database model, which is widely used in modern data management. By allowing you to connect related tables based on shared columns, joins preserve data relationships, ensure data integrity, and maintain consistency across the database.

* Complex Data Analysis: Many real-world data scenarios involve data spread across different tables with complex relationships. Joins empower you to perform intricate data analysis by combining tables with common keys, enabling you to draw insights from interconnected datasets that would be challenging or impossible to achieve with separate queries.

* Single Cohesive Result: Joins merge data from multiple tables into a single cohesive result set. This unified view of data simplifies reporting, visualization, and downstream processing, providing a holistic perspective that aids decision-making processes.

* Multi-Table Reporting: In business intelligence and reporting systems, joins are indispensable for generating comprehensive reports that draw data from various tables. They empower analysts to craft meaningful reports with enriched data, showcasing relevant information in a single output.

As mentioned before, there are several types of joins that you will be using. Starting with Inner Joins,

#### Inner Join

In [25]:
%%sql
SELECT o.customer_id, o.order_id, r.returned
FROM orders_df AS o
INNER JOIN returns_df AS r ON o.order_id = r.order_id
WHERE r.returned = 'Yes'
limit 10


Unnamed: 0,customer_id,order_id,returned
0,ZD-21925,CA-2014-143336,Yes
1,ZD-21925,CA-2014-143336,Yes
2,ZD-21925,CA-2014-143336,Yes
3,TB-21055,CA-2016-111682,Yes
4,TB-21055,CA-2016-111682,Yes
5,TB-21055,CA-2016-111682,Yes
6,TB-21055,CA-2016-111682,Yes
7,TB-21055,CA-2016-111682,Yes
8,TB-21055,CA-2016-111682,Yes
9,TB-21055,CA-2016-111682,Yes


This example simply joins the orders_df table with the returns_df table to align customer id's with returned orders. I selected all the columns I'd need from each table and aliased them using "o." and "r." to tell SQL which column is from which table. O standing for orders and R standing for returns. I use the AS operator to assign the tables to a letter. 

In [37]:
%%sql
SELECT *
FROM people_df

Unnamed: 0,person,region
0,Anna Andreadi,West
1,Chuck Magee,East
2,Kelly Williams,Central
3,Cassandra Brandow,South


In [41]:
%%sql
SELECT *
FROM returns 
LIMIT 10

Unnamed: 0,returned,order_id
0,Yes,CA-2017-153822
1,Yes,CA-2017-129707
2,Yes,CA-2014-152345
3,Yes,CA-2015-156440
4,Yes,US-2017-155999
5,Yes,CA-2014-157924
6,Yes,CA-2017-131807
7,Yes,CA-2016-124527
8,Yes,CA-2017-135692
9,Yes,CA-2014-123225


In [59]:
%%sql
SELECT *
FROM orders_df
LIMIT 6

Unnamed: 0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,postal_code,region,product_id,category,sub_category,product_name,sales,quantity,discount,profit
0,1,CA-2016-152156,2016-11-08,2016-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96,2,0.0,41.9136
1,2,CA-2016-152156,2016-11-08,2016-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back",731.94,3,0.0,219.582
2,3,CA-2016-138688,2016-06-12,2016-06-16,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,California,90036,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters by Universal,14.62,2,0.0,6.8714
3,4,US-2015-108966,2015-10-11,2015-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775,5,0.45,-383.031
4,5,US-2015-108966,2015-10-11,2015-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368,2,0.2,2.5164
5,6,CA-2014-115812,2014-06-09,2014-06-14,Standard Class,BH-11710,Brosina Hoffman,Consumer,United States,Los Angeles,California,90032,West,FUR-FU-10001487,Furniture,Furnishings,"Eldon Expressions Wood and Plastic Desk Accessories, Cherry Wood",48.86,7,0.0,14.1694


#### Left Join

In this example, we are going to be using two LEFT JOINs. The first LEFT JOIN combines the "orders" and "returns" tables based on the "Order_id" column, which retrieves the return status for each order (if available). The second LEFT JOIN combines the result with the "people" table based on the "Region" column, which retrieves the regional manager for each order's region.


In [63]:
%%sql
SELECT o.Order_id, o.Quantity, o.sales, o.Region,
       r.Returned, p.person AS regional_manager
FROM orders o
LEFT JOIN returns r ON o.Order_id = r.Order_id
LEFT JOIN people p ON o.Region = p.Region
LIMIT 10

Unnamed: 0,order_id,quantity,sales,region,returned,regional_manager
0,CA-2014-143336,2,8.56,West,Yes,Anna Andreadi
1,CA-2014-143336,3,213.48,West,Yes,Anna Andreadi
2,CA-2014-143336,4,22.72,West,Yes,Anna Andreadi
3,CA-2016-111682,6,208.56,East,Yes,Chuck Magee
4,CA-2016-111682,5,32.4,East,Yes,Chuck Magee
5,CA-2016-111682,5,319.41,East,Yes,Chuck Magee
6,CA-2016-111682,2,14.56,East,Yes,Chuck Magee
7,CA-2016-111682,2,30.0,East,Yes,Chuck Magee
8,CA-2016-111682,4,48.48,East,Yes,Chuck Magee
9,CA-2016-111682,1,1.68,East,Yes,Chuck Magee


As you can see, the LEFT JOINs combine the "orders" and "returns" tables to show the return status for each order and then combine the result with the "people" table to display the regional manager for each order's region. LEFT JOINs ensure that all orders are included in the result, and additional information from the related tables is brought in where available. This provides a comprehensive view of orders along with return status and regional manager details.

#### Right Join

Now, let's perform a RIGHT JOIN to retrieve all return records along with their corresponding order details (if available) and the associated regional manager:

In [64]:
%%sql
SELECT p.Region, p.person AS regional_manager, o.Order_id, o.Quantity, r.Returned, o.sales
FROM people p
LEFT JOIN orders o ON p.Region = o.Region
LEFT JOIN returns r ON o.Order_id = r.Order_id
LIMIT 10

Unnamed: 0,region,regional_manager,order_id,quantity,returned,sales
0,West,Anna Andreadi,CA-2014-143336,2,Yes,8.56
1,West,Anna Andreadi,CA-2014-143336,3,Yes,213.48
2,West,Anna Andreadi,CA-2014-143336,4,Yes,22.72
3,East,Chuck Magee,CA-2016-111682,6,Yes,208.56
4,East,Chuck Magee,CA-2016-111682,5,Yes,32.4
5,East,Chuck Magee,CA-2016-111682,5,Yes,319.41
6,East,Chuck Magee,CA-2016-111682,2,Yes,14.56
7,East,Chuck Magee,CA-2016-111682,2,Yes,30.0
8,East,Chuck Magee,CA-2016-111682,4,Yes,48.48
9,East,Chuck Magee,CA-2016-111682,1,Yes,1.68


In this example, we are performing a RIGHT JOIN with the "returns" table as the left table. We then use a RIGHT JOIN with the "orders" table based on the "Order_id" column to retrieve order details for each return (if available). Lastly, we use a LEFT JOIN with the "people" table based on the "Region" column to retrieve the regional manager for each order's region.



#### Full Join

In [65]:
%%sql
SELECT o.order_id, o.customer_name, o.region AS order_region, p.region AS people_region, p.person AS regional_manager
FROM orders_df AS o
FULL JOIN people_df AS p ON o.region = p.region
LIMIT 15

Unnamed: 0,order_id,customer_name,order_region,people_region,regional_manager
0,CA-2016-152156,Claire Gute,South,South,Cassandra Brandow
1,CA-2016-152156,Claire Gute,South,South,Cassandra Brandow
2,CA-2016-138688,Darrin Van Huff,West,West,Anna Andreadi
3,US-2015-108966,Sean O'Donnell,South,South,Cassandra Brandow
4,US-2015-108966,Sean O'Donnell,South,South,Cassandra Brandow
5,CA-2014-115812,Brosina Hoffman,West,West,Anna Andreadi
6,CA-2014-115812,Brosina Hoffman,West,West,Anna Andreadi
7,CA-2014-115812,Brosina Hoffman,West,West,Anna Andreadi
8,CA-2014-115812,Brosina Hoffman,West,West,Anna Andreadi
9,CA-2014-115812,Brosina Hoffman,West,West,Anna Andreadi
