# Introduction to Joins

- **Definition**: A join is a T-SQL operation that combines rows from two or more tables based on a related column.
- **Purpose**: Joins are used to retrieve data from multiple tables in a single query.

In [1]:
select * from TSQLV6.Sales.Orders

orderid,custid,empid,orderdate,requireddate,shippeddate,shipperid,freight,shipname,shipaddress,shipcity,shipregion,shippostalcode,shipcountry
10248,85,5,2020-07-04,2020-08-01,2020-07-16,3,32.38,Ship to 85-B,6789 rue de l'Abbaye,Reims,,10345,France
10249,79,6,2020-07-05,2020-08-16,2020-07-10,1,11.61,Ship to 79-C,Luisenstr. 9012,Münster,,10328,Germany
10250,34,4,2020-07-08,2020-08-05,2020-07-12,2,65.83,Destination SCQXA,"Rua do Paço, 7890",Rio de Janeiro,RJ,10195,Brazil
10251,84,3,2020-07-08,2020-08-05,2020-07-15,1,41.34,Ship to 84-A,"3456, rue du Commerce",Lyon,,10342,France
10252,76,4,2020-07-09,2020-08-06,2020-07-11,2,51.3,Ship to 76-B,"Boulevard Tirou, 9012",Charleroi,,10318,Belgium
10253,34,3,2020-07-10,2020-07-24,2020-07-16,2,58.17,Destination JPAIY,"Rua do Paço, 8901",Rio de Janeiro,RJ,10196,Brazil
10254,14,5,2020-07-11,2020-08-08,2020-07-23,2,22.98,Destination YUJRD,Hauptstr. 1234,Bern,,10139,Switzerland
10255,68,9,2020-07-12,2020-08-09,2020-07-15,3,148.33,Ship to 68-A,Starenweg 6789,Genève,,10294,Switzerland
10256,88,3,2020-07-15,2020-08-12,2020-07-17,2,13.97,Ship to 88-B,"Rua do Mercado, 5678",Resende,SP,10354,Brazil
10257,35,4,2020-07-16,2020-08-13,2020-07-22,3,81.91,Destination JYDLM,Carrera1234 con Ave. Carlos Soublette #8-35,San Cristóbal,Táchira,10199,Venezuela


## Inner Join

Combines rows from two tables where there is a match in the columns specified in the join condition.

- The most commonly used join.
- Retrieves rows from both tables where the join condition is met (i.e., where there is a match between the tables).
- If no match is found, the row is excluded from the result set.


In [None]:
-- SQL-92 Syntax

SELECT E.empid, E.firstname, E.lastname, O.orderid
FROM TSQLV6.HR.Employees AS E
    INNER JOIN TSQLV6.Sales.Orders AS O
    ON E.empid = O.empid
WHERE E.empid = 1 and YEAR(O.orderdate) = 2020 and o.custid = 71;


In [7]:
-- SQL-89 Syntax

SELECT E.empid, E.firstname, E.lastname, O.orderid
FROM TSQLV6.HR.Employees AS E, TSQLV6.Sales.Orders AS O
WHERE E.empid = O.empid;

empid,firstname,lastname,orderid
1,Sara,Davis,10258
1,Sara,Davis,10270
1,Sara,Davis,10275
1,Sara,Davis,10285
1,Sara,Davis,10292
1,Sara,Davis,10293
1,Sara,Davis,10304
1,Sara,Davis,10306
1,Sara,Davis,10311
1,Sara,Davis,10314


## Composite Join

A _composite join_ is simply a join for which you need to match multiple attributes from each side.

Syntax:

            SELECT *
            FROM dbo.Table1 AS T1
            INNER JOIN dbo.Table2 AS T2
                ON T1.col1 = T2.col1
                AND T1.col2 = T2.col2

In [8]:
SELECT E.empid, E.firstname, E.lastname, O.orderid, E.country, O.shipcountry
FROM TSQLV6.HR.Employees AS E
    INNER JOIN TSQLV6.Sales.Orders AS O
    ON E.empid = O.empid
    AND E.country = O.shipcountry;

empid,firstname,lastname,orderid,country,shipcountry
8,Maria,Cameron,10262,USA,USA
7,Russell,King,10289,UK,UK
4,Yael,Peled,10294,USA,USA
8,Maria,Cameron,10305,USA,USA
2,Don,Funk,10307,USA,USA
8,Maria,Cameron,10310,USA,USA
1,Sara,Davis,10314,USA,USA
1,Sara,Davis,10316,USA,USA
4,Yael,Peled,10329,USA,USA
4,Yael,Peled,10338,USA,USA


## Self Cross Join

- You can join multiple instances of the same table. 

- This capability is known as a _Self Join_ and is supported with all fundamental join types (cross joins, inner joins, and outer joins).

In [None]:
-- SQL-92 Syntax : Croos Join does not require a condition.

SELECT
 E1.empid, E1.firstname, E1.lastname,
 E2.empid, E2.firstname, E2.lastname
FROM TSQLV6.HR.Employees AS E1
    CROSS JOIN TSQLV6.HR.Employees AS E2;

In [None]:
-- SQL-92 Syntax : Croos Join does not require a condition.

SELECT C.custid, E.empid
FROM TSQLV6.Sales.Customers AS C
  CROSS JOIN TSQLV6.HR.Employees AS E;

In [None]:
-- SQL-89 Syntax

SELECT C.custid, E.empid
FROM TSQLV6.Sales.Customers AS C, TSQLV6.HR.Employees AS E;

## How to use the Auxiliary Table  

- Table: **dbo.Nums**  

- Example: Generate 5 copies out of each employee row.

In [12]:
-- Using Auxiliary table:  TSQLV6.dbo.Nums

USE TSQLV6;

SELECT E.empid, E.firstname, E.lastname, N.n
FROM TSQLV6.HR.Employees AS E
    CROSS JOIN dbo.Nums AS N 
WHERE N.n <= 5
ORDER BY empid, n;

empid,firstname,lastname,n
1,Sara,Davis,1
1,Sara,Davis,2
1,Sara,Davis,3
1,Sara,Davis,4
1,Sara,Davis,5
2,Don,Funk,1
2,Don,Funk,2
2,Don,Funk,3
2,Don,Funk,4
2,Don,Funk,5


## Non-equi Join

- When a **Join condition** involves <u>only an equality operator</u>, the Join is said to be an _equi join_. 

- When a **Join condition** involves any <u>operator besides equality</u>, the Join is said to be a _non-equi join_.

In [13]:
SELECT
 E1.empid, E1.firstname, E1.lastname,
 E2.empid, E2.firstname, E2.lastname
FROM TSQLV6.HR.Employees AS E1
    INNER JOIN TSQLV6.HR.Employees AS E2
    ON E1.empid < E2.empid;

empid,firstname,lastname,empid.1,firstname.1,lastname.1
1,Sara,Davis,2,Don,Funk
1,Sara,Davis,3,Judy,Lew
2,Don,Funk,3,Judy,Lew
1,Sara,Davis,4,Yael,Peled
2,Don,Funk,4,Yael,Peled
3,Judy,Lew,4,Yael,Peled
1,Sara,Davis,5,Sven,Mortensen
2,Don,Funk,5,Sven,Mortensen
3,Judy,Lew,5,Sven,Mortensen
4,Yael,Peled,5,Sven,Mortensen


## Multi-join

A join table operator operates only on two tables, but a single query can have multiple joins. 

In general, when more than one table operator appears in the FROM clause, the table operators are logically processed in written order. 

That is, the result table of the first table operator is treated as the left input to the second table operator; the result of the second table operator is treated as the left input to the third table operator; and so on.

In [14]:
SELECT
 C.custid, C.companyname, O.orderid, OD.productid, OD.qty
FROM TSQLV6.Sales.Customers AS C
    INNER JOIN TSQLV6.Sales.Orders AS O
        ON C.custid = O.custid
    INNER JOIN TSQLV6.Sales.OrderDetails AS OD
        ON O.orderid = OD.orderid;

custid,companyname,orderid,productid,qty
85,Customer ENQZT,10248,11,12
85,Customer ENQZT,10248,42,10
85,Customer ENQZT,10248,72,5
79,Customer FAPSM,10249,14,9
79,Customer FAPSM,10249,51,40
34,Customer IBVRG,10250,41,10
34,Customer IBVRG,10250,51,35
34,Customer IBVRG,10250,65,15
84,Customer NRCSK,10251,22,6
84,Customer NRCSK,10251,57,15


## Left (Outer) Join

- Retrieves **all rows from the left table**, and matching rows from the right table. 

- If no match is found, **NULL** values are returned for columns <u>from the right table</u>.

In [17]:
-- Not all Customers have an Order

SELECT C.custid, C.companyname, O.orderid
FROM TSQLV6.Sales.Customers AS C
    LEFT OUTER JOIN TSQLV6.Sales.Orders AS O
    ON C.custid = O.custid;

custid,companyname,orderid
1,Customer NRZBB,10643.0
1,Customer NRZBB,10692.0
1,Customer NRZBB,10702.0
1,Customer NRZBB,10835.0
1,Customer NRZBB,10952.0
1,Customer NRZBB,11011.0
2,Customer MLTDN,10308.0
2,Customer MLTDN,10625.0
2,Customer MLTDN,10759.0
2,Customer MLTDN,10926.0


In [18]:
-- Filter only the rows in which one of the attributes
-- on the nonpreserved side of the join is NULL

SELECT C.custid, C.companyname
FROM TSQLV6.Sales.Customers AS C
    LEFT OUTER JOIN TSQLV6.Sales.Orders AS O
    ON C.custid = O.custid
WHERE O.orderid IS NULL;

custid,companyname
22,Customer DTDMN
57,Customer WVAXS


## Right (Outer) Join
  

- Retrieves **all rows from the right table**, and matching rows from the left table.

- If no match is found, **NULL** values are returned for columns <u>from the left table</u>.

In [16]:
-- All Orders have a Customer

SELECT C.custid, C.companyname, O.orderid
FROM TSQLV6.Sales.Customers AS C
    RIGHT OUTER JOIN TSQLV6.Sales.Orders AS O
    ON C.custid = O.custid;

custid,companyname,orderid
1,Customer NRZBB,10643
1,Customer NRZBB,10692
1,Customer NRZBB,10702
1,Customer NRZBB,10835
1,Customer NRZBB,10952
1,Customer NRZBB,11011
2,Customer MLTDN,10308
2,Customer MLTDN,10625
2,Customer MLTDN,10759
2,Customer MLTDN,10926


## Full Outer Join

- Retrieves rows **there is a match** in one of the tables.

- If there is no match, the result will have **NULL** values in the columns <u>from the table with no match</u>.

In [19]:
SELECT C.custid, C.companyname, O.orderid
FROM TSQLV6.Sales.Customers AS C
    FULL OUTER JOIN TSQLV6.Sales.Orders AS O
    ON C.custid = O.custid;

custid,companyname,orderid
1,Customer NRZBB,10643.0
1,Customer NRZBB,10692.0
1,Customer NRZBB,10702.0
1,Customer NRZBB,10835.0
1,Customer NRZBB,10952.0
1,Customer NRZBB,11011.0
2,Customer MLTDN,10308.0
2,Customer MLTDN,10625.0
2,Customer MLTDN,10759.0
2,Customer MLTDN,10926.0


## Outer Joins in Multi-Join Query

In [20]:
SELECT C.custid, O.orderid, OD.productid, OD.qty
FROM Sales.Customers AS C
    LEFT OUTER JOIN Sales.Orders AS O
        ON C.custid = O.custid
    INNER JOIN Sales.OrderDetails AS OD
        ON O.orderid = OD.orderid;

custid,orderid,productid,qty
85,10248,11,12
85,10248,42,10
85,10248,72,5
79,10249,14,9
79,10249,51,40
34,10250,41,10
34,10250,51,35
34,10250,65,15
84,10251,22,6
84,10251,57,15


In [21]:
SELECT C.custid, O.orderid, OD.productid, OD.qty
FROM Sales.Customers AS C
    LEFT OUTER JOIN Sales.Orders AS O
        ON C.custid = O.custid
    LEFT OUTER JOIN Sales.OrderDetails AS OD
        ON O.orderid = OD.orderid;

custid,orderid,productid,qty
85,10248.0,11.0,12.0
85,10248.0,42.0,10.0
85,10248.0,72.0,5.0
79,10249.0,14.0,9.0
79,10249.0,51.0,40.0
34,10250.0,41.0,10.0
34,10250.0,51.0,35.0
34,10250.0,65.0,15.0
84,10251.0,22.0,6.0
84,10251.0,57.0,15.0


In [25]:
SELECT C.custid, O.orderid, OD.productid, OD.qty
FROM Sales.Orders AS O
    INNER JOIN Sales.OrderDetails AS OD
        ON O.orderid = OD.orderid
    RIGHT OUTER JOIN Sales.Customers AS C
        ON O.custid = C.custid;

custid,orderid,productid,qty
85,10248.0,11.0,12.0
85,10248.0,42.0,10.0
85,10248.0,72.0,5.0
79,10249.0,14.0,9.0
79,10249.0,51.0,40.0
34,10250.0,41.0,10.0
34,10250.0,51.0,35.0
34,10250.0,65.0,15.0
84,10251.0,22.0,6.0
84,10251.0,57.0,15.0


In [26]:
SELECT C.custid, O.orderid, OD.productid, OD.qty
FROM Sales.Customers AS C
    LEFT OUTER JOIN (Sales.Orders AS O
    INNER JOIN Sales.OrderDetails AS OD
        ON O.orderid = OD.orderid)
ON C.custid = O.custid;

custid,orderid,productid,qty
85,10248.0,11.0,12.0
85,10248.0,42.0,10.0
85,10248.0,72.0,5.0
79,10249.0,14.0,9.0
79,10249.0,51.0,40.0
34,10250.0,41.0,10.0
34,10250.0,51.0,35.0
34,10250.0,65.0,15.0
84,10251.0,22.0,6.0
84,10251.0,57.0,15.0


## Using the COUNT aggregate with outer joins

In [27]:
-- COUNT(*)

SELECT C.custid, COUNT(*) AS numorders
FROM Sales.Customers AS C
    LEFT OUTER JOIN Sales.Orders AS O
        ON C.custid = O.custid
GROUP BY C.custid;

custid,numorders
1,6
2,4
3,7
4,13
5,18
6,7
7,11
8,3
9,17
10,14


In [21]:
-- COUNT(O.orderid)

SELECT C.custid, COUNT(O.orderid) AS numorders
FROM Sales.Customers AS C
    LEFT OUTER JOIN Sales.Orders AS O
        ON C.custid = O.custid
GROUP BY C.custid;

custid,numorders
1,6
2,4
3,7
4,13
5,18
6,7
7,11
8,3
9,17
10,14


## Filtering attributes from the nonpreserved side of an outer join

In [20]:
SELECT C.custid, C.companyname, O.orderid, O.orderdate
FROM Sales.Customers AS C
    LEFT OUTER JOIN Sales.Orders AS O
        ON C.custid = O.custid
WHERE O.orderdate >= '20220101';

custid,companyname,orderid,orderdate
1,Customer NRZBB,10835,2022-01-15
1,Customer NRZBB,10952,2022-03-16
1,Customer NRZBB,11011,2022-04-09
2,Customer MLTDN,10926,2022-03-04
3,Customer KBUDE,10856,2022-01-28
4,Customer HFBZG,10864,2022-02-02
4,Customer HFBZG,10953,2022-03-16
4,Customer HFBZG,10920,2022-03-03
4,Customer HFBZG,11016,2022-04-10
5,Customer HGVLZ,10924,2022-03-04


## Including missing values

You can use outer joins to identify and include missing values when querying data. 

- For example, suppose you need to query all orders from the Orders table in the TSQLV6 database. You need to ensure that you get at least one row in the output for each date in the range January 1, 2020 through December 31, 2022.

In [19]:
SELECT 
    DATEADD(day, Nums.n - 1, 
    CAST('20200101' AS DATE)) AS orderdate,
    O.orderid, 
    O.custid, 
    O.empid
FROM dbo.Nums
    LEFT OUTER JOIN Sales.Orders AS O
        ON DATEADD(day, Nums.n - 1, CAST('20200101' AS DATE)) = O.orderdate
WHERE Nums.n <= DATEDIFF(day, '20200101', '20221231') + 1
ORDER BY orderdate;

orderdate,orderid,custid,empid
2020-01-01,,,
2020-01-02,,,
2020-01-03,,,
2020-01-04,,,
2020-01-05,,,
2020-01-06,,,
2020-01-07,,,
2020-01-08,,,
2020-01-09,,,
2020-01-10,,,
