# Q1 

## Explain the difference between the `UNION ALL` and `UNION` operators. In what cases are they equivalent? When they are equivalent, which one should you use?

# A1

## Differences
    
### 1. Duplicate Handling

- `UNION`: <u>Removes duplicate rows</u> from the final result set.<br><u>This involves a sorting and comparison process to identify and remove duplicates</u>, which can add computational overhead.
- `UNION ALL`: Includes <u>all rows from the combined `SELECT` statements, **including duplicates**</u>.<br>It <u>does not perform duplicate removal</u>, which makes it generally <u>faster</u>.

### 2. Performance

- Since `UNION` has to <u>sort and remove duplicates</u>, it tends to be <u>slower</u> than `UNION ALL`, especially for large result sets.
- `UNION ALL` has <u>better performance</u> because it <u>skips the step of duplicate removal</u>, and it is more efficient for large data sets.

## Cases Where They Are Equivalent

`UNION` and `UNION ALL` are equivalent if there are <u>no duplicate rows in the result sets of the combined `SELECT` statements</u>.<br>In this case, both operators will produce <u>identical output</u>.
- For example, if <u>each `SELECT` statement generates rows that are **guaranteed to be unique across the entire result set**</u> (perhaps due to a `DISTINCT` clause or unique keys in the underlying tables), then there will be <u>no duplicates to remove</u>.

## Which One to Use?

- If you know that there will be <u>no duplicate rows</u> (either because of constraints, or just due to the nature of the data), then <u>`UNION ALL`</u> is the better choice due to its performance advantage.
- If you need to <u>remove duplicates</u>, from the <u>combined result</u>, use `UNION`.

In Summary, use `UNION ALL` wherever possible for better performance unless the removal of duplicates is required.

# Q2
 Write a query that returns customer and employee pairs that had order activity in January 2016 but not in February 2016

 Tables Involved: Orders table

 Desired Output:

 ```
 custid      empid
----------- -----------
1           1
3           3
5           8
5           9
6           9
7           6
9           1
12          2
16          7
17          1
20          7
24          8
25          1
26          3
32          4
38          9
39          3
40          2
41          2
42          2
44          8
47          3
47          4
47          8
49          7
55          2
55          3
56          6
59          8
63          8
64          9
65          3
65          8
66          5
67          5
70          3
71          2
75          1
76          2
76          5
80          1
81          1
81          3
81          4
82          6
84          1
84          3
84          4
88          7
89          4

(50 row(s) affected)

 ```


# A2

In [None]:
USE TSQLV4;

SELECT custid, empid
FROM Sales.Orders
WHERE orderdate >= '20160101' AND orderdate < '20160201'
EXCEPT
SELECT custid, empid
FROM Sales.Orders
WHERE orderdate >= '20160201' AND orderdate < '20160301';

USE Northwinds2022TSQLV7;
SELECT CustomerId, EmployeeId
FROM Sales.[Order]
WHERE OrderDate >= '20160101' AND OrderDate < '20160201'
EXCEPT
SELECT CustomerId, EmployeeId
FROM Sales.[Order]
WHERE OrderDate >= '20160201' AND OrderDate < '20160301';