# SELECT ... FROM ... WHERE ... GROUP BY ... HAVING ... ORDER BY [LIMIT]

The following

```sql
SELECT
FROM 
WHERE
GROUP BY 
HAVING 
ORDER BY
```

is logically processed in the following order

```sql
FROM
WHERE
GROUP BY
HAVING 
SELECT
ORDER BY
```

In SQL Server, aliases created in SELECT can be used only in ORDER BY.

If GROUP BY is used, then all phases after GROUP BY - HAVING, SELECT, and ORDER BY - must work on groups, not individual rows.

Elements that do not participate in GROUP BY should be used with aggregate functions. All aggregate functions except for `COUNT(*)` ignore NULL values.

```sql
# Assume col contains 10,20,NULL,10,20.
COUNT(*)              # 5
COUNT(col)            # 4
COUNT(DISTINCT col)   # 2
```




## SELECT [DISTINCT]

```sql
SELECT DISTINCT col FROM ...
```

In the example above, if col has NULL values, MySQL keeps only one NULL value because DISTINCT treats all NULL values as the same value.


### SELECT ... INTO

```sql
SELECT expr INTO a_variable FROM tbl;
```

In the following, the structure of new_tbl is based on tbl:

```sql
DROP TABLE IF EXISTS new_tbl;
SELECT a, b, c INTO new_tbl FROM tbl;
```

In the following, new_tbl is created by the result of an EXCEPT operation:

```sql
SELECT a,b,c
INTO new_tbl
FROM tbl1
EXCEPT
SELECT a,b,c
FROM tbl2
```


### SELECT TOP (SQL Server)

```sql
SELECT TOP n [PERCENT] [WITH TIES] column(s)
FROM ... ORDER BY ...
```

The WITH TIES allows you to return more rows with values that match the last row.


### OFFSET .. FETCH (SQL Server)

A query that uses OFFSET-FETCH must have an ORDER BY clause.

```sql
SELECT ... FROM ... ORDER BY ...

OFFSET 10 ROWS;                             # skip the first 10 

OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY;     # select 11 to 20 rows

OFFSET 0 ROWS FETCH FIRST 10 ROWS ONLY;     # select the top 10
```

FIRST and NEXT are interchangeable.


## WHERE

```sql
WHERE x1='a' AND x2 < '2016-05-28'
WHERE (x1='a' AND x2 < '2016-05-28') OR (x1='b' AND x3 IS NULL)
WHERE x IS NULL AND NOT (... OR ...)
WHERE x <> 'a'
WHERE x BETWEEN '2020-01-20' AND '2020-12-03'
WHERE x = (SELECT ...)

WHERE x <> (SELECT ...)     # is wrong when (SELECT ...) returns a set of values.

WHERE x IN (SELECT ...)
WHERE x IN ('apple','orange')
WHERE x NOT IN (...)
WHERE x <> ALL (...)

WHERE name COLLATE Latin1_General_CS_AS = N'John'

WHERE x LIKE N'A%';
# N stands for National, used for a unicode data type.
# Here x is NCHAR or NVARCHAR type.

WHERE x LIKE '_a%' or x LIKE '%a'      # % is any number (>= 0) of characters and _ is one character.
WHERE x LIKE '(___)___-____'
WHERE x LIKE '[AB%]'                   # the first character is A or B
WHERE x LIKE '[A-C]%'                  # the first character is A, B, or C.
WHERE x LIKE '[^A-C]%'                 # the first character is not A, B, or C.
WHERE x LIKE '%20\%%' ESCAPE '\';      # find any row containing 20%

WHERE EXISTS (SELECT 1 ...)
WHERE NOT EXISTS (SELECT 1 ...)

WHERE x REGEXP '(.{3}).{3}-.{4}'
```


* Find a row corresponding to the maximum of a column:

```sql
SELECT ... FROM tbl
WHERE col = (SELECT MAX(col) FROM tbl)
```

* Find rows whose values of a column are greater than the average value of the column:

```sql
SELECT ... FROM tbl
WHERE col > (SELECT AVG(col) FROM tbl)
```



## GROUP BY


A GROUP BY clause without using an aggregate function is like the DISTINCT clause.

The following are equal:

```sql
SELECT col FROM tbl GROUP BY col;
SELECT DISTINCT col FROM tbl;
```

We can use an alias in the GROUP BY clause in MySQL, but not in SQL Server.

```sql
SELECT expr AS e
FROM tbl
GROUP BY e
```


### GROUP BY ... HAVING

HAVING is similar to WHERE, but it applies to groups rather than to single rows.

HAVING can refer to aliases, but WHERE cannot do so (MySQL).



```sql
SELECT x FROM tbl 
GROUP BY x 
HAVING COUNT(y) > 1;

SELECT expr1 AS e1, expr2 AS e2 FROM tbl 
GROUP BY e1
HAVING e2 > 1;
```

In SQL Server, using aliases in HAVING is not allowed.

```sql
SELECT x, expr AS y
FROM tbl
GROUP BY x

HAVING y > 1;       # Error
HAVING expr > 1;    # okay

SELECT x, MAX(y) max_y, MIN(y) min_y
FROM tbl
GROUP BY x

HAVING MAX(y) > 10 OR MIN(y) < 3;
HAVING AVG(y) BETWEEN 2 AND 8;

```


### GROUP BY ... WITH ROLLUP, ROLLUP(), GROUPING()

```sql
SELECT * FROM t;
+------+-------+----------+
| name | size  | quantity |
+------+-------+----------+
| ball | small |       10 |
| ball | large |       20 |
| hoop | small |       15 |
| hoop | large |        5 |
| ball | small |        8 |
| ball | large |       18 |
| hoop | small |       14 |
| hoop | large |       25 |
+------+-------+----------+

SELECT name, size, SUM(quantity) total
FROM t
GROUP BY name, size WITH ROLLUP;
+------+-------+-------+
| name | size  | total |
+------+-------+-------+
| ball | large |    38 |
| ball | small |    18 |
| ball | NULL  |    56 |
| hoop | large |    30 |
| hoop | small |    29 |
| hoop | NULL  |    59 |
| NULL | NULL  |   115 |
+------+-------+-------+
```

SQL server:

```sql
SELECT name, size, SUM(quantity) total
FROM t
GROUP BY ROLLUP(name, size);
```

The GROUPING() function returns 1 when NULL occurs in a supper-aggregate row, otherwise, it returns 0.

```sql
SELECT name, size, SUM(quantity) total, GROUPING(name), GROUPING(size)
FROM t
GROUP BY name, size WITH ROLLUP;
+------+-------+-------+----------------+----------------+
| name | size  | total | GROUPING(name) | GROUPING(size) |
+------+-------+-------+----------------+----------------+
| ball | large |    38 |              0 |              0 |
| ball | small |    18 |              0 |              0 |
| ball | NULL  |    56 |              0 |              1 |
| hoop | large |    30 |              0 |              0 |
| hoop | small |    29 |              0 |              0 |
| hoop | NULL  |    59 |              0 |              1 |
| NULL | NULL  |   115 |              1 |              1 |
+------+-------+-------+----------------+----------------+
```

```sql
SELECT IF(GROUPING(name),'all_names',name) name, IF(GROUPING(size),'all_sizes',size) size, SUM(quantity) total
FROM t
GROUP BY name, size WITH ROLLUP;
+-----------+-----------+-------+
| name      | size      | total |
+-----------+-----------+-------+
| ball      | large     |    38 |
| ball      | small     |    18 |
| ball      | all_sizes |    56 |
| hoop      | large     |    30 |
| hoop      | small     |    29 |
| hoop      | all_sizes |    59 |
| all_names | all_sizes |   115 |
+-----------+-----------+-------+
```

### GROUP BY GROUPING SETS(), GROUP BY CUBE()

The following

```sql
SELECT x, y, SUM(z) AS sum_z
FROM tbl
GROUP BY GROUPING SETS( (x,y),(x),(y),());
```

or

```sql
SELECT x, y, SUM(z) AS sum_z
FROM tbl
GROUP BY CUTE(x,y);
```

is equal to 

```sql
SELECT x, y, SUM(z) AS sum_z
FROM tbl
GROUP BY x, y
  UNION ALL
SELECT x, NULL,SUM(z)
FROM tbl
GROUP BY x
  UNION ALL
SELECT NULL, y, SUM(z)
FROM tbl
GROUP BY y
  UNION ALL
SELECT NULL,NULL,SUM(z)
FROM tbl
```

but more efficient.


## ORDER BY [FIELD()]

```sql
ORDER BY x, y;

ORDER BY 2, 7;    # index of column

ORDER BY x DESC, y;

ORDER BY LEN(x);

ORDER BY LEFT(x, 3);

ORDER BY RAND();   # Rearrange rows randomly

ORDER BY FIELD(col_name, val1, val2, val3);
# If x is a value in the column col_name, FIELD() returns 1 if x is val1, 2 if x is val2, 3 if x is val3, 0 otherwise.
```


### Natural sorting

The values of a column (named x) in a table are '1A', '2Bk', '1Wy', '3B', '10Cr', '2Ar', '3Bu'.
Sort the rows of the table by the column naturally:

```sql
SELECT ...
FROM ...
ORDER BY x+0, x;
```


## Derived table, Subquery

The following raises an error if an alias is not used:

```sql
SELECT ... 
FROM (SELECT ... FROM ...) AS t;
```

With ANY or ALL:

```sql
SELECT x, y FROM tbl
WHERE y >= ANY( SELECT AVG(y) FROM tbl GROUP BY z )
```

If a subquery uses the data from its outer query, the subquery is evaluated once for each row in the outer query.

```sql
SELECT x FROM t AS t1
WHERE x > (SELECT AVG(x) FROM t WHERE y = t1.y) 
```

A subquery can be used as a column expression:

```sql
SELECT x, (SELECT MAX(y) FROM t2 WHERE t2.id = t1.id) AS max_y
FROM  t1
```

# INNER JOIN, LEFT JOIN, RIGHT JOIN, APPLY

In MySQL, JOIN, CROSS JOIN, and INNER JOIN (without using ON or USING) are equivalent.


```sql
SELECT t1.x, t2.y
FROM tbl1 AS t1 INNER JOIN tbl2 AS t2
ON t1.a = t2.b         # or USING (col_name) if column names are equal
WHERE ...
```

You can use other operators in ON:

```sql
...
ON t1.col1 = t2.col2 AND t1.col3 > t2.col4 
```


Be careful when you use a condition in a ON clause and in a WHERE clause. The following joins two tables by a left join and then select the rows whose id are 100.

```sql
SELECT t1.id, t2.amount
FROM t1 LEFT JOIN t2 ON t1.id = t2.id 
WHERE t1.id = 100  
```

Meawhile, the following shows all rows of t1. If t1.id is not 100, then the row is of the form (id, NULL). Why? If t1.id is not 100, then the row does not satisfy the two conditions: t1.id = t2.id AND t1.id = 100. But since we use a left join, the row appears on the result and the values belong to t2 will be NULL. 

```sql
SELECT t1.id, t2.amount
FROM t1 LEFT JOIN t2 ON t1.id = t2.id AND t1.id = 100
```

## CROSS JOIN (in SQL Server)

The CROSS JOIN returns a Cartesian product of rows from both tables. Unlike the INNER JOIN or LEFT JOIN, the cross join does not establish a relationship between the joined tables.

```sql
SELECT ... FROM t1 CROSS JOIN t2;
```

Assume tbl has column col whose values are 0,1,..,9. The following will generates integers from 1 to 1000:

```sql
SELECT t3.col*100 + t2.col*10 + t1.col + 1 AS num
FROM tbl AS t1 CROSS JOIN tbl AS t2 CROSS JOIN tbl AS t3
ORDER BY num;
```

## Example: Use a self-join

Let col1 and col2 be two columns of tbl. We want to see the distinct rows of the form (val1, val2_1, val2_2), where val1 is a value in col1 and val2_1 and val2_2 are values in col2.

```sql
SELECT t1.col1, t1.col2, t2.col2
FROM tbl t1 INNER JOIN tbl t2
ON t1.col1 = t2.col1 AND t1.col1 > t2.col2
[ORDER BY t1.col1, t1.col2, t2.col2];
```

## Example: Use all types of join

Consider the following data:

| table | columns |
| --- | --- |
| t1 | id, c1 |
| t2 | id, c2 |
| t3 | id1, id2, c3 |

where id1 and id2 in t3 are foreign keys referncing t1.id and t2.id, respectively.


The following may not show all combinations of (c1, c2), since we use inner joins.

```sql
SELECT c1, c2, SUM(c3) AS total
FROM t3 
    INNER JOIN t1 ON t1.id = t3.id1
    INNER JOIN t2 ON t2.id = t3.id2
GROUP BY c1, c2;
```

Suppose we want to see all combinations of (c1, c2). If a pair of (c1, c2) does not exist in the ifnner-joinned table, the value of total in the row will be set to 0.

Step1: Make all combinations of t1 and t2 using the cross join:

```sql
SELECT c1, c2, t1.id, t2.id                 # or SELECT *
FROM t1 JOIN t2 # or use CROSS JOIN
```

Step2: Make the above inner-joinned table.

```sql
SELECT c1, c2, SUM(c3) AS total, t1.id, t2.id 
FROM t3 
    INNER JOIN t1 ON t1.id = t3.id1
    INNER JOIN t2 ON t2.id = t3.id2
GROUP BY c1, c2;
```

Step3: Join the two tables created in Step1 and Step2 by LEFT JOIN on id. Note that we need to modify the column total properly.

```sql
SELECT t1.c1, t2.c2, IFNULL(t4.total, 0) AS total 
FROM t1 JOIN t2 LEFT JOIN
    (SELECT c1, c2, SUM(c3) AS total, t1.id AS t1_id, t2.id AS t2_id
     FROM t3 INNER JOIN t1 ON t1.id = t3.id1 INNER JOIN t2 ON t2.id = t3.id2
     GROUP BY c1, c2) AS t4 ON t4.t1_id = t1.id AND t4.t2_id = t2.id
[ORDER BY t1.c1, t2.c2];
```

## APPLY

APPLY operators on two input tables - left and right. The right table is usually a derived table or a TVF (table-valued function).

`CROSS APPLY` is similar to CROSS JOIN, but in `tbl1 CROSS APPLY tbl2` tbl1 is evaluated first and tbl2 is evaluated per fro from tbl1. 

```sql
SELECT ...
FROM tbl1 CROSS APPLY 
  (SELECT TOP 3 ...
   FROM tbl2
   WHERE tbl2.id = tbl1.id
   ORDER BY ...);
```

If the right table expression returns an empty set, the CROSS APPLY does not return the corresponding left row. `OUTER APPLY` will return rows from the left table in that case. It is similar to `LEFT OUTER JOIN`. 

# UNION, INTERSECT, MINUS, EXCEPT


## UNION

UNION [DISTINCE] removes duplicate rows, but UNION ALL keeps all rows.

```sql
SELECT ... UNION [ALL | DISTINCT] SELECT ...;

expr_1 UNION expr_2
```
where if `expr_i` is of the form `SELECT ... FROM t` or `TABLE t`.


In the example below, the colum names are u.x1 and u.y1, not u.x2 and u.y2.

```
SELECT u.x1, u.y1
FROM (SELECT x1, y1 FROM tbl1)
  UNION ALL
     (SELECT x2, y2 FROM tbl2)
) u;
```

## INTERSECT

MySQL does not support INTERSECT, but we can implement it easily.

```sql
(SELECT c1 FROM t1) INTERSECT (SELECT c2 FROM t2);

# is equivalent either to
SELECT DISTINCT c1 FROM t1 INNER JOIN t2 ON t1.c1 = t2.c2;

# or to
SELECT DISTINCT c1 FROM t1 WHERE c1 IN (SELECT c2 FROM t2);
```

## MINUS

MySQL does not support MINUS, but we can implement it easily.


```sql
(SELECT c1 FROM t1) MINUS (SELECT c2 FROM t2);

# is equivalent to
SELECT c1 FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c2 WHERE t2.c2 IS NULL;

# or to
SELECT c1 FROM t1 WHERE c1 NOT IN (SELECT c2 FROM t2);
```

## EXCEPT

The following returns distinct rows that appear in tbl1 but not tbl2. 

```sql
SELECT ...
FROM tbl1 
EXCEPT 
SELECT ...;
```



# Pivot/Unpivot

## PIVOT()

Assume column x consists of x1, x2, and x3. The following will pivot the table in the FROM clause: 

```sql
SELECT id, x1, x2, x3
FROM (SELECT id, x, y FROM tbl) AS t
PIVOT(SUM(y) FOR x IN(x1, x2, x3)) AS p;
```

## UNPIVOT()

```sql
SELECT id, x, y
FROM tbl_pivotted
UNPIVOT(y FOR x IN(x1, x2, x3)) AS u;
```

# Common queries


## Find the row holding the maximum of a column

```sql
SELECT ... FROM tbl
WHERE x = (SELECT MAX(x) FROM tbl)

SELECT ...
FROM tbl t1 LEFT JOIN tbl t2 ON t1.x < t2.x
WHERE t2.x IS NULL;

SELECT ... FROM tbl
ORDER BY x DESC LIMIT 1

# Using a variable:
SELECT @max_x := MAX(x) FROM tbl;
SELECT ... FROM tbl WHERE x = @max_x;
```

## Find the rows holding the group-wise maximum of a column

For each x, find y with max z.

Using JOIN:

```sql
SELECT t1.x, y, t1.z
FROM tbl t1
JOIN ( SELECT x, MAX(z) as z FROM tbl GROUP BY x ) AS t2
  ON t1.x = t2.x AND t1.z = t2.z;
```

Using LEFT JOIN:

```sql
SELECT t1.x, t1.y, t1.z
FROM tbl t1 LEFT JOIN tbl t2 ON t1.x = t2.x AND t1.z < t2.z
WHERE t2.z IS NULL
```

Using a window:

```sql
WITH t AS (
    SELECT x, y, z, ROW_NUMBER() OVER (PARTITION BY x ORDER BY z DESC) AS row_no
    FROM tbl
)
SELECT x, y, z FROM t WHERE row_no=1;
```

## Sort by a column that contains missing values

x, y: columns of tbl

Order by x and y. If x is NULL, display the row at the bottom.

MS SQL: 

```mssql
SELECT ...
FROM tbl
ORDER BY CASE WHEN x IS NULL THEN 1 ELSE 0 END, x, y
```

## Find rows whose order dates were in 2018

```sql
WHERE YEAR(orderDate) = 2018
```
If we use YEAR() as above, we cannot use index anymore. Try the following:

```sql
WHERE orderDate >= '2018-01-01' AND orderDate < '2019-01-01'
```

Note that if orderDate is a TIMESTAMP not a DATE, the following can be wrong:

```sql
WHERE orderDate >= '2018-01-01' AND orderDate <= '2019-12-31'
```

It is because the above excludes the times > '2019-12-31 00:00:00'.


## Find the top 10% on a column

MS SQL:
```
SELECT TOP 10 percent FROM tbl 
ORDER BY x DESC;
```

## Select 10% randomly

MS SQL:
```
SELECT TOP 10 PERCENT * FROM tbl ORDER BY NEWID();
```

## Compute total number of rows per employeer and the number per employee satisfying some condition.

```
WITH t1 AS (
    SELECT employeeID, totalN = COUNT(*)
    FROM tbl
    GROUP BY employeeName
),
t2 AS (
    SELECT employeeID, totalWithCond = COUNT(*)
    FROM tbl
    WHERE condition(s)
    GROUP BY employeeName
)
SELECT employeeName, e.employeeID, totalN, totalWithCond
FROM employeeTbl e JOIN t1 ON t1.employeeID = e.employeeID 
  JOIN t2 ON t2.employeeID = e.employeeID;
```