### Introduction
While most of the SQL commands below will work in many different SQL databases, all the examples here have been tested on MySQL only. Some statements may be specific to MySQL.  

In general SQL statements can be classified into DDL and DML:
- **DDL:** is Data Definition Language. It includes `ALTER`, `CREATE`, `DROP`, `TRUNCATE` and `RENAME`
- **DML:** stands for Data Manipulation Language. It inclues `SELECT`, `INSERT`, `DELETE`, `UPDATE`, `CALL`, etc

### Querying
**Limiting Result:** The `LIMIT` command has an option to set the offset. The offset of the initial row is 0 (not 1).

In [1]:
# %%
%load_ext sql

# %%
%sql mysql+mysqldb://root:root@localhost/employees

'Connected: root@employees'

In [11]:
%%sql

SELECT * FROM employees e LIMIT 5,5 -- rows 6 to 10

 * mysql+mysqldb://root:***@localhost/employees
5 rows affected.


emp_no,birth_date,first_name,last_name,gender,hire_date
10006,1953-04-20,Anneke,Preusig,F,1989-06-02
10007,1957-05-23,Tzvetan,Zielinski,F,1989-02-10
10008,1958-02-19,Saniya,Kalloufi,M,1994-09-15
10009,1952-04-19,Sumant,Peac,F,1985-02-18
10010,1963-06-01,Duangkaew,Piveteau,F,1989-08-24


Same can be achieved by using the `OFFSET` command

In [12]:
%%sql

SELECT * FROM employees e LIMIT 5 OFFSET 5

 * mysql+mysqldb://root:***@localhost/employees
5 rows affected.


emp_no,birth_date,first_name,last_name,gender,hire_date
10006,1953-04-20,Anneke,Preusig,F,1989-06-02
10007,1957-05-23,Tzvetan,Zielinski,F,1989-02-10
10008,1958-02-19,Saniya,Kalloufi,M,1994-09-15
10009,1952-04-19,Sumant,Peac,F,1985-02-18
10010,1963-06-01,Duangkaew,Piveteau,F,1989-08-24


**DUAL Table** Some select queries may not involve any table column, for example:

In [3]:
%%sql

SELECT 1+1 AS sum

 * mysql+mysqldb://root:***@localhost/employees
1 rows affected.


sum
2


In some database system `SELECT` must be paired with `FROM`, therefore we make use of dummy table DUAL:

In [4]:
%%sql

SELECT 1+1 AS sum FROM dual

 * mysql+mysqldb://root:***@localhost/employees
1 rows affected.


sum
2


**Getting unique result:** `DISTINCT` keyword is not a function, so we can use it without the paranthesis. We can use DISTINCT with one or many columns. If we use DISTINCT with one column, all unique values are picked (including NULL).

In [13]:
%%sql

SELECT DISTINCT title FROM titles

 * mysql+mysqldb://root:***@localhost/employees
7 rows affected.


title
Senior Engineer
Staff
Engineer
Senior Staff
Assistant Engineer
Technique Leader
Manager


Using DISTINCT with multiple columns means the result will contain only those rows where either column is unique.

In [14]:
%%sql

SELECT DISTINCT last_name, first_name FROM employees ORDER BY last_name LIMIT 10

 * mysql+mysqldb://root:***@localhost/employees
10 rows affected.


last_name,first_name
Aamodt,Abdelkader
Aamodt,Adhemar
Aamodt,Aemilian
Aamodt,Alagu
Aamodt,Aleksander
Aamodt,Alexius
Aamodt,Alois
Aamodt,Aluzio
Aamodt,Amabile
Aamodt,Anestis


**WHERE Clause And Alias** alias defined as part of select can only be used in GROUP BY, ORDER BY, or HAVING clauses, not in WHERE clause.

**RANK OVER** assigns a rank to each row within a partition. The statement generally looks like `RANK() OVER (PARTITION BY <expr> ORDER BY <expr>)`. The partition part partitions the table based on a column, and order by sorts it. The rank is then applied based on sorted order. For example, the below query gets us the top 3 salaries per year:

In [9]:
%%sql

SELECT * FROM
(
    SELECT CONCAT(employees.first_name, ' ', employees.last_name) fullname, salaries.salary, YEAR(salaries.to_date) `year`, 
        RANK() OVER (PARTITION BY YEAR(salaries.to_date) ORDER BY salaries.salary DESC) salary_rank
    FROM employees
    INNER JOIN salaries
    ON employees.emp_no = salaries.emp_no
) subquery
WHERE subquery.salary_rank <= 3
ORDER BY subquery.`year`, subquery.salary_rank
LIMIT 10;

 * mysql+mysqldb://root:***@localhost/employees
10 rows affected.


fullname,salary,year,salary_rank
Hauke Bouloucos,96581,1985,1
Licheng Thummel,92293,1985,2
Karoline Businaro,88919,1985,3
Tsutomu Alameldin,123668,1986,1
Tokuyasu Pesch,116058,1986,2
Chirstian Kobara,113013,1986,3
Tsutomu Alameldin,126169,1987,1
Toshimo Reghbati,124357,1987,2
Tokuyasu Pesch,119115,1987,3
Tsutomu Alameldin,129434,1988,1


`RANK` is part of a bigger group called `Window functions`.

**AND, OR Precedence:** `AND` has higher precedence than `OR`. So all AND conditions are evaluated first.

In [15]:
%%sql

SELECT
    dept_name
FROM
    departments
WHERE
    1 = 0
AND
    1 = 0
OR
    1 = 1
AND
    1 = 0
AND
    2 = 2
OR
    1 = 1
AND
    2 = 2;

 * mysql+mysqldb://root:***@localhost/employees
9 rows affected.


dept_name
Customer Service
Development
Finance
Human Resources
Marketing
Production
Quality Management
Research
Sales


Is equivalent to:

In [None]:
%%sql

SELECT
    dept_name
FROM
    departments
WHERE 
    (
        1 = 0
    AND
        1 = 0
    )
OR
    (   
        1 = 1
    AND
        1 = 0
    AND
        2 = 2
    )
OR
    (
        1 = 1
    AND
        2 = 2
    )
;

**BETWEEN:** is inclusive

In [16]:
%%sql

SELECT * FROM employees WHERE emp_no BETWEEN 10019 AND 10024

 * mysql+mysqldb://root:***@localhost/employees
6 rows affected.


emp_no,birth_date,first_name,last_name,gender,hire_date
10019,1953-01-23,Lillian,Haddadi,M,1999-04-30
10020,1952-12-24,Mayuko,Warwick,M,1991-01-26
10021,1960-02-20,Ramzi,Erde,M,1988-02-10
10022,1952-07-08,Shahaf,Famili,M,1995-08-22
10023,1953-09-29,Bojan,Montemayor,F,1989-12-17
10024,1958-09-05,Suzette,Pettey,F,1997-05-19


In [18]:
%%html
<style>
table {float:left}
</style>

**STRING functions:** supported String functions in MySQL (only the most useful ones are listed):  

| Function  | Description                                                            |
|-----------|------------------------------------------------------------------------|
| CONCAT    | Concatenate two or more strings into a single string                   |
| INSTR     | Return the position of the first occurrence of a substring in a string |
| LOWER     | Convert a string to lowercase                                          |
| UPPER     | Convert a string to uppercase                                          |
| LTRIM     | Remove whitespace from left                                            |
| RTRIM     | Remove whitespace from right                                           |
| TRIM      | Remove whitespace from left and right                                  |
| REPLACE   | Search and replace a substring in a string                             |
| SUBSTRING | Extract a substring starting from a position with a specific length.   |  

MySQL does not have string concatenation operator like || or +. It also doesn't have s SPLIT function.

**DATE functions:** supported Date/Time functions in MySQL (only the most useful ones are listed): 

| Function    | Description                                                               |
|-------------|---------------------------------------------------------------------------|
| CURDATE     | Returns the current date.                                                 |
| SYSDATE     | Returns the current date.                                                 |
| NOW         | Returns the current date and time at which the statement executed.        |
| DATEDIFF    | Calculates the number of days between two DATE values.                    |
| TIMEDIFF    | Calculates the difference between two TIME or DATETIME values.            |
| STR_TO_DATE | Converts a string into a date and time value based on a specified format. |
| DATE_FORMAT | Formats a date value based on a specified date format.                    |
| DATE_ADD    | Adds a time value to date value.                                          |

**Counting:** Using `COUNT` aggregation function. There are some flavours of using COUNT. `COUNT(*)` counts the number of rows (whether or not they contain NULL values), whereas `COUNT(column)` counts non-null values only. `COUNT(1)` is the same as `COUNT(*)`.

In [19]:
%%sql

SELECT COUNT(*) FROM employees

 * mysql+mysqldb://root:***@localhost/employees
1 rows affected.


COUNT(*)
300024


In [20]:
%%sql
-- Checking if any employee has no last name
SELECT COUNT(last_name) FROM employees

 * mysql+mysqldb://root:***@localhost/employees
1 rows affected.


COUNT(last_name)
300024


**Grouping Result:** Some important rules about `GROUP BY` clause:
- GROUP BY clauses can contain as many columns as you want. This enables you to nest groups, providing you with more granular control over how data is grouped.
- If you have nested groups in your GROUP BY clause, data is summarized at the last specified group.
- Every column listed in GROUP BY must be a retrieved column or a valid expression (but not an aggregate function). If an expression is used in the SELECT, that same expression must be specified in GROUP BY. Aliases cannot be used.
- Aside from the aggregate calculations statements, every column in your SELECT statement should be present in the GROUP BY clause.
- If the grouping column contains a row with a NULL value, NULL will be returned as a group. If there are multiple rows with NULL values, they’ll all be grouped together.
- The GROUP BY clause must come after any WHERE clause and before any ORDER BY clause.

In [21]:
%%sql

SELECT gender, COUNT(*) FROM employees
GROUP BY gender;

 * mysql+mysqldb://root:***@localhost/employees
2 rows affected.


gender,COUNT(*)
M,179973
F,120051


It is not a requirement to use aggregate function with `GROUP BY`. Consider the table:
```
Col1  Col2  Col3
 A     X     1
 A     Y     2
 A     Y     3
 B     X     0
 B     Y     3
 B     Z     1
```

Writing `SELECT Col1, Col2, Col3 FROM data GROUP BY Col1, Col2, Col3` will result in the same table. This query is similar to `SELECT DISTINCT Col1, Col2, Col3 FROM data`

Writing `SELECT Col1, Col2 FROM data GROUP BY Col1, Col2` would give the result:
```
Col1  Col2
 A     X  
 A     Y  
 B     X  
 B     Y  
 B     Z  
```
The above query is similar to `SELECT DISTINCT Col1, Col2 FROM data`.

Writing, `SELECT Col1, Col2, Col3 FROM data GROUP BY Col1, Col2` would result in:
```
Col1    Col2    Col3
A       X       1
A       Y       2
B       X       0
B       Y       3
B       Z       1
```

The `HAVING` clause works on the grouped data.

**Joins:** The most basic join is a simple cross product of tables. This is also known as `INNER JOIN`

In [25]:
%%sql

SELECT CONCAT(first_name,' ',last_name) FullName, salary
FROM employees e, salaries s
WHERE YEAR(from_date) = '1995'
LIMIT 10

 * mysql+mysqldb://root:***@localhost/employees
10 rows affected.


FullName,salary
Georgi Facello,76884
Bezalel Simmel,76884
Parto Bamford,76884
Chirstian Koblick,76884
Kyoichi Maliniak,76884
Anneke Preusig,76884
Tzvetan Zielinski,76884
Saniya Kalloufi,76884
Sumant Peac,76884
Duangkaew Piveteau,76884


A join like above makes not much sense, we have to explicitly provide JOIN condition

In [26]:
%%sql

SELECT CONCAT(first_name,' ',last_name) FullName, salary
FROM employees e, salaries s
WHERE e.emp_no = s.emp_no
AND YEAR(from_date) = '1995'
LIMIT 10

 * mysql+mysqldb://root:***@localhost/employees
10 rows affected.


FullName,salary
Georgi Facello,76884
Parto Bamford,40006
Chirstian Koblick,60770
Kyoichi Maliniak,88448
Anneke Preusig,47917
Tzvetan Zielinski,68833
Sumant Peac,80944
Mary Sluis,54545
Patricio Bridgland,44195
Eberhardt Terkki,57590


Alternate syntax for above

In [27]:
%%sql

SELECT CONCAT(first_name,' ',last_name) FullName, salary
FROM employees e
INNER JOIN salaries s
ON e.emp_no = s.emp_no
WHERE YEAR(from_date) = '1995'
LIMIT 10

 * mysql+mysqldb://root:***@localhost/employees
10 rows affected.


FullName,salary
Georgi Facello,76884
Parto Bamford,40006
Chirstian Koblick,60770
Kyoichi Maliniak,88448
Anneke Preusig,47917
Tzvetan Zielinski,68833
Sumant Peac,80944
Mary Sluis,54545
Patricio Bridgland,44195
Eberhardt Terkki,57590


**Union of Table:** removes all duplicates when the two select statements are combined. Use `UNION ALL` to keep duplicate rows.

**Full Text Search:** Full-text searches are performed using two functions: `MATCH()` to specify the columns to be searched and `AGAINST()` to specify the search expression to be used. Columns needing FULL TEXT SEARCH feature must be indexed earlier using the `FULLTEXT` keyword.

In [None]:
%%sql

SELECT note_text
FROM productnotes
WHERE MATCH(note_text) AGAINST('rabbit')

Full text search also has a rank feature which returns how well the search term matched our text.

In [None]:
%%sql

SELECT note_text, MATCH(note_text) AGAINST('rabbit') AS rank
FROM productnotes
WHERE rank > 0
ORDER BY rank;

Rank = 0 means that the search term was not found. *Query Expansion* is yet another useful feature which returns result even with rows which do not have exact match for the search term. The steps performed in such case are:
- First, a basic full-text search is performed to find all rows that match the search criteria.
- Next, MySQL examines those matched rows and selects all useful words (we’ll explain how MySQL figures out what is useful and what is not shortly).
- Then, MySQL performs the full-text search again, this time using not just the original criteria, but also all of the useful words.

In [None]:
%%sql

SELECT note_text
FROM productnotes
WHERE Match(note_text) Against('anvils' WITH QUERY EXPANSION);

There is another mode called as *Boolean Mode*.

### Inserting Values
**Select and Insert** we can insert data obtained using select query by using `INSERT INTO ... SELECT` statement

In [None]:
%%sql

INSERT INTO employee
SELECT * FROM dup_employee

In the above query employee and dup_employee tables must have same structure. One problem with the above statement is the possibility of overwriting tuples of employee table.

**Loading Data From Files:** if we have test files with tabulated data we can load into MySQL using `LOAD DATA` statement. The statement accepts a number of configurations to specify:
- Is the file on MySQL server? Or it is present on client system and needs to be transferred to the server
- The structure of file: delimeter, newline character
- Lines to ignore

The below statement copies a CSV file from client (notice the LOCAL keyword) to be loaded into the table. Since it is a CSV file we also define its structure

In [None]:
%%sql

LOAD DATA LOCAL INFILE '/tmp/emp_data.csv'
INTO TABLE employee
FIELDS TERMINATED BY ','
ENCLOSED BY '"' -- For string data inside CSV
LINES TERMINATED BY '\n' -- \r\n on Windows (most likely)
IGNORE 1 LINES; -- Ignore the first CSV header line

**Inserting Multiple Rows:** the maximum number of rows one can insert using `INSERT INTO ... VALUES` depends upon the following variable

In [6]:
%%sql

SHOW VARIABLES LIKE 'max_allowed_packet'

 * mysql+mysqldb://root:***@localhost/employees
1 rows affected.


Variable_name,Value
max_allowed_packet,4194304


The above result suggests 4MB of data can be transferred. So we can insert as many rows as long as the size of insert statement is below 4MB

In [None]:
%%sql

INSERT INTO learning.data VALUES
('C', 'M', 5),
('C', 'M', 2),
('D', 'Z', 2);