# Structured Query Language (SQL)

## Where To Learn SQL

**1. SQLZoo: The Wikipedia of SQL Learning** https://sqlzoo.net/wiki/SQL_Tutorial

**2. SQLBolt**
https://sqlbolt.com/

## Database Management Systems (DBMS)

[Comparing MySQL, PostgreSQL, and MongoDB](https://vercel.com/guides/mysql-vs-postgresql-vs-mongodb)

**1. MySQL**

**2. PostgreSQL**

**3. InfluxDB**

**4. MongoDB**

## Process for planning and designing a database before you create the SQL Database and Tables

**1. Identify and understand the main objects in your project**

Define the columns, table names, data types, relationship between tables, 

Example:
* Machine name: Pump Test Bench
* Oil Temperature: °C
* Hydraulic System Pressure: bar
* Hydraulic System Flow Rate: l/min
* Electric Motor Speed: RPM
* Ambient Temperature: °C
* Humidity: rH
* Ambient Noise: 
* Vibration: 

**2. Ask questions**

Example:
* What is the relationship between A and B?
* If A changes, does B stays the same or not?
* What is the impact?

**2. Sketch the tables**

Example:
* Lucid: https://www.lucidchart.com/pages/database-diagram/database-design
* Sqldbm: https://sqldbm.com/

**3. Draft the SQL statements that will create your database**

`CREATE TABLE, DROP TABLE, INSERT INTO, SELECT, JOIN, WHERE, `


* To retrieve data from a SQL database, we need to write `SELECT` statements (referred to as *queries*).

    Given a table of data, we can query for a specific columns:

    `SELECT column, another_column, ... FROM mytable`

    We can query all columns of data from a table:

    `SELECT * FROM mytable`

* To filter certain results from being returned, we need to use `WHERE` clause in the query.

    `SELECT column, another_column, ... FROM mytable WHERE condition AND/OR another_condition AND/OR ...;`

    Below are some useful operators to use for numerical data (integer or floating point)

In [2]:
import pandas as pd

pd.set_option("display.max_colwidth", 1)

data = [
("=, !=, <, <=, >, >=",	"Standard numerical operators",	"col_name != 4"),
("BETWEEN … AND …",	"Number is within range of two values (inclusive)",	"col_name BETWEEN 1.5 AND 10.5"),
("NOT BETWEEN … AND …",	"Number is not within range of two values (inclusive)",	"col_name NOT BETWEEN 1 AND 10"),
("IN (…)",	"Number exists in a list",	"col_name IN (2, 4, 6)"),
("NOT IN (…)",	"Number does not exist in a list", "col_name NOT IN (1, 3, 5)")
]

df = pd.DataFrame(data, columns=["Operator", "Condition", "SQL Example"])

df

Unnamed: 0,Operator,Condition,SQL Example
0,"=, !=, <, <=, >, >=",Standard numerical operators,col_name != 4
1,BETWEEN … AND …,Number is within range of two values (inclusive),col_name BETWEEN 1.5 AND 10.5
2,NOT BETWEEN … AND …,Number is not within range of two values (inclusive),col_name NOT BETWEEN 1 AND 10
3,IN (…),Number exists in a list,"col_name IN (2, 4, 6)"
4,NOT IN (…),Number does not exist in a list,"col_name NOT IN (1, 3, 5)"


Examples:

1. Find the temperature with a row **Id** of 6 --> `SELECT * FROM temp_table WHERE Id = 6;`

2. Find the price in the **year**s between 2000 and 2010 --> `SELECT * FROM sale_table WHERE year BETWEEN 2000 AND 2010;`

3. Find the price not in the **year**s between 2000 and 2010 --> `SELECT * FROM sale_table WHERE year NOT BETWEEN 2000 AND 2010;`

4. Find the first 5 hydraulic oil pressure and their timestamp --> `SELECT * FROM pressure_table WHERE Id BETWEEN 1 AND 5;`

* When writing **WHERE** clauses with columns containing text data, SQL supports operators to do things like case-sensitive string comparison and wildcard pattern matching.

    A few common text-data specific operators:

In [3]:
data = [
("=",	"Case sensitive exact string comparison (notice the single equals)",	"col_name = 'abc'"),
("!= or <>",	"Case sensitive exact string inequality comparison",	"col_name != 'abcd'"),
("LIKE",	"Case insensitive exact string comparison",	"col_name LIKE 'ABC'"),
("NOT LIKE",	"Case insensitive exact string inequality comparison",	"col_name NOT LIKE 'ABCD'"),
("%",	"Used anywhere in a string to match a sequence of zero or more characters (only with LIKE or NOT LIKE)",	"col_name LIKE '%AT%' (matches 'AT', 'ATTIC', 'CAT' or even 'BATS')"),
("_",	"Used anywhere in a string to match a single character (only with LIKE or NOT LIKE)",	"col_name LIKE 'AN_' (matches 'AND', but not 'AN')"),
("IN (…)",	"String exists in a list",	"col_name IN ('A', 'B', 'C')"),
("NOT IN (…)",	"String does not exist in a list",	"col_name NOT IN ('D', 'E', 'F')")
]

df = pd.DataFrame(data, columns=["Operator", "Condition", "SQL Example"])

df

Unnamed: 0,Operator,Condition,SQL Example
0,=,Case sensitive exact string comparison (notice the single equals),col_name = 'abc'
1,!= or <>,Case sensitive exact string inequality comparison,col_name != 'abcd'
2,LIKE,Case insensitive exact string comparison,col_name LIKE 'ABC'
3,NOT LIKE,Case insensitive exact string inequality comparison,col_name NOT LIKE 'ABCD'
4,%,Used anywhere in a string to match a sequence of zero or more characters (only with LIKE or NOT LIKE),"col_name LIKE '%AT%' (matches 'AT', 'ATTIC', 'CAT' or even 'BATS')"
5,_,Used anywhere in a string to match a single character (only with LIKE or NOT LIKE),"col_name LIKE 'AN_' (matches 'AND', but not 'AN')"
6,IN (…),String exists in a list,"col_name IN ('A', 'B', 'C')"
7,NOT IN (…),String does not exist in a list,"col_name NOT IN ('D', 'E', 'F')"


Examples:

1. Find only Toy Store movie --> `SELECT * FROM movies WHERE Title = "Toy Story";`

2. Find all the Toy Story movies --> `SELECT * FROM movies WHERE Title LIKE "Toy Story%";`

3. Find all the projects by Farees --> `SELECT * FROM projects WHERE Name = "FAREES";`

* SQL provides a convenient way to discard rows that have a *duplicate* column by using **DISTINCT** keyword.

    Select query with unique results:

    `SELECT DISTINCT column, another_column, ... FROM mytable WHERE condition(s);`

### Ordering Results

* SQL provides a way to sort the results of a query by a given column in *ascending* or *descending* order using **ORDER BY** clause.

    Select query with ordered results:

    `SELECT column, another_column, ... FROM mytable WHERE condition(s) ORDER BY column ASC/DESC;`

    When an **ORDER BY** clause is specified, each row is sorted alpha-numerically based on the specified column's value.


### Limiting Results to a Subset

* **LIMIT** and **OFFSET** clauses are a useful optimization to indicate to the database the subset of the results you care about.

    * The **LIMIT** will reduce the number of rows to return

    * The optional **OFFSET** will specify where to begin counting the number rows from

    Select query with limited rows:

    `SELECT column, another_column, ... FROM mytable WHERE condition(s) ORDER BY column ASC/DESC LIMIT num_limit OFFSET num_offset;`

Examples: 

`SELECT DISTINCT Director FROM movies ORDER BY Director;`

`SELECT title, year FROM movies ORDER BY year DESC LIMIT 4;`

`SELECT Title, Year FROM movies ORDER BY Title ASC LIMIT 5;`

`SELECT Title, Year FROM movies ORDER BY title ASC LIMIT 5 OFFSET 5;`

### Simple **SELECT** Queries

Examples:

`SELECT * FROM north_american_cities WHERE country = "Canada";`

`SELECT * FROM north_american_cities WHERE country = "United States" ORDER BY Latitude DESC;`

`SELECT city, longitude FROM north_american_cities WHERE longitude < -87.629798 ORDER BY longitude ASC;`

`SELECT * FROM north_american_cities WHERE country = "Mexico" ORDER BY population DESC LIMIT 2;`

`SELECT * FROM north_american_cities WHERE country = "United States" ORDER BY population DESC LIMIT 2 OFFSET 2;`

### Database Normalization

* Minimizes duplicate data in any single table, and allows for data in the database to grow independently of each other.

### Multi-table Queries with JOINs

* Table that share information about a single entity need to have a *primary key* that identifies that entity *uniquely* accross the database.

* One common primary key type is an auto-incrementing integer, but it can also be a string, hashed value, so long as it is unique.

* Using the **JOIN** clause in a query, we can combine row data accross two separate tables using this unique key.

* Select query with **INNER JOIN** on multiple tables

        `SELECT column, another_table_column,...`

        `FROM mytable`

        `INNER JOIN another_table`

                `ON mytable.id = another_table.id`

        `WHERE condition(s)`

        `ORDER BY column, ... ASC/DESC`

        `LIMIT num_limit OFFSET num_offset;`

* The **INNER JOIN** is a process that matches rows from the first table and the second table which have the same key (as defined by the **ON** constraint) 
    
    to create a result row with the combined columns from both tables. Other clauses will be applied after the tables are joined.

Examples:

`SELECT id, title, domestic_sales, international_sales FROM movies
INNER JOIN boxoffice ON movies.id = boxoffice.movie_id;`

`SELECT id, title, domestic_sales, international_sales FROM movies
INNER JOIN boxoffice ON movies.id = boxoffice.movie_id
WHERE international_sales > domestic_sales;`

`SELECT title, rating FROM movies
INNER JOIN boxoffice ON movies.id = boxoffice.movie_id
WHERE rating ORDER BY rating DESC;`

* We would use a **LEFT JOIN, RIGHT JOIN** or **FULL JOIN**, if two tables have asymmetric data:

        `SELECT column, another_table_column,...`

        `FROM mytable`

        `INNER/LEFT/RIGHT/FULL JOIN another_table`

                `ON mytable.id = another_table.matching_id`

        `WHERE condition(s)`

        `ORDER BY column, ... ASC/DESC`

        `LIMIT num_limit OFFSET num_offset;`

Examples: 

`SELECT DISTINCT building, building_name FROM employees
LEFT JOIN buildings ON employees.building = buildings.building_name;`

`SELECT * FROM buildings;`

`SELECT DISTINCT building_name, role 
FROM buildings 
  LEFT JOIN employees
    ON building_name = building;`

* You can test a column for **NULL** values in a **WHERE** clause by using either the **IS NULL** or **IS NOT NULL** constraint.

    `SELECT column, another_column FROM mytable WHERE column IS/IS NOT NULL AND/OR another_condition AND/OR ...;`

    Examples:

    `SELECT name, role FROM employees WHERE building IS NULL;`

    `SELECT building_name FROM buildings LEFT JOIN employees ON building_name = building WHERE building IS NULL;`



### Queries with expressions

* We can use *expressions* to write more complex logic on column values in a query.

* These expressions can use mathematical and string functions along with basic arithmetic to transform values when the query is executed.

* Example: `SELECT particle_speed / 2.0 AS half_particle_speed FROM physics_data WHERE ABS(particle_position) * 10.0 > 500;`

* When expressions are used in the **SELECT** part of the query, that they are also given a descriptive *alias* using the **AS** keyword.

* Example: `SELECT col_expression AS expr_description, ... FROM mytable;`

* Regular columns and even tables can also have aliases to make them easier to reference in the output and as a part of simplifying more complex queries:

*  `SELECT column AS better_column_name, ... FROM a_long_widgets_table_name AS mywidgets INNER JOIN widget_sales ON mywidgets.id = widget_sales.widget_id;`

Examples:

* List all movies and their combined sales in **millions** of dollars: `SELECT title, domestic_sales, international_sales, (domestic_sales + international_sales)/1000000 AS combined_sales_in_mil 
FROM boxoffice INNER JOIN movies ON id = movie_id;`

* List all movies and their ratings in **percent**: `SELECT id, title, rating, rating * 10 AS rating_in_percent FROM movies 
INNER JOIN boxoffice ON id = movie_id ORDER BY rating DESC;`

* List all movies that were released on even number years: `SELECT title, year FROM movies WHERE NOT year % 2;`

### Queries with aggregates

* SQL also supports the use of aggregate expressions (or functions) to summarize information about a group of rows of data

* Select query with aggregate functions over all rows:

    `SELECT AGG_FUNC(column_or_expression) AS aggregate_description, ... FROM mytable WHERE constraint_expression;`

* Each aggregate function is going to run on the whole set of result rows and return a single value

* Here are some common aggregate functions:

In [4]:
data = [
    ("COUNT(*), COUNT(column)", "A common function used to counts the number of rows in the group if no column name is specified. Otherwise, count the number of rows in the group with non-NULL values in the specified column."),
    ("MIN(column)", "Finds the smallest numerical value in the specified column for all rows in the group."),
    ("MAX(column)", "Finds the largest numerical value in the specified column for all rows in the group."),
    ("AVG(column)", "Finds the average numerical value in the specified column for all rows in the group."),
    ("SUM(column)", "Finds the sum of all numerical values in the specified column for the rows in the group.")
]

df = pd.DataFrame(data, columns=["Function", "Description"])

df

Unnamed: 0,Function,Description
0,"COUNT(*), COUNT(column)","A common function used to counts the number of rows in the group if no column name is specified. Otherwise, count the number of rows in the group with non-NULL values in the specified column."
1,MIN(column),Finds the smallest numerical value in the specified column for all rows in the group.
2,MAX(column),Finds the largest numerical value in the specified column for all rows in the group.
3,AVG(column),Finds the average numerical value in the specified column for all rows in the group.
4,SUM(column),Finds the sum of all numerical values in the specified column for the rows in the group.


* You can apply the aggregate functions to individual groups of data within that group. 

* This would then create as many results as there are unique groups defined by the **GROUP BY** clause.

    `SELECT AGG_FUNC(column_or_expression) AS aggregate_description, ... FROM mytable WHERE constraint_expression GROUP BY column;`

* The **GROUP BY** clause works by grouping rows that have the same value in the column specified

* Examples:

    * Find the longest time that an employee has been working: `SELECT MAX(years_employed) AS longest_employee FROM employees;`

    * For each role, find the average number of years employed by employees in that role: `SELECT role, AVG(years_employed) FROM employees WHERE years_employed GROUP BY role;`

    * Find the total number of employee years working in each building: `SELECT building, SUM(years_employed) FROM employees WHERE years_employed GROUP BY building;`

* By adding an additional **HAVING** clause which is used specifically with the **GROUP BY** clause to allow us to filter grouped rows from the result set

    `SELECT group_by_column, AGG_FUNC(column_expression) AS aggregate_result_alias, ...` 
    
    `FROM mytable` 
    
    `WHERE condition`

    `GROUP BY column`

    `HAVING group_condition;`

* Examples:

    * Find the number of Artists in the studio (without a **HAVING** clause): `SELECT SUM(role = "Artist") FROM employees;`

    * Find the number of Employees of each role in the studio: `SELECT role, COUNT(*) FROM employees GROUP BY role;`

    * Find the total number of years employed by all Engineers: `SELECT role, SUM(years_employed) FROM employees WHERE role = "Engineer";` **OR** `SELECT role, SUM(years_employed) FROM employees GROUP BY role HAVING role = "Engineer";`

## Example in Python

In [5]:
# # Example 1 : Create an SQLite database to store our book data
# import sqlite3

# # create a SQLite database called "books-collection"
# db = sqlite3.connect("books-collection.db")

# # create a "cursor" to control our database. The cursor will be used to modify our SQLite database.
# cursor = db.cursor()

# # create a table called "books"
# # cursor.execute("CREATE TABLE books (id INTEGER PRIMARY KEY, title varchar(250) NOT NULL UNIQUE, author varchar(250) NOT NULL, rating FLOAT NOT NULL)")

# # insert data into the table 
# # NOTE: the create a table code above must be commented out before we insert data into the table
# cursor.execute("INSERT INTO books VALUES(2, 'Elon Musk', 'Ashlee Vance', '9.5')")
# db.commit()