# Advanced databases

## Data Query Language - DQL
### dr  inż. Waldemar Bauer

## SQL Standard


- Structure Query Language(SQL) is a database query language used for storing and managing data in Relational DBMS

- SQL is an ANSI/ISO standard but exists different versions of the SQL language.

- The major commands such as SELECT, UPDATE, WHERE, DELETE, etc. are similar.

- Most of the SQL database engiones also have their own proprietary extensions in addition to the SQL standard.

## SQL Command

1. DDL: Data Definition Language
    - create
    - alter
    - delete
    - drop
    - rename
2. DML: Data Manipulation Language
    - insert	
    - update	
    - delete
3. TCL: Transaction Control Language
    - commit
    - rollback
    - savepoint
4. DQL: Data Query Language
    - select

## Select - basic form

```sql
SELECT [DISTINCT|ALL ] { * | [fieldExpression [AS newName]} 
FROM tableName [alias] 
[WHERE condition]
[GROUP BY fieldName(s)]  
[HAVING condition] ORDER BY fieldName(s)
```

## Select - PostgreSQL form
<img src="./img/select_pg.png" width="40%" height="40%">

[source](https://www.postgresql.org/docs/current/sql-select.html)

## Tutorial database
<img src='./img/dvd-rental-sample-database-diagram.png' width="30%" height="30%">

## Select on the begin

**Query 1**
```sql
select 'a'
```
**result:**

| ??column??, text 	|
|:---------------:	|
|       "a"       	|

**Query 2**
```sql
select 4-(4+4)*4
```
**result:**

|  ??column??, integer 	|
|:---------------------:|
|         -28         	|


## Alias

**Query 1**
```sql
select 'a' as "char"
```
**result:**

| char, text 	|
|:---------------:	|
|       "a"       	|

**Query 2**
```sql
select 4-(4+4)*4 as "calculation"
```
**result:**

|  calculation, integer 	|
|:---------------------:|
|         -28         	|

## Select and function

**Query 1**
```sql
select upper('Anneth')
```
**result:**

|   **upper**, text  |
|:-----------------:|
|       "ANNETH"    |

**Query 2**
```sql
select sqrt (4*4*4*4)
```
**result:**

|  **sqrt**, double precision|
|:-------------------------:|
|         16            	|

## Select from table

Select all column from table:

```sql
select * from actor
```

**result:**

| actor_id 	|  first_name 	|   last_name  	|       last_update      	|
|:--------:	|:-----------:	|:------------:	|:----------------------:	|
|     1    	|   Penelope  	|    Guiness   	| 2013-05-26 14:47:57.62 	|
|     2    	|     Nick    	|   Wahlberg   	| 2013-05-26 14:47:57.62 	|
|     3    	|      Ed     	|     Chase    	| 2013-05-26 14:47:57.62 	|
|     4    	|   Jennifer  	|     Davis    	| 2013-05-26 14:47:57.62 	|
|     5    	|    Johnny   	| Lollobrigida 	| 2013-05-26 14:47:57.62 	|
|     6    	|    Bette    	|   Nicholson  	| 2013-05-26 14:47:57.62 	|
|     7    	|    Grace    	|    Mostel    	| 2013-05-26 14:47:57.62 	|
|     8    	|   Matthew   	|   Johansson  	| 2013-05-26 14:47:57.62 	|
|     ...   |     ...     	|     ...    	|           ...         	|

## Select from table chosen columns

```sql
select first_name, last_name from actor
```
**result:**

|  first_name 	|   last_name  	|
|:-----------:	|:------------:	|
|   Penelope  	|    Guiness   	|
|     Nick    	|   Wahlberg   	|
|      Ed     	|     Chase    	|
|   Jennifer  	|     Davis    	|
|    Johnny   	| Lollobrigida 	|
|    Bette    	|   Nicholson  	|
|    Grace    	|    Mostel    	|
|   Matthew   	|   Johansson  	|
|     ...     	|     ...    	|

## Select from table chosen columns and concatenation results

```sql
select Concat('First Name: ',first_name, ' Last Name: ', last_name) as "My text"
from actor
```

**result:**

|             My text                      |
|:-----------------------------------------|
|First Name: Penelope Last Name: Guiness   |
|First Name: Nick Last Name: Wahlberg      |
|First Name: Ed Last Name: Chase           |
|First Name: Jennifer Last Name: Davis     |
|First Name: Johnny Last Name: Lollobrigida|
|First Name: Bette Last Name: Nicholson    |
|First Name: Grace Last Name: Mostel       |
|First Name: Matthew Last Name: Johansson  |
|                 ...                      |

##  Select from table chosen columns and concatenation part 2 

```sql
select Concat('First Name: ',first_name, ' Last Name: ', last_name) as "My text",

last_update from actor
```

**result:**

| My text                                     	| last_update            	|
|:---------------------------------------------	|:------------------------	|
| First Name: Penelope Last Name: Guiness     	| 2013-05-26 14:47:57.62 	|
| First Name: Nick Last Name: Wahlberg        	| 2013-05-26 14:47:57.62 	|
| First Name: Ed Last Name: Chase             	| 2013-05-26 14:47:57.62 	|
| First Name: Jennifer Last Name: Davis       	| 2013-05-26 14:47:57.62 	|
| First Name: Johnny Last Name: Lollobrigida  	| 2013-05-26 14:47:57.62 	|
| First Name: Bette Last Name: Nicholson      	| 2013-05-26 14:47:57.62 	|
| First Name: Grace Last Name: Mostel         	| 2013-05-26 14:47:57.62 	|
| First Name: Matthew Last Name: Johansson    	| 2013-05-26 14:47:57.62 	|
|...                                            |             ...           |

## Select limit
```sql
select first_name, last_name from actor limit 10 
```

**result:**

| first_name 	| last_name    	 |
|:------------:	|:--------------:|
| Penelope   	| Guiness      	 |
| Nick       	| Wahlberg     	 |
| Ed         	| Chase        	 |
| Jennifer   	| Davis        	 |
| Johnny     	| Lollobrigida 	 |
| Bette      	| Nicholson    	 |
| Grace      	| Mostel       	 |
| Matthew    	| Johansson    	 |
| Joe        	| Swank        	 |
| Christian  	| Gable        	 |

- the *limit* value must be positive 

## Select limit and offset

```sql
select first_name, last_name from actor limit 5 offset 5
```
**result:**

| first_name 	| last_name    	 |
|:------------:	|:--------------:|
| Bette      	| Nicholson    	 |
| Grace      	| Mostel       	 |
| Matthew    	| Johansson    	 |
| Joe        	| Swank        	 |
| Christian  	| Gable        	 |

- the *offset* value must be positive 

## Order by 
 
- Used to sort the result-set in ascending (ASC) or descending order (DESC)
- Must by use before limit
- Defult order by work in ASC mode

```sql
select first_name, last_name from actor order by first_name ASC limit 10
```
**result:**

| first_name 	|  last_name  	|
|:----------:	|:-----------:	|
|    Adam    	|    Grant    	|
|    Adam    	|    Hopper   	|
|     Al     	|   Garland   	|
|    Alan    	|   Dreyfuss  	|
|   Albert   	|  Johansson  	|
|   Albert   	|    Nolte    	|
|    Alec    	|    Wayne    	|
|   Angela   	| Witherspoon 	|
|   Angela   	|    Hudson   	|
|  Angelina  	|   Astaire   	|

## Select with distinct

**Query 1**
```sql
SELECT first_name FROM actor
```
Return 200 first names 

**Query 2**
```sql
SELECT DISTINCT first_name FROM actor
```
Return 128 first names


## Select with distinct part 2

**Query 1**
```sql
SELECT first_name, last_name FROM actor
```
Return 200 first names 

**Query 2**
```sql
SELECT DISTINCT (first_name, last_name) FROM actor
```
Return 199 first names 

## Query Explain

- Returns the execution plan which PostgreSQL planner generates for a given statement.
- Shows information about tables involed in the query, type of opperation on index, ordered, etc. and kind of join algorithm will be used
- Most important result of Explain is is start-cost before the first row can be returned and the total cost to return the complete result set

```sql
EXPLAIN [ ( option [, ...] ) ] statement
EXPLAIN [ ANALYZE ] [ VERBOSE ] statement
```


## Explain option

options:

- ANALYZE [ boolean ] - defult FALSE
- VERBOSE [ boolean ] - defult FALSE
- COSTS [ boolean ] - defult TRUE
- BUFFERS [ boolean ] - defult FALSE
- TIMING [ boolean ] -  defult TRUE
- SUMMARY [ boolean ] - defult TRUE
- FORMAT { TEXT | XML | JSON | YAML } - defult TEXT

## Explain option part 2

- Analyze
    - Option causes the sql_statement to be executed first and then actual run-time statistics.
    - Return: total elapsed time expended within each plan node, the number of rows it actually returned.
- Verbose 
    - Display additional information regarding the plan
    - Return: output column list for each node in the plan tree, schema-qualify table and function names, always label variables in expressions with their range table alias, and always print the name of each trigger for which statistics are displayed 
- COSTS
    - estimated startup and total cost of each plan node, as well as the estimated number of rows (Index Scan) and the estimated width of each row (in bytes of the returned rows)

## Explain exampel 1

```sql
EXPLAIN
select first_name, last_name from actor  order by first_name ASC limit 10
```
<img src="./img/explain_basic.png">


## Explain exampel 2

```sql
EXPLAIN ANALYZE VERBOSE 
select first_name, last_name from actor order by first_name ASC limit 10
```
<img src="./img/explain_pro.png">


## Explain exampel 3

```sql
EXPLAIN (ANALYZE TRUE, VERBOSE True, BUFFERS TRUE)
select first_name, last_name from actor  order by first_name ASC limit 10
```
<img src="./img/explain_full.png">


## Where in select

Where used condition to filter the rows returned from the SELECT statement. 

Standard operators in Where:

| Operator 	|      Description      	|
|:--------:	|:---------------------:	|
|     =    	|         Equal         	|
|     >    	|      Greater than     	|
|     <    	|       Less than       	|
|    >=    	| Greater than or equal 	|
|    <=    	|   Less than or equal  	|
| <> or != 	|       Not equal       	|
|    AND   	|  Logical operator AND 	|
|    OR    	|  Logical operator OR  	|


## Where examples

**Query 1**
```sql
select * from actor where actor_id < 5;
```
**Query 2**
```sql
select * from actor where actor_id < 10 and actor_id > 5;
```
**Query 3**
```sql
select * from actor where actor_id < 10 or actor_id > 5;
```

## Result Query 1 

| actor_id 	| first_name 	|  last_name 	|        last_update       	|
|:--------:	|:----------:	|:----------:	|:------------------------:	|
|     1    	| "Penelope" 	|  "Guiness" 	| "2013-05-26 14:47:57.62" 	|
|     2    	|   "Nick"   	| "Wahlberg" 	| "2013-05-26 14:47:57.62" 	|
|     3    	|    "Ed"    	|   "Chase"  	| "2013-05-26 14:47:57.62" 	|
|     4    	| "Jennifer" 	|   "Davis"  	| "2013-05-26 14:47:57.62" 	|

## Function in select

- A function is a set of SQL statements that perform a specific task. 

- In SQL Server standard we have many [predefined functions](https://www.w3schools.com/sql/sql_ref_sqlserver.asp) 

- Full list of PostgreSQL predefined function [here](https://www.postgresql.org/docs/current/functions.html)

## Function in select example
```sql
select * from actor where length(first_name) < 3;
```

Result:

| actor_id 	| first_name 	|  last_name  	|        last_update       	|
|:--------:	|:----------:	|:-----------:	|:------------------------:	|
|     3    	|    "Ed"    	|   "Chase"   	| "2013-05-26 14:47:57.62" 	|
|    136   	|    "Ed"    	| "Mansfield" 	| "2013-05-26 14:47:57.62" 	|
|    165   	|    "Al"    	|  "Garland"  	| "2013-05-26 14:47:57.62" 	|
|    179   	|    "Ed"    	|  "Guiness"  	| "2013-05-26 14:47:57.62" 	|

## Between in select

- Match a value against a range of values
- Is equale of condition -> col_name >= value and col_name <= value2 

```sql
select * from actor where length(first_name)  between 2 and 3  limit 5;
```

Result:

| actor_id 	| first_name 	| last_name 	|        last_update       	|
|:--------:	|:----------:	|:---------:	|:------------------------:	|
|     3    	|    "Ed"    	|  "Chase"  	| "2013-05-26 14:47:57.62" 	|
|     9    	|    "Joe"   	|  "Swank"  	| "2013-05-26 14:47:57.62" 	|
|    13    	|    "Uma"   	|   "Wood"  	| "2013-05-26 14:47:57.62" 	|
|    18    	|    "Dan"   	|   "Torn"  	| "2013-05-26 14:47:57.62" 	|
|    19    	|    "Bob"   	| "Fawcett" 	| "2013-05-26 14:47:57.62" 	|

## EXPLAIN between

```sql
EXPLAIN ANALYZE VERBOSE  
select * from actor where length(first_name)  between 2 and 3;
```
Result: 
"Seq Scan on public.actor  (cost=0.00..6.00 rows=1 width=25) (actual time=0.010..0.031 rows=28 loops=1)"

"  Output: actor_id, first_name, last_name, last_update"

"  Filter: ((length((actor.first_name)::text) >= 2) AND (length((actor.first_name)::text) <= 3))"

"  Rows Removed by Filter: 172"

"Planning Time: 0.045 ms"

"Execution Time: 0.040 ms"


## EXPLAIN between

```sql
EXPLAIN ANALYZE VERBOSE  
select * from actor where length(first_name) >= 2 and  length(first_name) <=3 ;
```
Result: 
"Seq Scan on public.actor  (cost=0.00..6.00 rows=1 width=25) (actual time=0.013..0.035 rows=28 loops=1)"

"  Output: actor_id, first_name, last_name, last_update"

"  Filter: ((length((actor.first_name)::text) >= 2) AND (length((actor.first_name)::text) <= 3))"

"  Rows Removed by Filter: 172"

"Planning Time: 0.066 ms"

"Execution Time: 0.046 ms"


## 'In' clausule in select

-  IN operator is used in the WHERE clause to check if a value matches any value in a list of values.

```sql
select * from actor where actor_id in (1,20,30,18);
```
Result:

| actor_id 	| first_name 	| last_name 	|        last_update       	|
|:--------:	|:----------:	|:---------:	|:------------------------:	|
|     1    	| "Penelope" 	| "Guiness" 	| "2013-05-26 14:47:57.62" 	|
|    18    	|    "Dan"   	|   "Torn"  	| "2013-05-26 14:47:57.62" 	|
|    20    	|  "Lucille" 	|  "Tracy"  	| "2013-05-26 14:47:57.62" 	|
|    30    	|  "Sandra"  	|   "Peck"  	| "2013-05-26 14:47:57.62" 	|

## 'In' operator in select part 2
```sql
select * from actor where first_name in ('Ed','Al', 'Carmen', 'Jude');
```

Result:

| actor_id 	| first_name 	|  last_name  	|        last_update       	|
|:--------:	|:----------:	|:-----------:	|:------------------------:	|
|     3    	|    "Ed"    	|   "Chase"   	| "2013-05-26 14:47:57.62" 	|
|    52    	|  "Carmen"  	|    "Hunt"   	| "2013-05-26 14:47:57.62" 	|
|    57    	|   "Jude"   	|   "Cruise"  	| "2013-05-26 14:47:57.62" 	|
|    136   	|    "Ed"    	| "Mansfield" 	| "2013-05-26 14:47:57.62" 	|
|    165   	|    "Al"    	|  "Garland"  	| "2013-05-26 14:47:57.62" 	|
|    179   	|    "Ed"    	|  "Guiness"  	| "2013-05-26 14:47:57.62" 	|

## Subquery in select

In from: 

```sql
select * from (select * from actor where length(first_name) < 3) tmp_actor 
where tmp_actor.actor_id < 100;
```

Result:

| actor_id 	| first_name 	|  last_name  	|        last_update       	|
|:--------:	|:----------:	|:-----------:	|:------------------------:	|
|     3    	|    "Ed"    	|   "Chase"   	| "2013-05-26 14:47:57.62" 	|

## Subquery in select

In from: 

```sql
select * from (select first_name, last_name from actor where length(first_name) < 3) tmp_actor 
where tmp_actor.actor_id < 100;
```

Result:

ERROR:  column tmp_actor.actor_id don't exist


## Subquery in select

In from: 

```sql
select * from (select first_name, last_name from actor where length(first_name) < 3) tmp_actor 
where length(tmp_actor.first_name) < 3;
```

Result:

| first_name 	|  last_name  	|
|:----------:	|:-----------:	|
|"Ed"|	"Chase"|
|"Ed"|	"Mansfield"|
|"Al"|	"Garland"|
|"Ed"|	"Guiness"|


## Subquery in select part 2
In where:
```sql
select * from actor where actor_id in (select actor_id from actor where length(first_name) < 3) and actor_id < 100;
```
Result:

| actor_id 	| first_name 	|  last_name  	|        last_update       	|
|:--------:	|:----------:	|:-----------:	|:------------------------:	|
|     3    	|    "Ed"    	|   "Chase"   	| "2013-05-26 14:47:57.62" 	|

## EXPLAIN subquery

```sql
EXPLAIN ANALYZE VERBOSE  
select * from (select * from actor where length(first_name) < 3) tmp_actor where tmp_actor.actor_id < 100;
```

"Seq Scan on public.actor  (cost=0.00..5.50 rows=33 width=25) (actual time=0.031..0.044 rows=1 loops=1)"

"  Output: actor.actor_id, actor.first_name, actor.last_name, actor.last_update"

"  Filter: ((actor.actor_id < 100) AND (length((actor.first_name)::text) < 3))"

"  Rows Removed by Filter: 199"

"Planning Time: 0.074 ms"

"Execution Time: 0.054 ms"


## EXPLAIN subquery part 2

```sql
EXPLAIN ANALYZE VERBOSE 
select * from actor where actor_id in (select actor_id from actor where length(first_name) < 3) and actor_id < 100;
```

"Hash Join  (cost=5.84..10.60 rows=33 width=25) (actual time=0.036..0.049 rows=1 loops=1)"

"  Output: actor.actor_id, actor.first_name, actor.last_name, actor.last_update"

"  Inner Unique: true"

"  Hash Cond: (actor.actor_id = actor_1.actor_id)"

"  ->  Seq Scan on public.actor  (cost=0.00..4.50 rows=99 width=25) (actual time=0.008..0.017 rows=99 loops=1)"

...

"  ->  Hash  (cost=5.00..5.00 rows=67 width=4) (actual time=0.023..0.023 rows=4 loops=1)"

...

"Planning Time: 0.214 ms"

"Execution Time: 0.068 ms"