# Basic SQL Queries
(Below queries are for PostgresSQL)

### 1. SELECT
(To extract specific columns from a table: `Projection` operation, picking columns from data)

- From countries table, Select capital and population columns: <br>
    **SELECT capital, pop FROM countries**
- From countries table, Select all columns: <br>
    **SELECT * FROM countries**

### SELECT - DISTINCT
(To extract distinct values from a table)

- From countries table, Select DISTINCT continent names: <br>
    **SELECT DISTINCT continent FROM countries**
- From countries table, Select DISTINCT combination of continent and region columns. When multiple column names are given after DISTINCT, unique combination of all those columns is selected and not an unique values for any specific column: <br>
    **SELECT DISTINCT continent, region FROM countries**
- DISTINCT for any specific columns (Postgres specific command): <br>
    **SELECT DISTINCT ON (continent) continent, region FROM countries**

Note:
- SELECT is the only command in SQL that can be used for doing computations e.g. SELECT runtime/60. AS duration FROM movies. If 60 is given without `.`, it will truncate the value to integer.
- AS is used to give aliases, e.g SELECT avg_pop AS "Average Population". Aliases are used extensively during Joins. `" "` is used only when identifiers have space between them.

### 2. WHERE
(To filter rows based on condition on specific fields)

From countries table, Select all capital cities in Asia: <br>
**SELECT country, capital FROM countries
        WHERE continent='Asia'**
        
Note:
- Aliases cannot be used in the WHERE clause. This is because SQL first filters, then selects. So, alias is not available when WHERE clause is executed.

## Condition testing:
- Comparison (=, <>, <, <=, >, >=)
- Pattern Matching (LIKE, NOT LIKE, ILIKE): ILIKE is for case insensitive
- Range (BETWEEN): returns values between two numbers, letters or times, where both values are inclusive. column_name [NOT] BETWEEN val1 AND val2
- List (IN): check value is in a list. column_name [NOT] IN (val1, val2, val3)
- Null Testing (IS NULL)

## AND, OR, NOT
- Logical operators to combine multiple conditions. By default, AND takes precedence over OR/ NOT. SO, we need to use `( )` to specify the order of logical operators.
- Use of NOT: WHERE NOT condition

## Pattern Matching

- % : any number of characters. ESCAPE : **WHERE name LIKE '%$%%' ESCAPE '$'
- _ : for single character (__ : two characters, and so on)


### 3. GROUP BY
(To group rows based on common values in a column, and extract aggregate values)

From countries table, get total population of continents: <br>
**SELECT SUM(pop) AS total_population, continent <br>
    FROM countries<br>
    GROUP BY continent**

Notes:
- When GROUP BY clause is used, columns cannot be included in SELECT without aggregation functions except those columns have been used from Grouping
- Aggregation functions can be SUM, AVG, MAX, MIN, COUNT
- Alias can be specified (e.g. total_poulation)
- GROUP BY comes after the WHERE clause and before ORDER BY clause
- We can use multiple columns in the GROUP BY clause, and all unique combination of those columns would be returned back.

### 4. HAVING
(Aggregation functions cannot be in WHERE clause for filtering rows. To filter using aggregate values at grouped level, HAVING clause is used)

From countries table, get continents with total population > 1000000: <br>
**SELECT SUM(pop) AS total_population, continent <br>
    FROM countries<br>
    GROUP BY continent<br>
    HAVING SUM(pop) > 1000000**
    
Notes:
- HAVING comes just after GROUP BY, it is a part of the group by clause; and can be done using   
- Alias cannot be used in grouping and filtering clauses. This is because aggregate function are applied on the grouped and selected data, so alias is not available before that.

#### WHERE filters before grouping, HAVING filters after grouping

### 4. ORDER BY
(To sort in ascending or descending order)

From countries table, get total population of continents. Sort in ascending order: <br>
**SELECT SUM(pop) AS total_population, continent <br>
    FROM countries<br>
    GROUP BY continent<br>
    ORDER BY total_population**

Notes:
- Default Order is ASC (ascending), uses `ORDER BY total_population DESC` for sorting in descending order. 
- Alias can be used in ORDER BY clause
- Can specify ORDER BY multiple columns: `ORDER BY total_population DESC, continent ASC`
- ORDER BY column should appear in the SELECT clause because Ordering happens after Select.

### 5. LIMIT
(To specify the number of rows to return)

From countries table, select the two largest continents in terms of total population: <br>
**SELECT SUM(pop) AS total_population, continent <br>
    FROM countries<br>
    GROUP BY continent<br>
    ORDER BY total_population DESC<br>
    LIMIT 2**
    
Note: Limit restricts the number of actual rows returned by the query, and makes the program more memory efficient. This is different from pandas dataframe .head(), where limited number of rows are visible but all are loaded in the memory.

## LIMIT - OFFSET
(To specify the number of rows to return, after skipping initial `offset` number of rows)

From countries table, select the second and third largest continents in terms of total population (i.e skip the first row): <br>
**SELECT SUM(pop) AS total_population, continent <br>
    FROM countries<br>
    GROUP BY continent<br>
    ORDER BY total_population DESC<br>
    LIMIT 2 OFFSET 1**   

## 5. JOINS
(To retrieve data from two or more tables simultaneously and create virtual tables from a database)

**SELECT columns <br>
FROM left_table <br>
    JOIN_TYPE <br>
    right_table <br>
    ON <br>
    join_condition <br>
WHERE row_filter <br>
GROUP BY grouping_column <br>
HAVING group_filter <br>
ORDER BY ordering_column ASC/DESC <br>
LIMIT 2 OFFSET 1**

**Type of Joins**

**A. CROSS_JOIN**
- Combines all rows of one table with all rows of the other table, without any condition (m x n rows)
**SELECT col1, col2, rt.col2 FROM left_table CROSS JOIN right_table AS rt;**

Note: 
- If there are common column names, we use simple table aliases or full table name before the column names.
- Table aliases are valid only for the duration of the query
- Each join is initially a CROSS JOIN, and then filtered based on condition.
- If alias is given, then we have to use the alias all the time, throughout the query, because we have decided to change the name of the table.

**B. INNER_JOIN**
- Intersection: only retrieves rows that are common to both the tables
- Only returns the records for which key are matched between the two or more tables

**SELECT lt.col1, rt.col2 FROM left_table AS lt INNER JOIN right_table AS rt ON lt.id = rt.id**

**C. SELF_JOIN**
- To Join all rows from one table with itself.
- Use case: Find the instructors who work in the same department
- Same table has two aliases

**SELECT t1.name, t1.department, t2.name, t2.department <br>
FROM table t1 JOIN table t2 <br>
ON t1.department = t2.department AND t1.id <> t2.id;**

Note: t1.id <> t2.id is used to self mapped names

**D. NATURAL JOIN**
- Automatically joins all columns with similar values

**E. LEFT OUTER JOIN / LEFT JOIN**
- All values from left table are returned and only matching values from the right table are returned. If no matching values are found in the right table, NULL is returned.
**F. RIGHT OUTER JOIN / RIGHT JOIN**
- All values from right table are returned and only matching values from the left table are returned. If no matching values are found in the left table, NULL is returned.
**G. FULL OUTER JOIN / FULL JOIN**
- All values from right table are returned and only matching values from the left table are returned
- Cross Join matches all rows regardless of any condition and has no NUll Values. Full join first finds the matching rows and then adds the non-matching rows with NULL values.

**USING WITH JOINS**
- Can be used instead of ON with Joins, to select the columns whose value needs to be matched, e.g. ON t1.colt1 = t2.colt2

**SELECT * <br>
FROM table1 AS t1 <br>
JOIN table2 AS t2 <br>
USING (col1, col2, col3)**

## CASE - WHEN clause

To set the value of one column conditionally depending upon the value of another column e.g.: <br>

	SELECT
		CASE
			WHEN runtime > 90 THEN 'long'
			WHEN runtime BETWEEN 30 AND 90 THEN 'normal'
			ELSE 'short'
		END AS run_type
        
General format for CASE-WHEN is:

	SELECT
		CASE
			WHEN condition_1 THEN expression_1
			WHEN condition_2 THEN expression_2
			ELSE expression_3
		END AS alias

## Aggregation Functions
(Performed overall on the selected column)

- COUNT()
- MIN(), MAX()
- SUM(), AVG()

Note:
- All aggregation function ignore NULL values, except COUNT(*)
- MIN(), MAX() can be used for string values also
- We can have multiple aggregation functions can be used together but we cannot have non-aggregated columns along with an aggregate column, unless the non-aggregated column is used in GROUP BY clause.
- Aggregation functions are executed after Selection, so, we CANNOT use aggregation functions within the WHERE clause. We use SUB-QUERIES for this.

## SUBQUERIES

A SELECT statement that can be used inside another SQL statement. So, subquery is a nested/ INNER query within a larger/OUTER query.

**SIMPLE/ UNCORRELATED SUBQUERY:** A subquery where the inner query is completely independent of the outer query

e.g. SELECT countries whose area is more than the average area for all countries

**SELECT country FROM countries WHERE area > 
        (SELECT AVG(area) FROM countries)**

**CORRELATED SUBQUERY:** The outer query is dependent upon what is happening in the inner query. In this SQL goes row by row in the original query and compares it to each row in the inner query i.e. every row in a table is compared against related data. Correlated subquery is used when the subquery has to return different results for each row of the original query. This is analogous to nested for-loop in python. It has big-O of n^2. Hence, it is very slow.

e.g. SELECT country from each continent which has the maximum population in that continent

**SELECT c1.name, c1.continent FROM country c1 WHERE c1.population = (SELECT max(c2.population) FROM country c2 WHERE c2.continent = c1.continent);**

Notes:
- SUB-QUERIES must be enclosed in `( )` and should not end with `,`
- Can be used in SELECT, FROM, WHERE, HAVING or other clauses
- SUBQUERIES vs JOIN: Subqueries are generally more readable but they do not give access to columns in the INNER table. Joins are generally faster but joins give a lot of redundant data during join and then filter, so it is more memory intensive. 

### ANY | ALL (in subqueries)

These key words are used to check if the values in outer query are equal to `Any` or `All` values in the inner query.

### EXISTS or NOT EXISTS

This key word is used to determine if the Subquery returns any row. It can be used in simple or correlated quesries but are typically used in correlated subqueries.

## CAST - AS
(Used for Datatype conversion)

**CAST (column_name AS data_type)**

Note: Postgres specific CAST: column_name::data_type

## TIMEZONE

**SHOW TIMEZONE**

**SET timezone = 'America/New_York'**

## Mathematical Operators

- Operators: +, -, *, /, % (modulo, remainder), ^ (exponent), @ (absolute value, without sign)
- Functions: abs(), ceil(), exp(), floor(), ln(), log(), pi(), power(), round(), sqrt(), round() works only with NUMERIC data types and not with REAL

## String Functions
- Concatenation `||`. Non string values are type casted to String.
- length(), lower(), upper(), position(substring in string) : count starts with 1, substring(string from position for characters)

## Datetime Operators

- +, -, *, / , work on datetime objects
- CURRENT_DATE, NOW(), CURRENT_TIME(0), CURRENT_TIMESTAMP(0), LOCALTIME(0), LOCALTIMESTAMP(0) : Number represents precision of the date-time object
- EXTRACT is used to take out bits of date-time object:
    **EXTRACT (hour FROM NOW())**
- age() function to calculate 

## NULLIF
(Returns NULL value if column contains a certain value)	
	
	SELECT
		NULLIF(column_name, value)
	FROM
		table_name

Overall Notes: 
- Order of keywords is important in SQL
- Order of Operations: Arithmetic (+, *) >> Comparison (<, >=) >> Logical (AND, OR)

In [None]:
SELECT version() : shows version of Postgres

In [None]:
5. Date Functions
In PostgreSQL, you can easily extract values from date columns. You will see the most used date functions below.
SELECT 
date_part('year',hire_date) as year,
date_part('month',hire_date) as month,
date_part('day',hire_date) as day,
date_part('dow',hire_date) as dayofweek,
to_char(hire_date, 'Dy') as day_name,
to_char(hire_date,'Month') as month_name,
hire_date
FROM employees
result from data sample where DATE FUNTION was used by data scientists using SQL
Date Functions Output

10. Comments
- `--` for single line or inline comments
- `\*     *\` for multi-line comments 

10. Window Functions
Window functions apply aggregate and ranking functions over a particular window (set of rows). OVER clause is used with window functions to define that window. OVER clause does two things:
Partitions rows to form set of rows (PARTITION BY clause is used).
Orders rows within those partitions into a particular order (ORDER BY clause is used).
Various aggregate functions such as SUM(), COUNT(), AVERAGE(), MAX(), and MIN() applied over a particular window (set of rows) are called aggregate window functions.
10.1. Aggregation Examples
The following query will give you the average salary for each department.
SELECT first_name,salary,department,
ROUND(AVG(salary) OVER(PARTITION BY department)) as avg_sales_by_dept
FROM employees
ORDER BY salary DESC
result from data sample where aggregation was used by data scientists using SQL
Aggregate Window Functions
10.2.Ranking the Values
The Rank() function is a window function that assigns a rank to each row within a partition of a result set.
The following example orders the table by the salary (descending). A rank value of 1 is the highest salary value.
SELECT first_name,salary,RANK() OVER(ORDER BY salary DESC)
FROM employees
result from data sample where ranking was used by data scientists using SQL
These are the queries that are used commonly by data professionals.
Hope you found it helpful! Thanks for reading!