# Basic SQL Queries
(Below queries are for PostgresSQL)

### 1. SELECT
(To extract specific columns from a table: `Projection` operation, picking columns from data)

- From countries table, Select capital and population columns: <br>
    **SELECT capital, pop FROM countries**
- From countries table, Select all columns: <br>
    **SELECT * FROM countries**

### SELECT - DISTINCT
(To extract distinct values from a table)

- From countries table, Select DISTINCT continent names: <br>
    **SELECT DISTINCT continent FROM countries**
- From countries table, Select DISTINCT combination of continent and region columns. When multiple column names are given after DISTINCT, unique combination of all those columns is selected and not an unique values for any specific column: <br>
    **SELECT DISTINCT continent, region FROM countries**
- DISTINCT for any specific columns (Postgres specific command): <br>
    **SELECT DISTINCT ON (continent) continent, region FROM countries**

### 2. WHERE
(To filter rows based on condition on specific fields)

From countries table, Select all capital cities in Asia: <br>
**SELECT country, capital FROM countries
        WHERE continent='Asia'**

### 3. GROUP BY
(To group rows based on common values in a column, and extract aggregate values)

From countries table, get total population of continents: <br>
**SELECT SUM(pop) AS total_population, continent <br>
    FROM countries<br>
    GROUP BY continent**

Notes:
- When GROUP BY clause is used, columns cannot be included in SELECT without aggregation functions except those columns have been used from Grouping
- Aggregation functions can be SUM, AVG, MAX, MIN, COUNT
- Alias can be specified (e.g. total_poulation)

### 4. HAVING
(Aggregation functions cannot be in WHERE clause for filtering rows. To filter using aggregate values, HAVING clause is used)

From countries table, get continents with total population > 1000000: <br>
**SELECT SUM(pop) AS total_population, continent <br>
    FROM countries<br>
    GROUP BY continent<br>
    HAVING SUM(pop) > 1000000**
    
Notes:
- Alias cannot be used in grouping and filtering clauses. This is because aggregate function are applied on the grouped and selected data, so alias is not available before that.

### 4. ORDER BY
(To sort in ascending or descending order)

From countries table, get total population of continents. Sort in ascending order: <br>
**SELECT SUM(pop) AS total_population, continent <br>
    FROM countries<br>
    GROUP BY continent<br>
    ORDER BY total_population**

Notes:
- Default Order is ASC (ascending), uses `ORDER BY total_population DESC` for sorting in descending order. 
- Alias can be used in ORDER BY clause
- Can specify ORDER BY multiple columns: `ORDER BY total_population DESC, continent ASC`
- ORDER BY column should appear in the SELECT clause because Ordering happens after Select.

### 5. LIMIT
(To specify the number of rows to return)

From countries table, select the two largest continents in terms of total population: <br>
**SELECT SUM(pop) AS total_population, continent <br>
    FROM countries<br>
    GROUP BY continent<br>
    ORDER BY total_population DESC<br>
    LIMIT 2**
    
Note: Limit restricts the number of actual rows returned by the query, and makes the program more memory efficient. This is different from pandas dataframe .head(), where limited number of rows are visible but all are loaded in the memory.

## LIMIT - OFFSET
(To specify the number of rows to return, after skipping initial `offset` number of rows)

From countries table, select the second and third largest continents in terms of total population (i.e skip the first row): <br>
**SELECT SUM(pop) AS total_population, continent <br>
    FROM countries<br>
    GROUP BY continent<br>
    ORDER BY total_population DESC<br>
    LIMIT 2 OFFSET 1**   

Overall Notes: 
- Order of keywords is important in SQL

In [None]:
### 6. JOIN (INNER, LEFT, RIGHT)
(To create virtual tables using )
Inner Join clause creates a new table (not physical) by combining rows that have matching values in two or more tables.
Example: Query all employee information and their divisions of the department.
Note: Blue table is the first table and the green table is our second table.
SELECT * FROM employees e
INNER JOIN departments d
ON e.department = d.department
result from data sample where inner joins were used by data scientists using SQL
The Left Join returns all rows from the left table and the matching rows from the right table. If no matching rows are found in the right table, NULL is used. (vice versa for Right Join)
Example: Write a query that prints all departments from employees and matches departments from the department table.
SELECT e.department,d.department FROM employees e
LEFT JOIN departments  d
ON e.department = d.department
result from data sample where right joins and left joins were used by data scientists using SQL

In [None]:
5. Date Functions
In PostgreSQL, you can easily extract values from date columns. You will see the most used date functions below.
SELECT 
date_part('year',hire_date) as year,
date_part('month',hire_date) as month,
date_part('day',hire_date) as day,
date_part('dow',hire_date) as dayofweek,
to_char(hire_date, 'Dy') as day_name,
to_char(hire_date,'Month') as month_name,
hire_date
FROM employees
result from data sample where DATE FUNTION was used by data scientists using SQL
Date Functions Output

In [None]:
7. Subqueries
A subquery is a SQL query nested inside a larger query.
A subquery may occur in:
a SELECT clause
a FROM clause
a WHERE clause
Example: Query first_name, department, and salary of each employee and also maximum salary given.
SELECT first_name,department,salary,(SELECT max(salary) FROM employees)
FROM employees
result from data sample where subqueries were used by data scientists using SQL
QUE

In [None]:
8. Correlated Subqueries
A correlated subquery is one way of reading every row in a table and comparing values in each row against related data. It is used whenever a subquery must return a different result or set of results for each candidate row considered by the main query.
Example: Write a query that finds the first name, salary, department, and average salary by department.
SELECT first_name,salary,department,round((SELECT AVG(salary) 
    FROM employees e2
    WHERE e1.department = e2.department
    GROUP BY department )) as avg_salary_by_department
FROM employees e1 
WHERE salary > (SELECT AVG(salary) 
    FROM employees e2
    WHERE e1.department = e2.department
    GROUP BY department )
ORDER BY salary
result from data sample where corelated subqueries were used by data scientistsusing SQL

In [None]:
9. Case When Clause
The CASE statement is used to implement the logic where you want to set the value of one column depending upon the values in other columns.
It is similar to the IF-ELSE statement in Excel.
Example: Write a query to print the first name, salary, and average salary as well as a new column that shows whether employees' salary is higher than average or not.
SELECT first_name,salary,(SELECT ROUND(AVG(salary)) FROM employees) as average_salary,
(CASE WHEN salary > (SELECT AVG(salary) FROM employees) THEN 'higher_than_average'
ELSE 'lower_than_average' END) as Salary_Case
FROM employees
result from data sample where CASE WHEN clause was used by data scientists using SQL

10. Comments
- `--` for single line or inline comments
- `\*     *\` for multi-line comments 

10. Window Functions
Window functions apply aggregate and ranking functions over a particular window (set of rows). OVER clause is used with window functions to define that window. OVER clause does two things:
Partitions rows to form set of rows (PARTITION BY clause is used).
Orders rows within those partitions into a particular order (ORDER BY clause is used).
Various aggregate functions such as SUM(), COUNT(), AVERAGE(), MAX(), and MIN() applied over a particular window (set of rows) are called aggregate window functions.
10.1. Aggregation Examples
The following query will give you the average salary for each department.
SELECT first_name,salary,department,
ROUND(AVG(salary) OVER(PARTITION BY department)) as avg_sales_by_dept
FROM employees
ORDER BY salary DESC
result from data sample where aggregation was used by data scientists using SQL
Aggregate Window Functions
10.2.Ranking the Values
The Rank() function is a window function that assigns a rank to each row within a partition of a result set.
The following example orders the table by the salary (descending). A rank value of 1 is the highest salary value.
SELECT first_name,salary,RANK() OVER(ORDER BY salary DESC)
FROM employees
result from data sample where ranking was used by data scientists using SQL
These are the queries that are used commonly by data professionals.
Hope you found it helpful! Thanks for reading!