# 1. Introduction

## 1.1 What is SQL ?

- SQL stands for Structured Query Language.
- It allows you access and manipulate databases.
- Although SQL is an ANSI/ISO standard, there are different versions/flavors of the SQL language, depending on the database program used (ex. MySQL, MS Access, SQL Server, Orcale). 
- All versions are very similar in their construction and their support for the main commands (such as SELECT, UPDATE, DELETE, INSERT, WHERE)


## 1.2 What can SQL do ?


SQL is one of the most important technical skills/tools for a data professional. Some of the things it can do:

- SQL can create and delete databases
- SQL can create and delete tables in a database
- SQL can insert, delete and update records in a database
- SQL can retrieve data from a database
- SQL can also clean retrieved data before further manipulation (with python for example) to carry out an analysis.

## 1.3 What are the main commands in SQL ?


The main commands of SQL can be easily remembered using this template SQL query

   **SELECT** columns, aggregate(column) <br>
   FROM table_or_subquery  <br>
   **INNER/OUTER JOIN** other_table ON condition  <br>
   **WHERE** condition  <br>
   **GROUP** BY columns  <br>
   **HAVING** condition_after_aggregation  <br>
   **ORDER BY** column ASC|DESC  <br>
   **LIMIT** number;  <br>

# 2. Warm-up 

## 2.1 Q1

for a warm up and recap on yesterday's lessons, let's:

- create a new DB and give it a any name (ex. examples)
- create a new table, named **countries**, and load the csv file named **large_countries_2015.csv** in it

## 2.2 A1

*in psql*

CREATE DATABASE examples;

\c examples;

CREATE TABLE countries ( <br>
country VARCHAR(60), <br>
population NUMERIC, <br>
fertility NUMERIC, <br>
continent VARCHAR(60)); <br>

COPY countries <br>
FROM 'csv_file_path' <br>
DELIMITER ',' <br>
CSV HEADER; <br>



# 3. Queries

## Q2: Examine the entire table. How many rows are there ?

## A2: 

SELECT * <br>
FROM countries; 

SELECT COUNT(\*)
FROM countries;

## Q3: Select first 5 rows in table

## A3:

SELECT * <br>
FROM countries <br>
LIMIT 5;

## Q4: Select country and population columns in the first 10 rows of the table

## A4:

SELECT country, population <br>
FROM countries <br>
LIMIT 10;

## Q5: Update table so that population is in millions and round to the nearest million

## A5:

UPDATE countries <br>
SET population = round(population/1000000);

# Q6: Select rows where the country is in Asia, order by fertility rate

## A6:

SELECT * <br>
FROM countries <br>
WHERE continent = 'Asia' <br>
ORDER BY fertility;

## Q7: Select rows where country is NOT in Asia and fertility rate is greater than 2

## A7:

SELECT * <br>
FROM countries <br>
WHERE continent <> 'Asia' AND fertility > 2;

## Q8: Select rows where country is in Asia OR Africa AND fertility rate is greater than 2

## A8:

SELECT * <br>
FROM countries <br>
WHERE continent IN ('Asia', 'Africa') AND fertility > 2;

## Q9: Which countries have a fertility rate between 1.4 and 2 ? What is their population ?

## A9: 

SELECT country, population, fertility <br>
FROM countries <br>
WHERE fertility BETWEEN 1.4 AND 2; <br>

## Q10: Which countries are in continents that starts with the letter A ?

## A10:

SELECT country, continent <br>
FROM countries <br>
WHERE continent LIKE 'A%'; <br>


# Q11: How continents are in the table ?



## A11:

SELECT DISTINCT continent <br>
FROM countries; <br>

<br> 
SELECT COUNT(DISTINCT continent)) <br>
FROM countries;

## Q12: What is the average population and fertility rate of countries in Asia ?



## A12: 

SELECT AVG(population) AS avg_population,  AVG(fertility) AS avg_fertility <br>
FROM countries <br>
WHERE continent = 'Asia';


## Q13: What is the average population, minimum and maximum fertility rate per continent ?

## A13:

SELECT continent, <br>
       AVG(population) AS avg_pop, <br>
       MIN(fertility) AS min_fer, <br>
       MAX(fertility) AS max_fer <br>
FROM countries <br>
GROUP BY continent <br>
ORDER BY continent; <br>

## Q14: same as Q13, but filter for continents with minimum fertility greater than 1.7. How many countries are represented in these continents ?

## A14:

SELECT continent, <br>
COUNT(country) AS n_country, <br>
AVG(population) AS avg_pop, <br>
MIN(fertility) AS min_fer, <br>
MAX(fertility) AS max_fer <br>
FROM countries <br>
GROUP BY continent <br>
HAVING MIN(fertility) > 1.7 <br>
ORDER BY continent;


## Q15: Which countries have a fertility rate higher than the average of their respective continents ?



## A15:

SELECT * , <br>
       ROUND(AVG(fertility) OVER (PARTITION BY continent), 2) AS cont_avg_fer, <br>
       fertility - ROUND(AVG(fertility) OVER (PARTITION BY continent), 2) AS cont_fer_diff        
FROM countries <br>
ORDER BY country;


