## ☑️ Part 1: Data exploration using SQL

- Complete the following questions
- Make sure you run the following code cells before you attempt any of the questions
- We will work with `flights.db` database through this workbook

First, import pandas and sqlite3 libraries and create the connection to the `flights.db` database, located in the `data` folder:

In [None]:
import pandas as pd
import sqlite3

conn = sqlite3.connect('data/flights.db')

A database might have multiple tables. It's a good idea to do an initial exploration of the database by first querying the `sqlite_master` table to see what tables are in the database.

Run the following code cell to show all the tables in the `flights.db` database:

In [None]:
query = """
SELECT name 
FROM sqlite_master 
WHERE type = 'table';
"""
df = pd.read_sql_query(query, conn)
df


Run the following code cell to show the schema for each table in `flights.db` database:

In [None]:
for table in ['airports','airlines','routes']:
    
    query = f"""
    SELECT sql 
    FROM sqlite_master 
    WHERE name = '{table}';
    """
    
    df = pd.read_sql_query(query, conn)
    print(''.join(df.values[0, 0]))

# SQL Statements

**Q1)** Refer to the `airlines` table and show all columns of the table:
- Then, show first 5 rows using pandas `head()` method
- Use `%%time` at the beginning of a cell, to measure the query execution time

Please note you have been provided with the code for this question to carry out the necessary analysis work. Simply run the code cell to produce the desired results.

In [None]:
%%time

query = """
SELECT * 
FROM airlines;
"""

df = pd.read_sql_query(query, conn)
df.head()

**Q2)** Refer to the `airlines` table. Now do the same as above, but this time returning 5 rows using SQL. Measure the time again using `%%time`. This time query execution time should be faster.

Please note you have been provided with the code for this question to carry out the necessary analysis work. Simply run the code cell to produce the desired results.

In [None]:
%%time

query = """
SELECT * 
FROM airlines
LIMIT 5;
"""

df = pd.read_sql_query(query, conn)
df

**Q3)** Refer to the `country` column in `airlines` table. Find 7 airlines from United Kingdom.

- To extract all records that matches `United Kingdom` the filter condition `country = 'United Kingdom'` can be used

See below code syntax for some guidance:
```SQL
SELECT *
FROM airlines
WHERE <filter_condition>
LIMIT 7;
```

In [None]:
#add your code below
#query = ...


df = pd.read_sql_query(query, conn)
df

#  Advanced Filtering with WHERE
## Predicates

**Q4)** Refer to the `airlines` table. Find airlines having `icao` equal to ACC or TWF. 

See below code syntax for some guidance:
```SQL
SELECT *
FROM airlines
WHERE icao IN <list_of_values>;
```
The `list_of_values` to be compared to the column using the IN operator should be enclosed in parentheses. Eg: ('ACC','TWF').

In [None]:
#add your code below
#query = ...


df = pd.read_sql_query(query, conn)
df.head(5)

**Q5)** Refer to `name` column in the `airlines` table. Find 5 airlines whose names contains the word `Flight`.

See below code syntax for some guidance:
```SQL
SELECT *
FROM airlines
WHERE name LIKE <pattern>
LIMIT 5;
```
For example, the pattern %Airline% - matches any string that contains the word "Airline" (uppercase or lowercase).

In [None]:
#add your code below
#query = ...


df = pd.read_sql_query(query, conn)
df

**Q6)** Refer to `name`, `active`, and `callsign` columns in the `airlines` table. Find 5 active airlines having a non-empty callsign value. 

- Look for all records that match the conditions where: `active`=`'Y'` and `callsign` column `IS NOT NULL`

See below code syntax for some guidance:
```SQL
SELECT name, active, callsign
FROM airlines
WHERE <condition1> AND <condition2>
LIMIT 5;
```

In [None]:
#add your code below
#query = ...

df = pd.read_sql_query(query, conn)
df

#  Sorting results

**Q7)** Refer to `name`, `country`, and `altitude` columns in the `airports` table. Find the 5 airports with the highest `altitude`.

See below code syntax for some guidance:
```SQL
SELECT name, country, altitude
FROM airports
ORDER BY <column_name> [ASC/DESC]
LIMIT 5;
```
Please note you have been provided with the code for this question to carry out the necessary analysis work. Simply run the code cell to produce the desired results.

In [None]:
query = """
SELECT name, country, altitude
FROM airports
ORDER BY altitude DESC
LIMIT 5;
"""

df = pd.read_sql_query(query, conn)
df

Please note that the `altitude` column in the `airports` table has been assigned the TEXT data type, which would cause data within this column to be improperly sorted

The report you see above is misleading. To address this issue, In the following section, we will utilize the SQL `CAST()` function to correct the error displayed in the report above.

## ☑️ Part 2 - Column operations

**Q8)** Refer to the `airlines` table. How many airlines start with a number between `0` and `9`?
- You can use `COUNT(*)` to count the number of rows returned

Please note you have been provided with the code for this question to carry out the necessary analysis work. Simply run the code cell to produce the desired results.

In [None]:
query = """
SELECT COUNT(*) 
FROM airlines 
WHERE SUBSTR(name, 1, 1) BETWEEN '0' AND '9';
"""
df = pd.read_sql_query(query, conn)
df

**Q9)** Refer to the `country` column in `airlines` table. How many countries have at least one airline?

- Consider `IS NOT NULL` keyword to filter out any NULL values in `country` column

- Use DISTINCT() function to extract distinct values and use COUNT() function to calculate the number of distinct values in `country` column

See below code syntax for some guidance:
```SQL
SELECT COUNT(DISTINCT(column_name))
FROM airlines 
WHERE <column_name> IS NOT NULL;
```

In [None]:
#add your code below
#query = ...


df = pd.read_sql_query(query, conn)
df

**Q10)** Refer to the `altitude` column in `airports` table. Find the 5 airports with the highest `altitude`.

This question is similar to what you did before in **Q7**.

See below code syntax form **Q7**:
```SQL
SELECT name, country, altitude
FROM airports
ORDER BY altitude DESC
LIMIT 5;
```
- Refer to the `ORDER BY` keyword, now use CAST() function to convert `altitude` column to `INT` data type - Eg: CAST(altitude AS INT)  
- Now values should order properly

In [None]:
#add your code below
#query = ...


df = pd.read_sql_query(query, conn)
df

**Q11)** Refer to the `name` and `altitude` columns in `airports` table. Further filter your data to get all airports from `United Kingdom`.

See below code syntax for some guidance:
```SQL
SELECT name, altitude
FROM airports
WHERE country='United Kingdom';
```
Now use CASE() statement to create a new calculated column called `altitude2` to reflect below conditions and results:
- Return "Low" if altitude lower than 100m
- Otherwise return "Medium" if altitude lower than 500m
- Otherwise return "High"

See below code syntax for some guidance:
```SQL
CASE
    WHEN CAST(altitude AS INT) < 100 THEN 'Low'
    WHEN CAST(altitude AS INT) < 500 THEN 'Medium'
    ELSE 'High'
END AS altitude2
```

In [None]:
#add your code below
#query = ...

df = pd.read_sql_query(query, conn)
df

**Q12)** Refer to the `airports` table. Now, calculate the total number of airports.

Please note you have been provided with the code for this question to carry out the necessary analysis work. Simply run the code cell to produce the desired results.

In [None]:
query = """
SELECT COUNT(*)
FROM airports;
"""

df = pd.read_sql_query(query, conn)
df

# Group metrics

**Q13)** Refer to the `city` column in `airports` table to find the number of airports per city.

- Consider `GROUP BY` keyword to group data by `city` column

See below code syntax for some guidance:
```SQL
SELECT column_name, COUNT(*)
FROM airports
GROUP BY <column_name>;
```

In [None]:
#add your code below
#query = ...

df = pd.read_sql_query(query, conn)
df.head(5)