# Optimized SQL queries Exercise: Flight Database
#### © Explore Data Science Academy

## Instructions to Students

#### This challenge was designed to determine how much have been learned so far and to test the knowledge of Optimised SQL queries. 

Questions were provided which were answered using optimized SQL Queries in attempt to test my understanding of the subject matter.

## Honour Code

I YINKA, AKINDELE, confirm - by submitting this document - that the solutions in this notebook are a result of my own work and that I abide by the EDSA honour code.

Non-compliance with the honour code constitutes a material breach of contract.

## US Flights Database
In this train, we used the US Flights database with actual US flights data to illustrate the optimisation techniques with large amounts of data. The flights database consist of the following tables:

- **flights**:     all domestic flights in the USA in 2008
- **carriers**:     lookup table for all the carriers
- **airports**:     lookup table for all the airports
- **planes**:     lookup table for the planes

For convenience, the flights database ER diagram is provided below:



<img src="images/flights_db_ER.png" width=30% align="center">

In [1]:
%load_ext sql

In [2]:
%sql sqlite:///data/flights.db

<br>
<br>

#### The query below was used to view all the tables in the database

In [3]:
%%sql
SELECT name FROM sqlite_master WHERE type IN ('table', 'view') AND name NOT LIKE 'sqlite_%' ORDER BY 1

 * sqlite:///data/flights.db
Done.


name
airports
carriers
flights
planes
sysdiagrams


<br>

**A query to view the flights table**

In [4]:
%%sql

SELECT *
FROM flights
LIMIT 5;

 * sqlite:///data/flights.db
Done.


index,Date,DayOfWeek,DepTime,CRSDepTime,ArrTime,CRSArrTime,UniqueCarrier,FlightNum,TailNum,ActualElapsedTime,CRSElapsedTime,AirTime,ArrDelay,DepDelay,Origin,Dest,Distance,TaxiIn,TaxiOut,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay
0,2008/1/3,4,2003.0,1955,2211.0,2225,WN,335,N712SW,128.0,150,116.0,-14.0,8.0,IAD,TPA,810,4,8,0,,0,,,,,
1,2008/1/3,4,754.0,735,1002.0,1000,WN,3231,N772SW,128.0,145,113.0,2.0,19.0,IAD,TPA,810,5,10,0,,0,,,,,
2,2008/1/3,4,628.0,620,804.0,750,WN,448,N428WN,96.0,90,76.0,14.0,8.0,IND,BWI,515,3,17,0,,0,,,,,
3,2008/1/3,4,926.0,930,1054.0,1100,WN,1746,N612SW,88.0,90,78.0,-6.0,-4.0,IND,BWI,515,3,7,0,,0,,,,,
4,2008/1/3,4,1829.0,1755,1959.0,1925,WN,3920,N464WN,90.0,90,77.0,34.0,34.0,IND,BWI,515,3,10,0,,0,2.0,0.0,0.0,0.0,32.0


<br>
<br>

### Question
How many different carriers are there in total in the database?

### Solution

In [5]:
%%sql

SELECT COUNT(DISTINCT Description) AS "Number_of_carriers_in_database"
FROM carriers

 * sqlite:///data/flights.db
Done.


Number_of_carriers_in_database
1491


<br>
<br>

### Question
How long was the longest delay before departure?

### Solution

In [6]:
%%sql

SELECT max(DepDelay) AS "Maximum_departure_delay"
FROM flights

 * sqlite:///data/flights.db
Done.


Maximum_departure_delay
1355.0


<br>
<br>

### Question
How many flights departed on the 28th of January 2008?

### Solution

In [7]:
%%sql

SELECT COUNT(*) AS "Number_of_departed_flights_on_28_01_2008"
FROM flights
WHERE Date = '2008/1/28'
AND Cancelled <> 1

 * sqlite:///data/flights.db
Done.


Number_of_departed_flights_on_28_01_2008
19495


<br>
<br>

### Question
What is the distance between Midway Airport (MDW) and Houston Airport (HOU)?

### Solution

In [8]:
%%sql

SELECT Distance AS "Distance_btw_Midway_and_Houston_Airport"
FROM flights
WHERE Origin = 'MDW'
AND Dest = 'HOU'
LIMIT 1;

 * sqlite:///data/flights.db
Done.


Distance_btw_Midway_and_Houston_Airport
937


<br>
<br>

### Question
Which day of the week had the highest number of cancelled flights? (Where, 1 = cancelled , 0 = not cancelled)

### Solution

In [9]:
%%sql

SELECT DayOfWeek, COUNT(Cancelled) AS 'Number of cancelled flight per day'
FROM flights
WHERE Cancelled = 1
GROUP BY DayOfWeek
ORDER BY [Number of cancelled flight per day] DESC

 * sqlite:///data/flights.db
Done.


DayOfWeek,Number of cancelled flight per day
4,3093
2,2993
3,2645
1,2617
5,2049
6,1984
7,1927


<br>
<br>

### Question
How many airports have the word "International" in their name?

### Solution

In [10]:
%%sql

SELECT COUNT(airport) AS "Number_of_Airport_with_international_in_their_names"
FROM airports
WHERE LOWER(airport) LIKE '%international%'

 * sqlite:///data/flights.db
Done.


Number_of_Airport_with_international_in_their_names
124


<br>
<br>

### Question
What is the most commonly produced model by the manufacturer "BOEING"?

### Solution

In [11]:
%%sql

SELECT model AS "Model", COUNT(model) AS "Number_of_Model"
FROM planes
WHERE LOWER(manufacturer) = 'boeing'
GROUP BY model
ORDER BY [Number_of_Model] DESC
LIMIT 5;

 * sqlite:///data/flights.db
Done.


Model,Number_of_Model
737-7H4,308
737-3H4,147
757-232,112
737-824,100
717-200,98


<br>
<br>

### Question
What manufacturer had the highest average delay time (DepDelay + ArrDelay)?

### Solution

In [12]:
%%sql

WITH CTE AS
(SELECT p.manufacturer AS Manufacturer, avg(f.DepDelay + f.ArrDelay) AS "Manufacturer_with_max_Average_DelayTime"
FROM flights AS f

INNER JOIN planes AS p
ON p.tailnum = f.TailNum

GROUP BY p.manufacturer)

SELECT CTE.Manufacturer, CTE.Manufacturer_with_max_Average_DelayTime
FROM CTE
ORDER BY CTE.Manufacturer_with_max_Average_DelayTime DESC
LIMIT 5;

 * sqlite:///data/flights.db
Done.


Manufacturer,Manufacturer_with_max_Average_DelayTime
AVIAT AIRCRAFT INC,46.18867924528302
GULFSTREAM AEROSPACE,36.96
SIKORSKY,36.56944444444444
FRIEDEMANN JON,33.76521739130435
LEBLANC GLENN T,33.60769230769231


<br>
<br>

### Question
How many planes landed at Los Angeles International Airport?

### Solution

In [13]:
%%sql

SELECT COUNT(f.Dest) AS "Number_of_flights_that_landed_in_LA_international_airport"
FROM flights AS f, airports AS a
WHERE f.Dest = a.iata
AND a.airport = "Los Angeles International"

 * sqlite:///data/flights.db
Done.


Number_of_flights_that_landed_in_LA_international_airport
18964


<br>
<br>

### Question
Which domestic carrier had the best on-time performance (OTP)? OTP is defined as the rate of on-time flights with a 15min buffer on departure and arrival.

### Solution

In [14]:
%%sql

SELECT c.Description, (f.ArrDelay + f.DepDelay) AS OTP
FROM flights AS f, carriers AS c
WHERE f.UniqueCarrier = c.Code

GROUP BY c.Description

ORDER BY [OTP]

LIMIT 5;

 * sqlite:///data/flights.db
Done.


Description,OTP
Pinnacle Airlines Inc.,
Hawaiian Airlines Inc.,-23.0
American Airlines Inc.,-15.0
Southwest Airlines Co.,-6.0
American Eagle Airlines Inc.,-3.0
