The Mayor of London's office make their data available to the public [here](https://data.london.gov.uk/dataset). In this project, you will work with a slightly modified version of a dataset containing information about public transport journey volume by transport type. 

The data has been loaded into a **Snowflake** database called `TFL` with a single table called `JOURNEYS`, including the following data:

## TFL.JOURNEYS

| Column | Definition | Data type |
|--------|------------|-----------|
| `MONTH`| Month in number format, e.g., `1` equals January | `INTEGER` |
| `YEAR` | Year | `INTEGER` |
| `DAYS` | Number of days in the given month | `INTEGER` |
| `REPORT_DATE` | Date that the data was reported | `DATE` |
| `JOURNEY_TYPE` | Method of transport used | `VARCHAR` |
| `JOURNEYS_MILLIONS` | Millions of journeys, measured in decimals | `FLOAT` |

Note that *in Snowflake all databases, tables, and columns are **upper case*** by default.

You will execute SQL queries to answer three questions...

---
1. What are the most popular transport types, measured by the total number of journeys? The output should contain two columns, 1) `JOURNEY_TYPE` and 2) `TOTAL_JOURNEYS_MILLIONS`, and be sorted by the second column in descending order. Save the query as `most_popular_transport_types`

In [None]:
WITH most_popular_transport_types AS (
    SELECT 
        JOURNEY_TYPE,
        SUM(JOURNEYS_MILLIONS) AS TOTAL_JOURNEYS_MILLIONS
    FROM TFL.JOURNEYS
    GROUP BY JOURNEY_TYPE
    ORDER BY TOTAL_JOURNEYS_MILLIONS DESC
)
SELECT *
FROM most_popular_transport_types;

Unnamed: 0,JOURNEY_TYPE,TOTAL_JOURNEYS_MILLIONS
0,Bus,24905.193947
1,Underground & DLR,15020.466544
2,Overground,1666.845666
3,TfL Rail,411.313421
4,Tram,314.689875
5,Emirates Airline,14.583718


---
2. Which five months and years were the most popular for the Emirates Airline? Return an output containing `MONTH`, `YEAR`, and `JOURNEYS_MILLIONS`, with the latter rounded to two decimal places and aliased as `ROUNDED_JOURNEYS_MILLIONS`. Exclude null values and save the result as `emirates_airline_popularity`

In [None]:
WITH emirates_airline_popularity AS (
	SELECT 
		MONTH,
    	YEAR,
    	ROUND(JOURNEYS_MILLIONS, 2) AS ROUNDED_JOURNEYS_MILLIONS
	FROM TFL.JOURNEYS
	WHERE JOURNEY_TYPE = 'Emirates Airline'
  		AND JOURNEYS_MILLIONS IS NOT NULL
	ORDER BY ROUNDED_JOURNEYS_MILLIONS DESC
	LIMIT 5
)
SELECT  *
FROM emirates_airline_popularity;

Unnamed: 0,MONTH,YEAR,ROUNDED_JOURNEYS_MILLIONS
0,5,2012,0.53
1,6,2012,0.38
2,4,2012,0.24
3,5,2013,0.19
4,5,2015,0.19


---
3. Find the five years with the lowest volume of Underground & DLR journeys, saving as `least_popular_years_tube`. The results should contain the columns `YEAR`, `JOURNEY_TYPE`, and `TOTAL_JOURNEYS_MILLIONS`.

In [None]:
WITH least_popular_years_tube AS
(
	SELECT 
    	YEAR,
    	JOURNEY_TYPE,
    	SUM(JOURNEYS_MILLIONS) AS TOTAL_JOURNEYS_MILLIONS
	FROM TFL.JOURNEYS
	WHERE JOURNEY_TYPE = 'Underground & DLR'
	GROUP BY YEAR, JOURNEY_TYPE
	ORDER BY TOTAL_JOURNEYS_MILLIONS ASC
	LIMIT 5
)
SELECT *
FROM least_popular_years_tube;

Unnamed: 0,YEAR,JOURNEY_TYPE,TOTAL_JOURNEYS_MILLIONS
0,2020,Underground & DLR,310.179316
1,2021,Underground & DLR,748.452544
2,2022,Underground & DLR,1064.859009
3,2010,Underground & DLR,1096.145588
4,2011,Underground & DLR,1156.647654
