# Busiest Flight Routes 
In this assignment, you'll play the role of a freelance analyst for TravelAire, a boutique travel firm. They're preparing for a busy travel season, and have put together a list of the flights they expect will be most popular. While the CSV contains data such as destination and arrival airports as well as number of passengers in previous years, it does not contain the median price of a ticket for each of these popular routes. 

The travel firm has therefore asked that you use the Amadeus Flights API to augment this list with the most up-to-date price information for each route. Specifically, they want you to calculate the median ticket price for each route, and afterwards, would like for you to compare the average ticket cost of domestic routes with international routes in this dataset. The firm is hoping to use this information to better plan their marketing efforts for the upcoming season
In order to complete this assignment, you will: 
- Connect to the Amadeus API 
- Load your CSV of Busiest Flight Routes data into a data frame 
- Write a function that will retrieve a list of flight offers based on the origin and destination airports 
- Write a function that will compute a median price from a given list of flight offers 
- Use the above functions to retrieve the median ticket price for each route in your CSV and update the data frame 
- Compare the average of all median domestic prices to the average of all median international flights

This is arguably the most difficult assignment in the term. Take your time, review your notes, and collaborate with your fellow peers!


### Getting Started
To get started, download the following files:
- `Unit 23 - Business - Unsolved.ipynb` (_this notebook_)
- `BusiestFlightRoutes.csv`

Place these together in to a dedicated directory on your hard drive. We recommend creating a folder in your `Documents` directory for this week of class, as follows:

```
Documents/
  Python/
    Term III/
      Unit 23 - Business - Unsolved.ipynb
      BusiestFlightRoutes.csv
```

Then, start Jupyter Notebook and open `Unit 23 - Business - Unsolved.ipynb` in your browser. Make sure the and `BusiestFlightRoutes.csv` file lives in the same directory.

---

### Problem Structure
Each problem will be accompanied by:
- **Instructions**
  - Each problem features a markdown cell explaining the problem.
- **Unfinished Code Cells**
  - Each problem has unfinished code cells, where you will write code to solve the problem.
  - Cells will contain either starter code for you to finish, or a comment explaining what your code should do.
- **Expected Output**. 
  - Many unfinished code cells will have output below them. You will be expected to write code that produces the same output.
  - Some unfinished code cells do _not_ have output below them. This is simply because not all code will generate output. Your solutions for these cells should _not_ print anything.
  
---
  
### Deliverables
To receive credit for this assignment, you must submit the following files:
- Your completed Jupyter Notebook

Your completed Jupyter Notebook will be this file, but with all of the problems solved. This is the only file you will need to submit. When you're done with the assignment, run all cells to verify that your code executes as expected. Then, save and submit this notebook.

Good luck!

# First Steps: Registering for & Connecting to the Amadeus API

You need an Amadeus for Developers account to complete this assignment. Follow the steps below to sign up:
- Create a new account at [Amadeus for Developers](https://developers.amadeus.com/my-apps)
- Click `Create New Application`, and click through any fields that pop up
- Copy the API key and secret into the `API_KEY` and `API_SECRET` variables in the cell below.
- Read the [Authorization Guide](https://developers.amadeus.com/self-service/apis-docs/guides/authorization-262). Identify the URL used to generate a token, and save it into the `AUTH_URL` variable in the cell below.

Then, run the provided code cell with the `Requst access token` comment to generate a token that you can use to make API requests.

**Important Note**: In order to prevent people from overusing the API, Amadeus invalidates tokens every 30 minutes. You may notice as you proceed that some of your requests fail, claiming that you lack adequate permissions. If this happens, simply re-run the cell below with the comment `# REQUEST ACCESS TOKEN`. This will fix the authentication error so that you can re-run the cell that produced the failed request.

---

Running the provided code will generate output similar to the following:

```
{'Authorization': 'Bearer RnieGwbz6kh3RPAJn2fbGGdIY7lc'}
```

Your specific `Bearer` token will look different, because authentication tokens are always unique.

In [1]:
# Provided Code -- Do NOT Edit!
import json
import pandas as pd
import matplotlib.pyplot as plt
import requests
import statistics
from pprint import pprint

# Provided for use in Part 2
def median(numbers):
    return statistics.median(numbers) if len(numbers) else None

auth_header = {}

In [2]:
# TODO: Save your `API_KEY`, `API_SECRET` and `AUTH_URL`
API_KEY = 'LwtIwiyvEYUlQYbsWAKpYvvZnF8xBHQW'
API_SECRET = 'zwGUBpXTTQeyfINj'
AUTH_URL = 'https://test.api.amadeus.com/v1/security/oauth2/token'

In [3]:
# TODO: Request access token -- CRITICAL NOTE: You'll need to run this code block periodically, since tokens expire after roughly ~30 minutes
request_auth = requests.post(AUTH_URL, data={
    'client_id': API_KEY,
    'client_secret': API_SECRET,
    'grant_type': 'client_credentials'
})
request_auth.status_code

200

In [4]:
# Provided Code --Do NOT Edit!
auth_header['Authorization'] = 'Bearer ' + request_auth.json()["access_token"]
auth_header

{'Authorization': 'Bearer b5YfEaQFdtHItGsrfXbpOuMxWvpO'}

## Part 1: Loading Data
In Part 1, you will load the `BusiestFlightRoutes.csv` data set.

### Problem 1: Load Flight Route Data
You have been provided with a `filename` containing the path to `BusiestFlightRoutes.csv`. Use it to complete the following steps:
- Load `BusiestFlightRoutes.csv` into a DataFrame called `routes`
- Print the `head` and `dtypes` of `routes`
---

Your code should print the following:

```

Airport 1	Airport 2	Distance (km)	Number of Passengers in 2018	Number of Passengers in 2017	Type
0	CJU	GMP	449	14107414	13460306	Domestic
1	CTS	HND	835	9698639	8726502	Domestic
2	SYD	MLB	705	9245392	9090941	Domestic
3	FUK	HND	889	8762547	7864000	Domestic
4	BOM	DEL	1150	7392155	7129943	Domestic

Airport 1                       object
Airport 2                       object
Distance (km)                    int64
Number of Passengers in 2018     int64
Number of Passengers in 2017     int64
Type                            object
dtype: object
```

In [5]:
# TODO: Load `BusiestFlightRoutes.csv`
filename = 'BusiestFlightRoutes.csv'
routes = pd.read_csv(filename)

In [6]:
# TODO: Print `head`
routes.head()

Unnamed: 0,Airport 1,Airport 2,Distance (km),Number of Passengers in 2018,Number of Passengers in 2017,Type
0,CJU,GMP,449,14107414,13460306,Domestic
1,CTS,HND,835,9698639,8726502,Domestic
2,SYD,MLB,705,9245392,9090941,Domestic
3,FUK,HND,889,8762547,7864000,Domestic
4,BOM,DEL,1150,7392155,7129943,Domestic


In [7]:
# TODO: Print `dtypes`
routes.dtypes

Airport 1                       object
Airport 2                       object
Distance (km)                    int64
Number of Passengers in 2018     int64
Number of Passengers in 2017     int64
Type                            object
dtype: object

## Part 2: Retrieving Price Data from the API
In Part 2, you will use the API to add a `CurrentMedianPrice` column to your DataFrame. You will write two helper functions to do this:
- `get_current_offers`, which retrieves a list of flight offers
- `get_median_offer_price`, which computes the median price from a list of flight offers

Variables representing the requisite API URLs are provided below. Briefly review the documentation about the [Flight Offers Search](https://developers.amadeus.com/self-service/category/air/api-doc/flight-offers-search) and [Flight Offers Price Analysis](https://developers.amadeus.com/blog/inroducing-flight-price-analysis-historical-flight-price-data) endpoints before proceeding.

In [8]:
# Provided Code -- Do NOT Edit!
OFFERS_URL = 'https://test.api.amadeus.com/v2/shopping/flight-offers'
OFFERS_URL

'https://test.api.amadeus.com/v2/shopping/flight-offers'

### Problem 1: Implement `get_current_offers` Function
You will implement a function, called `get_current_offers`, which accepts the following parameters:
- `origin`: Departing airport
- `destination`: Destination airport

You have been given starter code in the cells below, consisting of the `params` dictionary required to make the request. This dictionary specifies the following query parameters:
- `originLocationCode`: Airport code of the originating airport. 
- `destinationLocationCode`: Airport code of the destination airport
- `date`: The day on which to look for ticket prices. In this assignment, you'll hard-code this to `'2021-09-01'`.
- `adults`: The number of adults on the flight. In this assignment, you'll hard-code this to `1`.

You must fill in the code required to actually perform the request, namely:
- Use the provided `parameters` to send a `GET` request, and save the result into a variable called `response`
    - Convert the `response` to JSON, and `return` the `data` key

Note that the `params` we have provided have a hard-coded date. You will _only_ be looking at price data for flight offers on `2021-09-01`, as this is the beginning of the busy season.

**Hints**
- Recall that `requests.get` can take a [headers](https://realpython.com/python-requests/#request-headers) parameter. You must use this parameter to send your `auth_header` in order to connect to the API.
- If your request fails, re-run the cell at the top of this notebook to regenerate your `auth_header` variable. Then, try the request again.

In [9]:
# Declare `get_current_offers` function, accepting `origin` and `destination` parameters
def get_current_offers(origin, destination):
    # Query parameters
    params = {
        'originLocationCode': origin,
        'destinationLocationCode': destination,
        'departureDate': '2022-09-01',
        'adults': 1,
    }
    
    # TODO: Invoke `requests` with `OFFERS_URL`, `params` and your `auth_header`
    response = requests.get(OFFERS_URL, params=params, headers = auth_header)
    
    # TODO: Return `data` key of response JSON
    return response.json()['data']

Next, call your function with the following arguments, and save the result into a variable called `test_offers_data`:
- `origin='LHR'`
- `destination='JFK'`

Print the length of `test_offers_data`, as well as its first element. Your code's output should begin with the following:

```
128

{'type': 'flight-offer',
 'id': '1',
 'source': 'GDS',
 'instantTicketingRequired': False,
 'nonHomogeneous': False,
 'oneWay': False,
 'lastTicketingDate': '2021-04-24',
 'numberOfBookableSeats': 9,
 ...
}
```

Note that your results may be slightly different, as flight data changes daily.

In [10]:
# TODO: Test function on test values 
test_offers_data = get_current_offers('LHR','JFK')

In [11]:
# TODO: Print length of `test_offers_data`
len(test_offers_data)

136

In [12]:
# TODO: Print first element of `test_offers_data`
pprint(test_offers_data[0])

{'id': '1',
 'instantTicketingRequired': False,
 'itineraries': [{'duration': 'PT13H40M',
                  'segments': [{'aircraft': {'code': '320'},
                                'arrival': {'at': '2022-09-01T14:00:00',
                                            'iataCode': 'LIS',
                                            'terminal': '1'},
                                'blacklistedInEU': False,
                                'carrierCode': 'TP',
                                'departure': {'at': '2022-09-01T11:20:00',
                                              'iataCode': 'LHR',
                                              'terminal': '2'},
                                'duration': 'PT2H40M',
                                'id': '174',
                                'number': '1367',
                                'numberOfStops': 0,
                                'operating': {'carrierCode': 'TP'}},
                               {'aircraft': {'code': '339'},
    

### Problem 2: Computing Median of Current Offer Prices
The function above gets a list of all ticket offers between two airports. Next, you will write a function that takes in a parameter that is a list of dictionaries, and computes the median of flight prices. After you write your function, you will test it using the `test_offers_data` as the function argument. 

The cell below contains starter code. Complete it by following the steps below:
- Create an empty list called `prices`
- Iterate over each offer in `list_of_offers`
- Within the loop:
  - Extract each element's `price.grandTotal` value
  - Convert this value to a `float`
  - Append the result to `prices`
- Call `median` with `prices` as its argument, and return the result

In [13]:
# TODO: Method 1 -- Declare `get_median_offer_price` function, accepting `offers_data` argument
def get_median_offer_price (offers_data):
   

    # TODO: Collect `offer_grand_totals
    prices =[]
    
    # TODO: Iterate over each `offer` in `offers_data`
    for offer in offers_data:       
    
        # TODO: Extract `price.grandTotal` and convert to `float`
        grandTotal =  float(offer['price']['grandTotal'])
        
        # TODO: Append result to `offer_grand_totals`
        prices.append(grandTotal)

    # TODO: Make sure 
    return median(prices)

Next, test `get_median_offer_price` by invoking it on the `test_offers_data` variable you fetched in the previous problem. 

Your code should print the following:

```
1667.84
```

**Please note**, you may see a different number when you run this code, as flight prices change daily.

In [14]:
# TODO: Test by invoking on `offers_response`
get_median_offer_price (test_offers_data)

1786.61

### Problem 3: Fetching Required Data & Augmenting the `routes` DataFrame
Finally, you'll use the functions implemented in the previous problems to retrieve the median current price for each route in `routes`. You'll append these results as the `CurrentMedianPrice`.

You have been provided with an empty list, called `current_median_price`; a `date` variable; and a partially implemented `for` loop. Use them to complete the steps below:
- Use `iterrows` to terate over each `index` and `row` in `routes`
- Within the loop:
    - Extract each `row`'s `Airport 1` and `Airport 2` fields into `origin` and `destination` variables, respectively
    - Invoke `get_current_offers` with `origin`, `destination` and `date` to get a list of `current_offers`
    - Invoke `get_median_offer_price` on `current_offers` to get the current median price
    - Append the result of the above call to `current_median_prices`
- Use `current_median_prices` to add a `CurrentMedianPrice` column to `routes`

Finally, print the `head` of your `routes` DataFrame.

---

Your code should print the following:

```
	Airport 1	Airport 2	Distance (km)	Number of Passengers in 2018	Number of Passengers in 2017	Type	CurrentMedianPrice
0	CJU	GMP	449	14107414	13460306	Domestic	129.94
1	CTS	HND	835	9698639	8726502	Domestic	186.41
2	SYD	MLB	705	9245392	9090941	Domestic	1960.74
3	FUK	HND	889	8762547	7864000	Domestic	185.21
4	BOM	DEL	1150	7392155	7129943	Domestic	67.14
```

**Note:** Your results may differ slightly from the exemplar above.

In [15]:
# Provided Code -- Do NOT Edit!
current_median_prices = []

In [16]:
# TODO: Iterate over each `index` and `row` in `routes`
for index, row in routes.iterrows():
    
    # TODO: Extract `origin` and `destination` from `row`
    origin = row['Airport 1']
    destination = row['Airport 2']

    # TODO: Fetch `current_offers` by invoking `get_current_offers` with `origin`, `destination` and `date`
    current_offers = get_current_offers(origin, destination)

    # TODO: Get `median_offer_price` of `current_offers`
    median_offer_price = get_median_offer_price(current_offers)

    # TODO: Append `median_offer_price` to `current_median_prices`
    current_median_prices.append(median_offer_price)


In [17]:
# TODO: Augment  `routes` with `CurrentMedianPrice` and `HistoricalMedianPrice` columns
routes['CurrentMedianPrice'] = current_median_prices

In [18]:
# TODO: Print first 5 rows of `routes`
routes.head(5)

Unnamed: 0,Airport 1,Airport 2,Distance (km),Number of Passengers in 2018,Number of Passengers in 2017,Type,CurrentMedianPrice
0,CJU,GMP,449,14107414,13460306,Domestic,133.01
1,CTS,HND,835,9698639,8726502,Domestic,246.06
2,SYD,MLB,705,9245392,9090941,Domestic,1705.08
3,FUK,HND,889,8762547,7864000,Domestic,201.13
4,BOM,DEL,1150,7392155,7129943,Domestic,66.45


## Part 3: Analyzing Price Trends
In Part 3, you will use the `CurrentMedianPrice` column to provide TravelAire with basic insight into the airfare landscape. Specifically, you will answer the following question:
- What is the difference between the average price of domestic vs international flights?

### Problem 1: Average Price of Domestic vs International Flights
Find the average price of Domestic vs International flights. Start by grouping your DataFrame on `Type`, and then computing the `mean` of the `CurrentMedianPrice` for each group. Print your result.

Your code should grnerate the following output:

```
Type
Domestic         230.642436
International    424.721111
Name: CurrentMedianPrice, dtype: float64
```

In [19]:
# TODO: Compute mean `CurrentMedianPrice` on `Type` groups
Routes = routes.groupby(by='Type')
Routes['CurrentMedianPrice'].mean()

Type
Domestic         277.375000
International    552.591667
Name: CurrentMedianPrice, dtype: float64