## Introduction

One of our large global financial clients spends millions of dollars a year on air travel. They believe that some of their contracted airlines are intentionally extending their quoted mileage for routes to lower the results of their cost-per-mile KPI, resulting in them appearing cheaper than the competition. Our company have therefore been tasked to track the point-to-point mileage between their top destinations and compare it to the mileage given by the carriers.

The first step of this is to establish the point-to-point distance between each of the airports given. Our company maintain a master list of airport latitude and longitude provided by the International Air Transport Association, and it is using this that we intend to calculate these distances so that they can be wrapped into a function and compared to the carrier-provided distances live in one of our analytics applications.

#### First, let's have a look at provided dataset...

In [5]:
import pandas as pd
import numpy as np
import geopy.distance
import airportsdata
import re

In [6]:
df = pd.read_csv("Flight-distances.csv")

In [7]:
df.head()

Unnamed: 0,Normalised City Pair,Departure Code,Arrival Code,Departure_lat,Departure_lon,Arrival_lat,Arrival_lon
0,"London, United Kingdom - New York, United Stat...",LHR,JFK,51.5,-0.45,40.64,-73.79
1,"Johannesburg, South Africa - London, United Ki...",JNB,LHR,-26.1,28.23,51.47,-0.45
2,"London, United Kingdom - New York, United Stat...",LHR,JFK,51.5,-0.45,40.64,-73.79
3,"Johannesburg, South Africa - London, United Ki...",JNB,LHR,-26.1,28.23,51.47,-0.45
4,"London, United Kingdom - Singapore, Singapore",SIN,LHR,1.3,103.98,51.47,-0.45


#### Calculating distances between airports 

In [8]:
# Calculating distances between airports based on delivered dataset

distance_list = []
for i in df.index:
    departure_coords = (df.Departure_lat[i], df.Departure_lon[i])
    arrival_coords = (df.Arrival_lat[i],df.Arrival_lon[i])
    distance_list.append(round((geopy.distance.geodesic(departure_coords, arrival_coords).km),2))

In [9]:
# Adding to data frame column containing distances between airports

df['Declared_distance_km'] = np.array(distance_list)
df.head()

Unnamed: 0,Normalised City Pair,Departure Code,Arrival Code,Departure_lat,Departure_lon,Arrival_lat,Arrival_lon,Declared_distance_km
0,"London, United Kingdom - New York, United Stat...",LHR,JFK,51.5,-0.45,40.64,-73.79,5555.04
1,"Johannesburg, South Africa - London, United Ki...",JNB,LHR,-26.1,28.23,51.47,-0.45,9040.01
2,"London, United Kingdom - New York, United Stat...",LHR,JFK,51.5,-0.45,40.64,-73.79,5555.04
3,"Johannesburg, South Africa - London, United Ki...",JNB,LHR,-26.1,28.23,51.47,-0.45,9040.01
4,"London, United Kingdom - Singapore, Singapore",SIN,LHR,1.3,103.98,51.47,-0.45,10890.57


#### Creating function allowing us to compare previously calculated and actual distances between airports

In [10]:
# Loading IATA airports data

airports = airportsdata.load('IATA') 
airports = pd.DataFrame.from_dict(airports, 
                                  orient='index', 
                                  columns= ['name', 'city', 'lat', 'lon'])

In [11]:
def checking_distance():

    valid_code = re.compile(r"^[A-Z]{3}$")

    def airport_code(user_input):
        if valid_code.match(user_input.strip()):
            return True
        else:
            return False

    while True:
        user_input = input('Enter Departure Code (Three capital letters according to IATA codes):')
        if airport_code(user_input) == True:
            dep_code_input = user_input
            while True:
                user_input = input('Enter Arrival Code (Three capital letters according to IATA codes):')
                if airport_code(user_input) == True:
                    arr_code_input = user_input
                    break            
                else:
                    print('Invalid code')
            break            
        else:
            print('Invalid code')
            
    
    dep_lon = airports['lon'].loc[dep_code_input]
    dep_lat = airports['lat'].loc[dep_code_input]
    
    arr_lon = airports['lon'].loc[arr_code_input]
    arr_lat = airports['lat'].loc[arr_code_input]
    
    departure_coords = (dep_lat, dep_lon)
    arrival_coords = (arr_lat, arr_lon)
    
      
    print('\nChecking for distance...')
    print('\nProvided flights with calculated distances between airports based on given coordinates:\n')
    print(df[['Normalised City Pair', 'Departure Code', 'Arrival Code', 'Declared_distance_km']].loc[(df['Departure Code'] == dep_code_input) & (df['Arrival Code'] == arr_code_input)].to_string())
    print(f'\nActual distance between airports is: {round((geopy.distance.geodesic(departure_coords, arrival_coords).km),2)} km')
    

#### Let's try our function on the three most frequented flights

In [12]:
df[['Departure Code', 'Arrival Code']].value_counts().head(3)

Departure Code  Arrival Code
LHR             JFK             11
SIN             HKG              9
HKG             SIN              8
dtype: int64