# Shortest and Longest Sceduled Flights

Author: Daniel Eriksson

This notebook demonstrates how to use Python with Pandas
to analyze the [OpenFlights](http://openflights.org) dataset
in order to find the longest and shortest scheduled flights.

In [1]:
import pandas as pd
import numpy as np

Fetch route data

In [2]:
routes = pd.read_csv("openflights/data/routes.dat", sep=',',
     names=[
        'airline',
        'airline_id',
        'source_airport',
        'source_airport_id',
        'dest_airport',
        'dest_airport_id',
        'codeshare',
        'stops',
        'equipment'
    ],
    na_values=r'\N',
)
routes.head()

Unnamed: 0,airline,airline_id,source_airport,source_airport_id,dest_airport,dest_airport_id,codeshare,stops,equipment
0,2B,410.0,AER,2965.0,KZN,2990.0,,0,CR2
1,2B,410.0,ASF,2966.0,KZN,2990.0,,0,CR2
2,2B,410.0,ASF,2966.0,MRV,2962.0,,0,CR2
3,2B,410.0,CEK,2968.0,KZN,2990.0,,0,CR2
4,2B,410.0,CEK,2968.0,OVB,4078.0,,0,CR2


Fetch airport data

In [3]:
airports = pd.read_csv("openflights/data/airports.dat", sep=',',
    names=[
        'airport_id',
        'airport_name',
        'city',
        'country',
        'iata_faa',
        'icao',
        'lat',
        'lng',
        'alt',
        'timezone',
        'dst',
        'tz',
    ],
    index_col=0,
    na_values=r'\N',
)
airports.head()

Unnamed: 0_level_0,airport_name,city,country,iata_faa,icao,lat,lng,alt,timezone,dst,tz
airport_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,Goroka,Goroka,Papua New Guinea,GKA,AYGA,-6.081689,145.391881,5282,10.0,U,Pacific/Port_Moresby
2,Madang,Madang,Papua New Guinea,MAG,AYMD,-5.207083,145.7887,20,10.0,U,Pacific/Port_Moresby
3,Mount Hagen,Mount Hagen,Papua New Guinea,HGU,AYMH,-5.826789,144.295861,5388,10.0,U,Pacific/Port_Moresby
4,Nadzab,Nadzab,Papua New Guinea,LAE,AYNZ,-6.569828,146.726242,239,10.0,U,Pacific/Port_Moresby
5,Port Moresby Jacksons Intl,Port Moresby,Papua New Guinea,POM,AYPY,-9.443383,147.22005,146,10.0,U,Pacific/Port_Moresby


Define function for computing distance between coordinates.
The [haversine formula](http://www.movable-type.co.uk/scripts/latlong.html)
should be accurate for a range of distances.

In [4]:
def deg2rad(deg):
    """Convert degrees to radians. Works on numpy arrays"""
    return np.pi*deg/180.0


def distance(source_lat, source_lng, dest_lat, dest_lng):
    """Calculate distance using the haversine formula"""
    r = 6.371e3  # km
    phi_s = deg2rad(source_lat)
    phi_d = deg2rad(dest_lat)
    d_lambda = deg2rad(dest_lng - source_lng)

    a = np.sin((phi_d - phi_s)/2)**2 + np.cos(phi_s)*np.cos(phi_d)*np.sin(d_lambda/2)**2
    c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1-a))

    return r * c;
    

Grab the source and destination ids and look them up in the airports list.
Use the previously defined function `distance` to calculate distances between
source and destination.

In [6]:
route_airport_ids = routes[['source_airport_id', 'dest_airport_id']].dropna()
sources = airports[['airport_name', 'lat', 'lng']].loc[route_airport_ids.source_airport_id]
destinations = airports[['airport_name', 'lat', 'lng']].loc[route_airport_ids.dest_airport_id]

sources.columns = ['source_name', 'source_lat', 'source_lng']
destinations.columns = ['dest_name', 'dest_lat', 'dest_lng']

sources.index = range(sources.shape[0])
destinations.index = range(destinations.shape[0])

route_lens = pd.concat([sources, destinations], axis=1)

route_lens['length'] = distance(
    route_lens['source_lat'],
    route_lens['source_lng'],
    route_lens['dest_lat'],
    route_lens['dest_lng'])

Grab the ten longest flights. There are some duplicates, probably because several
airline fly the route or because code share routes are listed twice.

The longest flight in the database is **Sydney Intl** to **Dallas Fort Worth Intl** at **13800 km**.

In [7]:
route_lens.nlargest(10, 'length')

Unnamed: 0,source_name,source_lat,source_lng,dest_name,dest_lat,dest_lng,length
6831,Sydney Intl,-33.946111,151.177222,Dallas Fort Worth Intl,32.896828,-97.037997,13808.161124
46747,Sydney Intl,-33.946111,151.177222,Dallas Fort Worth Intl,32.896828,-97.037997,13808.161124
20060,Hartsfield Jackson Atlanta Intl,33.636719,-84.428067,Johannesburg Intl,-26.139166,28.246,13582.583646
20904,Johannesburg Intl,-26.139166,28.246,Hartsfield Jackson Atlanta Intl,33.636719,-84.428067,13582.583646
13929,Dubai Intl,25.252778,55.364444,Los Angeles Intl,33.942536,-118.408075,13400.076892
14056,Los Angeles Intl,33.942536,-118.408075,Dubai Intl,25.252778,55.364444,13400.076892
23083,Dubai Intl,25.252778,55.364444,Los Angeles Intl,33.942536,-118.408075,13400.076892
23167,Los Angeles Intl,33.942536,-118.408075,Dubai Intl,25.252778,55.364444,13400.076892
51032,King Abdulaziz Intl,21.679564,39.156536,Los Angeles Intl,33.942536,-118.408075,13389.824488
51078,Los Angeles Intl,33.942536,-118.408075,King Abdulaziz Intl,21.679564,39.156536,13389.824488


Grab the ten shortest flights. The top one seems to be an error as the source and destination
are the same airport.

The shortest flight in the database is between **Philip S W Goldson Intl** and **Belize City Municipal Airport** at **1.2 km**.

In [8]:
route_lens.nsmallest(10, 'length')

Unnamed: 0,source_name,source_lat,source_lng,dest_name,dest_lat,dest_lng,length
33037,Iskandar,-2.705197,111.673208,Iskandar,-2.705197,111.673208,0.0
3714,Philip S W Goldson Intl,17.539144,-88.308203,Belize City Municipal Airport,17.5344,-88.298,1.203554
3738,Belize City Municipal Airport,17.5344,-88.298,Philip S W Goldson Intl,17.539144,-88.308203,1.203554
42316,Philip S W Goldson Intl,17.539144,-88.308203,Belize City Municipal Airport,17.5344,-88.298,1.203554
42328,Belize City Municipal Airport,17.5344,-88.298,Philip S W Goldson Intl,17.539144,-88.308203,1.203554
35266,Point Baker Seaplane Base,56.351944,-133.6225,Port Protection Seaplane Base,56.328889,-133.61,2.676851
38715,Papa Westray Airport,59.3517,-2.90028,Westray Airport,59.3503,-2.95,2.822657
38719,Westray Airport,59.3503,-2.95,Papa Westray Airport,59.3517,-2.90028,2.822657
1726,Kasigluk Airport,60.873333,-162.524444,Nunapitchuk Airport,60.905833,-162.439167,5.86009
2643,Kasigluk Airport,60.873333,-162.524444,Nunapitchuk Airport,60.905833,-162.439167,5.86009
