# Mapping 

Transportation is about getting from place A to place B.  Therefore, most transportation data has a spatial component to it.  It is nice to be able to put these data on a map and see what is going on.  It is even better if we can put it on a map and interact with the data.  It would be even cooler if we could put our interactive map on a website to show it off!

To do this, we are going to use a package called folium.  You can find the documentation here: 

https://folium.readthedocs.io/en/latest/

And access it on github here: 

https://github.com/python-visualization/folium


### Credits

This lesson draws from the folium quickstart notebook, and from Vik Paruchuri DataQuest lesson: 

https://www.dataquest.io/blog/python-data-visualization-libraries/

### A side note on static mapping

Sometimes you may want to create a static map instead of an interactive map.  Interactive maps are nice for exploring your data, but static maps work well for an image that you can insert into a paper.  If you want to create static maps, then basemap is a good tool.  Here is a nice lesson focused on mapping earthquake activity: 

http://introtopython.org/visualization_earthquakes.html



### OK, back to interactive mapping, because that's fun...

It turns out that folium doesn't do much itself.  It is just a wrapper around something called leafletjs.  You can read more about that here:

http://leafletjs.com/index.html

Leaflet is a library in the JavaScript language.  JavaScript is the language used for most web applications.  We could do the same thing using JavaScript and leaflet directly, but then we would have to learn the syntax for another language.  That might not be too hard, but to keep it simple, we'll stick to the python wrapper for now.  It is good to be aware of, though, because if you want more options than folium allows, you can go directly to leaflet.  

What makes this possible is the fact that leaflet has a well-defined API.  That means that we can pass data back and forth, even from a different language.  


### Setup

Start by installing folium using pip.  At a command prompt, type: 

    pip install folium

Hmm...when I tried this on my desktop, I get an error that says: 

    PermissionError: [WinError 5] Access is denied: 'c:\\program files\\anaconda3\\Lib\\site-packages\\folium'
    
It seems that it is trying to install something in the program files directory, which Windows has protected.  This will depend on the security settings on your machine.  If you get this error, open a command prompt as an administrator.  In the windows search bar, type cmd.  When you see the command prompt, right click, and select run as administrator.  

This did the trick, and now I get: 

    Successfully installed folium-0.2.1
    
In addition, let's go to github and clone the folium repository (https://github.com/python-visualization/folium) to our desktop.  This gives us the source code on our local machine.  What we're really interested in is the examples folder, which gives us a bunch of jupyter notebooks showing how to do different stuff.  You are welcome to explore these as needed. 

You also need to install geopandas, which will make it easier to work with goegraphic data.  The pip installer doesn't work (the long explanation is here: http://geoffboeing.com/2014/09/using-geopandas-windows/), so we'll install using anaconda.  Type: 

    conda install -c conda-forge geopandas
 


Getting Started
---------------

To create a base map, simply pass your starting coordinates to Folium:

In [1]:
import folium

In [3]:
m = folium.Map(location=[38.034,-84.500])

to display it in your notebook, just ask for the object representation. 

In [4]:
m

To save it in a file

In [5]:
m.save('lex.html')

In [6]:
folium.Map?

We can use different backgrounds, or tilesets.  Several are built in.  Options include Stamen Terrain, Stamen Toner, Mapbox Bright, and Mapbox Control room tiles. 

In [7]:
folium.Map(
    location=[38.034,-84.500],
    tiles='Stamen Toner',
    zoom_start=13
)

Pick one you like and work with that for the rest of the class.  

Folium also supports Cloudmade and Mapbox custom tilesets- simply pass your key to the API_key keyword.  These are services where you can buy more backgrounds to make your maps look nice. 

```python
folium.Map(location=[45.5236, -122.6750],
           tiles='Mapbox',
           API_key='your.API.key')
```

### Open flights

Let's go back to our openflight data and make some maps. 

In [2]:
import pandas as pd
import numpy as np

In [9]:
# These files use \N as a missing value indicator.  When reading the CSVs, we will tell
# it to use that value as missing or NA.  The double backslash is required because
# otherwise it will interpret \N as a carriage return. 

# Read in the airports data.
airports = pd.read_csv("data/airports.dat", header=None, na_values='\\N')
airports.columns = ["id", "name", "city", "country", "iata", "icao", "latitude", "longitude", "altitude","timezone", "dst", "tz", "type", "source"]

# Read in the airlines data.
airlines = pd.read_csv("data/airlines.dat", header=None, na_values='\\N')
airlines.columns = ["id", "name", "alias", "iata", "icao", "callsign", "country", "active"]

# Read in the routes data.
routes = pd.read_csv("data/routes.dat", header=None, na_values='\\N')
routes.columns = ["airline", "airline_id", "source", "source_id", "dest", "dest_id", "codeshare", "stops", "equipment"]

In [10]:
# let's peek at what we have
airports.head()

Unnamed: 0,id,name,city,country,iata,icao,latitude,longitude,altitude,timezone,dst,tz,type,source
0,1,Goroka Airport,Goroka,Papua New Guinea,GKA,AYGA,-6.08169,145.391998,5282,10.0,U,Pacific/Port_Moresby,airport,OurAirports
1,2,Madang Airport,Madang,Papua New Guinea,MAG,AYMD,-5.20708,145.789001,20,10.0,U,Pacific/Port_Moresby,airport,OurAirports
2,3,Mount Hagen Kagamuga Airport,Mount Hagen,Papua New Guinea,HGU,AYMH,-5.82679,144.296005,5388,10.0,U,Pacific/Port_Moresby,airport,OurAirports
3,4,Nadzab Airport,Nadzab,Papua New Guinea,LAE,AYNZ,-6.569803,146.725977,239,10.0,U,Pacific/Port_Moresby,airport,OurAirports
4,5,Port Moresby Jacksons International Airport,Port Moresby,Papua New Guinea,POM,AYPY,-9.44338,147.220001,146,10.0,U,Pacific/Port_Moresby,airport,OurAirports


In [11]:
airlines.head()

Unnamed: 0,id,name,alias,iata,icao,callsign,country,active
0,-1,Unknown,,-,,,,Y
1,1,Private flight,,-,,,,Y
2,2,135 Airways,,,GNL,GENERAL,United States,N
3,3,1Time Airline,,1T,RNX,NEXTIME,South Africa,Y
4,4,2 Sqn No 1 Elementary Flying Training School,,,WYT,,United Kingdom,N


In [12]:
routes.head()

Unnamed: 0,airline,airline_id,source,source_id,dest,dest_id,codeshare,stops,equipment
0,2B,410.0,AER,2965.0,KZN,2990.0,,0,CR2
1,2B,410.0,ASF,2966.0,KZN,2990.0,,0,CR2
2,2B,410.0,ASF,2966.0,MRV,2962.0,,0,CR2
3,2B,410.0,CEK,2968.0,KZN,2990.0,,0,CR2
4,2B,410.0,CEK,2968.0,OVB,4078.0,,0,CR2


Make a map with the airports on it.

In [None]:
# since there are a lot of airports, making the map can be slow
# so limit it to US airports
us_airports = airports[airports['country']=='United States']
len(us_airports)

In [13]:
# Get a basic world map.
# 30 centers the map E-W, and 0 is the equator
airports_map = folium.Map(location=[30, 0], zoom_start=2) #30 deg and 0 deg

# Loop through the airports, and draw each one as a marker on the map
# popup tells it what to display when you click on it
for name, row in us_airports.iterrows():
    
    # For some reason, this one airport causes issues with the map.
    if row["name"] != "South Pole Station":
        marker = folium.Marker([row["latitude"], row["longitude"]]) # we exclude popup=row['name'] here to get the map
        marker.add_to(airports_map) #add my marker to my map
        
# Save it to a file (it's kinda big for the notebook) #file can be saved as html
airports_map.save('airports.html') #to show it in explorer allow block contents

Hmm...it looks like there are airports everywhere!  Let's try again with smaller makers. 

We can also specify the color.  A list of custom colors is available here: 

http://www.w3schools.com/cssref/css_colors.asp

In [14]:
# over-write the airports_map, rather than just adding more markers to it. 
airports_map = folium.Map(location=[30, 0], zoom_start=2)

# use circle markers this time, with custom size and color
for name, row in us_airports.iterrows():
        
    # For some reason, this one airport causes issues with the map.
    if row["name"] != "South Pole Station":
        marker = folium.CircleMarker([row["latitude"], row["longitude"]], 
                                     radius=5,
                                     color='DarkCyan',
                                     fill_color='DarkCyan')
        marker.add_to(airports_map)
        
airports_map.save('airports.html')

You can also select icons to use as markers.  That code would look like: 

    marker = folium.Marker([row["latitude"], row["longitude"]], 
                           icon=folium.Icon(icon='cloud'), 
                           popup=row['name'])
                           
The list of icons comes from something called bootstrap, and can be found here: 

http://www.bootstrapicons.com/


Or you can use clusters of markers to clean up the map.  This will group them when you zoom out, similar to a Craigslist map.  You can see how to do that here: 

https://ocefpaf.github.io/python4oceanographers/blog/2015/12/14/geopandas_folium/

You can clean up the rest of this airports map as part of your homework this week.  

Let's draw the routes, but since we have lots, let's just start with the routes departing Lexington. 

In [15]:
# Select the LEX routes, then join the source airports
lex_routes = routes[(routes['source']=="LEX")]
lex_routes = pd.merge(lex_routes, airports, left_on='source_id', right_on='id', how='left')

In [16]:
# join the destination airports.  Here we need to use the suffixes option, because 
# the column names overlap, and we want to distinguish between source and dest
lex_routes = pd.merge(lex_routes, airports, 
                      left_on='dest_id', 
                      right_on='id', 
                      how='left', 
                      suffixes=['_source','_dest'])

In [18]:
# here is what our data looks like
lex_routes

Unnamed: 0,airline,airline_id,source_x,source_id,dest,dest_id,codeshare,stops,equipment,id_source,...,iata_dest,icao_dest,latitude_dest,longitude_dest,altitude_dest,timezone_dest,dst_dest,tz_dest,type_dest,source
0,9E,3976.0,LEX,4017,ATL,3682,,0,CRJ,4017,...,ATL,KATL,33.6367,-84.428101,1026,-5.0,A,America/New_York,airport,OurAirports
1,AA,24.0,LEX,4017,CLT,3876,Y,0,CR7 CRJ,4017,...,CLT,KCLT,35.214001,-80.9431,748,-5.0,A,America/New_York,airport,OurAirports
2,AA,24.0,LEX,4017,DFW,3670,Y,0,ERD ER4,4017,...,DFW,KDFW,32.896801,-97.038002,607,-6.0,A,America/Chicago,airport,OurAirports
3,AA,24.0,LEX,4017,ORD,3830,Y,0,ERD ER4,4017,...,ORD,KORD,41.9786,-87.9048,672,-6.0,A,America/Chicago,airport,OurAirports
4,AF,137.0,LEX,4017,ATL,3682,Y,0,CRJ CR9,4017,...,ATL,KATL,33.6367,-84.428101,1026,-5.0,A,America/New_York,airport,OurAirports
5,DL,2009.0,LEX,4017,ATL,3682,,0,M88 717,4017,...,ATL,KATL,33.6367,-84.428101,1026,-5.0,A,America/New_York,airport,OurAirports
6,DL,2009.0,LEX,4017,DCA,3520,Y,0,CRJ,4017,...,DCA,KDCA,38.8521,-77.037697,15,-5.0,A,America/New_York,airport,OurAirports
7,DL,2009.0,LEX,4017,DTW,3645,Y,0,CR7 CRJ CR9,4017,...,DTW,KDTW,42.212399,-83.353401,645,-5.0,A,America/New_York,airport,OurAirports
8,DL,2009.0,LEX,4017,LGA,3697,,0,ERJ,4017,...,LGA,KLGA,40.777199,-73.872597,21,-5.0,A,America/New_York,airport,OurAirports
9,DL,2009.0,LEX,4017,MSP,3858,Y,0,CRJ,4017,...,MSP,KMSP,44.882,-93.221802,841,-6.0,A,America/Chicago,airport,OurAirports


In [19]:
lex_routes.to_csv('lex_routes1.csv')

In [18]:
# It looks like source has some duplicate names.  Drop the values from the airports
# file ane keep the one from the routes file
lex_routes = lex_routes.drop(['source_y','source'], axis=1)
lex_routes = lex_routes.rename(columns={'source_x': 'source'})
lex_routes

Unnamed: 0,airline,airline_id,source,source_id,dest,dest_id,codeshare,stops,equipment,id_source,...,country_dest,iata_dest,icao_dest,latitude_dest,longitude_dest,altitude_dest,timezone_dest,dst_dest,tz_dest,type_dest
0,9E,3976.0,LEX,4017,ATL,3682,,0,CRJ,4017,...,United States,ATL,KATL,33.6367,-84.428101,1026,-5.0,A,America/New_York,airport
1,AA,24.0,LEX,4017,CLT,3876,Y,0,CR7 CRJ,4017,...,United States,CLT,KCLT,35.214001,-80.9431,748,-5.0,A,America/New_York,airport
2,AA,24.0,LEX,4017,DFW,3670,Y,0,ERD ER4,4017,...,United States,DFW,KDFW,32.896801,-97.038002,607,-6.0,A,America/Chicago,airport
3,AA,24.0,LEX,4017,ORD,3830,Y,0,ERD ER4,4017,...,United States,ORD,KORD,41.9786,-87.9048,672,-6.0,A,America/Chicago,airport
4,AF,137.0,LEX,4017,ATL,3682,Y,0,CRJ CR9,4017,...,United States,ATL,KATL,33.6367,-84.428101,1026,-5.0,A,America/New_York,airport
5,DL,2009.0,LEX,4017,ATL,3682,,0,M88 717,4017,...,United States,ATL,KATL,33.6367,-84.428101,1026,-5.0,A,America/New_York,airport
6,DL,2009.0,LEX,4017,DCA,3520,Y,0,CRJ,4017,...,United States,DCA,KDCA,38.8521,-77.037697,15,-5.0,A,America/New_York,airport
7,DL,2009.0,LEX,4017,DTW,3645,Y,0,CR7 CRJ CR9,4017,...,United States,DTW,KDTW,42.212399,-83.353401,645,-5.0,A,America/New_York,airport
8,DL,2009.0,LEX,4017,LGA,3697,,0,ERJ,4017,...,United States,LGA,KLGA,40.777199,-73.872597,21,-5.0,A,America/New_York,airport
9,DL,2009.0,LEX,4017,MSP,3858,Y,0,CRJ,4017,...,United States,MSP,KMSP,44.882,-93.221802,841,-6.0,A,America/Chicago,airport


In [19]:
one_stop_lex = lex_routes[(lex_routes['stops'] == 1 )]
one_stop_lex

Unnamed: 0,airline,airline_id,source,source_id,dest,dest_id,codeshare,stops,equipment,id_source,...,country_dest,iata_dest,icao_dest,latitude_dest,longitude_dest,altitude_dest,timezone_dest,dst_dest,tz_dest,type_dest


In [20]:
# Let's keep only one route between each airport pair
# so we don't have a bunch of lines on top of each other
# The subset option tells it to consider just those columns when determining
# what is a duplicate. 

lex_routes = lex_routes.drop_duplicates(subset=['source', 'dest'])
lex_routes

Unnamed: 0,airline,airline_id,source,source_id,dest,dest_id,codeshare,stops,equipment,id_source,...,country_dest,iata_dest,icao_dest,latitude_dest,longitude_dest,altitude_dest,timezone_dest,dst_dest,tz_dest,type_dest
0,9E,3976.0,LEX,4017,ATL,3682,,0,CRJ,4017,...,United States,ATL,KATL,33.6367,-84.428101,1026,-5.0,A,America/New_York,airport
1,AA,24.0,LEX,4017,CLT,3876,Y,0,CR7 CRJ,4017,...,United States,CLT,KCLT,35.214001,-80.9431,748,-5.0,A,America/New_York,airport
2,AA,24.0,LEX,4017,DFW,3670,Y,0,ERD ER4,4017,...,United States,DFW,KDFW,32.896801,-97.038002,607,-6.0,A,America/Chicago,airport
3,AA,24.0,LEX,4017,ORD,3830,Y,0,ERD ER4,4017,...,United States,ORD,KORD,41.9786,-87.9048,672,-6.0,A,America/Chicago,airport
6,DL,2009.0,LEX,4017,DCA,3520,Y,0,CRJ,4017,...,United States,DCA,KDCA,38.8521,-77.037697,15,-5.0,A,America/New_York,airport
7,DL,2009.0,LEX,4017,DTW,3645,Y,0,CR7 CRJ CR9,4017,...,United States,DTW,KDTW,42.212399,-83.353401,645,-5.0,A,America/New_York,airport
8,DL,2009.0,LEX,4017,LGA,3697,,0,ERJ,4017,...,United States,LGA,KLGA,40.777199,-73.872597,21,-5.0,A,America/New_York,airport
9,DL,2009.0,LEX,4017,MSP,3858,Y,0,CRJ,4017,...,United States,MSP,KMSP,44.882,-93.221802,841,-6.0,A,America/Chicago,airport
10,G4,35.0,LEX,4017,FLL,3533,,0,M80,4017,...,United States,FLL,KFLL,26.072599,-80.152702,9,-5.0,A,America/New_York,airport
11,G4,35.0,LEX,4017,PGD,7056,,0,M80,4017,...,United States,PGD,KPGD,26.9202,-81.990501,26,-5.0,A,America/New_York,airport


That looks better.  Now, let's create a map.  To avoid adding duplicate airports, we are going to use a container called a set.  A set is an unordered collection of unique elements.  This means we can keep adding LEX to the set, and end up with only 1 LEX in the end.  

In [2]:
# create a basic map, centered on Lexington
lex_air = folium.Map(
    location=[38.034,-84.500],
    tiles='Stamen Toner',
    zoom_start=4
)

NameError: name 'folium' is not defined

In [22]:
# Define some empty sets
airport_set = set()
route_set = set()

# Make sure we don't add duplicates, especially for the origins
for name, row in lex_routes.iterrows():
    
    if row['source'] not in airport_set: 
        popup_string = row['city_source'] + ' (' + row['source'] + ')'
        marker = folium.CircleMarker([row["latitude_source"], row["longitude_source"]], 
                                     color='DarkCyan',
                                     fill_color='DarkCyan', 
                                     radius=5, popup=popup_string)
        marker.add_to(lex_air)
        airport_set.add(row['source'])
        
    if row['dest'] not in airport_set: 
        popup_string = row['city_dest'] + '(' + row['dest'] + ')'
        marker = folium.CircleMarker([row["latitude_dest"], row["longitude_dest"]], 
                                     color='MidnightBlue',
                                     fill_color='MidnightBlue', 
                                     radius=5, popup=popup_string)
        marker.add_to(lex_air)
        airport_set.add(row['dest'])
    
    # the parentheses in the indicate that we are adding a tuple to the route_set
    if (row['source'],row['dest']) not in route_set:            
        popup_string = row['source'] + '-' + row['dest']        
        line = folium.PolyLine([(row["latitude_source"], row["longitude_source"]), 
                                (row["latitude_dest"], row["longitude_dest"])], 
                                weight=2)
        line.add_to(lex_air)
        route_set.add((row['source'],row['dest']))
        
lex_air
lex_air.save('Lex airports.html')

That's cool.  But airplanes don't fly in a straight line.  They follow the great circle.  So when you fly from Chicago to London, you go over Greenland (which is really pretty on a clear day!).  Can we make the lines follow a great circle? 

It looks like there are some options here: 

http://gis.stackexchange.com/questions/47/what-tools-in-python-are-available-for-doing-great-circle-distance-line-creati

Let's try one of them. 

In [3]:
import pyproj

# when creating a function, it is good practice to define the API!
def getGreatCirclePoints(startlat, startlon, endlat, endlon): 
    """
    startlat - starting latitude 
    startlon - starting longitude 
    endlat   - ending latitude 
    endlon   - ending longitude 
    
    returns - a list of tuples, where each tuple is the lat-long for a point
              along the curve.  
    """
    # calculate distance between points
    g = pyproj.Geod(ellps='WGS84')
    (az12, az21, dist) = g.inv(startlon, startlat, endlon, endlat)

    # calculate line string along path with segments <= 20 km
    lonlats = g.npts(startlon, startlat, endlon, endlat,
                     1 + int(dist / 20000))

    # the npts function uses lon-lat, while the folium functions use lat-lon
    # This sort of thing is maddening!  What happens is the lines don't show
    # up on the map and you don't know why.  Learn from my mistakes
    latlons = []
    for lon_lat in lonlats: 
        
        # this is how you get values out of a tuple
        (lon, lat) = lon_lat
        
        # add them to our list
        latlons.append((lat, lon)) 
    
    # npts doesn't include start/end points, so prepend/append them
    latlons.insert(0, (startlat, startlon))
    latlons.append((endlat, endlon))
    
    return latlons


In [4]:
# any time we write a function, we should test that it works
p = getGreatCirclePoints(38.034, -84.500, 33.636700, -84.428101) 
p

[(38.034, -84.5),
 (37.864933929949096, -84.49708149511396),
 (37.695862920583586, -84.49417629534568),
 (37.52678697986378, -84.49128425988357),
 (37.35770611591111, -84.48840524964173),
 (37.18862033700805, -84.48553912723197),
 (37.019529651598035, -84.48268575693639),
 (36.85043406828536, -84.47984500468044),
 (36.68133359583508, -84.4770167380065),
 (36.51222824317288, -84.474200826048),
 (36.343118019385024, -84.47139713950395),
 (36.17400293371815, -84.46860555061399),
 (36.00488299557918, -84.46582593313389),
 (35.83575821453518, -84.46305816231151),
 (35.66662860031317, -84.46030211486323),
 (35.49749416279997, -84.45755766895067),
 (35.328354912042094, -84.45482470415813),
 (35.15921085824549, -84.45210310147009),
 (34.99006201177538, -84.44939274324938),
 (34.82090838315612, -84.44669351321562),
 (34.65174998307092, -84.44400529642411),
 (34.48258682236167, -84.44132797924507),
 (34.3134189120287, -84.43866144934323),
 (34.14424626323062, -84.43600559565783),
 (33.9750688872

In [25]:
# create a basic map, centered on Lexington
lex_air = folium.Map(
    location=[38.034,-84.500],
    tiles='Stamen Toner',
    zoom_start=4
)
lex_air

In [26]:
# define the map in the same way, but use great circles for the lines

# Define some empty sets
airport_set = set()
route_set = set()

# Make sure we don't add duplicates, especially for the origins
for name, row in lex_routes.iterrows():
    
    if row['source'] not in airport_set: 
        popup_string = row['city_source'] + ' (' + row['source'] + ')'
        marker = folium.CircleMarker([row["latitude_source"], row["longitude_source"]], 
                                     color='DarkCyan',
                                     fill_color='DarkCyan', 
                                     radius=5, popup=popup_string)
        marker.add_to(lex_air)
        airport_set.add(row['source'])
        
    if row['dest'] not in airport_set: 
        popup_string = row['city_dest'] + '(' + row['dest'] + ')'
        marker = folium.CircleMarker([row["latitude_dest"], row["longitude_dest"]], 
                                     color='MidnightBlue',
                                     fill_color='MidnightBlue', 
                                     radius=5, popup=popup_string)
        marker.add_to(lex_air)
        airport_set.add(row['dest'])
    
    # PolyLine will accept a whole list of tuples, not just two
    if (row['source'],row['dest']) not in route_set:            
        popup_string = row['source'] + '-' + row['dest']       
        
        gc_points = getGreatCirclePoints(row["latitude_source"], 
                                         row["longitude_source"], 
                                         row["latitude_dest"], 
                                         row["longitude_dest"])
        
        line = folium.PolyLine(gc_points, weight=2, popup=popup_string)
        line.add_to(lex_air)
        route_set.add((row['source'],row['dest']))
        
lex_air   

In [27]:
# save it to its own file
lex_air.save("lex_air.html")

### Your turn

The above map shows everywhere you can get to from Lexington on a direct flight.  Your job is to:

1. Make a map of all the possible destinations with one transfer. 
2. Make a map of all the possible desitnations with two transfers. 

Make the maps look nice!  Use color coding, vary the size of the features, or be selective about what you display in order to communicate the information effectively.  

Bonus: This is the air travel version of the Kevin Bacon game (https://oracleofbacon.org/).  What is the number N, such that you can reach every airport in the world with N or fewer transfers?  

Extra Bonus: Use this very important piece of knowledge to impress your friends at parties!

In [77]:
# These files use \N as a missing value indicator.  When reading the CSVs, we will tell
# it to use that value as missing or NA.  The double backslash is required because
# otherwise it will interpret \N as a carriage return. 

# Read in the airports data.
airports = pd.read_csv("data/airports.dat", header=None, na_values='\\N')
airports.columns = ["id", "name", "city", "country", "iata", "icao", "latitude", "longitude", "altitude","timezone", "dst", "tz", "type", "source"]

# Read in the airlines data.
airlines = pd.read_csv("data/airlines.dat", header=None, na_values='\\N')
airlines.columns = ["id", "name", "alias", "iata", "icao", "callsign", "country", "active"]

# Read in the routes data.
routes = pd.read_csv("data/routes.dat", header=None, na_values='\\N')
routes.columns = ["airline", "airline_id", "source", "source_id", "dest", "dest_id", "codeshare", "stops", "equipment"]

In [78]:
# Select the LEX routes, then join the source airports
lex_routes = routes[(routes['source']=="LEX")]
#lex_routes = pd.merge(lex_routes, airports, left_on='source_id', right_on='id', how='left')
lex_routes

Unnamed: 0,airline,airline_id,source,source_id,dest,dest_id,codeshare,stops,equipment
3588,9E,3976.0,LEX,4017.0,ATL,3682.0,,0,CRJ
5763,AA,24.0,LEX,4017.0,CLT,3876.0,Y,0,CR7 CRJ
5764,AA,24.0,LEX,4017.0,DFW,3670.0,Y,0,ERD ER4
5765,AA,24.0,LEX,4017.0,ORD,3830.0,Y,0,ERD ER4
9641,AF,137.0,LEX,4017.0,ATL,3682.0,Y,0,CRJ CR9
21095,DL,2009.0,LEX,4017.0,ATL,3682.0,,0,M88 717
21096,DL,2009.0,LEX,4017.0,DCA,3520.0,Y,0,CRJ
21097,DL,2009.0,LEX,4017.0,DTW,3645.0,Y,0,CR7 CRJ CR9
21098,DL,2009.0,LEX,4017.0,LGA,3697.0,,0,ERJ
21099,DL,2009.0,LEX,4017.0,MSP,3858.0,Y,0,CRJ


In [79]:
lex_routes = lex_routes.drop_duplicates(subset=['dest'])
lex_routes

Unnamed: 0,airline,airline_id,source,source_id,dest,dest_id,codeshare,stops,equipment
3588,9E,3976.0,LEX,4017.0,ATL,3682.0,,0,CRJ
5763,AA,24.0,LEX,4017.0,CLT,3876.0,Y,0,CR7 CRJ
5764,AA,24.0,LEX,4017.0,DFW,3670.0,Y,0,ERD ER4
5765,AA,24.0,LEX,4017.0,ORD,3830.0,Y,0,ERD ER4
21096,DL,2009.0,LEX,4017.0,DCA,3520.0,Y,0,CRJ
21097,DL,2009.0,LEX,4017.0,DTW,3645.0,Y,0,CR7 CRJ CR9
21098,DL,2009.0,LEX,4017.0,LGA,3697.0,,0,ERJ
21099,DL,2009.0,LEX,4017.0,MSP,3858.0,Y,0,CRJ
29047,G4,35.0,LEX,4017.0,FLL,3533.0,,0,M80
29048,G4,35.0,LEX,4017.0,PGD,7056.0,,0,M80


In [80]:
lex_direct=lex_routes.dest.unique()
lex_direct

array(['ATL', 'CLT', 'DFW', 'ORD', 'DCA', 'DTW', 'LGA', 'MSP', 'FLL',
       'PGD', 'PIE', 'SFB', 'IAH'], dtype=object)

In [81]:

lex_direct_routes = routes[routes['source'].isin(lex_direct)]
lex_direct_routes

Unnamed: 0,airline,airline_id,source,source_id,dest,dest_id,codeshare,stops,equipment
265,3E,10739.0,ORD,3830.0,BRL,5726.0,,0,CNC
266,3E,10739.0,ORD,3830.0,DEC,4042.0,,0,CNC
443,3M,20710.0,ATL,3682.0,LWB,6958.0,,0,SF3
444,3M,20710.0,ATL,3682.0,MCN,3754.0,,0,SF3
445,3M,20710.0,ATL,3682.0,MEI,4335.0,,0,SF3
446,3M,20710.0,ATL,3682.0,MSL,5756.0,,0,SF3
447,3M,20710.0,ATL,3682.0,PIB,5759.0,,0,SF3
448,3M,20710.0,ATL,3682.0,TUP,5773.0,,0,SF3
455,3M,20710.0,FLL,3533.0,BIM,1937.0,,0,SF3
456,3M,20710.0,FLL,3533.0,ELH,1943.0,,0,SF3


In [82]:
lex_direct_routes=lex_direct_routes[['source','source_id','dest','dest_id']]

In [83]:
lex_direct_routes.columns=['layover','layover_id','dest','dest_id']

In [84]:
lex_direct_routes.head()

Unnamed: 0,layover,layover_id,dest,dest_id
265,ORD,3830.0,BRL,5726.0
266,ORD,3830.0,DEC,4042.0
443,ATL,3682.0,LWB,6958.0
444,ATL,3682.0,MCN,3754.0
445,ATL,3682.0,MEI,4335.0


In [85]:
lex_routes=lex_routes[['source','source_id','dest','dest_id']]
lex_routes.columns=['source','source_id','layover','layover_id']
lex_routes.head()

Unnamed: 0,source,source_id,layover,layover_id
3588,LEX,4017.0,ATL,3682.0
5763,LEX,4017.0,CLT,3876.0
5764,LEX,4017.0,DFW,3670.0
5765,LEX,4017.0,ORD,3830.0
21096,LEX,4017.0,DCA,3520.0


In [86]:
one_trsfr_routes=lex_routes.join(lex_direct_routes,how='right',on='layover',lsuffix='lex_routes',rsuffix='lex_direct_routes')
one_trsfr_routes

Unnamed: 0,layover,source,source_id,layoverlex_routes,layover_idlex_routes,layoverlex_direct_routes,layover_idlex_direct_routes,dest,dest_id
57029,265,,,,,ORD,3830.0,BRL,5726.0
57029,266,,,,,ORD,3830.0,DEC,4042.0
57029,443,,,,,ATL,3682.0,LWB,6958.0
57029,444,,,,,ATL,3682.0,MCN,3754.0
57029,445,,,,,ATL,3682.0,MEI,4335.0
57029,446,,,,,ATL,3682.0,MSL,5756.0
57029,447,,,,,ATL,3682.0,PIB,5759.0
57029,448,,,,,ATL,3682.0,TUP,5773.0
57029,455,,,,,FLL,3533.0,BIM,1937.0
57029,456,,,,,FLL,3533.0,ELH,1943.0


In [87]:
one_trsfr_routes=one_trsfr_routes[['layoverlex_direct_routes','layover_idlex_direct_routes','dest','dest_id']]

In [88]:
one_trsfr_routes['source']='LEX'
one_trsfr_routes['source_id']='4017.0'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [89]:
one_trsfr_routes.columns=['layover','layover_id','dest','dest_id','source','source_id']

In [90]:
one_trsfr_routes.head()

Unnamed: 0,layover,layover_id,dest,dest_id,source,source_id
57029,ORD,3830.0,BRL,5726.0,LEX,4017.0
57029,ORD,3830.0,DEC,4042.0,LEX,4017.0
57029,ATL,3682.0,LWB,6958.0,LEX,4017.0
57029,ATL,3682.0,MCN,3754.0,LEX,4017.0
57029,ATL,3682.0,MEI,4335.0,LEX,4017.0


In [114]:
# join the destination airports.  Here we need to use the suffixes option, because 
# the column names overlap, and we want to distinguish between source and dest
one_trsfr_allroutes = pd.merge(one_trsfr_routes, airports, 
                      left_on='dest_id', 
                      right_on='id', 
                      how='left', 
                      suffixes=['_source','_dest'])
one_trsfr_allroutes


Unnamed: 0,layover,layover_id,dest,dest_id,source_source,source_id,id,name,city,country,iata,icao,latitude,longitude,altitude,timezone,dst,tz,type,source_dest
0,ORD,3830.0,BRL,5726,LEX,4017.0,5726.0,Southeast Iowa Regional Airport,Burlington,United States,BRL,KBRL,40.783199,-91.125504,698.0,-6.0,A,America/Chicago,airport,OurAirports
1,ORD,3830.0,DEC,4042,LEX,4017.0,4042.0,Decatur Airport,Decatur,United States,DEC,KDEC,39.834599,-88.865700,682.0,-6.0,A,America/Chicago,airport,OurAirports
2,ATL,3682.0,LWB,6958,LEX,4017.0,6958.0,Greenbrier Valley Airport,Lewisburg,United States,LWB,KLWB,37.858299,-80.399498,2302.0,-5.0,U,America/New_York,airport,OurAirports
3,ATL,3682.0,MCN,3754,LEX,4017.0,3754.0,Middle Georgia Regional Airport,Macon,United States,MCN,KMCN,32.692799,-83.649200,354.0,-5.0,A,America/New_York,airport,OurAirports
4,ATL,3682.0,MEI,4335,LEX,4017.0,4335.0,Key Field,Meridian,United States,MEI,KMEI,32.332600,-88.751900,297.0,-6.0,A,America/Chicago,airport,OurAirports
5,ATL,3682.0,MSL,5756,LEX,4017.0,5756.0,Northwest Alabama Regional Airport,Muscle Shoals,United States,MSL,KMSL,34.745300,-87.610199,551.0,-6.0,A,America/Chicago,airport,OurAirports
6,ATL,3682.0,PIB,5759,LEX,4017.0,5759.0,Hattiesburg Laurel Regional Airport,Hattiesburg/Laurel,United States,PIB,KPIB,31.467100,-89.337097,298.0,-6.0,A,America/Chicago,airport,OurAirports
7,ATL,3682.0,TUP,5773,LEX,4017.0,5773.0,Tupelo Regional Airport,Tupelo,United States,TUP,KTUP,34.268101,-88.769897,346.0,-6.0,A,America/Chicago,airport,OurAirports
8,FLL,3533.0,BIM,1937,LEX,4017.0,1937.0,South Bimini Airport,Alice Town,Bahamas,BIM,MYBS,25.699900,-79.264702,10.0,-5.0,U,America/Nassau,airport,OurAirports
9,FLL,3533.0,ELH,1943,LEX,4017.0,1943.0,North Eleuthera Airport,North Eleuthera,Bahamas,ELH,MYEH,25.474899,-76.683502,13.0,-5.0,U,America/Nassau,airport,OurAirports


In [115]:
one_trsfr_allroutes.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3558 entries, 0 to 3557
Data columns (total 20 columns):
layover          3558 non-null object
layover_id       3558 non-null float64
dest             3558 non-null object
dest_id          3558 non-null object
source_source    3558 non-null object
source_id        3558 non-null object
id               3553 non-null float64
name             3553 non-null object
city             3553 non-null object
country          3553 non-null object
iata             3553 non-null object
icao             3553 non-null object
latitude         3553 non-null float64
longitude        3553 non-null float64
altitude         3553 non-null float64
timezone         3553 non-null float64
dst              3553 non-null object
tz               3553 non-null object
type             3553 non-null object
source_dest      3553 non-null object
dtypes: float64(6), object(14)
memory usage: 583.7+ KB


In [116]:
# create a basic map, centered on Lexington
lex_air = folium.Map(
    location=[38.034,-84.500],
    tiles='Stamen Toner',
    zoom_start=4
)

In [117]:
# Define some empty sets
airport_set = set()
route_set = set()

# Make sure we don't add duplicates, especially for the origins
for name, row in one_trsfr_allroutes.iterrows():
    
    if row['iata'] not in airport_set: 
        popup_string = row['city'] + ' (' + row['source_source'] + ')'
        marker = folium.CircleMarker(38.034, -84.500, color='MidnightBlue',
                                     fill_color='MidnightBlue', radius = 5, popup=popup_string )
        marker.add_to(lex_air)
        airport_set.add(row['source_source'])
        
   
        
lex_air
lex_air.save('one transfer.html')

TypeError: __init__() got multiple values for argument 'radius'

In [118]:
layover2_direct=one_trsfr_allroutes.dest.unique()
layover2_direct

array(['BRL', 'DEC', 'LWB', 'MCN', 'MEI', 'MSL', 'PIB', 'TUP', 'BIM',
       'ELH', 'EYW', 'FPO', 'GGT', 'GHB', 'MCO', 'MHH', 'TCB', 'TPA',
       'EZE', 'CUR', 'AZO', 'CHA', 'CID', 'CRW', 'CVG', 'EVV', 'FWA',
       'GSO', 'GSP', 'LAN', 'LEX', 'MBS', 'MSP', 'ROA', 'SYR', 'TYS',
       'XNA', 'MSY', 'ATL', 'CLT', 'DFW', 'LHR', 'MIA', 'ORD', 'PHL',
       'PHX', 'ABE', 'AGS', 'ALB', 'ANU', 'AUA', 'AUS', 'AVL', 'AVP',
       'BDL', 'BHM', 'BNA', 'BOS', 'BTR', 'BUF', 'BWI', 'BZE', 'CAE',
       'CAK', 'CDG', 'CHO', 'CHS', 'CLE', 'CMH', 'CUN', 'CZM', 'DAB',
       'DAY', 'DCA', 'DEN', 'DSM', 'DTW', 'DUB', 'EWN', 'EWR', 'FAY',
       'FCO', 'FLL', 'FLO', 'FRA', 'GCM', 'GNV', 'GPT', 'GRU', 'HHH',
       'HPN', 'HSV', 'HTS', 'IAD', 'IAH', 'ILM', 'IND', 'JAN', 'JAX',
       'JFK', 'LAS', 'LAX', 'LGA', 'LIR', 'LIT', 'LYH', 'MBJ', 'MCI',
       'MDT', 'MEM', 'MEX', 'MGM', 'MHT', 'MKE', 'MLB', 'MOB', 'MYR',
       'NAS', 'OAJ', 'OMA', 'ORF', 'PBI', 'PDX', 'PGV', 'PHF', 'PIT',
       'PLS', 'PNS',

In [119]:
Final_dest = routes[routes['source'].isin(layover2_direct)]
Final_dest

Unnamed: 0,airline,airline_id,source,source_id,dest,dest_id,codeshare,stops,equipment
5,2B,410.0,DME,4029.0,KZN,2990.0,,0,CR2
6,2B,410.0,DME,4029.0,NBC,6969.0,,0,CR2
7,2B,410.0,DME,4029.0,TGK,,,0,CR2
8,2B,410.0,DME,4029.0,UUA,6160.0,,0,CR2
69,2I,8359.0,LIM,2789.0,AYP,2786.0,,0,142
70,2I,8359.0,LIM,2789.0,CUZ,2812.0,,0,142 141
71,2I,8359.0,LIM,2789.0,HUU,6067.0,,0,141
72,2I,8359.0,LIM,2789.0,PCL,2781.0,,0,143 146
73,2I,8359.0,LIM,2789.0,TPP,2806.0,,0,142 146
103,2K,1338.0,BOG,2709.0,GYE,2673.0,,0,319


In [120]:
Final_dest=Final_dest[['source','source_id','dest','dest_id']]

In [121]:
Final_dest.columns=['layover2','layover_id2','dest','dest_id']

In [122]:
Final_dest.head()

Unnamed: 0,layover2,layover_id2,dest,dest_id
5,DME,4029.0,KZN,2990.0
6,DME,4029.0,NBC,6969.0
7,DME,4029.0,TGK,
8,DME,4029.0,UUA,6160.0
69,LIM,2789.0,AYP,2786.0


In [124]:
two_trsfr_routes= one_trsfr_allroutes [['source_source','source_id','layover','layover_id','dest','dest_id']]
two_trsfr_routes.columns=['source','source_id','layover','layover_id', 'layover2', 'layover_id2']
two_trsfr_routes

Unnamed: 0,source,source_id,layover,layover_id,layover2,layover_id2
0,LEX,4017.0,ORD,3830.0,BRL,5726
1,LEX,4017.0,ORD,3830.0,DEC,4042
2,LEX,4017.0,ATL,3682.0,LWB,6958
3,LEX,4017.0,ATL,3682.0,MCN,3754
4,LEX,4017.0,ATL,3682.0,MEI,4335
5,LEX,4017.0,ATL,3682.0,MSL,5756
6,LEX,4017.0,ATL,3682.0,PIB,5759
7,LEX,4017.0,ATL,3682.0,TUP,5773
8,LEX,4017.0,FLL,3533.0,BIM,1937
9,LEX,4017.0,FLL,3533.0,ELH,1943


In [125]:
two_trsfr=two_trsfr_routes.join(Final_dest,how='right',on='layover2',lsuffix='two_trsfr_routes',rsuffix='Final_dest')
two_trsfr.head()

Unnamed: 0,layover2,source,source_id,layover,layover_id,layover2two_trsfr_routes,layover_id2two_trsfr_routes,layover2Final_dest,layover_id2Final_dest,dest,dest_id
3557,5,,,,,,,DME,4029.0,KZN,2990.0
3557,6,,,,,,,DME,4029.0,NBC,6969.0
3557,7,,,,,,,DME,4029.0,TGK,
3557,8,,,,,,,DME,4029.0,UUA,6160.0
3557,69,,,,,,,LIM,2789.0,AYP,2786.0


In [126]:
two_trsfr=two_trsfr[['layover2Final_dest','layover_id2Final_dest','dest','dest_id']]

In [127]:
two_trsfr['source']='LEX'
two_trsfr['source_id']='4017.0'

In [128]:
two_trsfr.columns=['layover2', 'layover2_id','dest', 'dest_id', 'source','source_id']
two_trsfr

Unnamed: 0,layover2,layover2_id,dest,dest_id,source,source_id
3557,DME,4029.0,KZN,2990.0,LEX,4017.0
3557,DME,4029.0,NBC,6969.0,LEX,4017.0
3557,DME,4029.0,TGK,,LEX,4017.0
3557,DME,4029.0,UUA,6160.0,LEX,4017.0
3557,LIM,2789.0,AYP,2786.0,LEX,4017.0
3557,LIM,2789.0,CUZ,2812.0,LEX,4017.0
3557,LIM,2789.0,HUU,6067.0,LEX,4017.0
3557,LIM,2789.0,PCL,2781.0,LEX,4017.0
3557,LIM,2789.0,TPP,2806.0,LEX,4017.0
3557,BOG,2709.0,GYE,2673.0,LEX,4017.0


In [129]:
# join the destination airports.  Here we need to use the suffixes option, because 
# the column names overlap, and we want to distinguish between source and dest
two_trsfr_allroutes = pd.merge(two_trsfr, airports, 
                      left_on='dest_id', 
                      right_on='id', 
                      how='left', 
                      suffixes=['_source','_dest'])
two_trsfr_allroutes

Unnamed: 0,layover2,layover2_id,dest,dest_id,source_source,source_id,id,name,city,country,iata,icao,latitude,longitude,altitude,timezone,dst,tz,type,source_dest
0,DME,4029.0,KZN,2990,LEX,4017.0,2990.0,Kazan International Airport,Kazan,Russia,KZN,UWKD,55.606201,49.278702,411.0,3.0,N,Europe/Moscow,airport,OurAirports
1,DME,4029.0,NBC,6969,LEX,4017.0,6969.0,Begishevo Airport,Nizhnekamsk,Russia,NBC,UWKE,55.564701,52.092499,643.0,3.0,N,Europe/Moscow,airport,OurAirports
2,DME,4029.0,TGK,,LEX,4017.0,,,,,,,,,,,,,,
3,DME,4029.0,UUA,6160,LEX,4017.0,6160.0,Bugulma Airport,Bugulma,Russia,UUA,UWKB,54.639999,52.801701,991.0,3.0,N,Europe/Moscow,airport,OurAirports
4,LIM,2789.0,AYP,2786,LEX,4017.0,2786.0,Coronel FAP Alfredo Mendivil Duarte Airport,Ayacucho,Peru,AYP,SPHO,-13.154800,-74.204399,8917.0,-5.0,U,America/Lima,airport,OurAirports
5,LIM,2789.0,CUZ,2812,LEX,4017.0,2812.0,Alejandro Velasco Astete International Airport,Cuzco,Peru,CUZ,SPZO,-13.535700,-71.938797,10860.0,-5.0,U,America/Lima,airport,OurAirports
6,LIM,2789.0,HUU,6067,LEX,4017.0,6067.0,Alferez Fap David Figueroa Fernandini Airport,Huánuco,Peru,HUU,SPNC,-9.878810,-76.204803,6070.0,-5.0,U,America/Lima,airport,OurAirports
7,LIM,2789.0,PCL,2781,LEX,4017.0,2781.0,Cap FAP David Abenzur Rengifo International Ai...,Pucallpa,Peru,PCL,SPCL,-8.377940,-74.574303,513.0,-5.0,U,America/Lima,airport,OurAirports
8,LIM,2789.0,TPP,2806,LEX,4017.0,2806.0,Cadete FAP Guillermo Del Castillo Paredes Airport,Tarapoto,Peru,TPP,SPST,-6.508740,-76.373199,869.0,-5.0,U,America/Lima,airport,OurAirports
9,BOG,2709.0,GYE,2673,LEX,4017.0,2673.0,José Joaquín de Olmedo International Airport,Guayaquil,Ecuador,GYE,SEGU,-2.157420,-79.883598,19.0,-5.0,U,America/Guayaquil,airport,OurAirports


In [131]:
two_trsfr_allroutes.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 26262 entries, 0 to 26261
Data columns (total 20 columns):
layover2         26262 non-null object
layover2_id      26262 non-null float64
dest             26262 non-null object
dest_id          26232 non-null object
source_source    26262 non-null object
source_id        26262 non-null object
id               26097 non-null float64
name             26097 non-null object
city             26097 non-null object
country          26097 non-null object
iata             26097 non-null object
icao             26097 non-null object
latitude         26097 non-null float64
longitude        26097 non-null float64
altitude         26097 non-null float64
timezone         26097 non-null float64
dst              26097 non-null object
tz               26086 non-null object
type             26097 non-null object
source_dest      26097 non-null object
dtypes: float64(6), object(14)
memory usage: 4.2+ MB


In [130]:
# create a basic map, centered on Lexington
lex_air = folium.Map(
    location=[38.034,-84.500],
    tiles='Stamen Toner',
    zoom_start=4
)


       

In [133]:
# Define some empty sets
airport_set = set()
route_set = set()

# Make sure we don't add duplicates, especially for the origins
for name, row in two_trsfr_allroutes.iterrows():
    
    if row['source_source'] not in airport_set: 
        popup_string = row['city'] + ' (' + row['source_source'] + ')'
        marker = folium.CircleMarker(38.034,-84.500, 
                                     color='DarkCyan',
                                     fill_color='DarkCyan', 
                                     radius=5, popup=popup_string)
        marker.add_to(lex_air)
        airport_set.add(row['source_source'])
        
lex_air
lex_air.save('two transfer.html')

TypeError: __init__() got multiple values for argument 'radius'