# Guided Project
### Visualizing Geographic Data

From scientific fields like meteorology and climatology, through to the software on our smartphones like Google Maps and Facebook check-ins, geographic data is always present in our everyday lives. Raw geographic data like latitudes and longitudes are difficult to understand using the data charts and plots we've discussed so far. To explore this kind of data, you'll need to learn how to visualize the data on maps.<br>

In this mission, we'll explore the fundamentals of geographic coordinate systems and how to work with the basemap library to plot geographic data points on maps. We'll be working with flight data from the [openflights website](http://openflights.org/data.html). Here's a breakdown of the files we'll be working with and the most pertinent columns from each dataset:

* `airlines.csv` - data on each airline.
  * `country` - where the airline is headquartered.
  * `active` - if the airline is still active.
* `airports.csv` - data on each airport.
  * `name` - name of the airport.
  * `city` - city the airport is located.
  * `country` - country the airport is located.
  * `iata` - unique airport code.
  * `latitude` - latitude value.
  * `longitude` - longitude value.

* `routes.csv` - data on each flight route.
  * `airline` - airline for the route.
  * `source` - starting city for the route.
  * `dest` - destination city for the route.
  
We can explore a range of interesting questions and ideas using these datasets:
* **For each airport, which destination airport is the most common?**
* **Which cities are the most important hubs for airports and airlines?**

Before diving into coordinate systems, explore the datasets in the code cell below.

* Read in the 3 CSV files into 3 separate dataframe objects - `airlines`, `airports`, and `routes`.
* Use the `DataFrame.iloc[]` method to return the first row in each dataframe as a neat table.
* Display the first rows for all dataframes using the `print()` function. Try to answer the following questions:
  * What's the best way to link the data from these 3 different datasets together?
  * What are the formats of the latitude and longitude values?

In [1]:
import sys
import pathlib


In [2]:
pathlib

<module 'pathlib' from '/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pathlib.py'>

In [3]:
sys.path

['',
 '/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python36.zip',
 '/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6',
 '/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/lib-dynload',
 '/Users/choigww/Library/Python/3.6/lib/python/site-packages',
 '/usr/local/lib/python3.6/site-packages',
 '/usr/local/lib/python3.6/site-packages/six-1.11.0-py3.6.egg',
 '/usr/local/lib/python3.6/site-packages/IPython/extensions',
 '/Users/choigww/.ipython']

In [None]:
!python

Python 3.6.3 |Anaconda custom (64-bit)| (default, Nov  8 2017, 18:10:31) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 

In [1]:
from mpl_toolkits.basemap import Basemap

ModuleNotFoundError: No module named 'mpl_toolkits.basemap'

In [2]:
!conda install basemap

Solving environment: done

NotWritableError: The current user does not have write permissions to a required path.
  path: /Users/choigww/anaconda/pkgs/bzip2-1.0.6-h649919c_2/info/repodata_record.json
  uid: 501
  gid: 20

If you feel that permissions on this path are set incorrectly, you can manually
change them by executing

  $ sudo chown 501:20 /Users/choigww/anaconda/pkgs/bzip2-1.0.6-h649919c_2/info/repodata_record.json

In general, it's not advisable to use 'sudo conda'.




In [3]:
!sudo chown 501:20 /Users/choigww/anaconda/pkgs/bzip2-1.0.6-h649919c_2/info/repodata_record.json

Password:


In [6]:
import mpl_toolkits

In [7]:
mpl_toolkits.basemap

AttributeError: module 'mpl_toolkits' has no attribute 'basemap'

In [12]:
from mpl_toolkits.basemap import Basemap

ModuleNotFoundError: No module named 'mpl_toolkits.basemap'

In [9]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns

In [2]:
airlines = pd.read_csv('data/airlines.csv', encoding="Latin-1")
airports = pd.read_csv('data/airports.csv', encoding="Latin-1")
routes = pd.read_csv('data/routes.csv', encoding='Latin-1')

In [3]:
routes.head()

Unnamed: 0,airline,airline_id,source,source_id,dest,dest_id,codeshare,stops,equipment
0,2B,410,AER,2965,KZN,2990,,0,CR2
1,2B,410,ASF,2966,KZN,2990,,0,CR2
2,2B,410,ASF,2966,MRV,2962,,0,CR2
3,2B,410,CEK,2968,KZN,2990,,0,CR2
4,2B,410,CEK,2968,OVB,4078,,0,CR2


In [4]:
airports.head()

Unnamed: 0,id,name,city,country,iata,icao,latitude,longitude,altitude,offset,dst,timezone
0,1,Goroka Airport,Goroka,Papua New Guinea,GKA,AYGA,-6.08169,145.391998,5282,10,U,Pacific/Port_Moresby
1,2,Madang Airport,Madang,Papua New Guinea,MAG,AYMD,-5.20708,145.789001,20,10,U,Pacific/Port_Moresby
2,3,Mount Hagen Kagamuga Airport,Mount Hagen,Papua New Guinea,HGU,AYMH,-5.82679,144.296005,5388,10,U,Pacific/Port_Moresby
3,4,Nadzab Airport,Nadzab,Papua New Guinea,LAE,AYNZ,-6.569803,146.725977,239,10,U,Pacific/Port_Moresby
4,5,Port Moresby Jacksons International Airport,Port Moresby,Papua New Guinea,POM,AYPY,-9.44338,147.220001,146,10,U,Pacific/Port_Moresby


In [5]:
airlines.head()

Unnamed: 0,id,name,alias,iata,icao,callsign,country,active
0,1,Private flight,\N,-,,,,Y
1,2,135 Airways,\N,,GNL,GENERAL,United States,N
2,3,1Time Airline,\N,1T,RNX,NEXTIME,South Africa,Y
3,4,2 Sqn No 1 Elementary Flying Training School,\N,,WYT,,United Kingdom,N
4,5,213 Flight Unit,\N,,TFU,,Russia,N


Display the first rows for all dataframes using the print() function. Try to answer the following questions:
* What's the best way to link the data from these 3 different datasets together?
* What are the formats of the latitude and longitude values?

In [6]:
print(airlines.iloc[0])
print(airports.iloc[0])
print(routes.iloc[0])

id                       1
name        Private flight
alias                   \N
iata                     -
icao                   NaN
callsign               NaN
country                NaN
active                   Y
Name: 0, dtype: object
id                              1
name               Goroka Airport
city                       Goroka
country          Papua New Guinea
iata                          GKA
icao                         AYGA
latitude                 -6.08169
longitude                 145.392
altitude                     5282
offset                         10
dst                             U
timezone     Pacific/Port_Moresby
Name: 0, dtype: object
airline         2B
airline_id     410
source         AER
source_id     2965
dest           KZN
dest_id       2990
codeshare      NaN
stops            0
equipment      CR2
Name: 0, dtype: object


A geographic coordinate system allows us to locate any point on Earth using latitude and longitude coordinates.

![latlon](https://s3.amazonaws.com/dq-content/latitude_longitude.png)

### Map Projection : 3D sphere ---> 2D data(latitude, longitude)
In most cases, we want to visualize latitude and longitude points on two-dimensional maps. Two-dimensional maps are faster to render, easier to view on a computer and distribute, and are more familiar to the experience of popular mapping software like Google Maps. Latitude and longitude values describe points on a sphere, which is three-dimensional. To plot the values on a two-dimensional plane, we need to convert the coordinates to the Cartesian coordinate system using a **map projection**.

* A [map projection](https://en.wikipedia.org/wiki/Map_projection) transforms points on a sphere to a two-dimensional plane. When projecting down to the two-dimensional plane, some properties are distorted. Each map projection makes trade-offs in what properties to preserve and you can read about the different [trade-offs](https://en.wikipedia.org/wiki/Map_projection#Metric_properties_of_maps) here. We'll use the [Mercator projection](https://en.wikipedia.org/wiki/Mercator_projection), because it is commonly used by popular mapping software.

### basemap toolkit
Before we convert our flight data to Cartesian coordinates and plot it, let's learn more about the basemap toolkit. Basemap is an extension to Matplotlib that makes it easier to work with geographic data. The documentation for basemap provides a good high-level overview of what the library does:

* The matplotlib basemap toolkit is a library for plotting 2D data on maps in Python. Basemap does not do any plotting on it’s own, but provides the facilities to transform coordinates to one of 25 different map projections.

Basemap makes it easy to convert from the spherical coordinate system (latitudes & longitudes) to the Mercator projection. While basemap uses Matplotlib to actually draw and control the map, the library provides many methods that enable us to work with maps quickly. Before we dive into how basemap works, let's get familiar with how to install it.

The easiest way to install basemap is through Anaconda. If you're new to Anaconda, we recommend checking out our [Python and Pandas installation project](https://www.dataquest.io/mission/118/project-python-and-pandas-installation):

```bash
conda install basemap
```

Because basemap uses matplotlib, you'll want to import matplotlib.pyplot into your environment when you use Basemap.

```python
from mpl_toolkits.basemap import Basemap
```




### basemap toolkit basics
Here's what the general workflow will look like when working with two-dimensional maps:

* Create a new basemap instance with the specific map projection we want to use and how much of the map we want included.
* Convert spherical coordinates to Cartesian coordinates using the basemap instance.
* Use the matplotlib and basemap methods to customize the map.
* Display the map.

Let's focus on the first step and create a new basemap instance. To create a new instance of the basemap class, we call the [basemap constructor](http://matplotlib.org/basemap/api/basemap_api.html#mpl_toolkits.basemap.Basemap) and pass in values for the required parameters:

* projection: the map projection.
* llcrnrlat: latitude of lower left hand corner of the desired map domain
* urcrnrlat: latitude of upper right hand corner of the desired map domain
* llcrnrlon: longitude of lower left hand corner of the desired map domain
* urcrnrlon: longitude of upper right hand corner of the desired map domain

In [9]:
from matplotlib import pyplot as plt


ModuleNotFoundError: No module named 'mpl_toolkits.basemap'