# Basic analysis and plotting of geodata with Python
In this hands-on lesson, we are going to use this notebook as a guide to explore, analyze and plot data from the Airports dataset from Natural Earth. For each step in this process, we are going to write code together to acheive some goals: 

1. Read and explore the Airports dataset
2. Group airports by country and report the top 10 countries with the most airports
3. Input a 2-digit country code and return a list of airports in it
4. Answer the question, "How many countries have only one airport?"
5. Report the proportion of airports in the Southern vs. Northern hemisphere
6. Find the furthest south and north airports by latitude
7. Create a custom "hemisphere" column based on conditional statement
8. Plot all airports and per hemisphere

## Step 1: Read and explore the Airports dataset
Our first step is to import libraries we'll need to read in and explore the airports dataset. Hit shift+Enter to run the cell below and import the libraries.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import matplotlib.colors

Next, let's read in the airports_edit.csv file and store it as a DataFrame called "df". Pandas has a built-in function for reading in csv files called, read_csv(). Fill in the blanks to read in and store the dataset as df.

In [None]:
df = pd.read_csv()

There are several tools to explore the dataset with pandas, like .head(), .shape() and .info() which can all be written either in combination with print statements or on their own, attached to the df DataFrame we just created to give us some quick information about its size, attributes and contents. Run the df.head() cell below, then try writing commands for .shape() and .info(). Hint: with these two, you will need a print statement.

In [None]:
df.head()

## Step 2: Group airports by country and report the top 10 countries with the most airports
The .value_counts() function can be used to count the number of airports per country. The groupby function can be used to group data by country and count the number of unique name values of airports associated with each country. 

Since you can plot data directly from your DataFrame using the plot() method in MatPlotLib, we can store the groupby object. Using the .describe() method, show some statistics about the airports after they are divided by country. For example, what is the mean number of airports per country? What is the most? What is the least?

Start by adding the value_counts() function to the country column from df in the cell below. Run shift+Enter and see what happens. Then move on to try creating a new groupby object that sorts countries by number of airports in the order of most to least.

In [None]:
df['country'].

In [None]:
groupby_country = 

Create a bar plot of airports by country using Matplotlib. Since there are a lot of countries on earth and it makes it hard to read the data, restrict your report to the top 10 by specifing rows 0:9.

In [None]:
# create bar plot here

## Step 3: Input a 2-digit country code and return a list of airports in it
There are many ways to query a DataFrame. Use the .loc method to find and show just those airports that are in the Netherlands (country code = "NL"). Try it with other countries too - this statement should be easily modifiable to call any country on earth and see the list of airports it has.

In [None]:
df.loc[]

## Step 4: Answer the question, "How many countries have only one airport?"
Create a new dataframe where airports are grouped by country and use the .loc method to request just the rows where counts is equal to 1. Write a print statement that reports how many countries have only one airport. There is some code to get you started...

In [None]:
ap_per_country = df.groupby()

## Step 5: Report the proportion of airports in the Southern vs. Northern hemisphere
We can divide the airports dataset by latitude to see the proportion of airports that are located in the two hemispheres of the world. Southern locations are those with a negative latitude, and northern locations are those with a positive latitude. Then, we can store the number of airports in each sub-dataset using the len() function, and create a quick calculation of percentage to include in a print statement that reports the approximate percentage of airports in each hemisphere, respectively. 

The southern hemisphere dataset is created already, create the northern hemisphere.

In [None]:
sh_df = df[df['latitude'] < 0]
sh_df.head()

In [None]:
print("Approximately",(),"percent of the world's airports are located in the Southern hemisphere.")
print("Approximately",(),"percent of the world's airports are located in the Northern hemisphere.")

## Step 6: Find the furthest south and north airports by latitude

In [None]:
df.sort_values()

## Step 7: Create a custom column "hemisphere" based on a conditional statement
Create a new column called hemisphere in the df DataFrame. If latitude is less than 0 it should assign the label "southern" and if greater than 0 it should assign "northern".

To accomplish this, use numpy’s built-in where() function. This function takes three arguments in sequence: the condition we’re testing for, the value to assign to our new column if that condition is true, and the value to assign if it is false.

Create a barplot showing the number of airports per hemisphere.

In [None]:
df['hemisphere'] = np.where()

In [None]:
# create barplot here

## Step 8: Plot all airports and per hemisphere
Using longitude as the x axis and latitude as the y axis, we can plot the airports in their approximate geographic location. Make a scatterplot from the df dataframe and use plt.show() to display it.

In [None]:
df.plot(kind="scatter", x=" ", y=" ", alpha=0.4, figsize=(1,7))
plt.show()

Make a scatterplot from sh_df Dataframe that plots just the airports in the southern hemisphere. Add the ylim() specification to make sure you plot the whole world and it doesn't get distorted. You can also try it with the northern hemisphere.

In [None]:
sh_df.plot()
plt.ylim(-60, 80)
plt.show()

Add some color to the world by mapping the color of the points on the scatterplot to their latitude. 

In [None]:
df.plot(kind="scatter", x="longitude", y="latitude",
    c=" ", cmap=plt.get_cmap("jet"),
    colorbar=True, alpha=0.4, figsize=(15,7),
)
plt.legend()
plt.show()

## That's it! Nice job :)
Keep going and learning more with pandas, numpy and matplotlib! 