# GeoPandas

In chapter 1 we learnt:
* How to use **NumPy** (package for scientific computing, which is part of SciPy package that we didn't see yet)
* How to use **Pandas** (package providing data structures and data analysis tools)
* How to use **Matplotlib** (a 2D plotting library)
* How to load a CSV file into a pandas.DataFrame object

Now, we will see:
* How to plot in geospatial data using **GeoPandas**
* How to load a JSON file into a geopandas.GeoDataFrame object
* How to use **Shapely**, a package providing geometric objects and operations
* How to use **missingno** to visualize missing data

# Load & Downloads

To visualize geographic data, you need two things:
* The original map that defines countries, roads, rivers, ...
* Your own data to display over it

Download steps:

* Download the **US States** map > 5m > GeoJSON file from https://eric.clst.org/tech/usgeojson/. His name should be `gz_2010_us_040_00_5m.json`
* Download the **Florence Hurricane points** into a `florence.csv` file: http://flhurricane.com/cyclone/stormhistory.php?p=1&year=2018&storm=6   
Tips: if you are on Linux or MacOS, you can use `wget` to download the file by link:   
```
wget "http://flhurricane.com/cyclone/stormhistory.php?p=1&year=2018&storm=6"
```
Then rename the file to `florence.csv`
```
mv stormhistory.php\?p\=1\&year\=2018\&storm\=6 florence.csv
```

Move both files to this folder. Then:
### If you use Pipenv
* Run `pipenv install` at the root of the repository. A few packages have been added to the Pipfile since last time.

### If you use Anaconda
* Run "Anaconda Prompt" with Administrator Rights.
* Install conda forga by typing `conda config --append channels conda-forge`
* Then, type `conda install geopandas shapely missingno descartes -c conda forge`

In [None]:
# Load packages
import geopandas
import shapely

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
import missingno


%matplotlib inline

# Plotting the US Map with GeoJSON data

Let's load our JSON file with `geopandas`.

**`GeoPandas`** is an open source project to make working with geospatial data in python easier.  
GeoPandas extends the datatypes used by `pandas` to allow spatial operations on geometric types.  
Geometric operations are performed by `shapely`.  
Geopandas further depends on `fiona` for file access and `descartes` and `matplotlib` for plotting.

In [None]:
country = geopandas.read_file("gz_2010_us_040_00_5m.json")
country.head()

You can see in the column `geometry` some shapes. Each value is a **Shapely** object. It can be a:
* Point
* Line
* Polygon
* MultiPolygon

Each object can be used for a different type of physical object such as: Point for building, Line for Street, Polygon for city, and MultiPolygon for country with multiple cities inside. For more information about each Geometric object, read the **Shapely** documentation: https://shapely.readthedocs.io/en/stable/manual.html#geometric-objects

In [None]:
print(type(country))

`country` is a GeoDataFrame: it is very similar to pandas DataFrame and both objects share a lot of functionalities, like plotting.

In [None]:
country.plot()

Here, the map also include faraway countries, so we will remove them to focus on the US states.

With what we learnt in Chapter 01 - Lesson02, remove "Alaska" and "Hawaii" from the `country` DataFrame.  
Return the result in the `only_us` DataFrame variable.

In [None]:
print(country['NAME'].unique())

# TODO: Exclude Alaska and Hawaii
only_us = country
only_us.plot(figsize=(18, 18));

# Plotting the Hurricane points

In [None]:
florence = pd.read_csv('florence.csv')
florence.head()

Instead of plotting the missing values "by hand", we can use the `missingno` package. It will print the number of non-N/A values for each column.

In [None]:
missingno.bar(florence);

You can have a more complex view, seeing which lines contains the missing values in each column, using `matrix`

In [None]:
missingno.matrix(florence)

There's only one missing value, in the `Forecaster` column.

While we're at it, let's drop this column with some unusued features of this dataset.

In [None]:
florence = florence.drop(['AdvisoryNumber', 'Forecaster', 'Received'], axis=1)
florence.head()

Look at how we defined `axis=1` in the `drop` function. Ring a bell? Indeed, we want to drop columns and not rows.

To take a **statistical** look at the data, we can use the `describe` function.

In [None]:
florence.describe()

* **count** is the number of values
* **mean** is the average
* **min** is the minimum value observed
* **max** is the maximum value observed
* **std** is the standart deviation
* **25%** is the 25% **percentile**. 

In the image below, you can see the 95% percentile: it is the value below which 95% of the observations may be found. 

![Percentile](https://upload.wikimedia.org/wikipedia/commons/thumb/f/f7/Loi_fisher_95e_centile.svg/2560px-Loi_fisher_95e_centile.svg.png)

## From coordinates to Shapely object

Now, let's take a moment to get into some theory with Latitude & Longitude.

![LatLongMap](http://www.satsig.net/world105.gif)

* **Latitude** is used to express how far north or south you are, relative to the equator. 
* **Longitude** shows your location in an east-west direction, relative to the Greenwich meridian. 

The usual notation is `(Long, Lat)` with `Long` from -180 to +180 and `Lat` from -90 to +90 (as you can see on the map above). Usually, we give the `(Long, Lat)` values according to (**N**orth, **E**ast).

On the output of the `describe()` function above, you can the see the **mean** (<=> average) of the (Long, Lat) values are (57, 26). If you look at the map, in (N, E) notation, that would put us somewhere near Russia (in Lettonia, actually).

Indeed, the values we got from the Florence Hurrican website are (**N**orth, **W**est) wise, so we need to make the West values negative to correctly plot the data in the (N, E) notation.

In [None]:
florence['Long'] = -florence['Long']
florence.head()

Now that our Latitude & Longitude values are correct, we will transform it in a `Point` from the **shapely** library.

The `Point` constructor method takes a list of points, in 2 or 3 dimensions.

In [None]:
x, y = 10, 20

print(shapely.geometry.Point([x, y]))

z = 32

print(shapely.geometry.Point([x, y, z]))

In [None]:
# Create a third column, `coordinates`, taking for values the list [Long, Lat]
florence['coordinates'] = florence[['Long', 'Lat']].values.tolist()
florence.head()

`apply` is a method used on DataFrame or GeoDataFrame to apply a function to a whole column.

In [None]:
# Just an example

def multiplyby2(x):
    return x*2

florence["Lat"].apply(multiplyby2)

Look how we did not give any arguments to `multiplyby2` in the `apply` function.   
`apply` will call your function by iteself for each value of the selected rows.

Instead of defining an entire function with a name, you can use `lambda function`. It's also called an `anonymous function`.

In [None]:
# This is equivalent to florence["Lat"].apply(multiplyby2)
florence["Lat"].apply(lambda x: x*2)

You can also apply the same function to multiple columns.

In [None]:
florence[["Lat", "Wind", "Pres"]].apply(multiplyby2)

`apply` only returns the modified data without touching our (Geo)DataFrame, so `florence` were not modified.  
Let's return to our Point conversion.

In [None]:
# TODO: Apply the Point fonction on the `coordinates` column.
florence['coordinates'] = florence['coordinates']
florence.head()

## Convert the DataFrame to GeoDataFrame


In [None]:
geodf_florence = geopandas.GeoDataFrame(florence, geometry='coordinates')

# If running this cell gives you "TypeError: Input must be valid geometry objects", 
# it means your Point conversion above failed.
geodf_florence.head()

Our `coordinates` column is now similar as the `geometry` column from the US Map : it contains **Shapely** objects. We can plot it with the `plot` method from GeoPandas, the same as we did before.

In [None]:
geodf_florence.plot(figsize=(20,10));

# Plotting Hurricane points on the US Map using Matplotlib

In [None]:
fig, ax = plt.subplots(1, figsize=(18, 18))

# Plotting the base
base = only_us.plot(ax=ax)

# Plotting the hurricane position on top with cyan color to stand out:
geodf_florence.plot(ax=base, color='cyan', marker="*", markersize=10);

plt.show()

We can also makes it even more beautiful and use more data, for example we can
* Color the Hurrican points in a different color depending on the force of the **Wind**.
* Remove the Lat, Long axis
* Add a title, legend, colors, ...
* Use a `divider` to align the legend with the plot

You can view the full list of [colormaps provided by matplotlib](https://matplotlib.org/tutorials/colors/colormaps.html).

In [None]:
fig, ax = plt.subplots(1, figsize=(18, 8), facecolor=(0, 1, 1, .08))

divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="5%", pad=0.1)

base = only_us.plot(ax=ax, color='#3B3C6E')

points = geodf_florence.plot(ax=base, 
                             cax=cax,
                             
                             column='Wind', 
                             marker="<", 
                             markersize=10, 
                             cmap='cool', 
                             
                             label="Wind speed (mph)", 
                             legend=True)

ax.axis('off')

ax.set_title("Hurricane Florence in US Map", fontsize=20)

plt.show()

Now, instead of showing the force of the `Wind`:
* Show the value of the `Pres` column.
* Label the legend as "Pression (hPa)"
* CHange the color map `cmap` to "viridis"

In [None]:
# TODO: Display the Pression values for each point, change the label and the colors of the legend.

fig, ax = plt.subplots(1, figsize=(18, 5))

plt.show()

Just to see the evolution of the Pressure, display a basic Line Chart with the evolution of the Pression by Date.

In [None]:
# TODO: Plot a line Char with the Date of the Florence Hurricane as x-axis and its Pression as y-axis.

fig, ax = plt.subplots(1, figsize=(18, 5))

plt.show()

To know more of what you can do with GeoPandas, [explore their gallery](http://geopandas.org/gallery/index.html).