# Load & Downloads

We're back to the French government website. They offer a [map of Paris' arrondissements](https://www.data.gouv.fr/fr/datasets/arrondissements-1/#_) in differents formats: CSV, JSON, GeoJSON and Shapefile.

In the PDF, they give the definition of the attributes (you can skip it, I did the work for you)

1. Let's go out of our comfort zone and download the **Shapefile** format. 
* Move it to this folder (`chapter02`) and unzip it. On Linux or MacOs, you can use `unzip arrondissements.zip`.

In [None]:
# Load packages
import geopandas
import shapely

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
import missingno


%matplotlib inline

# Paris Map

In [None]:
paris_df = geopandas.read_file("arrondissements.shp", encoding="utf-8")
paris_df.head()

In [None]:
paris_df.plot(figsize=(18, 12))

Yay! It's Paris.

# Trees Map

Now, download the [les-arbres.csv](https://raw.githubusercontent.com/alexisperrier/upem_python0918/master/jour_07/les-arbres.csv) file.

Tips: if you are on Linux or MacOS, you can use `wget` to download the file by link:   
`wget "https://raw.githubusercontent.com/alexisperrier/upem_python0918/master/jour_07/les-arbres.csv"`

Then, move it to this folder (`chapter02`).

In [None]:
trees = pd.read_csv('les-arbres.csv', sep=';', error_bad_lines=False)
trees.head()

In [None]:
missingno.matrix(trees)

For clarity, let's drop the columns with a lot of missing values.

In [None]:
trees = trees.drop(['COMPLEMENTADRESSE', 'NUMERO', 'VARIETEOUCULTIVAR'], axis=1)
trees.describe()

## DataFrame.str methods

DataSeries (i.e. a column of a DataFrame) offer methods to operate on each element of the array. They exclude missing values automatically.

Exhaustive list of provided str methods:
* lower(): transform to lowercase
* upper(): transform to uppercase
* len(): returns the length of the string
* count(char): count the number of char occurrences
* strip(): removes whitespaces at the start and at the end of the string
* contains(s2): returns True if the string contains s2, false otherwise

In [None]:
s = pd.Series(['A', 'B', 'C', 
               'Never gonna give you up', 'Never gonna let you down', 
               'Never gonna run arOoOounnnd', 'and desert you.', 
               'never gonna give', 'nEvEr gOnNa gIvE',
               'Give you UP!'], dtype="string")

s.str.lower()

In [None]:
s.str.count("e")

You can also chain them:

In [None]:
s.str.lower().str.count("e")

And use Boolean masks

In [None]:
s[s.str.lower().str.count("e") > 2]

Now, use `str.contains` to keep only the trees that are in Paris.

In [None]:
trees["ARRONDISSEMENT"].unique()

In [None]:
# TODO: Complete this line to only keep trees that are in Paris using `str` methods
paris_trees = trees
paris_trees

## Merge arrondissements from both maps

#### First, the Paris Map

We will merge both maps on arrondissements.

We will transform them to integers, for example `10` for the 10th arrondissement. 

Use `apply` like we did before to convert the type of the `c_ar` column. The `int()` built-in will be useful.

In [None]:
int("42")  # Convert a string to an int

In [None]:
# TODO: Use `apply` to convert float-type `c_arinsee` to int type.
paris_df.loc[:, "c_ar"] = paris_df["c_ar"]
paris_df.head()

#### Then, the Trees Map

In [None]:
paris_trees["ARRONDISSEMENT"]

Now let's convert a string looking like `PARIS 11E ARRDT` to a int `11`.

Use the slicing we learnt with NumPy to get the 2 characters before the character "E". (reminder: `[i:j:k]` or `[start:stop:step]`)

In [None]:
def get_arrdt(s):
    if type(s) is not str:
        return 41
    e_index = s.find("E")
    # TODO: Use `e_index` (the index of the letter 'E') to get the 2 characters before it in the string `s`.
    two_chars_before_e = "42"
    return int(two_chars_before_e)
    
print(get_arrdt("PARIS 8E ARRDT"))
print(get_arrdt("PARIS 11E ARRDT"))

In [None]:
paris_trees.loc[:, "c_ar"] = paris_trees["ARRONDISSEMENT"].apply(get_arrdt)
# If you have a red warning below, check that the "c_ar" column exists and only have integer values like '10', '11', '8', ...
# If that's the case: it worked and you can continue. Otherwise, try to debug yourself or call me for help
paris_trees

Now, let's convert the Geolocation to a `shapely` `Point`.  
To do the conversion we have to analyse what we already have.

In [None]:
first_geoloc = paris_trees["geo_point_2d"][0]
print(type(first_geoloc))
print(first_geoloc)

In [None]:
def string2point(s):
    s = s.split(", ")
    longlat = [float(s[1]), float(s[0])]  # (48, 2) refers to (Lat, Long) meanwhile the wanted syntax is (Long, Lat)
    return shapely.geometry.Point(longlat)
    
p = string2point("48.8259993388, 2.32878574525")
print(p)
p

In [None]:
paris_trees.loc[:, "geometry"] = paris_trees["geo_point_2d"].apply(string2point)
paris_trees

In [None]:
paris_trees = geopandas.GeoDataFrame(paris_trees, geometry='geometry')
paris_trees

# Display the Tree Map on the Paris Map

In [None]:
paris_trees.plot()

In [None]:
paris_df.plot()

In [None]:
fig, ax = plt.subplots(1, figsize=(18, 12))

# Plotting the Paris map
base = paris_df.plot(ax=ax, 
                     color='#d9d9d9')

# Plotting the trees
paris_trees.plot(ax=base, 
                 color='#669966', 
                 marker="^", 
                 markersize=1);

plt.show()

# Choropleth Map

## Display the number of trees by arrondissement

First, let's use `groupby` to group the trees by arrondissements.

In [None]:
nb_trees = paris_trees.groupby('c_ar').count()[["IDBASE"]]
nb_trees

Because we have grouped all the rows based on the `c_ar` column, `c_ar` became an index. We can undo the indexing using `reset_index`.

In [None]:
nb_trees.reset_index(inplace=True)
nb_trees

In [None]:
merged = paris_df.merge(nb_trees, on="c_ar")
merged

In [None]:
fig, ax = plt.subplots(1, figsize=(18, 10))

divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="5%", pad=0.1)

merged.plot(ax=ax,
            cax=cax,
            column="IDBASE",
            cmap="YlGn",
            legend=True)

plt.show()

We can also change the legend's orientation, and add the location of the trees on the map.

We used the `alpha` keyword to lower the opacity of the trees.

In [None]:
fig, ax = plt.subplots(1, figsize=(18, 10))

merged.plot(ax=ax,
            column="IDBASE",
            cmap="YlGn",
            legend=True,
            legend_kwds={'label': "Number of trees by arrondissement",
                         'orientation': "horizontal"})

paris_trees.plot(ax=ax, 
                 alpha=0.1, 
                 color='#333333', 
                 marker="^", 
                 markersize=1)

plt.show()

Now, let's display the tree color based on their height.

In [None]:
fig, ax = plt.subplots(1, figsize=(18, 10))

divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="5%", pad=0.1)

merged.plot(ax=ax,
            column="IDBASE",
            cmap="YlGn",
            legend=True,
            legend_kwds={'label': "Number of trees by arrondissement",
                         'orientation': "horizontal"})

paris_trees.plot(ax=ax,
                 cax=cax,
                 column="HAUTEUR (m)",
                 cmap="Greys", 
                 alpha=0.1,
                 marker="*", 
                 markersize=1,
                 legend=True,
                 label="Tree circonference (cm)");

plt.show()

Why are all the trees black?

Let's examine the data.

In [None]:
paris_trees.describe()

In [None]:
paris_trees[paris_trees["HAUTEUR (m)"] > 1000].count()

In [None]:
paris_trees[paris_trees["HAUTEUR (m)"] > 30].count()

So there is only 17 trees with a height higher than 1000m, and 516 are higher than 30m, out of **160 000+ trees**.  
Let's remove them and display the color map again.

In [None]:
paris_trees = paris_trees[paris_trees["HAUTEUR (m)"] < 30]

In [None]:
fig, ax = plt.subplots(1, figsize=(18, 10))

divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="5%", pad=0.1)

merged.plot(ax=ax,
            column="IDBASE",
            cmap="Greys",
            legend=True,
            legend_kwds={'label': "Number of trees by arrondissement",
                         'orientation': "horizontal"})

paris_trees.plot(ax=ax,
                 cax=cax,
                 column="HAUTEUR (m)",
                 cmap="YlGn", 
                 alpha=0.1,
                 marker="*", 
                 markersize=2,
                 legend=True,
                 label="Tree circonference (cm)");

plt.show()

It's the end of this Notebook!

You can now start the Project for the next course.