<h1 align=center><font size = 5>Generating Maps with Python</font></h1>

## Introduction

In this lab, we will learn how to create maps for different objectives. To do that, we will part ways with Matplotlib and work with another Python visualization library, namely **Folium**. What is nice about **Folium** is that it was developed for the sole purpose of visualizing geospatial data. While other libraries are available to visualize geospatial data, such as **plotly**, they might have a cap on how many API calls you can make within a defined time frame. **Folium**, on the other hand, is completely free.

# Exploring Datasets with Pandas and Matplotlib<a id="0"></a>

Toolkits: This lab heavily relies on [*pandas*](http://pandas.pydata.org/) and [**Numpy**](http://www.numpy.org/) for data wrangling, analysis, and visualization. The primary plotting library we will explore in this lab is [**Folium**](https://github.com/python-visualization/folium/).

Data sets (no need to download for now): 

1. San Francisco Police Department Incidents for the year 2016 - [Police Department Incidents](https://data.sfgov.org/Public-Safety/Police-Department-Incidents-Previous-Year-2016-/ritf-b9ki) from San Francisco public data portal. Incidents derived from San Francisco Police Department (SFPD) Crime Incident Reporting system. Address and location has been anonymized by moving to mid-block or to an intersection.    

2. Immigration to Canada from 1980 to 2013 - [International migration flows to and from selected countries - The 2015 revision](http://www.un.org/en/development/desa/population/migration/data/empirical2/migrationflows.shtml) from United Nation's website. The dataset contains annual data on the flows of international migrants as recorded by the countries of destination. The data presents both inflows and outflows according to the place of birth, citizenship or place of previous / next residence both for foreigners and nationals.

# Download and Prep data <a id="2"></a>

Import Primary Modules:

In [2]:
import numpy as np  # useful for many scientific computing in Python
import pandas as pd # primary data structure library

# Introduction to Folium <a id="4"></a>

**Folium** is a powerful Python library that helps you create several types of Leaflet maps(?? накладная карта). The fact that the **Folium** results are interactive makes this library very useful for dashboard building.

From the official **Folium** documentation page:

> Folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js library. Manipulate your data in Python, then visualize it in on a Leaflet map via Folium.

> Folium makes it easy to visualize data that's been manipulated in Python on an interactive Leaflet map. It enables both the binding of data to a map for choropleth visualizations.

#### Let's install **Folium**

In [3]:
# !pip install Folium 
import folium

# print("Folium installed and imported!")

Generating the world map is straigtforward in **Folium**. You simply create a **Folium** *Map* object and then you display it. What is attactive about **Folium** maps is that they are interactive, so you can zoom into any region of interest despite the initial zoom level. 

In [4]:
# define the world map
world_map = folium.Map()

# display world map
world_map

Go ahead. Try zooming in and out of the rendered map above.

You can customize this default definition of the world man by specifying the centre of your map and the intial zoom level. 

All locations on a map are defined by their respective *Latitude* and *Longitude* values. So you can create a map and pass in a center of *Latitude* and *Longitude* values of **[0, 0]**. 

For a defined center, you can also defined the intial zoom level into that location when the map is rendered. The higher the zoom level the more the map is zoomed into the center.

Let's create a map centered around Canada and play with the zoom level to see how it affects the rendered map.

In [5]:
# define the world map centered around Canada with a low zoom level
world_map = folium.Map(location=[56.130, -106.35], zoom_start=2)

# display world map
world_map

Let's create the map again with a higher zoom level

In [6]:
# define the world map centered around Canada with a higher zoom level
world_map = folium.Map(location=[56.130, -106.35], zoom_start=8)

# display world map
world_map

As you can see, the higher the zoom level the more the map is zoomed into the given center.

### Try to plot a map centered around your house (or house, where you have lived a lot of time). Use high level of zoom.

In [7]:
## write your code here
my_home= folium.Map(location=[50.4546600, 30.5238000], zoom_start=11)
my_home

Another cool feature of **Folium** is that you can generate different map styles.

### A. Stamen Toner Maps (Stamen - design studio - https://stamen.com/)

These are high-contrast B+W (black and white) maps. They are perfect for data mashups and exploring river meanders and coastal zones. 

Let's create a Stamen Toner.

In [8]:
# create a Stamen Toner map of your favourite location, choose zoom level by as you like
my_favourite_place = folium.Map(location=[52.377956, 4.897070], zoom_start=11, tiles='Stamen Toner')

# display map
my_favourite_place

Feel free to zoom in and out to see how this style compares to the default one.

### B. Stamen Terrain Maps

These are maps that feature hill shading and natural vegetation colors. They showcase advanced labeling and linework generalization of dual-carriageway roads.

Let's create a Stamen Terrain map with custom zoom level.

In [9]:
# create a Stamen Toner map of location used above
my_favourite_place = folium.Map(location=[52.377956, 4.897070], zoom_start=11, tiles='Stamen Terrain')

# display map
my_favourite_place

Feel free to zoom in and out to see how this style compares to Stamen Toner and the default style.

### C. Mapbox Bright Maps (Mapbox - developer of map visualization)

These are maps that quite similar to the default style, except that the borders are not visible with a low zoom level. Furthermore, unlike the default style where country names are displayed in each country's native language, *Mapbox Bright* styles displays all country names in English.

Let's create a world map with this style. Zoom in and notice how the borders start showing as you zoom in, and the displayed country names are in English.

In [10]:
# create a world map with a Mapbox Bright style.
my_map = folium.Map(tiles="CartoDB positron")
# display the map
my_map

This type of maps was been removed. LOL! Find other types, which Folium provides us (looak at the text of Error). Try to understand, how you can use them.

### Pick one of the styles above and create a map of Ukraine

In [10]:
## write your code here
my_map = folium.Map(location=[48.926563, 31.475782],zoom_start=5,tiles='Stamen Terrain')
my_map

# Maps with Markers <a id="6"></a>


Read it into a *pandas* dataframe:

In [11]:
df_incidents = pd.read_csv('Police_Department_Incidents_-_2016_.csv')

print("Dataset read into pandas dataframe!")

Dataset read into pandas dataframe!


Let's take a look at the first five items in our dataset.

In [12]:
df_incidents.head()

Unnamed: 0,IncidntNum,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,Address,X,Y,Location,PdId
0,120058272,WEAPON LAWS,POSS OF PROHIBITED WEAPON,Friday,01/29/2016 12:00:00 AM,11:00,SOUTHERN,"ARREST, BOOKED",800 Block of BRYANT ST,-122.403405,37.775421,"(37.775420706711, -122.403404791479)",12005827212120
1,120058272,WEAPON LAWS,"FIREARM, LOADED, IN VEHICLE, POSSESSION OR USE",Friday,01/29/2016 12:00:00 AM,11:00,SOUTHERN,"ARREST, BOOKED",800 Block of BRYANT ST,-122.403405,37.775421,"(37.775420706711, -122.403404791479)",12005827212168
2,141059263,WARRANTS,WARRANT ARREST,Monday,04/25/2016 12:00:00 AM,14:59,BAYVIEW,"ARREST, BOOKED",KEITH ST / SHAFTER AV,-122.388856,37.729981,"(37.7299809672996, -122.388856204292)",14105926363010
3,160013662,NON-CRIMINAL,LOST PROPERTY,Tuesday,01/05/2016 12:00:00 AM,23:50,TENDERLOIN,NONE,JONES ST / OFARRELL ST,-122.412971,37.785788,"(37.7857883766888, -122.412970537591)",16001366271000
4,160002740,NON-CRIMINAL,LOST PROPERTY,Friday,01/01/2016 12:00:00 AM,00:30,MISSION,NONE,16TH ST / MISSION ST,-122.419672,37.76505,"(37.7650501214668, -122.419671780296)",16000274071000


So each row consists of 13 features:
> 1. **IncidntNum**: Incident Number
> 2. **Category**: Category of crime or incident
> 3. **Descript**: Description of the crime or incident
> 4. **DayOfWeek**: The day of week on which the incident occurred
> 5. **Date**: The Date on which the incident occurred
> 6. **Time**: The time of day on which the incident occurred
> 7. **PdDistrict**: The police department district
> 8. **Resolution**: The resolution of the crime in terms whether the perpetrator was arrested or not
> 9. **Address**: The closest address to where the incident took place
> 10. **X**: The longitude value of the crime location 
> 11. **Y**: The latitude value of the crime location
> 12. **Location**: A tuple of the latitude and the longitude values
> 13. **PdId**: The police department ID

Let's find out how many entries there are in our dataset.

In [13]:
## write your code here
df_incidents.shape


(150500, 13)

So the dataframe consists of 150,500 crimes, which took place in the year 2016. In order to reduce computational cost, let's just work with the first 500 incidents in this dataset.

In [14]:
# get the first 500 crimes in the df_incidents dataframe
df_incidents= df_incidents.head(500)

Let's confirm that our dataframe now consists only of 500 crimes.

In [15]:
df_incidents.shape

(500, 13)

Now that we reduced the data a little bit, let's visualize where these crimes took place in the city of San Francisco. We will use the default style and we will initialize the zoom level to 12. 

In [16]:
# San Francisco latitude and longitude values
latitude = 37.77
longitude = -122.42

In [17]:
# create map and display it
## write your code here
sanfran_map = folium.Map(location=[37.77, -122.42],zoom_start=12)
sanfran_map

Now let's superimpose the locations of the crimes onto the map. The way to do that in **Folium** is to create a *feature group* with its own features and style and then add it to the sanfran_map.

We will use zip() function. Find description of this function and add code in example below in order to illustrate it's work.

In [18]:
a = [1, 2, 3, 4, 5]
b = [1, 2, 3, 4, 5]
## write your code here
mapped = zip(a,b)
 
set(mapped)

{(1, 1), (2, 2), (3, 3), (4, 4), (5, 5)}

In [19]:
# instantiate a feature group for the incidents in the dataframe
incidents = folium.map.FeatureGroup()

# loop through the 1000 crimes and add each to the incidents feature group
for lat, lng, in zip(df_incidents.Y, df_incidents.X):
    incidents.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius = 5, # define how big you want the circle markers to be
            color = "yellow",
            fill_color = "blue",
            fill_opacity=0.9
        )
    )

# add incidents to map
sanfran_map.add_child(incidents)

You can also add some pop-up text that would get displayed when you hover over a marker. Let's make each marker display the category of the crime when hovered over.

In [20]:
# instantiate a feature group for the incidents in the dataframe
incidents = folium.map.FeatureGroup()

# loop through the 1000 crimes and add each to the incidents feature group
for lat, lng, in zip(df_incidents.Y, df_incidents.X):
    incidents.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius = 5, # define how big you want the circle markers to be
            color = "yellow",
            fill_color = "blue",
            fill_opacity=0.6
        )
    )

# add pop-up text to each marker on the map
latitudes = list(df_incidents.Y)
longitudes = list(df_incidents.X)
labels = list(df_incidents.Category)

for lat, lng, label in zip(latitudes, longitudes, labels):
    folium.Marker([lat, lng], popup=label).add_to(sanfran_map)    
    
# add incidents to map
sanfran_map.add_child(incidents)

Isn't this really cool? Now you are able to know what crime category occurred at each marker. But the map looks so congested will all these markers. So one interesting solution that can be implemented using **Folium** is to cluster the markers in the same neighborhood, resulting in the number of crimes in each neighborhood. These clusters can be thought of as pockets of San Francisco which you can then analyze separately.

To implement this, we start off by instantiating a *MarkerCluster* object and adding all the data points in the dataframe to this object.

In [21]:
from folium.plugins import MarkerCluster

# let's start again with a clean copy of the map of San Francisco
sanfran_map = folium.Map(location = [latitude, longitude], zoom_start = 12)

# instantiate a mark cluster object for the incidents in the dataframe
incidents = MarkerCluster().add_to(sanfran_map)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, label, in zip(df_incidents.Y, df_incidents.X, df_incidents.Category):
    folium.Marker(
        location = [lat, lng],
        icon = None,
        popup = label,
    ).add_to(incidents)

# display map
sanfran_map

Notice how when you zoom out all the way, all markers are grouped into one cluster, *the global cluster*, of 1000 markers or crimes, which is the total number of crimes in our dataframe. Once you start zooming in, the *global cluster* will start breaking up into smaller clusters. Zooming in all the way will result in individual markers.

# Choropleth Maps <a id="8"></a>

A `Choropleth` map is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per-capita income. The choropleth map provides an easy way to visualize how a measurement varies across a geographic area or it shows the level of variability within a region. Below is a `Choropleth` map of the US depicting the population by square mile per state:

<img src = "https://ibm.box.com/shared/static/2kzaknzdf6crt3n5rx6haskg3wiaklxl.png" width = 600> 

Now, let's create our own `Choropleth` map of the world depicting immigration from various countries to Canada.

Read it into a *pandas* dataframe:

In [22]:
import pandas as pd
import folium
df_can = pd.read_excel('Canada.xlsx',
                     sheet_name="Canada by Citizenship",
                     skiprows=range(20))                      

print("Data downloaded and read into a dataframe!")

Data downloaded and read into a dataframe!


Let's take a look at the first five items in our dataset.

In [23]:
df_can.head()

Unnamed: 0,Type,Coverage,OdName,AREA,AreaName,REG,RegName,DEV,DevName,1980,...,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013
0,Immigrants,Foreigners,Afghanistan,935,Asia,5501,Southern Asia,902,Developing regions,16,...,2978,3436,3009,2652,2111,1746,1758,2203,2635,2004
1,Immigrants,Foreigners,Albania,908,Europe,925,Southern Europe,901,Developed regions,1,...,1450,1223,856,702,560,716,561,539,620,603
2,Immigrants,Foreigners,Algeria,903,Africa,912,Northern Africa,902,Developing regions,80,...,3616,3626,4807,3623,4005,5393,4752,4325,3774,4331
3,Immigrants,Foreigners,American Samoa,909,Oceania,957,Polynesia,902,Developing regions,0,...,0,0,1,0,0,0,0,0,0,0
4,Immigrants,Foreigners,Andorra,908,Europe,925,Southern Europe,901,Developed regions,0,...,0,0,1,1,0,0,0,0,1,1


Let's find out how many entries there are in our dataset.

In [24]:
# print the dimensions of the dataframe
df_can.shape

(197, 43)

Clean up data. We will make some modifications to the original dataset to make it easier to create our visualizations. 

In [25]:
# Clean up the data set to remove unnecessary columns (eg. REG) 
df_can.drop(['AREA','REG','DEV','Type','Coverage'], axis = 1, inplace = True)

<font color="green"> Describe the drop() function and the essence of the it's arguments briefly</font>

To delete rows or columns, axis :  0 --rows; axis = 1 -- columns, index : single label or list-like, columns: single label or list-like, inplace : if False, return a copy. Otherwise, do operation inplace and return None, 
Errors: ‘ignore’, ‘raise’, default ‘raise’
If ‘ignore’, suppress error and only existing labels are dropped.

In [26]:
# Let us rename the columns so that they make sense
df_can.rename (columns = {'OdName':'Country', 'AreaName':'Continent','RegName':'Region'}, inplace = True)

<font color="green"> What kind of collection should be inserted into the rename() function?</font>

A collection of old column names, and new ones

In [27]:
# For sake of consistency, let us also make all column labels of type string
df_can.columns = list(map(str, df_can.columns))

<font color="green"> How does column name replacement work?</font>

Takes old name and changes it to new one

<font color="green"> How does the map() function work? Give an example</font>

Executes a specified function for each item in an iterable.

def myfunc(a, b):
  print(a+b)

x = map(myfunc, (1), (2))
output: 3

In [28]:
# Add total column
df_can['Total'] =  df_can.sum (numeric_only=True, axis = 1)

<font color="green"> Why is it given (axis = 1)?</font>

to indicate that it is a column

Let's take a look at the first five items of our clean dataframe.

In [29]:
df_can.head()

Unnamed: 0,Country,Continent,Region,DevName,1980,1981,1982,1983,1984,1985,...,2005,2006,2007,2008,2009,2010,2011,2012,2013,Total
0,Afghanistan,Asia,Southern Asia,Developing regions,16,39,39,47,71,340,...,3436,3009,2652,2111,1746,1758,2203,2635,2004,58639
1,Albania,Europe,Southern Europe,Developed regions,1,0,0,0,0,0,...,1223,856,702,560,716,561,539,620,603,15699
2,Algeria,Africa,Northern Africa,Developing regions,80,67,71,69,63,44,...,3626,4807,3623,4005,5393,4752,4325,3774,4331,69439
3,American Samoa,Oceania,Polynesia,Developing regions,0,1,0,0,0,0,...,0,1,0,0,0,0,0,0,0,6
4,Andorra,Europe,Southern Europe,Developed regions,0,0,0,0,0,0,...,0,1,1,0,0,0,0,1,1,15


In order to create a `Choropleth` map, we need a geojson file that defines the areas/boundaries of the state, county, or country that we are interested in. In our case, since we are endeavoring to create a world map, we want a geojson that defines the boundaries of all world countries. For your convenience, you can use existing file **world_countries.json**.

Now that we have the geojson file, let's create a world map, centered around **[0, 0]** *latitude* and *longitude* values, with an intial zoom level of 2, and using *Mapbox Bright* style.

In [30]:
world_geo = 'world_countries.json' ## geojson file

# create a plain world map
world_map = folium.Map(location=[0, 0], zoom_start=2)

# generate choropleth map using the total population of each country to Canada from 1980 to 2013
folium.Choropleth(
    geo_data=world_geo,
    data=df_can,
    columns=['Country', 'Total'],
    key_on='feature.properties.name',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Immigration to Canada'
).add_to(world_map)

# display map
world_map

As per our `Choropleth` map legend, the darker the color of a country and the closer the color to red, the higher the number of immigrants from that country. However, it was not fully realized. The reason for this is the uneven distribution of values from it's minimum to maximum.

In [69]:
world_geo = 'world_countries.json' ## geojson file

# create a plain world map
world_map = folium.Map(location=[0, 0], zoom_start=2)

# Divide the entire range of the dataset into quantiles
bins = list(df_can["Total"].quantile([0,0.02,0.25,0.5,0.75,0.9,0.98,1]))

# generate choropleth map using the total population of each country to Canada from 1980 to 2013
folium.Choropleth(
    geo_data=world_geo,
    data=df_can,
    columns=['Country', 'Total'],
    key_on='feature.properties.name',
    fill_color='YlGnBu', 
    fill_opacity=1, 
    line_opacity=0.2,
    legend_name='Immigration to Canada',
    nan_fill_color = 'White',
    bins = bins
).add_to(world_map)

world_map

<font color="green"> Try to find such numbers for the legend so that it reads normally</font>