# GENERATING MAPS IN PYTHON

Import pandas and numpy

In [None]:
import numpy as np
import pandas as pd 

# Introduction to Folium <a id="4"></a>

Folium is a powerful Python library that helps you create several types of Leaflet maps. Folium results are interactive, which makes this library very useful for dashboard building.

From the official Folium documentation page:

> Folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js library. Manipulate your data in Python, then visualize it in on a Leaflet map via Folium.

> Folium makes it easy to visualize data that's been manipulated in Python on an interactive Leaflet map. It enables both the binding of data to a map for choropleth visualizations as well as passing Vincent/Vega visualizations as markers on the map.

> The library has a number of built-in tilesets from OpenStreetMap, Mapbox, and Stamen, and supports custom tilesets with Mapbox or Cloudmade API keys. Folium supports both GeoJSON and TopoJSON overlays, as well as the binding of data to those overlays to create choropleth maps with color-brewer color schemes.

**Folium** is not available by default. So, if you have not installed it before, you should  install  install it you are able to import it.

```!conda install -c conda-forge folium=0.5.0 --yes```

In [None]:
# import folium
import folium

print('Folium imported!')

Generating the world map is straigtforward in **Folium**. You simply create a **Folium** *Map* object and then you display it. What is attactive about **Folium** maps is that they are interactive, so you can zoom into any region of interest despite the initial zoom level. 

In [None]:
# define the world map
world_map = folium.Map()

# display world map
world_map

You can customize this default definition of the world map by specifying the centre of your map and the intial zoom level. 

Let's create a map centered around Turkey and play with the zoom level to see how it affects the rendered map.

In [None]:
# define the world map centered around Turkey with a low zoom level
tr_lat = 38.9637451
tr_lng = 35.24332055

world_map = folium.Map(location=[tr_lat,tr_lng], zoom_start=4)

# display world map
world_map

**Question**: Create a map of Istanbul with a zoom level of 10.

In [None]:
### type your answer here



You can generate different map styles in  **Folium**.

### Stamen Toner Maps

These are high-contrast B+W (black and white) maps. They are perfect for exploring rivers and coastal zones. 

Let's create a Stamen Toner map of Turkey with a zoom level of 6.

In [None]:
# create a Stamen Toner map of the world centered around Turkey
world_map = folium.Map(location=[tr_lat, tr_lng], zoom_start=6, tiles='Stamen Toner')

# display map
world_map

### Stamen Terrain Maps

These are maps that feature hill shading and natural vegetation colors. They showcase advanced labeling and linework generalization of dual-carriageway roads.

Let's create a Stamen Terrain map of Turkey with zoom level 6.

In [None]:
# create a Stamen Terrain map of the world centered around Turkey


# display map



Feel free to zoom in and out to see how this style compares to Stamen Toner and the default style.

There are other tiles available. 


**Question**: Create a map of Istanbul to visualize its hill shading and natural vegetation. Use a zoom level of 6.

In [None]:
### type your answer here





# Maps with Markers <a id="6"></a>


Let's create a map of Turkey and add a marker for Ankara

In [None]:
tr_map = folium.Map(location =[tr_lat, tr_lng], zoom_start = 6)

ankara_marker = folium.Marker(
    location = [39.9334,32.8597],
    popup = 'Capital',
    icon = folium.Icon(color ='blue', icon='info-sign')
    )
                             
tr_map.add_child(ankara_marker)
tr_map

Now let's add another marker, this time for Istanbul to teh map we created.

In [None]:
tr_map = folium.Map(location =[tr_lat, tr_lng], zoom_start = 6)

ankara_marker = folium.Marker(
    location = [39.9334,32.8597],
    popup = 'Capital',
    icon = folium.Icon(color ='blue', icon='info-sign')
    )
                             
# add a CircleMarker for Istanbul with radius 10


tr_map.add_child(ankara_marker)
tr_map

**Police Incidents dataset**

Let's import the data on police department incidents using *pandas* `read_csv()` method.

In [None]:
# read the dataset into a pandas dataframe:
df_incidents = pd.read_csv('Police_Department_Incidents-2016.csv')

print('Dataset read into a pandas dataframe!')

Let's take a look at the first five items in our dataset.

In [None]:
df_incidents.head()

So each row consists of 13 features:
> 1. **IncidntNum**: Incident Number
> 2. **Category**: Category of crime or incident
> 3. **Descript**: Description of the crime or incident
> 4. **DayOfWeek**: The day of week on which the incident occurred
> 5. **Date**: The Date on which the incident occurred
> 6. **Time**: The time of day on which the incident occurred
> 7. **PdDistrict**: The police department district
> 8. **Resolution**: The resolution of the crime in terms whether the perpetrator was arrested or not
> 9. **Address**: The closest address to where the incident took place
> 10. **X**: The longitude value of the crime location 
> 11. **Y**: The latitude value of the crime location
> 12. **Location**: A tuple of the latitude and the longitude values
> 13. **PdId**: The police department ID

Let's find out how many entries there are in our dataset.

In [None]:
df_incidents.shape

So the dataframe consists of 150,500 crimes, which took place in the year 2016. In order to reduce computational cost, let's just work with the first 100 incidents in this dataset.

In [None]:
# get the first 100 crimes in the df_incidents dataframe
limit = 100
df_incidents = df_incidents.iloc[0:limit, :]

Let's confirm that our dataframe now consists only of 100 crimes.

In [None]:
df_incidents.shape

Now that we reduced the data, let's visualize where these crimes took place in the city of San Francisco. We will use the default style and we will initialize the zoom level to 12. 

In [None]:
# San Francisco latitude and longitude values
sf_lat = 37.77
sf_lng = -122.42

In [None]:
# create map and display it
sanfran_map = folium.Map(location=[sf_lat, sf_lng], zoom_start=12)

# display the map of San Francisco
sanfran_map

Now let's superimpose the locations of the crimes onto the map. To do that in **Folium**, we will create a *feature group* with its own features and style and then add it to the sanfran_map.

In [None]:
# create map 
sanfran_map = folium.Map(location=[sf_lat, sf_lng], zoom_start=12)

# instantiate a feature group for the incidents in the dataframe
incidents = folium.map.FeatureGroup()

# loop through the 100 crimes and add each to the incidents feature group
for lat, lng, in zip(df_incidents.Y, df_incidents.X):
    incidents.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )

# add incidents to map
sanfran_map.add_child(incidents)


**Question:** Add some markers with pop-up text that would get displayed when you hover over a marker.Make each marker display the category of the crime when hovered over.

In [None]:
incidents = folium.map.FeatureGroup()

# loop through the 100 crimes and add each to the incidents feature group
for lat, lng, in zip(df_incidents.Y, df_incidents.X):
    incidents.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )
# add markers
for lat, lng, label in zip(df_incidents.Y, df_incidents.X, df_incidents.Category):
    folium.Marker(
        location = [lat,lng],
        icon = None,
        popup =label).add_to(incidents)

# add incidents to map
sanfran_map.add_child(incidents)


Use **marker clusters** to make the map less congested. Each cluster is then represented by the number of crimes in each neighborhood. These clusters can be thought of as pockets of San Francisco which you can then analyze separately.

To implement this, we start off by instantiating a *MarkerCluster* object and adding all the data points in the dataframe to this object.

In [None]:
from folium import plugins

# let's start again with a clean copy of the map of San Francisco
sanfran_map = folium.Map(location = [sf_lat, sf_lng], zoom_start = 12)

# instantiate a mark cluster object for the incidents in the dataframe
incidents = plugins.MarkerCluster().add_to(sanfran_map)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, label, in zip(df_incidents.Y, df_incidents.X, df_incidents.Category):
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(incidents)

# display map
sanfran_map

# Choropleth Maps <a id="8"></a>

A `Choropleth` map is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per-capita income. The choropleth map provides an easy way to visualize how a measurement varies across a geographic area or it shows the level of variability within a region.


Let's create a `Choropleth` map of the world depicting immigration from various countries to Canada.

Read the dataset into a *pandas* dataframe:

In [None]:
df_can = pd.read_excel('Canada.xlsx',
                     sheet_name='Canada by Citizenship',
                     skiprows=20,
                     skipfooter=2)

print('Data read into a dataframe!')

Let's take a look at the first five items in our dataset.

In [None]:
df_can.head()

Let's find out how many entries there are in our dataset.

In [None]:
# print the dimensions of the dataframe
print(df_can.shape)

Clean up data. We will make some modifications to the original dataset to make it easier to create our visualizations. Refer to *Introduction to Matplotlib and Line Plots* and *Area Plots, Histograms, and Bar Plots* notebooks for a detailed description of this preprocessing.

In [None]:
# clean up the dataset to remove unnecessary columns (eg. REG) 
df_can.drop(['AREA','REG','DEV','Type','Coverage'], axis=1, inplace=True)

# let's rename the columns so that they make sense
df_can.rename(columns={'OdName':'Country', 'AreaName':'Continent','RegName':'Region'}, inplace=True)

# add total column
df_can['Total'] = df_can.sum(axis=1)

# years that we will be using in this lesson - useful for plotting later on
years = list(map(str, range(1980, 2014)))
print ('data dimensions:', df_can.shape)

Let's take a look at the first five items of our cleaned dataframe.

In [None]:
df_can.head()

In order to create a `Choropleth` map, we need a GeoJSON file that defines the areas/boundaries of the state, county, or country that we are interested in. In our case, since we are endeavoring to create a world map, we want a GeoJSON that defines the boundaries of all world countries. We will use the **world_countries.json** file, and then  create a world map, centered around **[0, 0]** *latitude* and *longitude* values, with an intial zoom level of 2.

In [None]:
# read countries geojson file
world_geo = r'world_countries.json' # geojson file

# create a plain world map
world_map = folium.Map(location=[0, 0], zoom_start=2)
world_map

And now to create a `Choropleth` map, we will use the *choropleth* method with the following main parameters:

1. `geo_data`, which is the GeoJSON file.
2. `data`, which is the dataframe containing the data.
3. `columns`, which represents the columns in the dataframe that will be used to create the `Choropleth` map.
4. `key_on`, which is the key or variable in the GeoJSON file that contains the name of the variable of interest. To determine that, you will need to open the GeoJSON file using any text editor and note the name of the key or variable that contains the name of the countries, since the countries are our variable of interest. In this case, **name** is the key in the GeoJSON file that contains the name of the countries. Note that this key is case_sensitive, so you need to pass exactly as it exists in the GeoJSON file.

In [None]:
# create a plain world map
world_map = folium.Map(location=[0, 0], zoom_start=2)
world_map

# generate choropleth map using the total immigration of each country to Canada from 1980 to 2013
world_map.choropleth(
    geo_data=world_geo,
    data=df_can,
    columns=['Country', 'Total'],
    key_on='feature.properties.name',
    fill_color='Reds', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Immigration to Canada'
)

# display map
world_map

As per our `Choropleth` map legend, the darker the color of a country and the closer the color to red, the higher the number of immigrants from that country. Accordingly, the highest immigration over the course of 33 years (from 1980 to 2013) was from China, India, and the Philippines, followed by Poland, Pakistan, and interestingly, the US.

Notice how the legend is displaying a negative boundary or threshold. We can fix that by defining our own thresholds and starting with 0 instead of -6,918!

In [None]:
# create a numpy array of length 6 and has linear spacing from the minium total immigration to the maximum total immigration
threshold_scale = np.linspace(df_can['Total'].min(),
                              df_can['Total'].max(),
                              6, dtype=int)
threshold_scale = threshold_scale.tolist() # change the numpy array to a list
threshold_scale[-1] = threshold_scale[-1] + 1 # make sure that the last value of the list is greater than the maximum immigration

# let Folium determine the scale.
world_map = folium.Map(location=[0, 0], zoom_start=2)
world_map.choropleth(
    geo_data=world_geo,
    data=df_can,
    columns=['Country', 'Total'],
    key_on='feature.properties.name',
    threshold_scale=threshold_scale,
    fill_color='Reds', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Immigration to Canada',
    reset=True
)


world_map

Play around with the data and perhaps create `Choropleth` maps for individuals years, or perhaps decades, and see how they compare with the entire period from 1980 to 2013.