
# Lab 7 -  Generating Maps with Python

Estimated time needed: **30** minutes

## Objectives

After completing this lab you will be able to:

-   Visualize geospatial data with Folium


## Introduction

In this lab, we will learn how to create maps for different objectives. To do that, we will part ways with Matplotlib and work with another Python visualization library, namely **Folium**. What is nice about **Folium** is that it was developed for the sole purpose of visualizing geospatial data. While other libraries are available to visualize geospatial data, such as **plotly**, they might have a cap on how many API calls you can make within a defined time frame. **Folium**, on the other hand, is completely free.


# Exploring Datasets with _pandas_ and Matplotlib<a id="0"></a>

Toolkits: This lab heavily relies on [_pandas_](http://pandas.pydata.org?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ) and [**Numpy**](http://www.numpy.org?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ) for data wrangling, analysis, and visualization. The primary plotting library we will explore in this lab is [**Folium**](https://github.com/python-visualization/folium/).

Datasets: 

1.  San Francisco Police Department Incidents for the year 2018 to present - [Police Department Incidents](https://data.sfgov.org/Public-Safety/Map-of-Police-Department-Incident-Reports-2018-to-/jq29-s5wp) from San Francisco public data portal. Incidents derived from San Francisco Police Department (SFPD) Crime Incident Reporting system. Updated daily, showing data for the year 2018 to present. Address and location has been anonymized by moving to mid-block or to an intersection.    

2.  Immigration to USA from 1980 to 2013 - [International migration flows to and from selected countries - The 2015 revision](http://www.un.org/en/development/desa/population/migration/data/empirical2/migrationflows.shtml?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ) from United Nation's website. The dataset contains annual data on the flows of international migrants as recorded by the countries of destination. The data presents both inflows and outflows according to the place of birth, citizenship or place of previous / next residence both for foreigners and nationals. For this lesson, we will focus on the USA Immigration data


# Downloading and Prepping Data <a id="2"></a>


Import Primary Modules:


In [1]:
import numpy as np  # useful for many scientific computing in Python
import pandas as pd # primary data structure library

# Introduction to Folium <a id="4"></a>


Folium is a powerful Python library that helps you create several types of Leaflet maps. The fact that the Folium results are interactive makes this library very useful for dashboard building.

From the official Folium documentation page:

> Folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js library. Manipulate your data in Python, then visualize it in on a Leaflet map via Folium.

> Folium makes it easy to visualize data that's been manipulated in Python on an interactive Leaflet map. It enables both the binding of data to a map for choropleth visualizations as well as passing Vincent/Vega visualizations as markers on the map.

> The library has a number of built-in tilesets from OpenStreetMap, Mapbox, and Stamen, and supports custom tilesets with Mapbox or Cloudmade API keys. Folium supports both GeoJSON and TopoJSON overlays, as well as the binding of data to those overlays to create choropleth maps with color-brewer color schemes.


#### Let's install **Folium**


**Folium** is not available by default. So, we first need to install it before we are able to import it.


In [2]:
!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Folium installed and imported!')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: C:\Users\evanf\anaconda3

  added / updated specs:
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.1.0               |             py_1         614 KB  conda-forge
    branca-0.4.2               |     pyhd8ed1ab_0          26 KB  conda-forge
    conda-4.9.2                |   py38haa244fe_0         3.1 MB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    openssl-1.1.1h             |       he774522_0         5.8 MB  conda-forge
    python_abi-3.8             |           1_cp38           4 KB  conda-fo

Generating the world map is straigtforward in **Folium**. You simply create a **Folium** _Map_ object and then you display it. What is attactive about **Folium** maps is that they are interactive, so you can zoom into any region of interest despite the initial zoom level. 


In [3]:
# define the world map
world_map = folium.Map()

# display world map
world_map

Go ahead. Try zooming in and out of the rendered map above.


You can customize this default definition of the world map by specifying the centre of your map and the intial zoom level. 

All locations on a map are defined by their respective _Latitude_ and _Longitude_ values. So you can create a map and pass in a center of _Latitude_ and _Longitude_ values of **[0, 0]**. 

For a defined center, you can also define the intial zoom level into that location when the map is rendered. **The higher the zoom level the more the map is zoomed into the center**.

Let's create a map centered around USA and play with the zoom level to see how it affects the rendered map.


In [4]:
# define the world map centered around USA with a low zoom level
world_map = folium.Map(location=[37.0902, -95.7129], zoom_start=4)

# display world map
world_map

Let's create the map again with a higher zoom level


In [5]:
# define the world map centered around USA with a higher zoom level
world_map = folium.Map(location=[37.0902, -95.7129], zoom_start=8)

# display world map
world_map

As you can see, the higher the zoom level the more the map is zoomed into the given center.


**Question**: Create a map of Mexico with a zoom level of 4 ( you can google coordinates)


In [7]:
### type your answer here
# define the world map centered around USA with a higher zoom level
world_map = folium.Map(location=[23.6260, -102.5375], zoom_start=4)

# display world map
world_map

Another cool feature of **Folium** is that you can generate different map styles.


### A. Stamen Toner Maps

These are high-contrast B+W (black and white) maps. They are perfect for data mashups and exploring river meanders and coastal zones. 


Let's create a Stamen Toner map of USA with a zoom level of 4.


In [8]:
# create a Stamen Toner map of the world centered around Canada
world_map = folium.Map(location=[37.0902, -95.7129], zoom_start=4, tiles='Stamen Toner')

# display map
world_map

Feel free to zoom in and out to see how this style compares to the default one.


### B. Stamen Terrain Maps

These are maps that feature hill shading and natural vegetation colors. They showcase advanced labeling and linework generalization of dual-carriageway roads.


Let's create a Stamen Terrain map of Canada with zoom level 4.


In [9]:
# create a Stamen Toner map of the world centered around USA
world_map = folium.Map(location=[37.0902, -95.7129], zoom_start=4, tiles='Stamen Terrain')

# display map
world_map

Feel free to zoom in and out to see how this style compares to Stamen Toner and the default style.


In [10]:
USA_map = folium.Map(
            location=[37.0902, -95.7129], 
            zoom_start=4
)
USA_map

In [11]:
#create a feature group
Texas = folium.map.FeatureGroup()

#create a child and add it to the feature group
Texas.add_child(folium.features.CircleMarker(
[31.9686,-99.9018], radius = 5,
    color= "red", fill_color = "Red"
    )
)

#add the featureGroup to the map
USA_map.add_child(Texas)

#label the marker
folium.Marker([31.9686,-99.9018], 
              popup='Texas').add_to(USA_map)

#display the map
USA_map


Zoom in and notice how the borders start showing as you zoom in, and the displayed country names are in English.


**Question**: Create a map of Mexico to visualize its hill shading and natural vegetation. Use a zoom level of 6.


In [12]:
### type your answer here
world_map = folium.Map(location=[23.6260, -102.5375], zoom_start=6, tiles='Stamen Terrain')

# display map
world_map

# Maps with Markers <a id="6"></a>


Let's download and import the data on police department incidents using _pandas_ `read_csv()` method.


Download the dataset and read it into a _pandas_ dataframe:


In [13]:
df_incidents = pd.read_csv('Police_Department_Incident_Reports__2018_to_Present.csv')

print('Dataset downloaded and read into a pandas dataframe!')

Dataset downloaded and read into a pandas dataframe!


Let's take a look at the first five items in our dataset.


In [14]:
df_incidents.head()

Unnamed: 0,Incident Datetime,Incident Date,Incident Time,Incident Year,Incident Day of Week,Report Datetime,Row ID,Incident ID,Incident Number,CAD Number,...,SF Find Neighborhoods,Current Police Districts,Current Supervisor Districts,Analysis Neighborhoods,HSOC Zones as of 2018-06-05,OWED Public Spaces,Central Market/Tenderloin Boundary Polygon - Updated,Parks Alliance CPSI (27+TL sites),ESNCAG - Boundary File,"Areas of Vulnerability, 2016"
0,2018/01/01 09:26:00 AM,2018/01/01,09:26,2018,Monday,2018/01/01 09:27:00 AM,61893007041,618930,171052174,173641140.0,...,88.0,2.0,9.0,1.0,,,,,,2.0
1,2018/01/01 02:30:00 AM,2018/01/01,02:30,2018,Monday,2018/01/01 08:21:00 AM,61893105041,618931,180000768,180010668.0,...,90.0,9.0,1.0,7.0,,,,,,2.0
2,2018/01/01 10:00:00 AM,2018/01/01,10:00,2018,Monday,2018/01/01 10:20:00 AM,61893275000,618932,180000605,180010893.0,...,20.0,4.0,10.0,36.0,,,1.0,,,2.0
3,2018/01/01 10:03:00 AM,2018/01/01,10:03,2018,Monday,2018/01/01 10:04:00 AM,61893565015,618935,180000887,180011579.0,...,,9.0,1.0,28.0,,,,,,1.0
4,2018/01/01 09:01:00 AM,2018/01/01,09:01,2018,Monday,2018/01/01 09:39:00 AM,61893607041,618936,171052958,180011403.0,...,106.0,6.0,3.0,6.0,,,,,,2.0


The data has a lot of rows that we don't need so we are going to drop those and we will also rename the remaining columns so that they don't have two words which may cause problems in the future:

In [15]:
# clean up the dataset to remove unnecessary columns (eg. Incident Year) 
df_incidents.drop(['Incident Datetime', 'Incident Year', 'Report Datetime', 'Row ID', 'CAD Number','Report Type Code','Filed Online','Incident Code', 'Incident Subcategory','Analysis Neighborhood','Supervisor District','SF Find Neighborhoods','Current Police Districts','Current Supervisor Districts','Analysis Neighborhoods','HSOC Zones as of 2018-06-05','OWED Public Spaces','Central Market/Tenderloin Boundary Polygon - Updated','Parks Alliance CPSI (27+TL sites)','ESNCAG - Boundary File','Areas of Vulnerability, 2016','CNN'], axis=1, inplace=True)



In [16]:
# let's rename the columns 
df_incidents.rename(columns={'Incident Date':'Date', 'Incident Time':'Time','Incident ID':'PDId','Incident Number':'IncidntNum','Incident Category':'Category','Incident Description':'Descript','Resolution':'Resolution','Intersection':'Address','Police District':'PdDistrict','Latitude':'Y','Longitude':'X','point':'Location', 'Incident Day of Week':'DayOfWeek','Report Type Description':'Descript' }, inplace=True)

In [17]:
df_incidents

Unnamed: 0,Date,Time,DayOfWeek,PDId,IncidntNum,Descript,Category,Descript.1,Resolution,Address,PdDistrict,Y,X,Location
0,2018/01/01,09:26,Monday,618930,171052174,Vehicle Supplement,Recovered Vehicle,"Vehicle, Recovered, Auto",Open or Active,03RD ST \ HOLLISTER AVE,Southern,37.721716,-122.395944,"(37.72171587946975, -122.39594382884452)"
1,2018/01/01,02:30,Monday,618931,180000768,Initial,Burglary,"Burglary, Residence, Forcible Entry",Open or Active,LISBON ST \ PERSIA AVE,Ingleside,37.722000,-122.433606,"(37.722000219874225, -122.43360633930074)"
2,2018/01/01,10:00,Monday,618932,180000605,Initial Supplement,Missing Person,Found Person,Open or Active,VAN NESS AVE \ WILLOW ST,Northern,37.783370,-122.420832,"(37.78337048750076, -122.42083185184009)"
3,2018/01/01,10:03,Monday,618935,180000887,Initial,Other Miscellaneous,"Driving, No License Issued",Cite or Arrest Adult,BRAZIL AVE \ MISSION ST,Ingleside,37.724683,-122.434798,"(37.72468255342173, -122.43479841474401)"
4,2018/01/01,09:01,Monday,618936,171052958,Vehicle Supplement,Recovered Vehicle,"Vehicle, Recovered, Auto",Open or Active,CUSTOM HOUSE PL \ JACKSON ST,Central,37.796698,-122.401294,"(37.796698028315056, -122.40129440446798)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
440015,2021/03/01,20:00,Monday,1009963,216022541,Coplogic Initial,Larceny Theft,"Theft, Other Property, >$950",Open or Active,GEARY ST \ JONES ST,Central,37.786730,-122.413161,"(37.78672974391054, -122.41316144562205)"
440016,2021/03/03,06:00,Wednesday,1010092,210139564,Initial,Suspicious Occ,Suspicious Occurrence,Open or Active,STOCKTON ST \ BAY ST,Central,37.806028,-122.410311,"(37.8060277166905, -122.41031084283831)"
440017,2021/02/27,15:00,Saturday,1010097,210140282,Initial,Embezzlement,"Embezzlement, Grand Theft By Employee",Open or Active,PAGE ST \ CENTRAL AVE,Park,37.771358,-122.443836,"(37.771358111902316, -122.44383614984872)"
440018,2021/03/03,07:17,Wednesday,1010106,210139611,Initial,Malicious Mischief,"Malicious Mischief, Vandalism to Property",Open or Active,05TH ST \ HOWARD ST,Southern,37.781500,-122.404933,"(37.781499507548546, -122.40493334783943)"


So each row consists of 13 features:

> 1.  **IncidntNum**: Incident Number
> 2.  **Category**: Category of crime or incident
> 3.  **Descript**: Description of the crime or incident
> 4.  **DayOfWeek**: The day of week on which the incident occurred
> 5.  **Date**: The Date on which the incident occurred
> 6.  **Time**: The time of day on which the incident occurred
> 7.  **PdDistrict**: The police department district
> 8.  **Resolution**: The resolution of the crime in terms whether the perpetrator was arrested or not
> 9.  **Address**: The closest address to where the incident took place
> 10. **X**: The longitude value of the crime location 
> 11. **Y**: The latitude value of the crime location
> 12. **Location**: A tuple of the latitude and the longitude values
> 13. **PdId**: The police department ID


Let's find out how many entries there are in our dataset.


In [18]:
df_incidents.shape

(440020, 14)

So the dataframe consists of 440,020 crimes, which took place in the year 2018 to present. In order to reduce computational cost, let's just work with the first 100 incidents in this dataset.


In [19]:
# get the first 100 crimes in the df_incidents dataframe
limit = 100
df_incidents = df_incidents.iloc[0:limit, :]

Let's confirm that our dataframe now consists only of 100 crimes.


In [20]:
df_incidents.shape

(100, 14)

Now that we reduced the data a little bit, let's visualize where these crimes took place in the city of San Francisco. We will use the default style and we will initialize the zoom level to 12. 


In [21]:
# San Francisco latitude and longitude values
latitude = 37.77
longitude = -122.42

In [22]:
# create map and display it
sanfran_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# display the map of San Francisco
sanfran_map

Now let's superimpose the locations of the crimes onto the map. The way to do that in **Folium** is to create a _feature group_ with its own features and style and then add it to the sanfran_map.


In [23]:
# instantiate a feature group for the incidents in the dataframe
incidents = folium.map.FeatureGroup()

# loop through the 100 crimes and add each to the incidents feature group
for lat, lng, in zip(df_incidents.Y, df_incidents.X):
    incidents.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )

# add incidents to map
sanfran_map.add_child(incidents)

You can also add some pop-up text that would get displayed when you hover over a marker. Let's make each marker display the category of the crime when hovered over.


In [24]:
# instantiate a feature group for the incidents in the dataframe
incidents = folium.map.FeatureGroup()

# loop through the 100 crimes and add each to the incidents feature group
for lat, lng, in zip(df_incidents.Y, df_incidents.X):
    incidents.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )

# add pop-up text to each marker on the map
latitudes = list(df_incidents.Y)
longitudes = list(df_incidents.X)
labels = list(df_incidents.Category)

for lat, lng, label in zip(latitudes, longitudes, labels):
    folium.Marker([lat, lng], popup=label).add_to(sanfran_map)    
    
# add incidents to map
sanfran_map.add_child(incidents)

Isn't this really cool? Now you are able to know what crime category occurred at each marker.

If you find the map to be so congested will all these markers, there are two remedies to this problem. The simpler solution is to remove these location markers and just add the text to the circle markers themselves as follows:


In [25]:
# create map and display it
sanfran_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# loop through the 100 crimes and add each to the map
for lat, lng, label in zip(df_incidents.Y, df_incidents.X, df_incidents.Category):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5, # define how big you want the circle markers to be
        color='yellow',
        fill=True,
        popup=label,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(sanfran_map)

# show map
sanfran_map

The other proper remedy is to group the markers into different clusters. Each cluster is then represented by the number of crimes in each neighborhood. These clusters can be thought of as pockets of San Francisco which you can then analyze separately.

To implement this, we start off by instantiating a _MarkerCluster_ object and adding all the data points in the dataframe to this object.


In [26]:
from folium import plugins

# let's start again with a clean copy of the map of San Francisco
sanfran_map = folium.Map(location = [latitude, longitude], zoom_start = 12)

# instantiate a mark cluster object for the incidents in the dataframe
incidents = plugins.MarkerCluster().add_to(sanfran_map)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, label, in zip(df_incidents.Y, df_incidents.X, df_incidents.Category):
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(incidents)

# display map
sanfran_map

Notice how when you zoom out all the way, all markers are grouped into one cluster, _the global cluster_, of 100 markers or crimes, which is the total number of crimes in our dataframe. Once you start zooming in, the _global cluster_ will start breaking up into smaller clusters. Zooming in all the way will result in individual markers.


# Choropleth Maps <a id="8"></a>

A `Choropleth` map is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per-capita income. The choropleth map provides an easy way to visualize how a measurement varies across a geographic area or it shows the level of variability within a region. 




Now, let's create our own `Choropleth` map of the world depicting immigration from various countries to USA.

Let's first download and import our primary USA immigration dataset using _pandas_ `read_excel()` method. 


In [27]:
world_map = folium.Map(
zoom_start=2)


In [28]:
world_map

Download the dataset and read it into a _pandas_ dataframe:


In [29]:
df_USA = pd.read_excel('UnitedStatesofAmerica.xlsx',
                       sheet_name='USA by Place of birth',
                       skiprows=range(20),
                       skipfooter=2)

print ('Data read into a pandas dataframe!')

Data read into a pandas dataframe!


Let's take a look at the first five items in our dataset.


In [30]:
df_USA.head()

Unnamed: 0,Type,Coverage,OdName,AREA,AreaName,REG,RegName,DEV,DevName,1980,...,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013
0,Immigrants,Foreigners,Afghanistan,935,Asia,5501,Southern Asia,902,Less developed regions,722,...,2137,4749,3417,1753,2813,3165,2017,1648,1617,2196
1,Immigrants,Foreigners,Albania,908,Europe,925,Southern Europe,901,More developed regions,30,...,3840,5947,7914,5737,5754,5137,4711,3612,3364,3186
2,Immigrants,Foreigners,Algeria,903,Africa,912,Northern Africa,902,Less developed regions,175,...,805,1115,1300,1036,1037,1485,1305,1364,1369,1241
3,Immigrants,Foreigners,American Samoa,909,Oceania,957,Polynesia,902,Less developed regions,0,...,12,15,28,11,14,19,14,D,-,D
4,Immigrants,Foreigners,Andorra,908,Europe,925,Southern Europe,901,More developed regions,2,...,..,..,..,..,..,..,..,..,..,..


Let's find out how many entries there are in our dataset.


In [31]:
# print the dimensions of the dataframe
print(df_USA.shape)

(219, 43)


In [32]:

# clean up the dataset to remove unnecessary columns (eg. REG) 
df_USA.drop(['AREA', 'REG', 'DEV', 'Type', 'Coverage'], axis=1, inplace=True)

# let's rename the columns so that they make sense
df_USA.rename(columns={'OdName':'Country', 'AreaName':'Continent','RegName':'Region'}, inplace=True)

# for sake of consistency, let's also make all column labels of type string
df_USA.columns = list(map(str, df_USA.columns))



#Change all invalid values to zero
df_USA.replace('D', 0,inplace=True)

df_USA.replace('..', 0,inplace=True)

df_USA.replace('-', 0, inplace=True)

# add total column
df_USA['Total'] = df_USA.sum(axis=1)

# years that we will be using in this lesson - useful for plotting later on
years = list(map(str, range(1980, 2014)))
print('data dimensions:', df_USA.shape)


data dimensions: (219, 39)


Let's take a look at the first five items of our cleaned dataframe.


In [33]:
df_USA.head()

Unnamed: 0,Country,Continent,Region,DevName,1980,1981,1982,1983,1984,1985,...,2005,2006,2007,2008,2009,2010,2011,2012,2013,Total
0,Afghanistan,Asia,Southern Asia,Less developed regions,722,1881,1569,2566,3222,2794,...,4749,3417,1753,2813,3165,2017,1648,1617,2196,74430
1,Albania,Europe,Southern Europe,More developed regions,30,11,23,22,32,45,...,5947,7914,5737,5754,5137,4711,3612,3364,3186,87380
2,Algeria,Africa,Northern Africa,Less developed regions,175,184,190,201,197,202,...,1115,1300,1036,1037,1485,1305,1364,1369,1241,23281
3,American Samoa,Oceania,Polynesia,Less developed regions,0,0,7,7,0,0,...,15,28,11,14,19,14,0,0,0,220
4,Andorra,Europe,Southern Europe,More developed regions,2,3,2,1,0,0,...,0,0,0,0,0,0,0,0,0,18


In order to create a `Choropleth` map, we need a GeoJSON file that defines the areas/boundaries of the state, county, or country that we are interested in. In our case, since we are endeavoring to create a world map, we want a GeoJSON that defines the boundaries of all world countries. You have been provided with this file via Moodle. Let's name it **world_countries.json**.


Now that we have the GeoJSON file, let's create a world map, centered around **[0, 0]** _latitude_ and _longitude_ values, with an intial zoom level of 2.


In [34]:
world_geo = r'world_countries.json' # geojson file

# create a plain world map
world_map = folium.Map(location=[0, 0], zoom_start=2)


And now to create a `Choropleth` map, we will use the _choropleth_ method with the following main parameters:

1.  geo_data, which is the GeoJSON file.
2.  data, which is the dataframe containing the data.
3.  columns, which represents the columns in the dataframe that will be used to create the `Choropleth` map.
4.  key_on, which is the key or variable in the GeoJSON file that contains the name of the variable of interest. To determine that, you will need to open the GeoJSON file using any text editor and note the name of the key or variable that contains the name of the countries, since the countries are our variable of interest. In this case, **name** is the key in the GeoJSON file that contains the name of the countries. Note that this key is case_sensitive, so you need to pass exactly as it exists in the GeoJSON file.


In [35]:
# generate choropleth map using the total immigration of each country to USA from 1980 to 2013
world_map.choropleth(
    geo_data=world_geo,
    data=df_USA,
    columns=['Country', 'Total'],
    key_on='feature.properties.name',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Immigration to USA'
)

# display map
world_map

As per our Choropleth map legend, the darker the color of a country and the closer the color to red, the higher the number of immigrants from that country. Accordingly, the highest immigration over the course of 33 years (from 1980 to 2013) was from Mexico.

Notice how the legend is displaying a negative boundary or threshold. Let's fix that by defining our own thresholds and starting with 0 instead of -60,780 !

In [36]:
world_geo = r'world_countries.json'

# create a numpy array of length 6 and has linear spacing from the minium total immigration to the maximum total immigration
threshold_scale = np.linspace(df_USA['Total'].min(),
                              df_USA['Total'].max(),
                              6, dtype=int)
threshold_scale = threshold_scale.tolist() # change the numpy array to a list
threshold_scale[-1] = threshold_scale[-1] + 1 # make sure that the last value of the list is greater than the maximum immigration

# let Folium determine the scale.
world_map = folium.Map(location=[0, 0], zoom_start=2)
world_map.choropleth(
    geo_data=world_geo,
    data=df_USA,
    columns=['Country', 'Total'],
    key_on='feature.properties.name',
    threshold_scale=threshold_scale,
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Immigration to Canada',
    reset=True
)
world_map