# Maps

# Introduction

This is Part 6 of the project, "Data Visualization with Python". The imported data were read into a pandas dataframe. Pandas and Numpy were used in data wrangling, data analysis, and data visualization. Various types of maps were created primarily using the Folium library. 

Author: Avery Jan

Date: 4-30-2022

***
# Datasets

### San Francisco Police Department Incidents for the year 2016:

The data came from San Francisco public data portal. Incidents derived from San Francisco Police Department (SFPD) Crime Incident Reporting system. Updated daily, showing data for the entire year of 2016. Address and location has been anonymized by moving to mid-block or to an intersection. The data was modified by IBM Developer Skills Network, IBM Corporation 2020.
https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/Police_Department_Incidents_-_Previous_Year__2016_.csv


### Immigration to Canada from 1980 to 2013:

The data cam from United Nation's website. The dataset contains annual data on the flows of international migrants as recorded by the countries of destination. The data presents both inflows and outflows according to the place of birth, citizenship or place of previous / next residence both for foreigners and nationals. This part of the project uses the Canadian Immigration data subset. The subset was created by IBM Developer Skills Network, IBM Corporation 2020. 
https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/Canada.xlsx
***

# Downloading and Reading Data into Dataframe <a id="2"></a>


In [1]:
# Install openpyxl (formerly xlrd), a module that pandas requires to read Excel files.

import piplite

await piplite.install(['openpyxl==3.0.9', 'folium'])

In [2]:
# Import two key data analysis modules: pandas and numpy.

import numpy as np  # useful for many scientific computing in Python
import pandas as pd # primary data structure library




# I. Creating Maps using Folium


In [2]:
# Install Folium.
!pip3 install folium==0.5.0

# Import folium library
import folium

print('Folium installed and imported!')

Folium installed and imported!


In [10]:
# Define the world map centered around Canada with a low zoom level.
world_map = folium.Map(location=[56.130, -106.35], zoom_start=4)

# Display world map.
world_map

In [9]:
# Define the world map centered around Canada with a zoom level of 8.
world_map = folium.Map(location=[56.130, -106.35], zoom_start=8)

# Display world map.
world_map

Observation: The higher the zoom level the more the map is zoomed into the given center.


In [11]:
# Create a map of Mexico with a zoom level of 4.

# define Mexico's geolocation coordinates
mexico_latitude = 23.6345
mexico_longitude = -102.5528

# define the world map centered around Mexico with a lower zoom level
mexico_map = folium.Map(location=[mexico_latitude, mexico_longitude], zoom_start=4)

# display world map
mexico_map


# II. Generating Different Map Styles


### (a) Stamen Toner Maps

These are high-contrast B+W (black and white) maps. They are perfect for data mashups and exploring river meanders and coastal zones.


In [12]:
# Create a Stamen Toner map of Canada with a zoom level of 4.
world_map = folium.Map(location=[56.130, -106.35], zoom_start=4, tiles='Stamen Toner')

# display map
world_map

### (b) Stamen Terrain Maps

These are maps that feature hill shading and natural vegetation colors. They showcase advanced labeling and linework generalization of dual-carriageway roads.


In [13]:
# Create a Stamen Terrain map of Canada with zoom level 4.
world_map = folium.Map(location=[56.130, -106.35], zoom_start=4, tiles='Stamen Terrain')

# Display map.
world_map

In [14]:
# Create a map of Mexico to visualize its hill shading and natural vegetation at a zoom level of 6.

#define Mexico's geolocation coordinates
mexico_latitude = 23.6345
mexico_longitude = -102.5528

# define the world map centered around Canada with a higher zoom level
mexico_map = folium.Map(location=[mexico_latitude, mexico_longitude], zoom_start=6, tiles='Stamen Terrain')

# display world map
mexico_map


## (c) Maps with Markers <a id="6"></a>


In [15]:
# Download the dataset and read it into a pandas dataframe.

from js import fetch
import io

URL = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/Police_Department_Incidents_-_Previous_Year__2016_.csv'
resp = await fetch(URL)
text = io.BytesIO((await resp.arrayBuffer()).to_py())

df_incidents =  pd.read_csv(text)

print('Dataset downloaded and read into a pandas dataframe!')

Dataset downloaded and read into a pandas dataframe!


In [16]:
# View the  first five items in the dataset.
df_incidents.head()

Unnamed: 0,IncidntNum,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,Address,X,Y,Location,PdId
0,120058272,WEAPON LAWS,POSS OF PROHIBITED WEAPON,Friday,01/29/2016 12:00:00 AM,11:00,SOUTHERN,"ARREST, BOOKED",800 Block of BRYANT ST,-122.403405,37.775421,"(37.775420706711, -122.403404791479)",12005827212120
1,120058272,WEAPON LAWS,"FIREARM, LOADED, IN VEHICLE, POSSESSION OR USE",Friday,01/29/2016 12:00:00 AM,11:00,SOUTHERN,"ARREST, BOOKED",800 Block of BRYANT ST,-122.403405,37.775421,"(37.775420706711, -122.403404791479)",12005827212168
2,141059263,WARRANTS,WARRANT ARREST,Monday,04/25/2016 12:00:00 AM,14:59,BAYVIEW,"ARREST, BOOKED",KEITH ST / SHAFTER AV,-122.388856,37.729981,"(37.7299809672996, -122.388856204292)",14105926363010
3,160013662,NON-CRIMINAL,LOST PROPERTY,Tuesday,01/05/2016 12:00:00 AM,23:50,TENDERLOIN,NONE,JONES ST / OFARRELL ST,-122.412971,37.785788,"(37.7857883766888, -122.412970537591)",16001366271000
4,160002740,NON-CRIMINAL,LOST PROPERTY,Friday,01/01/2016 12:00:00 AM,00:30,MISSION,NONE,16TH ST / MISSION ST,-122.419672,37.76505,"(37.7650501214668, -122.419671780296)",16000274071000


So each row consists of 13 features:

> 1.  **IncidntNum**: Incident Number
> 2.  **Category**: Category of crime or incident
> 3.  **Descript**: Description of the crime or incident
> 4.  **DayOfWeek**: The day of week on which the incident occurred
> 5.  **Date**: The Date on which the incident occurred
> 6.  **Time**: The time of day on which the incident occurred
> 7.  **PdDistrict**: The police department district
> 8.  **Resolution**: The resolution of the crime in terms whether the perpetrator was arrested or not
> 9.  **Address**: The closest address to where the incident took place
> 10. **X**: The longitude value of the crime location
> 11. **Y**: The latitude value of the crime location
> 12. **Location**: A tuple of the latitude and the longitude values
> 13. **PdId**: The police department ID


In [17]:
# Get the number of entries in the dataset.
df_incidents.shape

(150500, 13)

In [18]:
# Limit to the first 100 crimes in the df_incidents dataframe. (The dataframe includes 150,500 crimes.)
limit = 100
df_incidents = df_incidents.iloc[0:limit, :]

In [19]:
# Confirm that the dataframe now consists only of 100 crimes.
df_incidents.shape

(100, 13)

In [20]:
# Goal: Visualize where these crimes took place in the city of San Francisco.

# San Francisco latitude and longitude values
latitude = 37.77
longitude = -122.42

In [25]:
# Goal: Add some pop-up text (the category of the crime) that would get displayed when one hover over a marker. 

# Step 1: Instantiate a feature group for the incidents in the dataframe.
incidents = folium.map.FeatureGroup()

# Step 2: loop through the 100 crimes and add each to the incidents feature group.
for lat, lng, in zip(df_incidents.Y, df_incidents.X):
    incidents.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius=5, # Define how big you want the circle markers to be.
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )

# Step 3: Add pop-up text to each marker on the map.
latitudes = list(df_incidents.Y)
longitudes = list(df_incidents.X)
labels = list(df_incidents.Category)

for lat, lng, label in zip(latitudes, longitudes, labels):
    folium.Marker([lat, lng], popup=label).add_to(sanfran_map)    
    
# Step 4: Add incidents to map.
sanfran_map.add_child(incidents)

In [26]:
# Goal: Add the text (crime category) directly to the circle markers. 

# Step 1: Create map. 
sanfran_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# Step 2: loop through the 100 crimes and add each to the map.
for lat, lng, label in zip(df_incidents.Y, df_incidents.X, df_incidents.Category):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5, # define how big the circle markers are going to be.
        color='yellow',
        fill=True,
        popup=label,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(sanfran_map)

# Step 3: Display the map.
sanfran_map

In [27]:
# Goal:  Group the markers into different clusters. 
# Each cluster is then represented by the number of crimes in each neighborhood.

# Step 1: Import plugins module.
from folium import plugins

# Step 2: Create a clean copy of the map of San Francisco.
sanfran_map = folium.Map(location = [latitude, longitude], zoom_start = 12)

# Step 3: Instantiate a mark cluster object for the incidents in the dataframe.
incidents = plugins.MarkerCluster().add_to(sanfran_map)

# Step 4: loop through the dataframe and add each data point to the mark cluster
for lat, lng, label, in zip(df_incidents.Y, df_incidents.X, df_incidents.Category):
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(incidents)

# Step 5: Display the map.
sanfran_map

### Note: 
When zooming out all the way, all markers are grouped into one cluster, the global cluster, of 100 markers or crimes, which is the total number of crimes in the dataframe. Once start zooming in, the global cluster will start breaking up into smaller clusters. Zooming in all the way will result in individual markers.


## (d) Choropleth Maps

A Choropleth map is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per-capita income. The choropleth map provides an easy way to visualize how a measurement varies across a geographic area, or it shows the level of variability within a region. 


### Choropleth Map - Downloading data and reading it into a dataframe


In [30]:
from js import fetch
import io

# Step 1: Download the Canadian Immigration dataset and read it into a pandas dataframe.
URL = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/Canada.xlsx'
resp = await fetch(URL)
text = io.BytesIO((await resp.arrayBuffer()).to_py())

df_can = pd.read_excel(
    text,
    sheet_name='Canada by Citizenship',
    skiprows=range(20),
    skipfooter=2)

print('Data downloaded and read into a dataframe!')

Data downloaded and read into a dataframe!


In [31]:
# View the forst 5 rows of the dataframe. 
df_can.head()

Unnamed: 0,Type,Coverage,OdName,AREA,AreaName,REG,RegName,DEV,DevName,1980,...,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013
0,Immigrants,Foreigners,Afghanistan,935,Asia,5501,Southern Asia,902,Developing regions,16,...,2978,3436,3009,2652,2111,1746,1758,2203,2635,2004
1,Immigrants,Foreigners,Albania,908,Europe,925,Southern Europe,901,Developed regions,1,...,1450,1223,856,702,560,716,561,539,620,603
2,Immigrants,Foreigners,Algeria,903,Africa,912,Northern Africa,902,Developing regions,80,...,3616,3626,4807,3623,4005,5393,4752,4325,3774,4331
3,Immigrants,Foreigners,American Samoa,909,Oceania,957,Polynesia,902,Developing regions,0,...,0,0,1,0,0,0,0,0,0,0
4,Immigrants,Foreigners,Andorra,908,Europe,925,Southern Europe,901,Developed regions,0,...,0,0,1,1,0,0,0,0,1,1


In [32]:
# Print the dimensions of the dataframe (the number of entries in the dataset).
print(df_can.shape)

(195, 43)



### Choropleth Map - Data Wrangling


In [33]:
# Clean up the dataset to remove unnecessary columns (eg. REG). 
df_can.drop(['AREA','REG','DEV','Type','Coverage'], axis=1, inplace=True)

# Rename the columns so that they make sense.
df_can.rename(columns={'OdName':'Country', 'AreaName':'Continent','RegName':'Region'}, inplace=True)

# Make all column labels of type string.
df_can.columns = list(map(str, df_can.columns))

# Add the "Total" column.
df_can['Total'] = df_can.sum(axis=1)

# Cast years that will be used in this lab to string type and create a list of all years- useful for plotting later on.
years = list(map(str, range(1980, 2014)))
print ('data dimensions:', df_can.shape)

  df_can['Total'] = df_can.sum(axis=1)


data dimensions: (195, 39)


In [34]:
# View the first five rows of the modified dataframe.
df_can.head()

Unnamed: 0,Country,Continent,Region,DevName,1980,1981,1982,1983,1984,1985,...,2005,2006,2007,2008,2009,2010,2011,2012,2013,Total
0,Afghanistan,Asia,Southern Asia,Developing regions,16,39,39,47,71,340,...,3436,3009,2652,2111,1746,1758,2203,2635,2004,58639
1,Albania,Europe,Southern Europe,Developed regions,1,0,0,0,0,0,...,1223,856,702,560,716,561,539,620,603,15699
2,Algeria,Africa,Northern Africa,Developing regions,80,67,71,69,63,44,...,3626,4807,3623,4005,5393,4752,4325,3774,4331,69439
3,American Samoa,Oceania,Polynesia,Developing regions,0,1,0,0,0,0,...,0,1,0,0,0,0,0,0,0,6
4,Andorra,Europe,Southern Europe,Developed regions,0,0,0,0,0,0,...,0,1,1,0,0,0,0,1,1,15


In [35]:
# Download all world countries geojson file in preparation of creating a choropleth map. 

from js import fetch
import io
import json

URL = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/world_countries.json'
resp = await fetch(URL)
data = io.BytesIO((await resp.arrayBuffer()).to_py())
world_geo = json.load(data)

print('GeoJSON file loaded!')

GeoJSON file loaded!



### Choropleth Map - Immigration to Canada from all countries 


In [42]:
# Goal: Generate choropleth map of immigration to Canada from 1980 to 2013.

# Step 1: Create a numpy array of length 6 and has linear spacing from the minimum total immigration to the maximum total immigration
threshold_scale = np.linspace(df_can['Total'].min(),
                              df_can['Total'].max(),
                              6, dtype=int)
threshold_scale = threshold_scale.tolist() # change the numpy array to a list
threshold_scale[-1] = threshold_scale[-1] + 1 # make sure that the last value of the list is greater than the maximum immigration

# Step 2: let Folium determine the scale.
world_map = folium.Map(location=[0, 0], zoom_start=2)
world_map.choropleth(
    geo_data=world_geo,
    data=df_can,
    columns=['Country', 'Total'],
    key_on='feature.properties.name',
    threshold_scale=threshold_scale,
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Immigration to Canada',
    reset=True
)
world_map