# Canada Immigration 2013
### *Normalized to world population*
#### James Cage April 11, 2019

As a class exercise for Coursera's *Data Visualization with Python*, I worked through an example of creating a choropleth map showing immigration to Canada in 2013. Each country was shaded by total immigration from that country. This is a great map if you are interested in the "flavor" of immigration coming into Canada. That is, immigration from the Canadian perspective. But this map may just be world population map. China and India have the largest populations on earth - is it any surprise that a lot of immigrants come from there? 

What if you want to know in which countries are people most likely to emigrate to The Great White North?

To answer this, I obtain world population in 2013, and use it display the fraction of people in each country who moved to Canada.

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

1. [Zero](#0)<br>
2. [Two](#2)<br>
3. [Four](#4) <br>
4. [Six](#6) <br>
</div>
<hr>

# Setup

Material in this section adapted from [a notebook](https://labs.cognitiveclass.ai/tools/jupyterlab/lab/tree/labs/DV0101EN/DV0101EN-3-5-1-Generating-Maps-in-Python-py-v2.0.ipynb) created by [Alex Aklson](https://www.linkedin.com/in/aklson/) for a course on **Coursera** called *Data Visualization with Python*.

#### Import Libraries

In [2]:
import numpy as np  # useful for many scientific computing in Python
import pandas as pd # primary data structure library

#### Install Folium & Import Library

Note: If Folium is **not installed**, prepend this line to the following cell:

!conda install -c conda-forge folium=0.5.0 --yes

In [3]:
# Un-comment the folloowing line if Folium is not installed
#!conda install -c conda-forge folium=0.5.0 --yes

import folium
print('Folium installed and imported!')

Folium installed and imported!


#### Obtain & Clean Canadian Immigration Information

**NOTE**: The data provided for the class included information for 49 regions that were not in the geojson file (see below). This was due to spelling differences for the country names between the files, as well as outright omissions.  Most of these had very little immigration to Canada in 2013, but there were several major ones (including the UK, Russia, Iran, and Venezuela). I corrected the names for the 7 largest missing countries and saved the file as 'Canada2.xlsx', which I use below.

Now download and import the Canadian immigration dataset using *pandas* `read_excel()` method. **xlrd** is a module which *pandas* requires to read in excel files. Use it to download the dataset and read it into a pandas dataframe:

In [4]:
# If xlrd is not installed, uncomment the following line
#!conda install -c anaconda xlrd --yes

# Source of "raw" Excel file: 'https://ibm.box.com/shared/static/lw190pt9zpy5bd1ptyg2aw15awomz9pu.xlsx'

df_can = pd.read_excel('https://github.com/JamesDCage/31-31-Canada-01/blob/master/Canada2.xlsx?raw=true',
                     sheet_name='Canada by Citizenship',
                     skiprows=range(20),
                     skipfooter=2)

print('Data downloaded and read into a dataframe!')

Data downloaded and read into a dataframe!


If desired, use some or all of the following to take a look at the dataset.

In [5]:
print(df_can.shape)
df_can.head()

(195, 43)


Unnamed: 0,Type,Coverage,OdName,AREA,AreaName,REG,RegName,DEV,DevName,1980,...,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013
0,Immigrants,Foreigners,Afghanistan,935,Asia,5501,Southern Asia,902,Developing regions,16,...,2978,3436,3009,2652,2111,1746,1758,2203,2635,2004
1,Immigrants,Foreigners,Albania,908,Europe,925,Southern Europe,901,Developed regions,1,...,1450,1223,856,702,560,716,561,539,620,603
2,Immigrants,Foreigners,Algeria,903,Africa,912,Northern Africa,902,Developing regions,80,...,3616,3626,4807,3623,4005,5393,4752,4325,3774,4331
3,Immigrants,Foreigners,American Samoa,909,Oceania,957,Polynesia,902,Developing regions,0,...,0,0,1,0,0,0,0,0,0,0
4,Immigrants,Foreigners,Andorra,908,Europe,925,Southern Europe,901,Developed regions,0,...,0,0,1,1,0,0,0,0,1,1


Clean up data. We will make some modifications to the original dataset to make it easier to create our visualizations. Refer to *Introduction to Matplotlib and Line Plots* and *Area Plots, Histograms, and Bar Plots* notebooks for a detailed description of this preprocessing.

In [6]:
# clean up the dataset to remove unnecessary columns (eg. REG) 
df_can.drop(['AREA','REG','DEV','Type','Coverage'], axis=1, inplace=True)

# let's rename the columns so that they make sense
df_can.rename(columns={'OdName':'Country', 'AreaName':'Continent','RegName':'Region'}, inplace=True)

# for sake of consistency, let's also make all column labels of type string
df_can.columns = list(map(str, df_can.columns))

# add total column
df_can['Total'] = df_can.sum(axis=1)

# years that we will be using in this lesson - useful for plotting later on
years = list(map(str, range(1980, 2014)))
print ('data dimensions:', df_can.shape)

data dimensions: (195, 39)


Take a look at the cleaned database

In [7]:
df_can.head(2)

Unnamed: 0,Country,Continent,Region,DevName,1980,1981,1982,1983,1984,1985,...,2005,2006,2007,2008,2009,2010,2011,2012,2013,Total
0,Afghanistan,Asia,Southern Asia,Developing regions,16,39,39,47,71,340,...,3436,3009,2652,2111,1746,1758,2203,2635,2004,58639
1,Albania,Europe,Southern Europe,Developed regions,1,0,0,0,0,0,...,1223,856,702,560,716,561,539,620,603,15699


In order to create a `Choropleth` map, we need a GeoJSON file that defines the areas/boundaries of the state, county, or country that we are interested in. In our case, a GeoJSON that defines the boundaries of all world countries. We will use the map from the Coursera class and name it **world_countries.json**.

In [8]:
# download countries geojson file
!wget --quiet https://ibm.box.com/shared/static/cto2qv7nx6yq19logfcissyy4euo8lho.json -O world_countries.json
    
print('GeoJSON file downloaded!')

GeoJSON file downloaded!


# Get Population Data, Clean, and Merge

In this step, I obtained the world population data from the [Worldbank Open Data website](https://data.worldbank.org/indicator/sp.pop.totl?end=2013&start=1980). Many countries of interest were named in ways inconsistent with the geojson obtained above and the Canadian immigration data. (For example, "United States" instead of "United States of America".) This I cleaned the data in a separate notebook and created a new CSV file, where country names are consistent. For details, see "Clean world_countries.json.ipynb". 

In [9]:
# Import the data I cleaned in a separate Jupyter notebook:

# df_world_pop = pd.read_excel('http://api.worldbank.org/v2/en/indicator/SP.POP.TOTL?downloadformat=excel',
#                      sheet_name='Data',
#                      skiprows=range(3))

df_world_pop = pd.read_csv('https://raw.githubusercontent.com/JamesDCage/31-31-Canada-01/master/world_population.csv')

print('Data downloaded and read into a dataframe!')


Data downloaded and read into a dataframe!


In [11]:
print(df_world_pop.shape)
df_world_pop.head()

(264, 36)


Unnamed: 0,Country,ID,1980,1981,1982,1983,1984,1985,1986,1987,...,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013
0,Aruba,ABW,60096.0,60567.0,61345.0,62201.0,62836.0,63026.0,62644.0,61833.0,...,98737.0,100031.0,100832.0,101220.0,101353.0,101453.0,101669.0,102053.0,102577.0,103187.0
1,Afghanistan,AFG,13248370.0,13053954.0,12749645.0,12389269.0,12047115.0,11783050.0,11601041.0,11502761.0,...,24118979.0,25070798.0,25893450.0,26616792.0,27294031.0,28004331.0,28803167.0,29708599.0,30696958.0,31731688.0
2,Angola,AGO,8929900.0,9244507.0,9582156.0,9931562.0,10277321.0,10609042.0,10921037.0,11218268.0,...,18865716.0,19552542.0,20262399.0,20997687.0,21759420.0,22549547.0,23369131.0,24218565.0,25096150.0,25998340.0
3,Albania,ALB,2671997.0,2726056.0,2784278.0,2843960.0,2904429.0,2964762.0,3022635.0,3083605.0,...,3026939.0,3011487.0,2992547.0,2970017.0,2947314.0,2927519.0,2913021.0,2905195.0,2900401.0,2895092.0
4,Andorra,AND,36067.0,37500.0,39114.0,40867.0,42706.0,44600.0,46517.0,48455.0,...,76244.0,78867.0,80991.0,82683.0,83861.0,84462.0,84449.0,83751.0,82431.0,80788.0


# Choropleth Maps

Here's the choropleth map we created in Coursera's Data Visualization with Python. (After the class, I cleaned up the data a bit to show some important missing countries, like the UK, Iran, and Russia.) It shows the source countries for Canadian immigration, and I suspect it's skewed a bit by outliers (China, India). Perhaps a log scale would work better? But that's for another day.

For now I would like to divide immigration by the population of the source country, to get a fraction of each country's citizenry that emigrated to Canada. This will show us how potential emmigrants view Canada. I will use units of emmigrants per 100,000 of population.

# Reference - delete after completion