# Investigating the Use of Immigration Settlement Program Services

**By Chimdindu Ohaegbu, Junior Research Officer with the IRCC Chief Data Officer Branch (CDO) Fall 2021**

## Project Description
### Introduction

Immigration, Refugees, and Citizenship Canada (IRCC) works with many partners to help new arrivals to Canada integrate and settle into Canadian society. These partners can request funding through the following four initiatives:

- Pre-Arrival Settlement Services
- Racialized Newcomer Women Pilot
- Service Delivery Improvement 
- Action Plan for Official Languages Francophone Immigration Pathway-

These initiatives serve several broad purposes. The Racialized Newcomer Women Pilot and the Action Plan for Official Languages Francophone Immigration Pathway help specific populations thrive in Canada, while the Pre-Arrival Settlement Services are designed to assist certain groups of Permanent Residents and refugees in beginning their setttlement process prior to arriving. Lastly, Service Delivery Improvement serves as a form of research and development into new ideas and best practices for delivering settlement programs. 

Further information on IRCC Settlement Program Initiatives can be found here: https://www.canada.ca/en/immigration-refugees-citizenship/corporate/partners-service-providers/settlement-program-initiatives.html

The specific data set being utilized for this analysis is called _IRCC Settlement Program Initiatives: Active Agreements as of Jan 2021_. It is available under the Government of Canada's Open Data initiative and thus is freely accessible on their website, here: https://open.canada.ca/data/en/dataset/0f249746-7337-4ef7-8ecd-a5f9db4c8b62. It contains information regarding which settlement inititiative is being invoked, what activities are involved, as well as start and end dates, quantity of funding provided, and cities in which the programs took place from 2017 to 2020.

### Focus of this Analysis
This investigation will answer the following questions:
1. Which settlement initiatives were used the most often in the data set? 
2. Based on start dates, which year saw the most settlement services requested overall?

This data set was chosen for analysis due to its categorical nature and how recently the data was collected. The questions asked will offer a clear snapshot into the popularity of the different settlement initiatives through the period from 2017-2020, as well as which years saw changes in demand for these services relative to the others. This information can then be crossed checked with IRCC policy changes during this time period to establish possible causes for the trends presented here.

The intended audience for this investigation are people who are interested in immigration programs, but who may not be familiar with the data itself. It is expected that the users of this Jupyter Notebook will have some familiarity in Python, although comments are provided to explain how the code works.



Data Source: https://open.canada.ca/data/dataset/0f249746-7337-4ef7-8ecd-a5f9db4c8b62/resource/fb87a39d-38e0-42e2-a167-aced0a422955/download/ircc-sn-initiatives-report-q3-2020-2021.csv

### Getting Started

We'll begin by importing the libraries we need, as well as the data itself. We print the first 10 rows to ensure everything is formatted properly.

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px

url = "https://open.canada.ca/data/dataset/0f249746-7337-4ef7-8ecd-a5f9db4c8b62/resource/fb87a39d-38e0-42e2-a167-aced0a422955/download/ircc-sn-initiatives-report-q3-2020-2021.csv"
Names = ['Agreement Number', "Titre de l'entente FR", "Initiative EN","Initiative FR", "Activities EN","Activités FR","Start Date","End Date","Agreement Value","Organization","City","Province/Territory","Country"]
IRCC_Settlement_df =pd.DataFrame(pd.read_csv(url, header = 6, encoding='latin1'))
print(IRCC_Settlement_df.head(10))


  Agreement Number\nNumro de l'entente\n  \
0                              O191001001   
1                              O193935001   
2                              O193935002   
3                              O193935003   
4                              O193935004   
5                              O193935005   
6                              O193935006   
7                              O198719001   
8                              O198719002   
9                              O198719003   

                                  Agreement Title EN  \
0               Settlement Online Pre-Arrival (SOPA)   
1  Global Onboarding of Talent Initiative (GO Tal...   
2  Next Stop Canada Pre-Arrival Coordinated Servi...   
3          Pre-Arrival Supports and Services Program   
4                                           Build ON   
5  Canadian Employment Connections and Entreprene...   
6         Canada InfoNet (formerly known as CanPrep)   
7                                Planning for Canada   


Note that trying to read the csv as-is will produce the following error: “UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 14269: invalid continuation byte”. A tip to fix it was obtained from various sites including Stack Overflow, hence why the argument "encoding = 'latin1'" has been added. Sources:                                                                 odegrepper.com/code-examples/python/UnicodeDecodeError%3A+%27utf-8%27+codec+can%27t+decode+byte+0xe9+in+position+14269%3A+invalid+continuation+byte https://stackoverflow.com/questions/5552555/unicodedecodeerror-invalid-continuation-byte



## Analysis
### Question 1: Which settlement initiatives were used the most often in the data set?

To answer Question 1, we use pandas Series.str.count to note the appearance of each of the following strings is present in the column Initiative EN as either True or False:

- "Pre-Arrival Settlement Services"
- "Visible Minority Newcomer Women" **Can be abbreviated as VMNW outside of the code*
- "Service Delivery Improvement"    **Can be abbreviated as SDI outside of the code*
- "Action Plan for Official Languages Francophone Immigration Pathway"

We then apply pandas Series.sum to add up all the instances of True and obtain an official count.

**Note:** VMNW is the name present in the csv file, but the name had been changed to Racialized Newcomer Women Pilot before this notebook was created.


In [2]:
Initiative_EN = pd.Series(IRCC_Settlement_df["Initiative EN"])

Pre_Arrival_sum = pd.Series.sum(Initiative_EN.str.count("Pre-Arrival Settlement Services"))

VMNW_sum = pd.Series.sum(Initiative_EN.str.count("Visible Minority Newcomer Women"))

SDI_sum = pd.Series.sum(Initiative_EN.str.count("Service Delivery Improvement"))

Offic_Lang_Fr_sum = pd.Series.sum(Initiative_EN.str.count("Action Plan for Official Languages Francophone Integration Pathway"))

print ("Pre_Arrival_count: ",Pre_Arrival_sum,"|VMNW_count: ",VMNW_sum, "|SDI_count: ", SDI_sum,"|Offic_Lang_Fr_count: ", Offic_Lang_Fr_sum)

Pre_Arrival_count:  17 |VMNW_count:  19 |SDI_count:  76 |Offic_Lang_Fr_count:  35


We then group our individual pandas Series into a data frame to display the results more cleanly, and use plotly.express to visualize the distribution as a pie chart.

In [15]:
Initiatives_dict = {"Pre-Arrival Settlement Services": Pre_Arrival_sum, "Visible Minority Newcomer Women": VMNW_sum,"Service Delivery Improvement": SDI_sum,"Action Plan for Official Languages Francophone Integration Pathway": Offic_Lang_Fr_sum} 
Initiatives_Data = pd.DataFrame(list(Initiatives_dict.items()), columns = ['Initiatives','Count'])
print (Initiatives_Data)

colour_list = ['#02818a','#67a9cf','#f6eff7','#bdc9e1',]
Initiatives_Chart = px.pie(Initiatives_Data,values ='Count', names = 'Initiatives', title = 'Demand for IRCC Settlement Programs', color_discrete_sequence = colour_list)
Initiatives_Chart.show()

                                         Initiatives  Count
0                    Pre-Arrival Settlement Services     17
1                    Visible Minority Newcomer Women     19
2                       Service Delivery Improvement     76
3  Action Plan for Official Languages Francophone...     35


**Note:** The colour palette of the pie chart is colour-blind safe, and was obtained from: https://colorbrewer2.org/#type=sequential&scheme=PuBuGn&n=4

Thus, we can see that more than half of the settlement service initiatives being run between 2017 and 2020 are for Settlement Delivery Service Improvement at 51.7%. The official languages program for Francophone Integration comes second at 23.8%, while VMNW and Pre-Arrival Settlement Services are utitlized to a similar extent at 12.9% and 11.6%, respectively.

Please note that some code was borrowed from https://datatofish.com/dictionary-to-dataframe/ to prevent an error from being raised when trying to pass a dictionary to pd.DataFrame and keep the columns intact. Borrowed code: list(Initiatives_dict.items()). 

### Question 2: Based on start dates, which year saw the most settlement services requested overall?

The code for question 2 is very similar to that of question 1, however the results are visualized as a bar graph instead. 
The bar graph makes it easier to see the stark differences in the amount of services requested in some years compared to others, as we'll see below.

In [4]:
Start_Date = pd.Series(IRCC_Settlement_df["Start Date\nDate de dbut\n"])

Starts_2020 = pd.Series.sum(Start_Date.str.count("2020"))

Starts_2019 = pd.Series.sum(Start_Date.str.count("2019"))

Starts_2018= pd.Series.sum(Start_Date.str.count("2018"))

Starts_2017 = pd.Series.sum(Start_Date.str.count("2017"))

print ("Started in 2020: ",Starts_2020 ,"|Started in 2019: ",Starts_2019, "|Started in 2018: ", Starts_2018,"|Started in 2017: ", Starts_2017)

Started in 2020:  31 |Started in 2019:  48 |Started in 2018:  67 |Started in 2017:  1


In [5]:
Start_Dates_dict = {"2017": Starts_2017 , "2018": Starts_2018 ,"2019": Starts_2019,"2020": Starts_2020} 
Start_Dates_Data = pd.DataFrame(list(Start_Dates_dict.items()), columns = ['Year','Count'])
print (Start_Dates_Data)

Start_Dates_Graph = px.bar(Start_Dates_Data, x = 'Year', y ='Count', title = 'Number of Times Any Settlement Program was Used Per Year')
Start_Dates_Graph.show()


   Year  Count
0  2017      1
1  2018     67
2  2019     48
3  2020     31


It is easy to see that there were significantly more services requested in 2018 than any other year, and that there was a massive increase in these requests between 2017 and 2018. This increase can be explained, however, by the start dates of some of the settlement inititiatives listed in the data set. Service Delivery Improvement (SDI) funding began in 2017, while the Official Languages Action Plan (OLAP) is intended to span from 2018-2023. As seen in the pie chart earlier, these two initiatives represent a combined 75.5% of the total dataset, which spans from 2017-2020. Thus, it is logical that there were less services requested in 2017; the most utilized ones were either nascent or non-existent.

Additionally, 2020 marked the beginning of the COVID-19 pandemic for Canada, which saw decreases in immigration and permanent resident applications. As such, a decrease in settlement service requests could be attributed to those factors, although more extensive research could be done for a more detailed picture of all the contributing factors. **Source: https://www150.statcan.gc.ca/n1/daily-quotidien/210318/dq210318c-eng.htm**

Additionally, the combination of the two data measurements of most requested service and busiest year allow for some analysis of the value of specific programs. The skyrocketing popularity of the settlement service programs in 2018 after the SDI and OLAP initiatives became established can help inform future decisions about funding allocation, given that these metrics could potentially demonstrate community need for these programs. However, these metrics will need to be contextualized and paired with qualitative data before any decisions should be made. 

For instance, the VMNW program focuses on a specific subset of the population (racialized persons), and then further subdivides them by gender. As such, that program could easily have fewer requests due to fewer eligible participants. Thus, this metric would need to be cross-referenced with qualitative data on participant satisfaction, and quantitative data from the SDI reports to ascertain the value of individual programs to those involved. As well, the dataset does not speak to how much money was spent building awareness of each respective settlement program in communities; perhaps some were more requested than others because they were better known. Thus, the analysis presented in this notebook is a good starting point for discussions about social programs the government offers, but it must be contextualized with other datasets to be an effective aid in policy analysis.

## Conclusion

From this data set, we learned that the Settlement Delivery Improvements funding (SDI) was the most utilized settlement initiative, while Pre-Arrival settlement services were utitlized the least from 2017 to 2020. We also learned that 2018 was the busiest year for these requests, while 2017 was the least busy. The data was also cross-referenced with some easily obtainable facts such as the start dates of particular programs and the pandemic in order to understand some potential causes for the numbers displayed. Lastly, we learned to look for what the data doesn't say and how it could affect the numbers we see, even if the numerical analysis itself would remain unchanged by such insight. 