# Unemployment Rate Data Analytics
### DATA 601 Fall 2020
### By Bowen Li, Zhan(Gerry) Lin, Jialing Cai

## Introduction

For our current society, having a paid job will enables individual to obtain the necessities of life as well as other luxury spending for healthy living(Association, 2006). For most of people, employment provide them a sense of identity, social class and social connectedness. However, due to the impact of Global pandemic and declination of international oil price. More and more people lost their jobs worldwide. We are get used to hear from news that different industry sectors are firing or planning to lay off their employees. From government of Alberta, the unemployment rate’s peak for year 2020 is around 15.5%(Government of Alberta,2020). Which means among 100 of people, around 15-16 of people’s daily income been cut off.

The domain of our project is to see how unemployment change in Canada from past to present. What will be the unemployment rate change for people from different industry sectors? Our aim for this project is to investigate and analyze the unemployed rate change. Also, we want to see what will be the reason cause the fluctuation of unemployed rate. The population will be the non-institutionalised population 15 years of age and over who are from different Canada’s provinces and territories(Statistics Canada,2020). Also, Will different industry sectors show different unemployment rate change pattern? The data are collected directly from mandatory survey respondents(Statistics Canada,2020). The responses are captured directly from computerized questionnaire and sent electronically to Statistics Canada Regional Office.

## Dataset

There are two datasets being used in this project. One mainly includes population, labor force, employment, full-time employment, part-time employment and unemployment from the years 1976-2020 across Canda. It also indudes employment data regarding sexes, age groups and geography. There are 11 sub-items under geography whicn indicates the provinces in Canada. There are two groups under sex and 9 groups from 15 years to 64 years under age.This dataset is provided by the Statistic Canada and is available under this portal https://doi.org/10.25318/1410028701-eng. The data contained in these file is in structured and tabular form available in .xlsx file format. 

The second dataset contains monthly employment population of different industries from Jan 1976 until the Aug 2020 and available in .csv file format. The datasets are collected from Canada Statistic https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1410020201, and are permissible for public use.he data contained in these file is in structured and tabular form. 

For this project, we will mainly focus on sex groups(females and males) , age group(15 to 24years, 25 to 54years and 55 to 64 years) and different industry. We initially want to analyze the different age group and sex group with respect to different industry sectors. However, it is difficult for us to find the database corresponding to that. So in this paper, we will handle it separately.

In [35]:
import pandas as pd
#import folium
import matplotlib.pyplot as plt
import datetime as dt
import plotly.express as px
import plotly.graph_objs as go
import plotly.offline as py
import geopandas as gpd
import json
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
import time
from plotly.subplots import make_subplots
import matplotlib as mpl
import matplotlib.ticker as mtick
import matplotlib.dates as mdates
pd.options.mode.chained_assignment = None 

## Data Wrangling
1. We are going to filter data from 1976-2020 to 2000-2020 since the aim is to get the recent trends. 
2. Merge two data set together and make sure they have the same date format. 
3. Tables will be created to support the visulization for each questions.


In [36]:
#load data set
file_emp_bas=r'./Data/emp_basic_data.csv'
file_emp_ind=r'./Data/emp_industry_data.csv'

# unnecessary columns
drop_columns=['DGUID','Statistics','Data type',
              'UOM','UOM_ID','SCALAR_FACTOR','SCALAR_ID',
              'VECTOR','COORDINATE','STATUS','SYMBOL','TERMINATED','DECIMALS']

# read raw data into dataframe
df_emp_bas=pd.read_csv(file_emp_bas)
df_emp_ind=pd.read_csv(file_emp_ind)

# drop unnecessary columns
df_emp_bas.drop(drop_columns, axis=1, inplace=True)
df_emp_ind.drop(drop_columns, axis=1, inplace=True)
df_emp_bas['YEARMMDD'],df_emp_ind['YEARMMDD']=pd.to_datetime(df_emp_bas['REF_DATE']),pd.to_datetime(df_emp_ind['REF_DATE'])
df_emp_bas.drop(['REF_DATE'], axis=1, inplace=True)
df_emp_ind.drop(['REF_DATE'], axis=1, inplace=True)
df_emp_bas,df_emp_ind=df_emp_bas[(df_emp_bas['YEARMMDD']>='20000101')],df_emp_ind[(df_emp_ind['YEARMMDD']>='20000101')]
pgeo=['Alberta','British Columbia','Manitoba','New Brunswick',
      'Newfoundland and Labrador','Northwest Territories','Nova Scotia',
      'Nunavut','Ontario','Prince Edward Island','Quebec','Saskatchewan','Yukon','Canada']
             




Columns (15) have mixed types.Specify dtype option on import or set low_memory=False.


Columns (13) have mixed types.Specify dtype option on import or set low_memory=False.



In [37]:
#prepare datafram alberta data only 
ts=df_emp_bas[(df_emp_bas['Labour force characteristics'].isin(['Unemployment rate','Labour force','Employment']))&
             (df_emp_bas['Sex']=='Both sexes')&
             (df_emp_bas['Age group']=='15 years and over')]
#              (df_emp_bas['GEO']=='Alberta')]
lf_df=ts[(ts['Labour force characteristics']=='Labour force')][
    ['GEO','YEARMMDD','VALUE','Labour force characteristics']]
em_df=ts[(ts['Labour force characteristics']=='Employment')][
    ['GEO','YEARMMDD','VALUE','Labour force characteristics']]
#Unemployment rate dataframe
un_emp_rate=ts[(ts['Labour force characteristics']=='Unemployment rate')][
    ['GEO','YEARMMDD','VALUE','Labour force characteristics']]



## Guiding Questions
1. What is unemployment rate's trend for Canada and Alberta?  
2. What is the distribution of provincial unemployment rate ?
3. The employement rate for different sex group in Canada from year 2000 to year 2020? 
4. What is employment distribution for different industries? 


### 1. What is unemployment rate's trend for Canada and Alberta? 
From this, we can tell the trend of unemployment rate goes over the years. Do they have the same pattern? What happened at the certain year that cause unemployment rate going up or down.

In [38]:


# display(un_emp_rate)
def show_prov_unemp(x):
    subfig = make_subplots(specs=[[{"secondary_y": True}]])
    fig1 = px.line(lf_df[(lf_df['GEO']==x)], x="YEARMMDD", color='Labour force characteristics', y="VALUE" )
    fig2 = px.line(em_df[(em_df['GEO']==x)], x="YEARMMDD", color='Labour force characteristics', y="VALUE")
    fig3 = px.line(un_emp_rate[(un_emp_rate['GEO']==x)], x="YEARMMDD", color='Labour force characteristics',y="VALUE")
    fig4 = px.line(un_emp_rate[(un_emp_rate['GEO']=='Canada')], x="YEARMMDD", color='Labour force characteristics',y="VALUE")
    fig3.update_traces(yaxis="y2",name=x+' Unemployment rate')
    fig4.update_traces(yaxis="y2",name='Canada Unemployment rate')
    subfig.add_traces(fig1.data + fig2.data+fig3.data+fig4.data)
    subfig.layout.width=900
    subfig.layout.height=500
    subfig.layout.xaxis.title="Year"
    subfig.layout.yaxis.title="Labour force characteristics & Employment(Thsu)"

    subfig.layout.yaxis.tickformat=','
    subfig.layout.yaxis2.type="log"
    subfig.layout.yaxis2.title="Unemployment rate"
    subfig.layout.yaxis2.ticksuffix="%"
    subfig.for_each_trace(lambda t: t.update(line=dict(color=t.marker.color)))
    subfig.layout.title=x+' Employment Statistics,2010-2020'
    subfig.layout.legend=dict(
        yanchor="top",
        y=0.99,
        xanchor="left",
        x=0.01
    )
    subfig.show()

    
interact(show_prov_unemp,x=widgets.Dropdown(options=pgeo,value='Alberta', description='Provinces and Territories'))    


interactive(children=(Dropdown(description='Provinces and Territories', options=('Alberta', 'British Columbia'…

<function __main__.show_prov_unemp(x)>

In [5]:
interact(show_prov_unemp,x=widgets.Dropdown(options=pgeo,value='Quebec', description='Provinces and Territories'))  

interactive(children=(Dropdown(description='Provinces and Territories', index=10, options=('Alberta', 'British…

<function __main__.show_prov_unemp(x)>

The first thing we get from the above plot we know both Alberta and Canada's Labour fore and employment keep increasing from 2000 to 2020. 
They have a similar unemployment rate pattern between 2000-2015. Alberta was doing better than Canada in the same period. 
Alberta and Canada's unemployment rate start to go different directions since Oct/Nov 2014, Alberta unemployment rate is higher than Canada Since Dec 2015

### 2. What is the distribution of provincial unemployment rate ? 

In [6]:
emp_simple_df=df_emp_bas[(df_emp_bas['Labour force characteristics'
                                    ].isin(['Unemployment rate','Labour force','Employment']))&
             (df_emp_bas['Sex']=='Both sexes')&
             (df_emp_bas['Age group']=='15 years and over')
             ]
un_emp_df=emp_simple_df[(emp_simple_df['Labour force characteristics']=='Unemployment rate')]
un_emp_df=un_emp_df.drop(['Sex','Age group'], axis=1)
un_emp_df=un_emp_df.groupby([un_emp_df['GEO'],un_emp_df['Labour force characteristics'],
                             un_emp_df['YEARMMDD'].dt.year]).agg({'VALUE':'mean'}).reset_index()
tpdf={'GEO':['Alberta','British Columbia','Manitoba','New Brunswick',
             'Newfoundland and Labrador','Northwest Territories','Nova Scotia',
             'Nunavut','Ontario','Prince Edward Island','Quebec','Saskatchewan','Yukon']}
can_provs = pd.DataFrame(tpdf, columns = ['GEO'])

In [7]:
with open(r'./Data/canada_provinces.geojson') as f:
    prov = json.load(f)
    

In [39]:


def func_bin(x):
    rt='No Data'
    if  (x<4) :
        rt='<4%'
    elif (x>=4) and (x<5):
        rt='4%-5%'
    elif (x>=5) and (x<6):
        rt='5%-6%'
    elif (x>=6) and (x<7):
        rt='6%-7%'
    elif (x>=7) and (x<8):
        rt='7%-8%'
    elif (x>=8) and (x<10):
        rt='8%-10%'
    elif (x>=10) and (x<12):
        rt='10%-12%'
    elif (x>=12):
        rt='>12%'
    return rt
def show_map(x=2011):
    
    map_df=un_emp_df[un_emp_df['YEARMMDD']==x]
    map_df=can_provs.merge(map_df,on='GEO',how='left')
    map_df['YEARMMDD']=map_df[['YEARMMDD']].fillna(value=x)
    map_df['Range']=map_df.apply(lambda x:func_bin(x['VALUE']), axis = 1)
    map_df['Rate']=map_df.apply(lambda x:str(round(x['VALUE'],2)), axis = 1)
    map_df['Labour force characteristics']=map_df[['Labour force characteristics']
                                                 ].fillna(value='Unemployment rate')
    map_df=map_df.sort_values(by=['VALUE'], ascending=False)
    
    fig = px.choropleth(map_df, geojson=prov, locations='GEO', color='Range',
                        featureidkey="properties.NAME",
                        hover_data =['Rate'],
                        hover_name='GEO',
                        projection='robinson',
                         color_discrete_map={'No Data':'#d9d9d9','<4%':'#fde0dd','4%-5%':'#fcc5c0',
                                             '5%-6%':'#fa9fb5','6%-7%':'#f768a1','7%-8%':'#dd3497',
                                            '8%-10%':'#ae017e','10%-12%':'#7a0177','>12%':'#49006a'},
                          )


    
    fig.update_layout(
    title={
        'text': 'Canada '+str(x)+' Unemployment rate',
        'y':1
        },
    margin={"r":0,"t":5,"l":0,"b":0})
    fig.update_layout()    
    fig.update_geos(fitbounds="locations", visible=False)
    fig.show()

years=range(2000,2021)
interact(show_map, x=widgets.Dropdown(options=years, value=2019,description='Year'))

interactive(children=(Dropdown(description='Year', index=19, options=(2000, 2001, 2002, 2003, 2004, 2005, 2006…

<function __main__.show_map(x=2011)>

In [40]:
years=range(2000,2021)
interact(show_map, x=widgets.Dropdown(options=years, value=2006,description='Year'));

interactive(children=(Dropdown(description='Year', index=6, options=(2000, 2001, 2002, 2003, 2004, 2005, 2006,…

### 3.The employement rate for different sex group in Canada from year 2000 to year 2020

In [None]:
group_female= df_emp_bas[(df_emp_bas['Sex']=='Females')&
                       (df_emp_bas['Age group']=='15 years and over')&
                       (df_emp_bas['Labour force characteristics']=='Unemployment rate')&
                       (df_emp_bas['GEO']=='Canada')].reset_index()
group_female['VALUE%']=group_female['VALUE']/100
display(group_female)
group_male= df_emp_bas[(df_emp_bas['Sex']=='Males')&
                       (df_emp_bas['Age group']=='15 years and over')&
                       (df_emp_bas['Labour force characteristics']=='Unemployment rate')&
                       (df_emp_bas['GEO']=='Canada')].reset_index()
group_male['VALUE%']=group_male['VALUE']/100
#display(group_male)
fig = go.Figure()
fig.add_trace(go.Scatter(x=group_female['YEARMMDD'],y=group_female['VALUE%'],mode='lines',name='Females'))
fig.add_trace(go.Scatter(x=group_male['YEARMMDD'],y=group_male['VALUE%'],mode='lines',name='Males'))
fig.update_layout(title='Canada Unemployment Rate From Year 2000 to Yeaer 2020 Among Different Sex Groups',
                 legend_title='Sex groups',
                 xaxis_title='Year 2000 to Year 2020',
                 yaxis_title='Unemployeement rate')
fig.show()

From the above group we can say the overall trend of unemployment rate for female is always lower compare to males. The unemployment rate for women is around 0.049 (May,2019) to 0.135(April,2020). The unemployment rate for male is around 0.58(Feb,2020) to 0.14(May,2020). For both sexes, the highest peak appears on the year 2020 and the year 2010. The sharp increase for unemployment rate on year 2020 is the consequence of the COVID-19 pandemic. According to the research (Roger S.McIntyre,Yena Lee,2020), approximately on-third of the global population are under some form of lock down or quarantine. For year 2009, Canada also facing the challenge of financial crisis for the entire year(2). From the above chart we can see, although females have lower unemployment rate compare to males. The trend of unemployment rate is independent compare to gender. 

In [None]:
def fun_show_unemp_rate_v(x):
    group_female= df_emp_bas[(df_emp_bas['Sex']=='Females')&
                       (df_emp_bas['Age group']=='15 years and over')&
                       (df_emp_bas['Labour force characteristics']=='Unemployment rate')&
                       (df_emp_bas['GEO']==x)].reset_index()
    group_female['VALUE%']=group_female['VALUE']/100
    #display(group_female)
    group_male= df_emp_bas[(df_emp_bas['Sex']=='Males')&
                       (df_emp_bas['Age group']=='15 years and over')&
                       (df_emp_bas['Labour force characteristics']=='Unemployment rate')&
                       (df_emp_bas['GEO']==x)].reset_index()
    group_male['VALUE%']=group_male['VALUE']/100    
    fig = go.Figure()
    fig.add_traces(go.Violin(
        y=group_female['VALUE%'],
        x=group_female['Sex'],
        name='Female Unemployment Rate',
        marker_color='#FF851B',
        box_visible=True,
        meanline_visible=True
    ))
    fig.add_traces(go.Violin(
        y=group_male['VALUE%'],
        x=group_male['Sex'],
        name='Male Unemployment Rate',
        marker_color='#3D9970',
        box_visible=True,
        meanline_visible=True
    ))
    fig.update_traces(points='all', jitter=0)
    fig.update_layout(title_text="Violion Plot for "+x+" from Year 2000 to Year 2020 by Different Sex Groups",
                      yaxis_title='Unemployement rate',
                      xaxis_title='Different Sex Groups')
    fig.show()

interact(fun_show_unemp_rate_v,x=widgets.Dropdown(options=pgeo,value='Alberta', description='Provinces and Territories'))
    


Above violin plot shows the statistical value for unemployment rate for different genders from year 2000 to year 2020. From the plot we can see most of the values for females is in the range 0.076 to 0.049. For males, the value is in the range 0.101 to 0.057. The overall mean value for females is 0.065 and males in 0.0745. The max value for females is 0.014 and males 0.135. Also, the violin plot will provide more information compares to box plot. From the graph we can see how unemployment rate is distributed for 20 years. It shows the density of the value, and how value is distributed. By reading the graph, for females, the unemployment rate (0.068) has the highest frequency. For males, the highest frequency is on 0.076. According to statistics Canada, the lower unemployment rate for females is caused by the growth of service industries in Canada. The unemployment rate in service industries is lower than the goods-producing sector. About 88.4% of employed women worked in service industries(Statistic Canada,2008). ref:Statistic Canada,(2008), Unemployment rates, by sex. Available at:https://www150.statcan.gc.ca/n1/pub/71-222-x/2008001/sectionb/b-unemployment-chomage-eng.htm

### The gender group for all Canada provinces for year 2014,2017,2020

In [None]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

df_emp_bas['Year']=df_emp_bas['YEARMMDD'].dt.year
df_emp_bas['Month']=df_emp_bas['YEARMMDD'].dt.month
df_emp_bas['Day']=df_emp_bas['YEARMMDD'].dt.day

df_emp_basn = pd.DataFrame({'GEO':df_emp_bas['GEO'],'Year':df_emp_bas['Year'],'Month':df_emp_bas['Month'],
                           'Day':df_emp_bas['Day'],'VALUE':df_emp_bas['VALUE'],'Sex':df_emp_bas['Sex'],
                           'Age_group':df_emp_bas['Age group'],
                           'Labour_for_cecharacteristics':df_emp_bas['Labour force characteristics']})

all_2014_f= df_emp_basn [df_emp_basn.Sex.isin(['Females'])]
#display(ab_2014)
all_2014_f = all_2014_f[all_2014_f.Labour_for_cecharacteristics.isin(['Unemployment rate'])]
all_2014_f = all_2014_f[all_2014_f.Year.isin([2014])]
all_2014_f = all_2014_f[all_2014_f.GEO!='Canada']
all_2014_f= all_2014_f.groupby(['Year','Sex','GEO'])['VALUE'].mean().reset_index()
#display(ab_2014_f)

all_2014_m= df_emp_basn [df_emp_basn.Sex.isin(['Males'])]
all_2014_m = all_2014_m[all_2014_m.Labour_for_cecharacteristics.isin(['Unemployment rate'])]
all_2014_m = all_2014_m[all_2014_m.Year.isin([2014])]
all_2014_m = all_2014_m [all_2014_m .GEO!='Canada']

all_2014_m= all_2014_m.groupby(['Year','Sex','GEO'])['VALUE'].mean().reset_index()

#display(ab_2014_m)
fig = make_subplots(rows=3, cols=1,shared_xaxes=True,subplot_titles=("Year 2014","Year 2017","Year 2020"),
                    vertical_spacing=0.1)
fig.append_trace(go.Bar(y=all_2014_f['VALUE'],x=all_2014_f['GEO'],name="Females 2014",marker_color='blue'),row=1,col=1)
fig.append_trace(go.Bar(y=all_2014_m['VALUE'],x=all_2014_m['GEO'],name="Males 2014",marker_color='red'),row=1,col=1)

all_2017_f= df_emp_basn [df_emp_basn.Sex.isin(['Females'])]
all_2017_f = all_2017_f[all_2017_f.Labour_for_cecharacteristics.isin(['Unemployment rate'])]
all_2017_f = all_2017_f[all_2017_f.Year.isin([2017])]
all_2017_f = all_2017_f[all_2017_f.GEO!='Canada']
all_2017_f= all_2017_f.groupby(['Year','Sex','GEO'])['VALUE'].mean().reset_index()

all_2017_m= df_emp_basn [df_emp_basn.Sex.isin(['Males'])]
all_2017_m = all_2017_m[all_2017_m.Labour_for_cecharacteristics.isin(['Unemployment rate'])]
all_2017_m = all_2017_m[all_2017_m.Year.isin([2017])]
all_2017_m = all_2017_m [all_2017_m .GEO!='Canada']
all_2017_m= all_2017_m.groupby(['Year','Sex','GEO'])['VALUE'].mean().reset_index()

fig.append_trace(go.Bar(y=all_2017_f['VALUE'],x=all_2017_f['GEO'],name="Females 2017",marker_color='blue'),row=2,col=1)
fig.append_trace(go.Bar(y=all_2017_m['VALUE'],x=all_2017_m['GEO'],name="Males 2017",marker_color='red'),row=2,col=1)

all_2020_f= df_emp_basn [df_emp_basn.Sex.isin(['Females'])]
all_2020_f = all_2020_f[all_2020_f.Labour_for_cecharacteristics.isin(['Unemployment rate'])]
all_2020_f = all_2020_f[all_2020_f.Year.isin([2020])]
all_2020_f = all_2020_f[all_2020_f.GEO!='Canada']
all_2020_f= all_2020_f.groupby(['Year','Sex','GEO'])['VALUE'].mean().reset_index()

all_2020_m= df_emp_basn [df_emp_basn.Sex.isin(['Males'])]
all_2020_m = all_2020_m[all_2020_m.Labour_for_cecharacteristics.isin(['Unemployment rate'])]
all_2020_m = all_2020_m[all_2020_m.Year.isin([2020])]
all_2020_m = all_2020_m [all_2020_m .GEO!='Canada']
all_2020_m= all_2020_m.groupby(['Year','Sex','GEO'])['VALUE'].mean().reset_index()

fig.append_trace(go.Bar(y=all_2020_f['VALUE'],x=all_2020_f['GEO'],name="Females 2020",marker_color='blue'),row=3,col=1)
fig.append_trace(go.Bar(y=all_2020_m['VALUE'],x=all_2020_m['GEO'],name="Males 2020",marker_color='red'),row=3,col=1)
fig.update_xaxes(title_text="Different Province", row=3, col=1)
fig.update_yaxes(title_text="Unemployement Rate(%)", row=1, col=1)
fig.update_yaxes(title_text="Unemployement Rate(%)", row=2, col=1)
fig.update_yaxes(title_text="Unemployement Rate(%)", row=3, col=1)

fig.update_layout(height=700,
                  width=700,
                  title_text="Unemployment For Different Sex Groups Between all Canada Provice For Year 2014, Year 2017, and Year 2020",
                  #yaxis_title='Unemployement Rate',
                  #xaxis_title='Different porvince',
                  legend_title='Sex Group',
                 )
fig.show()

From the above group we can say the overall trend of unemployment rate for female is always lower compare to males. The unemployment rate for women is around 0.049 (May,2019) to 0.135(April,2020). The unemployment rate for male is around 0.58(Feb,2020) to 0.14(May,2020). For both sexes, the highest peak appears on the year 2020 and the year 2010. The sharp increase for unemployment rate on year 2020 is the consequence of the COVID-19 pandemic. According to the research (Roger S.McIntyre,Yena Lee,2020), approximately on-third of the global population are under some form of lock down or quarantine. For year 2009, Canada also facing the challenge of financial crisis for the entire year(2). From the above chart we can see, although females have lower unemployment rate compare to males. The trend of unemployment rate is independent compare to gender.  

- The above graph present the averaged unemployment rate for different genders for year 2014, year 2017 and year 2020. For almost every province the unemployment rate for females is lower compare to males. This difference is more significant in Newfoundland and Labrador. When we look at the trend in Alberta. In year 2014, the unemployment rate for different sex groups is about the same. Males(5.197%) even have slightly lower unemployment rate compares to females(5.076). however, the unemployment rate for males starts to increase in the year 2017 and 2020. For year 2017, the unemployment rate for females stays in 7.58% and for males is 9.19%. Similarly in year 2020, the unemployment rate for women increased to 13.43% as for males is 13.1%.
- Another interesting finding from the graph is in the year 2017, for every provinces, the females unemployment rate is less than males. In year 2020 the unemployment rate becomes more even for province like Ontario, Quebec, Nova Scotia.

### 4.The Unemployement rate with different Age groups in Canada from Year 2000 to Year 2020 

In [None]:
group_15_to_24 = df_emp_bas[(df_emp_bas['Sex']=='Both sexes')&
                       (df_emp_bas['Age group']=='15 to 24 years')&
                       (df_emp_bas['Labour force characteristics']=='Unemployment rate')&
                       (df_emp_bas['GEO']=='Canada')].reset_index()
display(group_15_to_24)
group_15_to_24['VALUE%']=group_15_to_24['VALUE']/100
#display(group_15_to_24)

group_25_to_54 = df_emp_bas[(df_emp_bas['Sex']=='Both sexes')&
                            (df_emp_bas['Age group']=='25 to 54 years')&
                             (df_emp_bas['Labour force characteristics']=='Unemployment rate')&
                            (df_emp_bas['GEO']=='Canada')].reset_index()
group_25_to_54['VALUE%']=group_25_to_54['VALUE']/100
#display(group_25_to_54)
group_55_to_64 = df_emp_bas[(df_emp_bas['Sex']=='Both sexes')&
                            (df_emp_bas['Age group']=='55 to 64 years')&
                             (df_emp_bas['Labour force characteristics']=='Unemployment rate')&
                            (df_emp_bas['GEO']=='Canada')].reset_index()
group_55_to_64['VALUE%']=group_55_to_64['VALUE']/100

fig = go.Figure()
fig.add_trace(go.Scatter(x=group_15_to_24['YEARMMDD'],y=group_15_to_24['VALUE%'],mode='lines',name='15 to 24years'))
fig.add_trace(go.Scatter(x=group_25_to_54['YEARMMDD'],y=group_25_to_54['VALUE%'],mode='lines',name='25 to 54 years'))
fig.add_trace(go.Scatter(x=group_55_to_64['YEARMMDD'],y=group_55_to_64['VALUE%'],mode='lines',name='55 to 64 years'))
fig.update_layout(title='Canada Unmployment Rate From Year 2000 to Yeaer 2020 Among Different Age Groups',
                 legend_title='Age Groups',
                 xaxis_title='Year 2000 to Year 2020',
                 yaxis_title='Unemployment rate')
fig.show()

Because the trendfor 25to54 and 55 to 64 is about the same, so use box plot to varify the infromation.

In [None]:
group_15_to_24= group_15_to_24.groupby(['YEARMMDD','Age group'])['VALUE'].sum().reset_index()
#display(group_15_to_24)
group_25_to_54 = group_25_to_54.groupby(['YEARMMDD','Age group'])['VALUE'].sum().reset_index()
group_55_to_64 = group_55_to_64.groupby(['YEARMMDD','Age group'])['VALUE'].sum().reset_index()
fig = go.Figure()
group_15_to_24['VALUE%']=group_15_to_24['VALUE']/100
group_25_to_54['VALUE%']=group_25_to_54['VALUE']/100
group_55_to_64['VALUE%']=group_55_to_64['VALUE']/100

fig.add_trace(go.Box(
    y=group_15_to_24['VALUE%'],
    x=group_15_to_24['Age group'],
    name='Canada 15-24',
    marker_color='#3D9970',
    boxmean=True
))
fig.add_trace(go.Box(
    y=group_25_to_54['VALUE%'],
    x=group_25_to_54['Age group'],
    name='Canada 25-54',
    marker_color='#FF4136',
    boxmean=True
))
fig.add_trace(go.Box(
    y=group_55_to_64['VALUE%'],
    x=group_55_to_64['Age group'],
    name='Canada 55-64',
    marker_color='#FF851B',
    boxmean=True
))
fig.update_traces(boxpoints='all', jitter=0)
fig.update_layout(title_text="Box Plot for Canada from Year 2000 to Year 2020 by Different Age Groups",
                  yaxis_title='Unemployement rate',
                  xaxis_title='Different Age Groups')
fig.show()

In order to see more statistic result, the box plot will tell more story compare to the liner plot. From the box plot we can see the 15-24 age group has the highest mean(0.1323), median(0.1305), max(0.294) and min(0.102) value. The max value for 15-24 even reaches 0.294, for 22 to 54 and 55 to 64 age group, the max is only 0.118 and 0.122 respectively. 


15 to 24 years group has the largest outliner value. For 25 to 54 years and 55 to 64 years group, the 55 to 64years group has slightly higher outliner value. The max unemployment rate value for 55 to 64 years is 0.122, for 25 to 54years is 0.118. 

In [None]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

#display(df_emp_basn)

df_emp_bas['Year']=df_emp_bas['YEARMMDD'].dt.year
df_emp_bas['Month']=df_emp_bas['YEARMMDD'].dt.month
df_emp_bas['Day']=df_emp_bas['YEARMMDD'].dt.day

df_emp_basn = pd.DataFrame({'GEO':df_emp_bas['GEO'],'Year':df_emp_bas['Year'],'Month':df_emp_bas['Month'],
                           'Day':df_emp_bas['Day'],'VALUE':df_emp_bas['VALUE'],'Sex':df_emp_bas['Sex'],
                           'Age_group':df_emp_bas['Age group'],
                           'Labour_for_cecharacteristics':df_emp_bas['Labour force characteristics']})
#display(df_emp_basn)
alberta_2014_total = df_emp_basn [df_emp_basn.Age_group.isin(['15 to 24 years','25 to 54 years','55 years and over'])]
alberta_2014_total = alberta_2014_total[alberta_2014_total.GEO.isin(['Alberta'])]
alberta_2014_total = alberta_2014_total[alberta_2014_total.Labour_for_cecharacteristics.isin(['Unemployment rate'])]
alberta_2014_total = alberta_2014_total[alberta_2014_total.Year.isin([2014])]
#alberta_2014_10_total = alberta_2014_10_total[alberta_2014_10_total.Month.isin([10])]
alberta_2014_total = alberta_2014_total[alberta_2014_total.Sex.isin(['Both sexes'])]
alberta_2014_total= alberta_2014_total.groupby(['Year','Age_group'])['VALUE'].mean().reset_index()
alberta_2014_total['VALUE%']=alberta_2014_total['VALUE']/100
#display(alberta_2014_10_total)
#display(alberta_2014_10_55_over)
#display(albeta_2014_total)
CA_2014_10_total =df_emp_basn [df_emp_basn.Age_group.isin(['15 to 24 years','25 to 54 years','55 years and over'])]
CA_2014_10_total = CA_2014_10_total[CA_2014_10_total.GEO.isin(['Canada'])]
CA_2014_10_total = CA_2014_10_total[CA_2014_10_total.Labour_for_cecharacteristics.isin(['Unemployment rate'])]
CA_2014_10_total = CA_2014_10_total[CA_2014_10_total.Year.isin([2014])]
#BC_2014_10_total = BC_2014_10_total[BC_2014_10_total.Month.isin([10])]
CA_2014_10_total = CA_2014_10_total[CA_2014_10_total.Sex.isin(['Both sexes'])]
CA_2014_10_total= CA_2014_10_total.groupby(['Year','Age_group'])['VALUE'].mean().reset_index()
CA_2014_10_total['VALUE%']=CA_2014_10_total['VALUE']/100
#display(BC_2014_10_total)

AB_2017_10_total =df_emp_basn [df_emp_basn.Age_group.isin(['15 to 24 years','25 to 54 years','55 years and over'])]
AB_2017_10_total = AB_2017_10_total[AB_2017_10_total.GEO.isin(['Alberta'])]
AB_2017_10_total = AB_2017_10_total[AB_2017_10_total.Labour_for_cecharacteristics.isin(['Unemployment rate'])]
AB_2017_10_total = AB_2017_10_total[AB_2017_10_total.Year.isin([2017])]
#AB_2017_10_total = AB_2017_10_total[AB_2017_10_total.Month.isin([10])]
AB_2017_10_total = AB_2017_10_total[AB_2017_10_total.Sex.isin(['Both sexes'])]
AB_2017_10_total= AB_2017_10_total.groupby(['Year','Age_group'])['VALUE'].mean().reset_index()
AB_2017_10_total['VALUE%']=AB_2017_10_total['VALUE']/100

#display(AB_2017_10_total)

CA_2017_10_total =df_emp_basn [df_emp_basn.Age_group.isin(['15 to 24 years','25 to 54 years','55 years and over'])]
CA_2017_10_total = CA_2017_10_total[CA_2017_10_total.GEO.isin(['Canada'])]
CA_2017_10_total = CA_2017_10_total[CA_2017_10_total.Labour_for_cecharacteristics.isin(['Unemployment rate'])]
CA_2017_10_total = CA_2017_10_total[CA_2017_10_total.Year.isin([2017])]
#BC_2017_10_total = BC_2017_10_total[BC_2017_10_total.Month.isin([10])]
CA_2017_10_total = CA_2017_10_total[CA_2017_10_total.Sex.isin(['Both sexes'])]
CA_2017_10_total= CA_2017_10_total.groupby(['Year','Age_group'])['VALUE'].mean().reset_index()
CA_2017_10_total['VALUE%']=CA_2017_10_total['VALUE']/100

#display(AB_2017_10_total)
#fin = alberta_2014_10_total.append(BC_2014_10_total)
#fin = fin.append(AB_2017_10_total)
#display(fin)
AB_2020_6_total =df_emp_basn [df_emp_basn.Age_group.isin(['15 to 24 years','25 to 54 years','55 years and over'])]
AB_2020_6_total = AB_2020_6_total[AB_2020_6_total.GEO.isin(['Alberta'])]
AB_2020_6_total = AB_2020_6_total[AB_2020_6_total.Labour_for_cecharacteristics.isin(['Unemployment rate'])]
AB_2020_6_total = AB_2020_6_total[AB_2020_6_total.Year.isin([2020])]
#AB_2020_6_total = AB_2020_6_total[AB_2020_6_total.Month.isin([6])]
AB_2020_6_total = AB_2020_6_total[AB_2020_6_total.Sex.isin(['Both sexes'])]
AB_2020_6_total= AB_2020_6_total.groupby(['Year','Age_group'])['VALUE'].mean().reset_index()
AB_2020_6_total['VALUE%']=AB_2020_6_total['VALUE']/100


CA_2020_6_total =df_emp_basn [df_emp_basn.Age_group.isin(['15 to 24 years','25 to 54 years','55 years and over'])]
CA_2020_6_total = CA_2020_6_total[CA_2020_6_total.GEO.isin(['Canada'])]
CA_2020_6_total = CA_2020_6_total[CA_2020_6_total.Labour_for_cecharacteristics.isin(['Unemployment rate'])]
CA_2020_6_total = CA_2020_6_total[CA_2020_6_total.Year.isin([2020])]
#BC_2020_6_total = BC_2020_6_total[BC_2020_6_total.Month.isin([6])]
CA_2020_6_total = CA_2020_6_total[CA_2020_6_total.Sex.isin(['Both sexes'])]
CA_2020_6_total= CA_2020_6_total.groupby(['Year','Age_group'])['VALUE'].mean().reset_index()
CA_2020_6_total['VALUE%']=CA_2020_6_total['VALUE']/100

fig=go.Figure()

fig = make_subplots(rows=3, cols=1,subplot_titles=("Year 2014","Year 2017","Year 2020"),vertical_spacing=0.05,shared_xaxes=True)
fig.add_trace(go.Bar(x=alberta_2014_total["Age_group"],y=alberta_2014_total['VALUE%'],name='AB 2014',
                     marker_color='red'),
                     row=1,col=1)
fig.add_trace(go.Bar(x=CA_2014_10_total['Age_group'],y=CA_2014_10_total['VALUE%'],name='CA 2014',marker_color='blue'),
                     row=1,col=1)
fig.add_trace(go.Bar(x=AB_2017_10_total['Age_group'],y=AB_2017_10_total['VALUE%'],name='AB 2017',marker_color='red'),
                     row=2,col=1)
fig.add_trace(go.Bar(x=CA_2017_10_total['Age_group'],y=CA_2017_10_total['VALUE%'],name='CA 2017',marker_color='blue'),
                     row=2,col=1)
fig.add_trace(go.Bar(x=AB_2020_6_total['Age_group'],y=AB_2020_6_total['VALUE%'],name='AB 2020',marker_color='red'),
                     row=3,col=1)
fig.add_trace(go.Bar(x=CA_2020_6_total['Age_group'],y=CA_2020_6_total['VALUE%'],name='CA 2020',marker_color='blue'),
                     row=3,col=1)
fig.update_xaxes(title_text="Different Age Groups", row=3, col=1)
fig.update_yaxes(title_text="Unemployement Rate", row=1, col=1)
fig.update_yaxes(title_text="Unemployement Rate", row=2, col=1)
fig.update_yaxes(title_text="Unemployement Rate", row=3, col=1)

fig.update_layout(title='Unemployement Rate from AB and Canada For Year 2014, 2017, 2020 with Different Age Gourp',
                 #xaxis_title='Diffrent Age Groups',
                 #yaxis_title='Unemployement Rate',
                 legend_title='AB vs Canada',
                 height=800,
                 width = 700,
                 barmode='group')


fig.show()

From the above graph we can see the distribution of unemployment rate for Alberta and Canada in different age groups. First we compared the year 2014. For year 2014, the averaged unemployment rate for Alberta is less than Canada for all age groups. But the trend changed in year 2017 and 2020. We can see the unemployment value for Alberta is becoming higher and exceed the mean of unemployment rate in overall Canada. For example, the unemployment rate for 25 to 54 years age group is only 0.0385 while the unemployment rate in Canada is 0.057. But in year 2020, the unemployment rate increased to  0.0959 while the averaged employment rate in overall Canada is 0.0818.

Visualization for age groups for all provinces

In [None]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

#display(df_emp_basn)

df_emp_bas['Year']=df_emp_bas['YEARMMDD'].dt.year
df_emp_bas['Month']=df_emp_bas['YEARMMDD'].dt.month
df_emp_bas['Day']=df_emp_bas['YEARMMDD'].dt.day

df_emp_basn = pd.DataFrame({'GEO':df_emp_bas['GEO'],'Year':df_emp_bas['Year'],'Month':df_emp_bas['Month'],
                           'Day':df_emp_bas['Day'],'VALUE':df_emp_bas['VALUE'],'Sex':df_emp_bas['Sex'],
                           'Age_group':df_emp_bas['Age group'],
                           'Labour_for_cecharacteristics':df_emp_bas['Labour force characteristics']})
all_province = df_emp_basn[df_emp_basn.Labour_for_cecharacteristics.isin(['Unemployment rate'])]
all_province = all_province[all_province.Age_group.isin(['15 to 24 years','25 to 54 years','55 years and over'])]
all_province = all_province [all_province.GEO!='Canada']

#display(all_province)

all_province = all_province.groupby(['Year','Age_group','GEO'])['VALUE'].mean().reset_index()
#all_province = df_emp_basn
#display(all_province)

fig = go.Figure()
fig.add_traces(go.Violin(x=all_province['GEO'][all_province['Age_group']=='15 to 24 years'],
                        y=all_province['VALUE'][all_province['Age_group']=='15 to 24 years'],
                         legendgroup='15-24', scalegroup='15-24', name='15-24 years',line_color='blue'))
fig.add_traces(go.Violin(x=all_province['GEO'][all_province['Age_group']=='25 to 54 years'],
                        y=all_province['VALUE'][all_province['Age_group']=='25 to 54 years'],
                         legendgroup='25-54', scalegroup='25-54', name='25-54 years',line_color='red'))
fig.add_traces(go.Violin(x=all_province['GEO'][all_province['Age_group']=='55 years and over'],
                        y=all_province['VALUE'][all_province['Age_group']=='55 years and over'],
                         legendgroup='55 and up', scalegroup='55 and up', name='55 and up',line_color='orange'))
fig.update_traces(box_visible=True, meanline_visible=True)
fig.update_layout(violinmode='group')
fig.update_layout(violinmode='group',
                  title_text="Violion Plot for Canada from Year 2000 to Year 2020 by Ages",
                  yaxis_title='Unemployement rate(%)',
                  xaxis_title='Different Age Groups')
fig.show()

By reading from the plot, we can say for every Canada provinces, the highest unemployment rate is always occurs in 15-24years age group. Newfoundland and Labrador has the highest unemployment rate for each of the age groups among Canada. Alberta has the lowest unemployment rate. However, Alberta has higher value for outlinears compare to other provinces. This pattern also shows Alberta's unemployment rate is more flucatuated compares to other provinces.  

### 4. What is employment distribution for industries?

In [None]:
# global variables for figure size
wid = 1500
hei = 1000
mar = 300

In [None]:
#Data wrangling for industry data
df_emp_ind['YEARMMDD'] = pd.to_datetime(df_emp_ind['YEARMMDD'])
date = pd.Series(df_emp_ind['YEARMMDD'])
date = pd.to_datetime(date)

Y = date.dt.year
M = date.dt.month
D = date.dt.day

df_emp_ind1 = pd.DataFrame({'NAICS' : df_emp_ind['North American Industry Classification System (NAICS)'], 
                            'YEAR' : Y,'MONTH':M,'DAY':D, 'GEO':df_emp_ind['GEO'],'VALUE' : df_emp_ind['VALUE']})
df_emp_ind1 = df_emp_ind1.drop(['MONTH','DAY'], axis = 1)

df_emp_ind1 = df_emp_ind1.drop(df_emp_ind1.loc[df_emp_ind1["NAICS"] 
                                               == "Total employed, all industries"].index)# Data without Total employed,all industries
df_emp_ind_pr = df_emp_ind1.drop(df_emp_ind1.loc[df_emp_ind1["GEO"] 
                                                 == "Canada"].index)

df_emp_ind_ca = df_emp_ind1.drop(df_emp_ind1.loc[df_emp_ind1["GEO"] 
                                                 != "Canada"].index)# Canada data for all industries
df_emp_ind_ca = df_emp_ind_ca.drop(['GEO'],axis = 1)
df_emp_ind_ab = df_emp_ind1.drop(df_emp_ind1.loc[df_emp_ind1["GEO"] 
                                                 != "Alberta"].index)# Alberta data for all industries
df_emp_ind_ab = df_emp_ind_ab.drop(['GEO'],axis = 1)


# 2020 data for Alberta and Canada
df_emp_ind_ca2020 = df_emp_ind_ca.drop(df_emp_ind_ca[df_emp_ind_ca["YEAR"] != 2020].index)# Canada data for all industries in 2020
df_emp_ind_ab2020 = df_emp_ind_ab.drop(df_emp_ind_ab[df_emp_ind_ab["YEAR"] != 2020].index)# Alberta data for all industries in 2020
df_emp_ind_ca2020 = df_emp_ind_ca2020.drop(['YEAR'],axis = 1)
df_emp_ind_ab2020 = df_emp_ind_ab2020.drop(['YEAR'],axis = 1)
df_emp_ind_ab2020 = df_emp_ind_ab2020.groupby(['NAICS']).sum().reset_index()
df_emp_ind_ca2020 = df_emp_ind_ca2020.groupby(['NAICS']).sum().reset_index()
df_emp_ind_ca2020.sort_values(by=['VALUE'],ascending = False,inplace = True)
df_emp_ind_ab2020.sort_values(by=['VALUE'],ascending = False,inplace = True)

#Calculate percentage of each industry
df_emp_ind_ca2020['PERCENTAGE'] = df_emp_ind_ca2020['VALUE']/df_emp_ind_ca2020['VALUE'].sum()
df_emp_ind_ab2020['PERCENTAGE'] = df_emp_ind_ab2020['VALUE']/df_emp_ind_ab2020['VALUE'].sum()


# Plot industry distribution of Canada and Alberta
df_emp_ind_ca2020B = go.Bar(y = df_emp_ind_ca2020['NAICS'], x = df_emp_ind_ca2020['PERCENTAGE'],
                            marker_color='indianred',  name  = 'Canada',orientation = 'h')
df_emp_ind_ab2020B = go.Bar(y = df_emp_ind_ab2020['NAICS'], x = df_emp_ind_ab2020['PERCENTAGE'],
                            marker_color='lightsalmon',  name  = 'Alberta',orientation = 'h')

layout18 = go.Layout( barmode ='group',title = 'Distribution of Industry in Canada and Alberta',
                     yaxis = dict(title = 'Industry'),xaxis = dict(title = 'Percentage of Industry'), 
                      width= wid, height = hei, margin=go.Margin(l=400,r=mar,b=mar,t=mar, pad=4))
df_emp_ind_ca_ab2020 = [df_emp_ind_ca2020B,df_emp_ind_ab2020B]

fig18 = go.Figure(data = df_emp_ind_ca_ab2020, layout = layout18)
py.iplot(fig18)


From the graph plotted above we can conclude that Good-producing sector, Construction and Forestry,fishing, mining, quarrying, oil and gas have larger portion in Alberta than Canada. It can also be noticed that the rank of industry for Canada. As we hover over the bars we see the percentage of industry in 2020.

In [None]:
df_emp_ind_ca2020pie = df_emp_ind_ca2020.reset_index()
df_emp_ind_ab2020pie =df_emp_ind_ab2020.reset_index()
fig = make_subplots(rows=1,cols=2,specs=[[{'type':'domain'}, {'type':'domain'}]])
fig.add_trace(go.Pie(labels=df_emp_ind_ca2020pie['NAICS'],values=df_emp_ind_ca2020pie['PERCENTAGE'], name="Canada Employment"),
              1, 1)
fig.add_trace(go.Pie(labels=df_emp_ind_ab2020pie['NAICS'], values=df_emp_ind_ab2020pie['PERCENTAGE'], name="Alberta Employment"),
              1, 2)
fig.update_traces(hole=.4, hoverinfo="label+percent+name")
fig.update_layout(
    title_text="Canada and Alberta Employment",
    
    annotations=[dict(text='Canada', x=0.18, y=0.5, font_size=20, showarrow=False),
                 dict(text='Alberta', x=0.82, y=0.5, font_size=20, showarrow=False)])
fig.show()

From the charts above, we can tell the structure of Alberta and Canada is similar. Except that Alberta has larger portion on construction and forestry, fishing, mining, quarrying oil and gas sectors.Whereas manufacturing has a lower rank.

To take a overall view of industrial employment contribution for all the provinces in Canada. Stacked bar chart is applied.

In [None]:
df_emp_ind12020 = df_emp_ind1.drop(df_emp_ind1.loc[df_emp_ind1["YEAR"] != 2020].index)
df_emp_ind12020 = df_emp_ind12020.drop(['YEAR'],axis = 1)
df_emp_ind12020 = df_emp_ind12020.groupby(['GEO','NAICS']).sum().reset_index()
#display(df_emp_ind12020)
df_emp_ind12020_others = df_emp_ind12020 
df_emp_ind12020_others['NAICS'].loc[df_emp_ind12020_others['NAICS'].isin(['Utilities [22]','Wholesale and retail trade [41, 44-45]',
                                                                      'Transportation and warehousing [48-49]','Other services (except public administration) [81]',
                                                                      'Professional, scientific and technical services [54]',
                                                                         'Information, culture and recreation [51, 71]','Accommodation and food services [72]'])]='OTHER'



df_emp_ind12020_others = df_emp_ind12020_others.groupby(['GEO','NAICS']).sum().reset_index()
df_emp_ind12020_othersT = df_emp_ind12020_others.groupby(['GEO']).sum().reset_index()

df_emp_ind12020_othersT = df_emp_ind12020_othersT.rename(columns={"VALUE":"Total"})
df_emp_indT1 = pd.merge(df_emp_ind12020_others, df_emp_ind12020_othersT,
              on=['GEO'], how='inner')
df_emp_indT = pd.DataFrame(df_emp_indT1)

df_emp_indT['PERCENTAGE'] = df_emp_indT['VALUE']/df_emp_indT['Total']
df_emp_indT=df_emp_indT.sort_values(by=['PERCENTAGE'], ascending=False)
df_emp_indT2= px.bar(df_emp_indT, y = df_emp_indT['PERCENTAGE'], x = df_emp_indT['GEO'], color = 'NAICS')
layout19=go.Layout(barmode='stack', title='Percent of Employment by Source in Canada and Provinces (2020)',\
                 xaxis=dict(title='Region',titlefont=dict(size=15)), yaxis=dict(title='Employment (%)',titlefont=dict(size=15)))
fig19 = go.Figure(data=df_emp_indT2, layout=layout19)
py.iplot(fig19) #shows the figure

From this percent bar graph we notice that the percent of industry varies from province to province. The trends for the provinces follow the trend observed for Canada. This is because of the availability of the sources in every province. For example, British Columbia(BC), Manitoba(MB), Quebec(QC), and Newfoundland(NL) use Hydro to generate their electricity because these are abundant in water. Similarly Alberta(AB), Saskatchewan(SK) and Novascotia(NS) use Coal for their electricity generations as most of Canada's cola mines are located in these three provinces. It can also be observed that Prince Edward Island(PE) generates most of its electricity using Wind. Additionally Nunavut(NT) uses Oil and Diesel as it has no significant primary energy production and relies on imported Oil for its energy needs.

In order to take a closer look for each industry, interact is used as follows.

In [None]:
def fun_show_unemp_ind_v(x):
    df_emp_ind12020 = df_emp_ind1.drop(df_emp_ind1.loc[df_emp_ind1["YEAR"] != 2020].index)
    df_emp_ind12020 = df_emp_ind12020.drop(['YEAR'],axis = 1)
    df_emp_ind12020 = df_emp_ind12020.groupby(['GEO','NAICS']).sum()
    df_emp_ind120201 = df_emp_ind12020['VALUE'].groupby(['GEO']).sum().reset_index()
    df_emp_ind120201 = df_emp_ind120201.rename(columns={"VALUE":"Total"})
    df_emp_ind12020 = df_emp_ind12020.reset_index()
    df_emp_indT = pd.merge(df_emp_ind12020, df_emp_ind120201,
             on=['GEO'], how='inner')
    df_emp_indT = pd.DataFrame(df_emp_indT)
    df_emp_ind_data = df_emp_indT
    df_emp_indT = df_emp_indT[df_emp_indT['NAICS']==x].reset_index()
    df_emp_indT['PERCENTAGE'] = df_emp_indT['VALUE']/df_emp_indT['Total']
    industry = df_emp_indT['NAICS'].unique()
    
#display(df_emp_ind120201)

    df_emp_indT= px.bar(df_emp_indT, y = df_emp_indT['PERCENTAGE'], x = df_emp_indT['GEO'])
    layout19=go.Layout( title='Percent of Employment by Source in Canada and Provinces (2020)',
                 xaxis=dict(title='Region',titlefont=dict(size=15)), yaxis=dict(title='Employment (%)',titlefont=dict(size=15)))
    fig19 = go.Figure(data=df_emp_indT, layout=layout19)
    fig19.update_yaxes(range=[0, 0.5])
    py.iplot(fig19) 
    
ind_list = df_emp_indT['NAICS'].unique()    
interact(fun_show_unemp_ind_v,x=widgets.Dropdown(options=ind_list,value='Services-producing sector', description='Industries'))

With the distribution of employment for 2020, the line chart is used to display the trend for each industry along with years.

In [None]:
def dis_top_n_ind(TopN=10):
    df_emp_indca_ = df_emp_ind.drop(df_emp_ind.loc[df_emp_ind["North American Industry Classification System (NAICS)"] 
                                                   == "Total employed, all industries"].index)
    df_emp_indca_ = df_emp_indca_.drop(df_emp_indca_.loc[df_emp_ind1["GEO"] == "Canada"].index)
    df_emp_ind_ca_10 = df_emp_indca_.groupby(['North American Industry Classification System (NAICS)','YEARMMDD']).sum()
    df_emp_ind_ca_10 = df_emp_ind_ca_10['VALUE'].groupby(['North American Industry Classification System (NAICS)',
                                                          'YEARMMDD']).sum().reset_index()
    df_emp_ind_ca_10 = df_emp_ind_ca_10.groupby('YEARMMDD').apply(
        lambda x: x.nlargest(TopN, 'VALUE')).reset_index(drop=True) 

    #Plot the trend
    data=[]
    #add source names in a list
    l=df_emp_ind_ca_10['North American Industry Classification System (NAICS)'].unique()

    for i in l:
        source=df_emp_ind_ca_10 [(df_emp_ind_ca_10['North American Industry Classification System (NAICS)']==i)]
        source.reset_index(level=0, inplace=True) 
        sourcegen = go.Scatter(x=source['YEARMMDD'], y = source['VALUE'], name=i, mode = 'lines', hoverinfo ='y+name')
        data.append(sourcegen)
    #sets the layout for the plot 
    layout = dict(title = 'Employmnet from 2000-2020 in Canada', xaxis = dict(title = 'Year'),
                  yaxis = dict(title = 'Employment', hoverformat = '.2f'))
    fig = dict(data=data, layout=layout)
    py.iplot(fig) 





interact(dis_top_n_ind, x=widgets.IntSlider(min=3, max=20, step=1, value=10));


From the graph, we notice that most of industry staying at the same employment level. Services-producing sector grows rapidly over the year. Besides, at the beginning of 2020, all the industry employments drop down dramatically. Also, in the year of 2008, all the industry experienced either a small decrease or stays the same since the influence of global financial crisis. Further more, manufacturing is the only industry that has a downward trend. This means that the employment of manufacturing is going down in Canada.

In order to have a better understanding of employment in terms of different province. The employment growth rate from year 2000 to year 2019 is calculated. 

In [None]:
df_emp_ind12000 = df_emp_ind1[df_emp_ind1['YEAR']==2000]
df_emp_ind12000 = df_emp_ind12000.set_index('GEO')
df_emp_ind12000= df_emp_ind12000.drop(['YEAR','NAICS'],axis =1)
df_emp_ind12000 =df_emp_ind12000.groupby(['GEO'],sort =False).sum()
df_emp_ind12000 = df_emp_ind12000.rename(columns={'VALUE':'2000'})
df_emp_ind12000.reset_index(level=0,inplace=True)

df_emp_ind12019 = df_emp_ind1[df_emp_ind1['YEAR']==2019]
df_emp_ind12019 = df_emp_ind12019.set_index('GEO')
df_emp_ind12019= df_emp_ind12019.drop(['YEAR','NAICS'],axis =1)
df_emp_ind12019 =df_emp_ind12019.groupby(['GEO'],sort=False).sum()
df_emp_ind12019 = df_emp_ind12019.rename(columns={'VALUE':'2019'})
df_emp_ind12019.reset_index(level=0,inplace=True)
#df_emp_ind12019=df_emp_ind12019.drop(['GEO'],axis=1)

#df_emp_ind1c = pd.concat([df_emp_ind12000,df_emp_ind12019], axis=1)
df_emp_ind1c = pd.concat([df_emp_ind12000,df_emp_ind12019], axis=1)

df_emp_ind1c = df_emp_ind12019.merge(df_emp_ind12000, on=['GEO'],how='outer')

df_emp_ind1c.reset_index(level=0,inplace=True)

l2=[]
for i in df_emp_ind1c['GEO']:
    percent = ((df_emp_ind1c['2019']-df_emp_ind1c['2000'])/df_emp_ind1c['2000'])*100
    l2.append(percent)
df_emp_ind1c['Percent Increase'] = l2[1] 



In [None]:
percentage = go.Bar(x=df_emp_ind1c['GEO'], y = df_emp_ind1c['Percent Increase'], name='Percent Increase',marker_color ='blue')
data=[percentage]
#sets the layout for the plot
layout=go.Layout(barmode='group', title='Percent Increase in Employment from 2000 to 2019 for Canada and Provinces',
    xaxis=dict(title='Region', titlefont=dict(size=13)), yaxis=dict(title='Percent Increase (%)', titlefont=dict(size=13)))
fig = go.Figure(data=data, layout=layout)
py.iplot(fig) #shows the plot

From the graph shown as above, we can tell that all the provinces in Canada have certain growth. Alberta has the greater growth rate of employment which is about 47.84%. After that, is BC, Ontario and Quebec which has growth rate of 32.44%, 28.10% and 27.60% New Brunswick and Nova Scotia on the other hand, has the least growth rate  which is 7.50% and 13.27%. Alberta and BC is the only provinces that have growth rate over Canada. Hence we can say Alberta and BC contributes the most in employment growth.

In [None]:
df_emp_ind12000I = df_emp_ind1[df_emp_ind1['YEAR']==2000]
df_emp_ind12000I = df_emp_ind12000I.set_index('NAICS')
df_emp_ind12000I= df_emp_ind12000I.drop(['YEAR','GEO'],axis =1)
df_emp_ind12000I =df_emp_ind12000I.groupby(['NAICS'],sort =False).sum()
df_emp_ind12000I = df_emp_ind12000I.rename(columns={'VALUE':'2000'})
df_emp_ind12000I.reset_index(level=0,inplace=True)

df_emp_ind12019I = df_emp_ind1[df_emp_ind1['YEAR']==2019]
df_emp_ind12019I = df_emp_ind12019I.set_index('NAICS')
df_emp_ind12019I= df_emp_ind12019I.drop(['YEAR','GEO'],axis =1)
df_emp_ind12019I =df_emp_ind12019I.groupby(['NAICS'],sort=False).sum()
df_emp_ind12019I = df_emp_ind12019I.rename(columns={'VALUE':'2019'})
df_emp_ind12019I.reset_index(level=0,inplace=True)
#df_emp_ind12019=df_emp_ind12019.drop(['GEO'],axis=1)

#df_emp_ind1c = pd.concat([df_emp_ind12000,df_emp_ind12019], axis=1)
df_emp_ind1cI = pd.concat([df_emp_ind12000I,df_emp_ind12019I], axis=1)

df_emp_ind1cI = df_emp_ind12019I.merge(df_emp_ind12000I, on=['NAICS'],how='outer')

df_emp_ind1cI.reset_index(level=0,inplace=True)

l2=[]
for i in df_emp_ind1cI['NAICS']:
    percent = ((df_emp_ind1cI['2019']-df_emp_ind1cI['2000'])/df_emp_ind1cI['2000'])*100
    l2.append(percent)
df_emp_ind1cI['Percent Increase'] = l2[1] 

In [None]:
percentage = go.Bar(x=df_emp_ind1cI['NAICS'], y = df_emp_ind1cI['Percent Increase'], name='Percent Increase',marker_color ='green')
data=[percentage]
#sets the layout for the plot
layout=go.Layout(barmode='group', title='Percent Increase in Employment from 2000 to 2019 for Industries',
    xaxis=dict(title='Industries', titlefont=dict(size=13)), yaxis=dict(title='Percent Increase (%)', titlefont=dict(size=13)),margin=dict(
        l=50,
        r=50,
        b=100,
        t=100,
        pad=4
    ), autosize=False,
    width=1500,
    height=1000)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig) #shows the plot

Agriculture and Manufacturing have negative increase rate. While construction have the greatest increasing rate. Professional, scientific and technical services and health care and social assistance have growth rate over 50%.

According to the trend line at the first part. There is a dropdown from the 2015 to 2016. Hence the employment growth rate in terms of provincs are calculated. 

In [None]:
df_emp_ind12015 = df_emp_ind1[df_emp_ind1['YEAR']==2015]
df_emp_ind12015 = df_emp_ind12015.set_index('GEO')
df_emp_ind12015= df_emp_ind12015.drop(['YEAR','NAICS'],axis =1)
df_emp_ind12015 =df_emp_ind12015.groupby(['GEO'],sort =False).sum()
df_emp_ind12015 = df_emp_ind12015.rename(columns={'VALUE':'2015'})
df_emp_ind12015.reset_index(level=0,inplace=True)

df_emp_ind12016 = df_emp_ind1[df_emp_ind1['YEAR']==2016]
df_emp_ind12016 = df_emp_ind12016.set_index('GEO')
df_emp_ind12016= df_emp_ind12016.drop(['YEAR','NAICS'],axis =1)
df_emp_ind12016 =df_emp_ind12016.groupby(['GEO'],sort=False).sum()
df_emp_ind12016 = df_emp_ind12016.rename(columns={'VALUE':'2016'})
df_emp_ind12016.reset_index(level=0,inplace=True)
#df_emp_ind12019=df_emp_ind12019.drop(['GEO'],axis=1)

#df_emp_ind1c = pd.concat([df_emp_ind12000,df_emp_ind12019], axis=1)
df_emp_ind1c_14 = pd.concat([df_emp_ind12015,df_emp_ind12016], axis=1)

df_emp_ind1c_14 = df_emp_ind12016.merge(df_emp_ind12015, on=['GEO'],how='outer')

df_emp_ind1c_14.reset_index(level=0,inplace=True)

l2=[]
for i in df_emp_ind1c_14['GEO']:
    percent = ((df_emp_ind1c_14['2016']-df_emp_ind1c_14['2015'])/df_emp_ind1c_14['2015'])*100
    l2.append(percent)
df_emp_ind1c_14['Percent Increase'] = l2[1] 



In [None]:
percentage = go.Bar(x=df_emp_ind1c_14['GEO'], y = df_emp_ind1c_14['Percent Increase'], name='Percent Increase',marker_color ='blue')
data14=[percentage]
#sets the layout for the plot
layout=go.Layout(barmode='group', title='Percent Increase in Employment from 2015 to 2016 for Canada and Provinces',
    xaxis=dict(title='Region', titlefont=dict(size=13)), yaxis=dict(title='Percent Increase (%)', titlefont=dict(size=13)))
fig1 = go.Figure(data=data14, layout=layout)
py.iplot(fig1) #shows the plot

We can tell that Canada experienced rough years. 7 provinces were negetive increase. Price Edward Island is the worst. British Columbia performed the best. Even so there is only less than 3 percent increase on employment. To furthur verify what happened on Alberta during those years. Employment increase rate in terms of industries are visulized.

In [None]:
df_emp_ind12015I = df_emp_ind1[df_emp_ind1['YEAR']==2015]
df_emp_ind12015I = df_emp_ind12015I[df_emp_ind12015I['GEO']=='Alberta']
df_emp_ind12015I = df_emp_ind12015I.set_index('NAICS')
df_emp_ind12015I= df_emp_ind12015I.drop(['YEAR','GEO'],axis =1)
df_emp_ind12015I =df_emp_ind12015I.groupby(['NAICS'],sort =False).sum()
df_emp_ind12015I = df_emp_ind12015I.rename(columns={'VALUE':'2015'})
df_emp_ind12015I.reset_index(level=0,inplace=True)

df_emp_ind12016I = df_emp_ind1[df_emp_ind1['YEAR']==2016]
df_emp_ind12016I = df_emp_ind12016I[df_emp_ind12016I['GEO']=='Alberta']

df_emp_ind12016I = df_emp_ind12016I.set_index('NAICS')
df_emp_ind12016I= df_emp_ind12016I.drop(['YEAR','GEO'],axis =1)
df_emp_ind12016I =df_emp_ind12016I.groupby(['NAICS'],sort=False).sum()
df_emp_ind12016I = df_emp_ind12016I.rename(columns={'VALUE':'2016'})
df_emp_ind12016I.reset_index(level=0,inplace=True)
#df_emp_ind12019=df_emp_ind12019.drop(['GEO'],axis=1)

#df_emp_ind1c = pd.concat([df_emp_ind12000,df_emp_ind12019], axis=1)
df_emp_ind1cI15 = pd.concat([df_emp_ind12015I,df_emp_ind12016I], axis=1)

df_emp_ind1cI15 = df_emp_ind12016I.merge(df_emp_ind12015I, on=['NAICS'],how='outer')

df_emp_ind1cI15.reset_index(level=0,inplace=True)

l3=[]
for i in df_emp_ind1cI15['NAICS']:
    percent = ((df_emp_ind1cI15['2016']-df_emp_ind1cI15['2015'])/df_emp_ind1cI15['2015'])*100
    l3.append(percent)
df_emp_ind1cI15['Percent Increase'] = l3[1] 

In [None]:
percentage = go.Bar(x=df_emp_ind1cI15['NAICS'], y = df_emp_ind1cI15['Percent Increase'], name='Percent Increase',marker_color ='green')
data3=[percentage]
#sets the layout for the plot
layout=go.Layout(barmode='group', title='Percent Increase in Employment from 2015 to 2016 for Industries in Alberta',
    xaxis=dict(title='Industries', titlefont=dict(size=13)), yaxis=dict(title='Percent Increase (%)', titlefont=dict(size=13)),margin=dict(
        l=50,
        r=50,
        b=100,
        t=100,
        pad=4
    ), autosize=False,
    width=1500,
    height=1000)
fig = go.Figure(data=data3, layout=layout)
py.iplot(fig) #shows the plot

From the graph, we can tell agriculture, forestry,fishing, mining,quarrying,oil and gas and manufacturing decreased over 10%. Educational services increased about 8.7%. The results are consistent with unemployment rate. Oil price crashed casue unemployment rate increasing rapidly. 

Now we know for the past 19 years, the employments have a positive increase. But what happened after Covid-19 explosion in Canada? We know that there are dramatic decrease. But how bad is that?

In [None]:
df_emp_ind1_2020p = df_emp_ind1[df_emp_ind1['YEAR']==2020]
df_emp_ind1_2020p = df_emp_ind1_2020p.set_index('GEO')
df_emp_ind1_2020p= df_emp_ind1_2020p.drop(['YEAR','NAICS'],axis =1)
df_emp_ind1_2020p =df_emp_ind1_2020p.groupby(['GEO']).sum()
df_emp_ind1_2020p = df_emp_ind1_2020p.rename(columns={'VALUE':'2020'})
df_emp_ind1_2020p.reset_index(level=0,inplace=True)


df_emp_ind1_2019P = df_emp_ind1[df_emp_ind1['YEAR']==2019]
df_emp_ind1_2019P = df_emp_ind1_2019P.set_index('GEO')
df_emp_ind1_2019P= df_emp_ind1_2019P.drop(['YEAR','NAICS'],axis =1)
df_emp_ind1_2019P =df_emp_ind1_2019P.groupby(['GEO']).sum()
df_emp_ind1_2019P = df_emp_ind1_2019P.rename(columns={'VALUE':'2019'})
df_emp_ind1_2019P.reset_index(level=0,inplace=True)
df_emp_ind1_2019P = df_emp_ind1_2019P.drop(['GEO'],axis =1)


df_emp_ind1_cp = pd.concat([df_emp_ind1_2020p,df_emp_ind1_2019P], axis=1)
df_emp_ind1_cp.reset_index(level=0,inplace=True)

l2=[]
for i in df_emp_ind1_cp['GEO']:
    percent = ((df_emp_ind1_cp['2020']-df_emp_ind1_cp['2019'])/df_emp_ind1_cp['2019'])*100
    l2.append(percent)
df_emp_ind1_cp['Percent Increase'] = l2[1] 

df_emp_ind1_cp = df_emp_ind1_cp.sort_values(by = ['Percent Increase'])

In [None]:
GEOpercentage = go.Bar(y=df_emp_ind1_cp['GEO'], x = df_emp_ind1_cp['Percent Increase'], name='Percent Increase',orientation = 'h',marker_color ='blue')
data1=[GEOpercentage]
#sets the layout for the plot
layout=go.Layout(barmode='group', title='Percent Increase in Employment from 2019 to 2020 for Canada and Provinces',
    xaxis=dict(title='Percent Increase (%)', titlefont=dict(size=13)), yaxis=dict(title='Provinces', titlefont=dict(size=13)))
fig = go.Figure(data=data1, layout=layout)
py.iplot(fig) #shows the plot

For all the provinces and Canada, the decrease for employments are all above 35%. Alberta and BC have the greatest decrease rate.

In [None]:
df_emp_ind1_2020 = df_emp_ind1[df_emp_ind1['YEAR']==2020]
df_emp_ind1_2020 = df_emp_ind1_2020.set_index('NAICS')
df_emp_ind1_2020= df_emp_ind1_2020.drop(['YEAR','GEO'],axis =1)
df_emp_ind1_2020 =df_emp_ind1_2020.groupby(['NAICS']).sum()
df_emp_ind1_2020 = df_emp_ind1_2020.rename(columns={'VALUE':'2020'})
df_emp_ind1_2020.reset_index(level=0,inplace=True)


df_emp_ind1_2019 = df_emp_ind1[df_emp_ind1['YEAR']==2019]
df_emp_ind1_2019 = df_emp_ind1_2019.set_index('NAICS')
df_emp_ind1_2019= df_emp_ind1_2019.drop(['YEAR','GEO'],axis =1)
df_emp_ind1_2019 =df_emp_ind1_2019.groupby(['NAICS']).sum()
df_emp_ind1_2019 = df_emp_ind1_2019.rename(columns={'VALUE':'2019'})
df_emp_ind1_2019.reset_index(level=0,inplace=True)
df_emp_ind1_2019 = df_emp_ind1_2019.drop(['NAICS'],axis =1)


df_emp_ind1_c = pd.concat([df_emp_ind1_2020,df_emp_ind1_2019], axis=1)
df_emp_ind1_c.reset_index(level=0,inplace=True)

l2=[]
for i in df_emp_ind1_c['NAICS']:
    percent = ((df_emp_ind1_c['2020']-df_emp_ind1_c['2019'])/df_emp_ind1_c['2019'])*100
    l2.append(percent)
df_emp_ind1_c['Percent Increase'] = l2[1] 

df_emp_ind1_c = df_emp_ind1_c.sort_values(by = ['Percent Increase'])


In [None]:
NAICSpercentage = go.Bar(y=df_emp_ind1_c['NAICS'], x = df_emp_ind1_c['Percent Increase'], 
                         name='Percent Increase',orientation = 'h',marker_color ='green')
data1=[NAICSpercentage]
#sets the layout for the plot
layout=go.Layout(barmode='group', title='Percent Increase in Employment from 2019 to 2020 for Canada and Provinces',
    xaxis=dict(title='Region', titlefont=dict(size=13)), 
                 yaxis=dict(title='Percent Increase (%)', titlefont=dict(size=13)))
fig = go.Figure(data=data1, layout=layout)
py.iplot(fig) #shows the plot

From the graph, we can easily noticed that the increase is negative for all the indusries. Accommodation and food services, Information,culture and recreation, Business, building and other support services, Transportation and warehousing and Forestry, fishing, mining, quarrying, oil and gas have the greatest decrease.

## Conclusions and Suggestion

1. Alberta and Canada have similar trends before year 2015. After 2015, the unemployment rate of Alberta increases dramaticly and surpass Canada on 2016.

2. Females in Canada will have lower unemployment rate compares to males. The averaged unemployment rate for females and male in Canada is 0.065 and 0.0745 respectively. Alberta follows same pattern as Canada, the unemployment rate for female and male is 0.053 and 0.057 respectively. Moreover, the unemployment rate difference caused by gender will be more significant in Newfoundland. This could be because manufacturing industry decreased over the time whiel health care and social assistance services increases.

3. Youth group in Canada is facing higher unemployment rate compare to adults. The unemployment rate for youth is significantly higher. The averaged unemployment rate for youth(15 to 24 years) is about 0.132. For mid-age and old-age people, the overall unemployment rate is 0.059 and 0.06 respectively. Alberta used to have lower unemployment rate compare to the average level in Canada. However, after facing the downturn of the oil and gas industry, the unemployment rate raised and exceed the unemployment rate average in Canada. Among all the provinces, Newfoundland has the highest unemployment rate for all age groups.


4. Even though Canada experienced rough years on 2015&2016. All the provinces has a steady employment growth rate for the past 20 year. Manufacturing industry declines with the year. However, with the influence of covid-19 explosion, the employment of all the industries dramatically declines. The employment of Alberta are affected by oil and gas industry. However, the explosion of COVID-19 have much deeper influence for Alberta and Canada. All the industries are having negetive increae. Accomodatin and food service decrease nearly 50%.

5. Due to the limitaion of the datasets, there is no distribution of gender and age group for each industry. More data should be investigated for exploring if certain gender or age group has higher or lower unemployment rate in each industry.

## References

- ADAM HAYES(2020),Bureau of Labor Statistics (BLS),Reviewed by Peter Westfall
- Roger S.McIntyre,Yena Lee.(2020), $Projected increases in suicide in Canada as a consequence of COVID-19. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7236718/pdf/main.pdf
- Canadian Public Health Associtaion.(2006) Unemployment,Mental Health, and Substance Use. 
https://economicdashboard.alberta.ca/Unemployment
- Government of Alberta.(2020). Unemployment Rate.https://economicdashboard.alberta.ca/Unemployment.
- Statistic Canada.(2020) Labour Force Survey(LFS). https://www23.statcan.gc.ca/imdb/p2SV.pl?Function=getSurvey&SDDS=3701
- Statistic Canada.(2020) Employment by industry, monthly, seasonally adjusted and unadjusted, and trend-cycle.https://www150.statcan.gc.ca/t1/tbl1/en/cv.action?pid=1410002301


## Acknowledgements

- We would like to thank Professor Usman Alim for the kindly guide and expert advice through out this difficult project, as well as TA Mike Mireku Kwakye’s encouragement.
- I would also like to thanks my colleagues for their wonderfull collaboration. Bowen finished the industry comparison. Jialing contributed to  the sex and age group analysis. Gerry helps to finish the overall unemployment rate trend in Canada, heat map as well as the interactive part.  