# Does Money Buy Happiness?
   ##### An explorative study of the impact of total average income on the quality of life in the elderly
Group Members:

Zakaria Hassen 150506690

Tiffany Chan   150841540


## Abstract

It is without a doubt that we live in a society that highly values materialism, as it leads to elevated social status and lifestyle, but is this truly an indicator of happiness? Can money really buy happiness? In this project, we aimed to answer this very question by analyzing the correlation between levels of income and the state of mental health among Canadians. We defined four pillars of mental health that clearly define the level of happiness, which consist of :

* **Very good or excellent self-rated mental health**
* **Health care needs associated with mental health problems, met**
* **Major depressive episode, measured criteria not met**
* **Persons who do not consider themselves a gambler**

The sample population chosen to be observed were bound by age ($65$ years and up), year $(2002)$ and location (sorted by region). These criteria for positive mental health were cross referenced with the level of income achieved by Canadians. This yielded interesting results. For example, it was found that women are more prone to major depressive episodes, however, lower income earning men, are not. Furthermore, the self-rated mental health varied among lower income segments. Mental healthcare was found to be relatively accessible to Canadians of all income brackets, minus a couple outliers.  

Additionally, unhealthy mental health mechanisms were found to be adapted by those falling into moderate to high income brackets. When observing the four pillars as a whole, it was observed that women as a whole across provinces represented more of those who identified with the positive mental health indicators despite earning significantly lower levels of income in comparison to men.

In conclusion, there did not appear to be a strong correlation between one's income and their happiness across both sexes. However, we concluded that at a certain income level it may be a contributing factor to lower mental health indicators. 


## Section 1: Introduction and Motivation

In settling upon a research question we asked ourselves the age old question, does money buy happiness? It is certain that money brings with it the ability to sustain a secure lifestyle, but does this equate to the general well-being of a person? The majority of adults were often told in their youth that their happiness could not be bought with money. And although being a subjective question in nature our team postulates that an answer can be arrived at through the analysis of relevant data sets.

The very nature of the question ecompasses many different factors to consider, including age, region, and individual characteristics. All of the above are significant determinants of income and happiness. Therefore, in order to be able to isolate these factors, the demographic for this study encompasses elderly citizens, particularly those aged 65 and up. Another reasoning for this is that we are under the assumption that the elderly are relatively stagnant in their walks of life. Many of them are quite established in their lifestyles and attitudes in regards to wellness are quite stabilized. Therefore, the assumption is made that the trends among the elderly will be evident and consistent throughout the research period.

Furthermore, in order to reduce the number of environmental factors affecting individuals, the income/happiness indexes will be observed regionally, meaning that the average incomes produced per region will be contrasted with general attitudes towards happiness and mental health from the same region. Through this reduction by grouping regionally and by age, we are able to closely observe the fascinating ties between money and happiness.


## Section 2: Data Set Description, Statistics, and Processing

### 2.1 Income Data Set

After formulating a question to guide the study, it then became a matter of finding the right data sets. The open government movement across the globe has greatly eased the search for quality data. First, in searching for relevant data sets we took to exploring the [official Canadian open data site](https://open.canada.ca/en/open-data). 

It was our goal during the information search to ensure that the data sets had some of the same key variables in common, namely region, sex, age and the year in which the study was conducted. 
 
The first relevant data set we settled on was the [Income of individuals, by sex, age group and income source data set](https://www150.statcan.gc.ca/n1/tbl/csv/11100159-eng.zip). From which we choose to focus on the elderly 65 years of age and older from all of the major provinces in canada.

Thus, we first loaded the dataset into a dataframe for manipulation, filtered for the needed year, then the necessary age group, average income of recipients, their income source, and lastly we filtered for so that locales were all of the major provinces. Our last step in the filtering process was to filter twice once resulting in a dataframe which included data for both males and females and then another with data for both sexes.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gdp
import numpy as np
%matplotlib inline

# Load the data into a dataframe
df = pd.read_csv('11100159.csv')

# Filter for the needed year
needed_year = df["REF_DATE"] == 2002
dy = df[needed_year]

# Filter for the necessary age group
sixtyFive = dy["Age group"] == "65 years and over"
dy = dy[sixtyFive]

# Filter for Average income of recipients
incomeRecipient = dy["Income recipient"] == 'Average income of recipients'
dy = dy[incomeRecipient]

# Filter for total income as the income source
incomeSource = dy['Income source'] == "Total income"
dy = dy[incomeSource]

# Filter for all the major provinces
provinces =["Newfoundland and Labrador","Prince Edward Island","Nova Scotia","New Brunswick","Quebec","Ontario","Manitoba","Saskatchewan","Alberta","British Columbia"]
geo = dy[dy.GEO.isin(provinces)]
dy = geo

# Create two DF's one with Male and Female distinction and one with data for both sexes
sexMask = (dy['Sex'] != 'Both sexes')
dz = dy
bothSexes = dz['Sex'] == 'Both sexes'
dataForChloropleth = dz[bothSexes]
finalIncome = dy[sexMask]

# Drop the unnecessary columns 
finalIncome = finalIncome.drop(["DGUID","UOM","UOM_ID","SCALAR_FACTOR","SCALAR_ID","VECTOR","COORDINATE","TERMINATED","DECIMALS","STATUS","SYMBOL"], axis=1)
dataForChloropleth = dataForChloropleth.drop(["DGUID","UOM","UOM_ID","SCALAR_FACTOR","SCALAR_ID","VECTOR","COORDINATE","TERMINATED","DECIMALS","STATUS","SYMBOL"], axis=1)
finalIncome['key'] = finalIncome["GEO"] + finalIncome["Sex"]
dataForChloropleth['key'] = dataForChloropleth["GEO"] + dataForChloropleth["Sex"]
finalIncome


### 2.2: The Mental Health Data Set

The second relevant data set we settled upon was [mental health indicators data set](https://www150.statcan.gc.ca/n1/tbl/csv/13100465-eng.zip). This data set included numbers and percentages for a whole host of mental health indicators from drug abuse statistics to perceived mental health and barriers to accessing mental health services. 

Then in a process similar to the previous data set we set about loading the data into a data frame. And then filtering the data so that it contained the necessary fields for comparison. This included filtering for the necessary year (2002), the necessary age grouping (65 years of age and older), the data characteristic type which we wanted to be percentages of the population. 

I.e. the percentage of males in Ontario that felt their health care needs were met. We then went on to filter by sex so that the data included both male and female data for each of the locales. And lastly we filtered the data such that only provincial data remained and dropped the unnecessary columns for data analysis. 

In [None]:
# Load the data into a dataframe
dfProfile = pd.read_csv('MentalHealthCAD.csv')

# Filter for the necessary year
needed_year = dfProfile["REF_DATE"] == 2002
dfProfile = dfProfile[needed_year]

# Filter for the necessary age group 
sixtyFive = dfProfile["Age group"] == "65 years and over"
dfProfile = dfProfile[sixtyFive]

# Filter the data characteritic type to be in percentages
incomeSource = dfProfile['Characteristics'] == "Percent"
dfProfile = dfProfile[incomeSource]

# Filter so that the data includes both males and females and not data for both sexes
sexMask = (dfProfile['Sex'] != 'Both sexes')
bothMask = (dfProfile['Sex'] == 'Both sexes')
dfProfileCopy = dfProfile
dfProfile = dfProfile[sexMask]
dfProfileCopy = dfProfileCopy[bothMask]

# Filter so that the data only contains provincial data and not nationwide data
provincesOnly = (dfProfile['GEO'] != 'Canada')
dfProfile = dfProfile[provincesOnly]
dfSecondChloropleth = dfProfileCopy[dfProfileCopy['GEO'] != 'Canada']

# Drop the unnecessary columns
dfProfile = dfProfile.drop(["DGUID","UOM","UOM_ID","SCALAR_FACTOR","SCALAR_ID","VECTOR","COORDINATE","TERMINATED","DECIMALS","STATUS","SYMBOL"], axis=1)
dfSecondChloropleth = dfSecondChloropleth.drop(["DGUID","UOM","UOM_ID","SCALAR_FACTOR","SCALAR_ID","VECTOR","COORDINATE","TERMINATED","DECIMALS","STATUS","SYMBOL"], axis=1)
dfProfile['key'] = dfProfile["GEO"] + dfProfile["Sex"]

dfProfile

### 2.3: Overall Mental Health Indicators by Percentage of Sex

The following table is to highlight the differences in Mental Health Indicators between sexes. The decision to create this table was rooted in the creation of Figure 3.2, where we observed the differences in income between the men and women. Given all of the people surveyed in the Mental Health Indicator dataset, which of the sexes identified the most with the indicators. This data set is extracted from the previous **Mental Health Data Set** $(2.2)$. 

Additionally, it is in this dataset that we decided that there are four primary pillars of *“Happiness”* that are defined by mental health indicators. These indicators include:
* **Very good or excellent self-rated mental health**: this gives a very good general snapshot on the level self-perceived happiness that the individual identifies with. 
* **Health care needs associated with mental health problems, met**: if the individual needs guidance for mental health or facilities, are they able to acquire these needs and pursue their happiness?
* **Major depressive episode, measured criteria not met**: this metric displays whether or not the individual has experienced a major depressive episode where their happiness and well being has been threatened.
* **Persons who do not consider themselves a gambler**: as a coping mechanism for poor mental health, individuals often resort to gambling and additives. Engagement in said activities is also a good indicator of mental health and general happiness.

To acquire this table, the Mental Health Indicator Dataset was isolated for the selected indicators mentioned in the paragraph above. Then, the total number of people for each indicator per sex was summed together, then a **percentage** of the total was calculated. The results are organized by sex and indicator.


In [None]:
# Create dataframe from Mental Health Indicator Data Set
dfIndicators = pd.read_csv('MentalHealthCAD.csv')

# Filter for necessary year
needed_year = dfIndicators["REF_DATE"] == 2002
dfIndicators = dfIndicators[needed_year]

# Filter for necessary age group
sixtyFive = dfIndicators["Age group"] == "65 years and over"
dfIndicators = dfIndicators[sixtyFive]

# Filter for number of people who fall under the indicator categories
numPpl = dfIndicators['Characteristics'] == "Number of persons"
dfIndicators = dfIndicators[numPpl]

# Filter for sex
sexMask = (dfIndicators['Sex'] != 'Both sexes')
dfIndicators = dfIndicators[sexMask]

# Filter for province distinction
provincesOnly = (dfIndicators['GEO'] != 'Canada')
dfIndicators = dfIndicators[provincesOnly]

# Drop unecessary columns
dfIndicators = dfIndicators.drop(["DGUID","UOM","UOM_ID","SCALAR_FACTOR","SCALAR_ID","VECTOR","COORDINATE","TERMINATED","DECIMALS","STATUS","SYMBOL"], axis=1)

# Create key using Mental Health Indicator and Sex
dfIndicators['key'] = dfIndicators['Mental health and well-being profile'] + dfIndicators["Sex"]

# Filter for specific indicators
categories = ['Persons who do not consider themselves a gambler','Health care needs associated with mental health problems, met','Major depressive episode, measured criteria not met','Very good or excellent self-rated mental health']
dfIndicators = dfIndicators.loc[dfIndicators['Mental health and well-being profile'].isin(categories)]

# Pivot table to aggregate total people in each indicator group
indicatorsBySex = dfIndicators.pivot_table(values = 'VALUE', index = ['Sex'], columns=['Mental health and well-being profile'], aggfunc = np.sum)

# Convert total number of  people into percentage of total
indicatorsBySex = (100. * indicatorsBySex / indicatorsBySex.sum()).round(0)
indicatorsBySex

## Section 3: Visualizations 

To glean learnings from the data it is important to visualize the data in a way that the information is clear to our audience. 



### Figure 3.1: Average Total Income of Both Sexes 65 Years and Over

The first set of data shown below details the break down of the average total income of both sexes over the age of 65 across the nation.

And at a glance we can see some interesting information already, namely that both sexes on average make the least in Newfoundland and Labrador and made the most in Ontario.

And that some of the "richer" provinces aside from Ontario include British Columbia and Alberta. Conversely, the provinces on the other end of the spectrum also include, Prince Edward Island and New Brunswick. 

In [None]:
# Load in the shape file using geopandas
file_path = 'lpr_000b16a_e.shp'
map_df = gdp.read_file(file_path)

# Merge the map and the data together 
merged = map_df.set_index('PRENAME').join(dataForChloropleth.set_index('GEO'))

# Select the value to show and set your min and max values
col = 'VALUE'
min_value,max_value = 20000,33000

# Plot the data
fig, axis = plt.subplots(1, figsize=(20, 10))
merged.plot(column=col, cmap='OrRd', linewidth=0.8, ax=axis, edgecolor='0.7')

# Format title and annotations
axis.axis('off')
axis.set_title('Average Total Income of Both Sexes 65 Years and Over',\
              fontdict={'fontsize': '25',
                        'fontweight' : '3'})
axis.annotate('https://open.canada.ca/en/open-datao - 2020/04/12',xy=(.01, .08), xycoords='figure fraction', horizontalalignment='left')

# Format colourbar
legend = plt.cm.ScalarMappable(cmap='OrRd', norm=plt.Normalize(vmin=min_value, vmax=max_value))
legend._A = []
color_bar = fig.colorbar(legend)
color_bar.set_label('In 2011 CAD($)',\
              fontdict={'fontsize': '15',
                        'fontweight' : '3'},labelpad=20)

### Figure 3.2: Average Total Income of Males and Females over 65 years of age

For the analysis of the research question it was also important to look at the income data on a per sex information level. 

Per province as the data showed, the glass ceiling is unfortunately very much in effect. With the earnings of females markedly less than males across the country. Specifically we can see that males made the most in Ontario and the least in Newfoundland and Labrador. 

Whereas although the females also made the most money in Ontario and the least money in Newfoundland and Labrador they made \$ \\$13,200 \$ less in the former and \$ \\$7,500 \$ less in the latter.  

In [None]:
# Create two masks to filter for the necessary data based on sex
MaleMask = finalIncome['Sex'] == 'Males'
FemaleMask = finalIncome['Sex'] == 'Females'

men_means = finalIncome[MaleMask]
women_means = finalIncome[FemaleMask]

# Use the provinces as labels and set the width of the bars
x = np.arange(len(dataForChloropleth['GEO']))  
width = 0.35

fig, ax = plt.subplots(1, figsize=(20, 10))
rects1 = ax.bar(x - width/2, men_means['VALUE'], width, label='Males')
rects2 = ax.bar(x + width/2, women_means['VALUE'], width, label='Females')

# Add figure data such as the labels, title and legend
ax.set_ylabel('Average Total Income',\
              fontdict={'fontsize': '20',
                        'fontweight' : '3'},labelpad=20)
ax.set_xlabel('Provinces',\
              fontdict={'fontsize': '20',
                        'fontweight' : '3'})

ax.set_title('Average Total Income of Males and Females over 65 years of age',\
              fontdict={'fontsize': '25',
                        'fontweight' : '3'},pad=30)
ax.set_xticks(x)
ax.set_xticklabels(dataForChloropleth['GEO'], rotation = 90,\
              fontdict={'fontsize': '20',
                        'fontweight' : '3'})
ax.legend()
fig.tight_layout()
plt.legend(['Males', 'Females'], bbox_to_anchor=(1, 1.3),fontsize = 20)
plt.yticks(fontsize = 20)
plt.show()

### Figure 3.3: Percentage of Population Who Have Not Experienced Barriers in Availability to Mental Health Services


Moving forward onto the mental health data we can see some interesting trends looking at data for both sexes across the nation in relation to their experience with barriers to mental health services. 

Relatively Alberta had the highest percentage of people that did experience barriers to mental health services.

Conversely, on the other end of the spectrum we can see that the entire population (both sexes) polled in New Brunswick have reportedly never experienced barriers to mental health services. With the rest of the provinces polling in the following order from worst ahead of Alberta to best behind New Brunswick:

* Newfoundland and Labrador
* Saskatchewan
* Quebec
* Prince Edward Island
* Nova Scotia
* British Columbia
* Manitoba 
* Ontario


In [None]:
# Filtering for Barriers to Acessing Mental Health
acessing = (dfSecondChloropleth['Mental health and well-being profile'] == 'Barriers accessing mental health services not encountered due to availability issues')
finalAccess = dfSecondChloropleth[acessing]

# Load in the shape file using geopandas
fp = 'lpr_000b16a_e.shp'
map_df = gdp.read_file(fp)

merged = map_df.set_index('PRENAME').join(finalAccess.set_index('GEO'))

variable = 'VALUE'
# set the range for the choropleth
vmin, vmax = 97, 100

# create figure and axes for Matplotlib
fig, ax = plt.subplots(1, figsize=(20, 10))

# create map
merged.plot(column=variable, cmap='Blues', linewidth=0.8, ax=ax, edgecolor='0.8')
ax.axis('off')

# add a title
ax.set_title('Percentage of Population Who Have Not Experienced Barriers in Availability to Mental Health Services', \
              fontdict={'fontsize': '19',
                        'fontweight' : '3'})
ax.annotate('https://open.canada.ca/en/open-datao - 2020/04/12',xy=(.01, .08), xycoords='figure fraction', horizontalalignment='left')

# Create colorbar as a legend
sm = plt.cm.ScalarMappable(cmap='Blues', norm=plt.Normalize(vmin=vmin, vmax=vmax))
sm._A = []
cbar = fig.colorbar(sm)
cbar.set_label('Percentage of Population (%)',\
              fontdict={'fontsize': '15',
                        'fontweight' : '3'},labelpad=20)

### Figure 3.4: Income vs Likelihood of Avoiding Major Depressive Episodes

Now that we've looked at the data sets in isolation, it would be pertinent to look at them in conjunction and to observe the trends the data shows. In this figure we are observing the relationship between income and the likelihood of avoiding major depressive episodes. 

The graph continues to reflect the income inequality observed in earlier figures and although the majority of the points are clustered relatively close to one another. We can still see a distinction upon a relative comparison of the values.

At the 97% likelihood of avoiding major depressive episodes there are more provinces with females across the country below this point $(6)$ than males $(4)$. Which in turn means there are more males with a higher likelihood of avoiding major depressive episodes. 

In [None]:
# Filter for specific mental health profile
acessing = (dfProfile['Mental health and well-being profile'] == 'Major depressive episode, measured criteria not met')
dfDepression = dfProfile[acessing]
dfDepression = dfDepression.drop(['REF_DATE','GEO','Age group','Sex','Mental health and well-being profile','Characteristics'], axis=1)
dfDepression.rename(columns={'VALUE':'MDE'}, inplace=True)

# Merge on key
dfDepIncome = dfDepression.merge(finalIncome, left_on='key', right_on='key')

#Seperate into male and female df's
maleMask = dfDepIncome['Sex'] == 'Males'
males = dfDepIncome[maleMask]
femaleMask = dfDepIncome['Sex'] == 'Females'
females = dfDepIncome[femaleMask]

# Plot the graph
plt.figure(figsize=(15, 10)) 
ax = plt.subplot()
male_x, male_y = males.MDE, males.VALUE
female_x,female_y = females.MDE, females.VALUE

# Set the y axis label
ax.set_ylabel('Average Total Income', \
              fontdict={'fontsize': '15',
                        'fontweight' : '3'}, labelpad = 20)
# Set the x axis label
ax.set_xlabel('Probalility of Avoiding Major Depressive Episodes (%)',\
              fontdict={'fontsize': '15',
                        'fontweight' : '3'}, labelpad = 20)

ax.set_title('Income vs Likelyhood of Avoiding Major Depressive Episodes',\
              fontdict={'fontsize': '22',
                        'fontweight' : '3'}, pad = 30)

ax.scatter(male_x,male_y,label='Males',c='b')
ax.scatter(female_x,female_y,label='Females',c='orange')
plt.legend()



### Figure 3.5: Percentage of Population Who Ranked Their Mental Health Highly

This graph explores the relationship between income and self-rated mental health, where the assumption is that those with high incomes have higher ranked mental health. 

Although it is quite visible that the women earn significantly less than men, the propagation of the dots for both sexes seem to be very similar in shape. One might notice that there is almost a “arch” shape shared between the two sexes.

Interestingly, the lowest earning segments of both sexes are found on the highest self-ranked mental health spectrum, as seen on the lower right side of the graph. The lowest earning female group has a percentage of $76$%, while the lowest earning male ranked at $78$%.  The highest earning segments of either sexes are found in the midway part, ranging from $ 50-71$ %.

In [None]:
# Filter for specific mental health profile
selfRate = (dfProfile['Mental health and well-being profile'] == 'Very good or excellent self-rated mental health')
dfselfRate = dfProfile[selfRate]
dfselfRate = dfselfRate.drop(['REF_DATE','GEO','Age group','Sex','Mental health and well-being profile','Characteristics'], axis=1)
dfselfRate.rename(columns={'VALUE':'SELF_RATE'}, inplace=True)

# Join on key
dfselfRate = dfselfRate.merge(finalIncome, left_on='key', right_on='key')

# Seperate into male and female DF's
maleMask = dfselfRate['Sex'] == 'Males'
males = dfselfRate[maleMask]
femaleMask = dfselfRate['Sex'] == 'Females'
females = dfselfRate[femaleMask]

# Plot the graph
plt.figure(figsize=(15, 10)) 
ax = plt.subplot()
male_x, male_y = males.SELF_RATE, males.VALUE
female_x,female_y = females.SELF_RATE, females.VALUE



# Set the y axis label
ax.set_ylabel('Average Total Income',fontsize=15, labelpad = 20)
# Set the x axis label
ax.set_xlabel('Percentage of Population Who Ranked Their Mental Health Highly (%)',fontsize=15, labelpad = 20)

ax.set_title('Income vs. Highly Self-Rated Mental Health',\
              fontdict={'fontsize': '22',
                        'fontweight' : '3'}, pad = 10)

ax.scatter(male_x,male_y,label='Males',c='b')
ax.scatter(female_x,female_y,label='Females',c='orange')
plt.legend()

### Figure 3.6: Percentage of Population Who Had Their Mental Health Needs Met

The following graph illustrates the facilities received by men and women in correlation to their average income. This relationship explores the concept of if money will allow you to access more and better facilities for mental health, ultimately leading individuals to happiness.

It is observed that the majority of the segments lie in the $98.5-99.5$% x-axis. Fascinatingly, both the highest and lowest earning segments in both sexes are found in this bracket.

However, in the less than $98$% range, there are five segments that fall under the lower end of the spectrum, four of which are female segments.



In [None]:
# Filter for specific mental health profile
metHealth = (dfProfile['Mental health and well-being profile'] == 'Health care needs associated with mental health problems, met')
metHealth = dfProfile[metHealth]
metHealth = metHealth.drop(['REF_DATE','GEO','Age group','Sex','Mental health and well-being profile','Characteristics'], axis=1)
metHealth.rename(columns={'VALUE':'SELF_RATE'}, inplace=True)

# Join on key
metHealth = metHealth.merge(finalIncome, left_on='key', right_on='key')

# Seperate into male and female DF's
maleMask = metHealth['Sex'] == 'Males'
males = metHealth[maleMask]
femaleMask = metHealth['Sex'] == 'Females'
females = metHealth[femaleMask]

# Plot the graph
plt.figure(figsize=(15, 10)) 
ax = plt.subplot()
male_x, male_y = males.SELF_RATE, males.VALUE
female_x,female_y = females.SELF_RATE, females.VALUE


# Set the y axis label
ax.set_ylabel('Average Total Income',fontsize=15, labelpad=20)
# Set the x axis label
ax.set_xlabel('Percentage of Population Who Had Their Mental Health Needs Met (%)',fontsize=15, labelpad=20)

ax.set_title('Income vs. Mental Healthcare Needs Met',\
              fontdict={'fontsize': '22',
                        'fontweight' : '3'}, pad = 10)

ax.scatter(male_x,male_y,label='Males',c='b')
ax.scatter(female_x,female_y,label='Females',c='orange')
plt.legend()

### Figure 3.7: Percentage of Population Who Consider Themselves a Gambler vs Mental Health Needs Met
In the following graph, there are three relationships that are observed in a scatter bubble graph. The average income is measured in the size of the bubbles, where the highest incomes are larger than the lower incomes. 

This graph also analyzes the correlation between the number of people who have had their mental health needs met, which is considered a healthy mechanism, and their likelihood of identifying themselves as a gambler, which is a unhealthy mechanism for those prone to unhappiness.

As we can observe in the graph, the highest earners with the most income are found to be in the $25$%+ percentile range when identifying as a gambler. This may suggest that having more money equates to higher odds of gambling. This same group of people however, seem to be getting the same range of mental health needs, in the $98.5-99$% range. Smaller income segments are scattered throughout the graph.


In [None]:
# Filter for specific mental health profile
health = (dfProfile['Mental health and well-being profile'] == 'Health care needs associated with mental health problems, met')
dfMetHealth = dfProfile[health]
dfMetHealth = dfMetHealth.drop(['REF_DATE','GEO','Age group','Sex','Mental health and well-being profile','Characteristics'], axis=1)
dfMetHealth.rename(columns={'VALUE':'MET_HEALTH'}, inplace=True)
gamble = (dfProfile['Mental health and well-being profile'] == 'Persons who do not consider themselves a gambler')
dfGamble = dfProfile[gamble]
dfGamble = dfGamble.drop(['REF_DATE','GEO','Age group','Sex','Mental health and well-being profile','Characteristics'], axis=1)
dfGamble.rename(columns={'VALUE':'GAMBLE'}, inplace=True)

# Join on key
dfGamble = dfGamble.merge(finalIncome, left_on='key', right_on='key')
dfGamble = dfGamble.merge(dfMetHealth, left_on='key', right_on='key')

# Seperate into male and female DF's
maleMask = dfGamble['Sex'] == 'Males'
males = dfGamble[maleMask]
femaleMask = dfGamble['Sex'] == 'Females'
females = dfGamble[femaleMask]

# Plot the graph
plt.figure(figsize=(20, 10)) 
ax = plt.subplot()

male_x, male_y = males.MET_HEALTH, males.GAMBLE
female_x,female_y = females.MET_HEALTH, females.GAMBLE
incomes = dfGamble.VALUE/100

# Set the y axis label
ax.set_ylabel('Percentage Consider Themselves a Gambler (%)',\
              fontdict={'fontsize': '20',
                        'fontweight' : '3'}, labelpad = 30)
# Set the x axis label
ax.set_xlabel('Percentage of Population Who Had Their Mental Health Needs Met (%)',\
              fontdict={'fontsize': '20',
                        'fontweight' : '3'}, labelpad = 30)
ax.set_title('Percentage of Population Who Consider Themselves a Gambler vs Mental Health Needs Met',\
              fontdict={'fontsize': '25',
                        'fontweight' : '3'}, pad = 30)
plt.yticks(fontsize = 20)
plt.xticks(fontsize = 20)

ax.scatter(male_x,male_y,label='Males',c='b', s=incomes, alpha = 0.3)
ax.scatter(female_x,female_y,label='Females',c='orange', s=incomes, alpha = 0.3)
plt.legend(fontsize = 14)

### Figure 3.8: Mental Health Indicators by Sex Across Canada

As mentioned in the dataset description 2.3, this graph highlights the relationship between the chosen mental health indicators and sex. All indicators add up to 100%, therefore each category represents the discrepancy between the sexes when surveyed.
If we observe the black line that is described on the y-axis as 50%, we can see that the female participants outweigh the male participants in every mental health indicator. The correlation to income is considered when we reference figure 2.2, where we observed the pattern that the women are met with a income glass ceiling and earn significantly less than men. Therefore, from this table, we can deduce that the segments that are being paid less encompass a higher percentage of the indicator aggressors.

In [None]:
# Transpose Indicators by Sex 
transposed = indicatorsBySex.transpose()

# Plot Bar Graph with colour
colour = ['orange','b']
ax = transposed.plot(kind = 'bar', stacked=True, figsize=(20, 10),fontsize=20, color=colour)

# X-tick Indicator Names
categories = ['Persons who do not consider themselves a gambler','Health care needs associated with mental health problems, met','Major depressive episode, measured criteria not met','Very good or excellent self-rated mental health']

# Format legend, labels, and title, colour
ax.set_ylabel('Percentage of Population(%)',fontsize=20)
ax.set_xlabel('Mental Health Indicator',fontsize=20)
ax.set_title('Mental Health Indicators by Sex',pad=30, fontsize = 20)
ax.set_xticklabels(categories,fontsize = 15,rotation=70)
ax.legend(['Females', 'Males'], bbox_to_anchor=(1, 1.2), fontsize = 20)
ax.axhline(y=50, color='black',linestyle='-')

## Section 4: Conclusion

In answering the question,*“Does money buy happiness?”* we selected relevant data sets, filtered and cleaned said data sets and then created visualizations in order to understand the nature of the data and the types of relationships between variables that can be observed. We observed the following properties of the data:

### On Income

Some of the richer provinces for both males and females included Ontario, BC, and Alberta. We postulated that this can possibly be attributed to the oil industry in Albert, and the larger economies in British Columbia and Ontario with two major metropolitan cities in both provinces (Toronto and Vancouver). Some of the relatively “poorer” provinces with lower average total income included Newfoundland & Labrador, PEI, New Brunswick.

It was observed that men made more than women in each and every province across the country without fail. And markedly, in the most “prosperous” province for both male and female earners where the joint average total income level was at its highest the income inequality was extremely significant as well with females in Ontario making $13,200$ less than their male counterparts. We posit that among the elderly, gender roles are more likely than not traditional in nature. As men were more traditionally expected to be present in the workforce and women to stay home and tend to domestic affairs. This may explain the discrepancy along with the glass ceiling that remains even today.


### Correlation with Mental Health Indicators

Although numbers across the country were high for the percentage of people in each province that did not encounter barriers to mental health services, after a relative inspection of the data one can observe that this mental health metric across the country was best in New Brunswick and and at its worst in Alberta. 

After isolating the datasets on their own and looking for overarching trends, we began a comparison of mental health indicators. The first of which was income and **the likelihood of avoiding a major depressive episode (MDE)**. We observed that, on a nationwide scale females have more of a chance of a major depressive episode than males . Which may at first suggest that income may function as a contributing factor as the women were making less than the males at this time period. And men did report the highest likelihood of avoiding a MDE. However there was no clear indication that money was a deciding factor for the males in that those males earning the least amount of money reported the lowest probability of a MDE. 

Overall, for both sexes, there appears to be a pattern in which the lower income segments were found to capture both the lowest and highest rates of **self-assessed mental health levels**. The higher income brackets were mostly found in the middle ground. This indicates that a higher income suggests a moderate level of self-assessed mental health, whereas a lower income can produce either  the most extremely positive levels of self-assessed mental health or the lowest possible level of self-assessed mental health. This suggests that there are factors that influence happiness and mental health outside of income, since income levels were not shown to be a determining factor.

**Access to mental health care** can be shown to be relatively equal across all income brackets. This is shown in the fact  the highest and lowest income segments are roughly contained in the same percentile. The highest concentration of segments also appear in the same range. This alludes to the fact that the privilege of accessing mental health facilities are not limited to the income received. However, when we observe the outliers of the Figure 3.6, in the lower percentage points on the x-axis, relatively more women are found to not have their mental health needs met. Since women earn significantly less than men, this may suggest that lower income may function as a contributing factor to low access to mental health facilities for females. 

As a whole, Canadians who possess a larger income tended to **identify as gamblers** despite good access to mental health services. This may suggest that having more money equates to higher odds of gambling. When we observe outliers who have lower access to mental health services, those with smaller income segments are less prone to problems in gambling, indicating that they do not delve into unhealthy coping mechanisms even when faced with mental health adversity. Generally, these findings support the maxim of, “more money, more problems” as the data suggests that money indeed causes more issues when it comes to gambling.

When the **mental health profiles are looked at as a whole**, females represent a bigger portion of those who identify with positive mental health indicators. What must be considered is the blaringly fact that women earn disproportionately less than men. The inverse relation here between income and good mental health are observed. However, this distinction to sex may not be limited to income dicrepancy, and can be influenced by external factors such as gender roles, etc. 

Ultimately, looking at the trends we’ve concluded that although money is a contributing factor to happiness it does not appear to be the sole contributing factor to happiness. In the way that those that believe money can buy happiness believe its impact to be so great. We did not find a strong correlation between income levels and each of the mental health metrics we measured. And we also found that there was a stronger chance of those at higher income ranges self identifying as gamblers. A trait that in extreme cases can lead to one's self ruin, which we viewed as a potential problem. So in response to the age old question *“Does money buy happiness?”*, we believe the answer is no!