In [10]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

wh_2015 = pd.read_csv(r'C:\Users\lumum\Documents\Data Projects\transform-data\World_Happiness_2015.csv')

wh_2015.head()


Unnamed: 0,Country,Region,Happiness Rank,Happiness Score,Standard Error,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual
0,Switzerland,Western Europe,1,7.587,0.03411,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,2.51738
1,Iceland,Western Europe,2,7.561,0.04884,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363,2.70201
2,Denmark,Western Europe,3,7.527,0.03328,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139,2.49204
3,Norway,Western Europe,4,7.522,0.0388,1.459,1.33095,0.88521,0.66973,0.36503,0.34699,2.46531
4,Canada,North America,5,7.427,0.03553,1.32629,1.32261,0.90563,0.63297,0.32957,0.45811,2.45176


In this notebook we'll explore the following functions and methods:
<ul>
    <li><b>Series.map()</b></li>
    <li><b>Series.apply()Example2</b></li>
    <li><b>DataFrame.applymap()</b></li>
    <li><b>DataFrame.apply()</b></li>
    <li><b>pd.melt()</b></li>
</ul>

In [11]:
#dictionary for renaming columns
mapping = {'Economy (GDP per Capita)': 'Economy', 'Health (Life Expectancy)': 'Health', 'Trust (Government Corruption)': 'Trust' }

wh_2015 = wh_2015.rename(mapping, axis = 1)

In [12]:
wh_2015.head()

Unnamed: 0,Country,Region,Happiness Rank,Happiness Score,Standard Error,Economy,Family,Health,Freedom,Trust,Generosity,Dystopia Residual
0,Switzerland,Western Europe,1,7.587,0.03411,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,2.51738
1,Iceland,Western Europe,2,7.561,0.04884,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363,2.70201
2,Denmark,Western Europe,3,7.527,0.03328,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139,2.49204
3,Norway,Western Europe,4,7.522,0.0388,1.459,1.33095,0.88521,0.66973,0.36503,0.34699,2.46531
4,Canada,North America,5,7.427,0.03553,1.32629,1.32261,0.90563,0.63297,0.32957,0.45811,2.45176


We need to analyse how each factor (economy, family, health, freedom) affect the overrall happiness score.
<code>
def label(element):
    if element > 1:
        return 'High'
    else:
        return 'Low'
</code>
    
Pandas has a couple of methods used to apply custom methods.
<ol>
<li>Series.map() method</li>
<li>Series.apply() method</li>
</ol>
<br>Both methods above apply a function element-wise to a column. 
When we say element-wise, we mean that we pass the function one value in the series at a time and it performs some kind of transformation.</br>

![image.png](attachment:image.png)

We use the following syntax for both methods:
![image.png](attachment:image.png)

In [13]:
def label(element):
    if element > 1:
        return 'High'
    else:
        return 'Low'
    
economy_impact_map = wh_2015['Economy'].map(label)
economy_impact_apply = wh_2015['Economy'].apply(label)
equal = economy_impact_map.equals(economy_impact_apply)

In [14]:
economy_impact_map.describe()

count     158
unique      2
top       Low
freq       92
Name: Economy, dtype: object

In [15]:
def label(element, x):
    if element > x:
        return 'High'
    else:
        return 'Low'
economy_impact_apply = wh_2015['Economy'].apply(label, x=0.8)

Above we've learnt that we can only use the <b>Series.apply()</b> method to apply a function with additional arguments element-wise - the <b>Series.map()</b> method will return an error.

Now we want to apply the same function to multiple columns.
We can do that with the <b>DataFrame.applymap() method.</b>

![image.png](attachment:image.png)

Just like with the Series.map() and Series.apply() methods, we need to pass the function name into the <b>df.applymap()</b> method without parentheses.

In [16]:
def label(element):
    if element > 1:
        return 'High'
    else:
        return 'Low'

factors = ['Economy', 'Family', 'Health', 'Freedom', 'Trust', 'Generosity']
factors_impact = wh_2015[factors].applymap(label)
factors_impact

Unnamed: 0,Economy,Family,Health,Freedom,Trust,Generosity
0,High,High,Low,Low,Low,Low
1,High,High,Low,Low,Low,Low
2,High,High,Low,Low,Low,Low
3,High,High,Low,Low,Low,Low
4,High,High,Low,Low,Low,Low
...,...,...,...,...,...,...
153,Low,Low,Low,Low,Low,Low
154,Low,Low,Low,Low,Low,Low
155,Low,Low,Low,Low,Low,Low
156,Low,Low,Low,Low,Low,Low


We can calculate the number of <b>'High'</b> & <b>'Low'</b> by using the df.value_counts.

In [17]:
factors_impact.apply(pd.value_counts)

Unnamed: 0,Economy,Family,Health,Freedom,Trust,Generosity
High,66,89,2,,,
Low,92,69,156,158.0,158.0,158.0


It's clear that Economy and Family column have the most High values.
![image.png](attachment:image.png)

Notice that we used the df.apply() method to transform multiple columns.

<br>This is only possible because the pd.value_counts function operates on a series.</br>

<br>If we tried to use the df.apply() method to apply a function that works element-wise to multiple columns, we'd get an error.</br>

<br>The DataFrame.apply() method has different capabilities.</br>
<br>Instead of applying functions element-wise, the df.apply() method applies functions along an axis, either column-wise or row-wise.</br>
<br>When we create a function to use with df.apply(), we set it up to accept a series, most commonly a column.</br>

In [18]:
factors_impact['Economy'].size

158