# Challenge: What test to use

1. Did people become less trusting from 2012 to 2014? Compute results for each country in the sample.
2. Did people become happier from 2012 to 2014? Compute results for each country in the sample.
3. Who reported watching more TV in 2012, men or women?
4. Who was more likely to believe people were fair in 2012, people living with a partner or people living alone?
5. Pick three or four of the countries in the sample and compare how often people met socially in 2014. Are there differences, and if so, which countries stand out?
6. Pick three or four of the countries in the sample and compare how often people took part in social activities, relative to others their age, in 2014. Are there differences, and if so, which countries stand out?

In [69]:
import numpy as np
import pandas as pd

In [70]:
# Columns: https://thinkful-ed.github.io/data-201-resources/ESS_practice_data/ESS_codebook.html
df = pd.read_csv("https://raw.githubusercontent.com/Thinkful-Ed/data-201-resources/master/ESS_practice_data/ESSdata_Thinkful.csv")

print(df.shape)
df.columns

(8594, 13)


Index(['cntry', 'idno', 'year', 'tvtot', 'ppltrst', 'pplfair', 'pplhlp',
       'happy', 'sclmeet', 'sclact', 'gndr', 'agea', 'partner'],
      dtype='object')

In [71]:
# Did people become less trusting from 2012 to 2014? Compute results for each country in the sample.
# Question: How do we use t-test and what not to calculate this instead?

print(df.groupby(["cntry", "year"])["ppltrst"].mean())

trust_2012 = df[df["year"] == 6][["cntry", "ppltrst"]].groupby("cntry").mean()
trust_2014 = df[df["year"] == 7][["cntry", "ppltrst"]].groupby("cntry").mean()

print((trust_2014-trust_2012)/trust_2012 * 100)

cntry  year
CH     6       5.677878
       7       5.751617
CZ     6       4.362519
       7       4.424658
DE     6       5.214286
       7       5.357143
ES     6       5.114592
       7       4.895128
NO     6       6.649315
       7       6.598630
SE     6       6.058499
       7       6.257709
Name: ppltrst, dtype: float64
        ppltrst
cntry          
CH     1.298701
CZ     1.424368
DE     2.739726
ES    -4.290937
NO    -0.762258
SE     3.288114


In [89]:
import scipy.stats as stats
print(stats.ttest_ind(trust_2012, trust_2014))

Ttest_indResult(statistic=array([-0.07405572]), pvalue=array([ 0.94242648]))


In [49]:
# Did people become happier from 2012 to 2014? Compute results for each country in the sample.
# Question: How do we use t-test and what not to calculate this instead?

print(df.groupby(["cntry", "year"])["happy"].mean())

happy_2012 = df[df["year"] == 6][["cntry", "happy"]].groupby("cntry").mean()
happy_2014 = df[df["year"] == 7][["cntry", "happy"]].groupby("cntry").mean()

print((happy_2014-happy_2012)/happy_2012 * 100)


cntry  year
CH     6       8.088312
       7       8.116429
CZ     6       6.770898
       7       6.914110
DE     6       7.428571
       7       7.857143
ES     6       7.548680
       7       7.419967
NO     6       8.251719
       7       7.915185
SE     6       7.907387
       7       7.946961
Name: happy, dtype: float64
          happy
cntry          
CH     0.347635
CZ     2.115120
DE     5.769231
ES    -1.705104
NO    -4.078359
SE     0.500473


In [54]:
# Who reported watching more TV in 2012, men or women?
# 1 Male, 2 Female
# FEMALE
# Question: How do we use t-test and what not to calculate this instead?

df[df["year"] == 6].groupby(["gndr"])["tvtot"].mean()

gndr
1.0    3.901906
2.0    3.944393
Name: tvtot, dtype: float64

In [56]:
# Who was more likely to believe people were fair in 2012, people living with a partner or people living alone?
# 1 Lives with husband/wife/partner at household grid, 2 Does not
# PEOPLE WITH PARTNERS
# Question: How do we use t-test and what not to calculate this instead?

df[df["year"] == 6].groupby(["partner"])["pplfair"].mean()

partner
1.0    6.080736
2.0    5.856965
Name: pplfair, dtype: float64

In [57]:
# Pick three or four of the countries in the sample and compare how often people met socially in 2014.
# Are there differences, and if so, which countries stand out?
# ANSWER: The countries that grew most in happiness met less socially (CZ DE) — seems counter intuitive
# Question: How do we use t-test and what not to calculate this instead?

df[df["year"] == 7].groupby("cntry")["sclmeet"].mean()

cntry
CH    5.160622
CZ    4.445802
DE    4.428571
ES    5.260116
NO    5.302326
SE    5.426211
Name: sclmeet, dtype: float64

In [66]:
# Pick three or four of the countries in the sample and compare how often people took part in social activities, relative to others their age, in 2014.
# Are there differences, and if so, which countries stand out?
# AGAIN CZ AND DE HAVE SOME OF THE LOWEST RATES
# Question: How do we use t-test and what not to calculate this instead?

bins = [1, 30, 50, 70, 100]
df_y = df[df["year"] == 7]

groups = df_y.groupby(["cntry", pd.cut(df_y["agea"], bins)])["sclact"].mean()
groups.unstack()


agea,"(1, 30]","(30, 50]","(50, 70]","(70, 100]"
cntry,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CH,2.8125,2.795455,2.741784,2.8
CZ,2.940741,2.634146,2.732394,2.487179
DE,,3.0,2.857143,2.0
ES,2.728814,2.691038,2.626471,2.24026
NO,2.95679,2.789474,2.827586,2.948052
SE,2.909548,2.86383,2.847973,2.916084
