# Challenge: What to use

In this dataset, the same participants answered questions in 2012 and again 2014.

1. Did people become less trusting from 2012 to 2014? Compute results for each country in the sample.
2. Did people become happier from 2012 to 2014? Compute results for each country in the sample.
3. Who reported watching more TV in 2012, men or women?
4. Who was more likely to believe people were fair in 2012, people living with a partner or people living alone?
5. Pick three or four of the countries in the sample and compare how often people met socially in 2014. Are there differences, and if so, which countries stand out?
6. Pick three or four of the countries in the sample and compare how often people took part in social activities, relative to others their age, in 2014. Are there differences, and if so, which countries stand out?


In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats as stats
%matplotlib inline

In [2]:
df = pd.read_csv('ESSdata_Thinkful.txt')

## 1. Did people become less trusting from 2012 to 2014? Compute results for each country in the sample.

Answer: No, they doen't (Paired t test, p = 0.89)

In [3]:
df['ppltrst'] = df['ppltrst'].fillna(df['ppltrst'].mean())

In [4]:
t_stat, pVal = stats.ttest_rel(df[df['year'] == 6]['ppltrst'], df[df['year'] == 7]['ppltrst'])

In [5]:
t_stat

0.13802065795077995

In [6]:
pVal

0.890230559647673

## 2. Did people become happier from 2012 to 2014? Compute results for each country in the sample.

Answer: No, they doen't (Paired t test, p = 0.11)

In [7]:
df['happy'] = df['happy'].fillna(df['happy'].mean())

In [8]:
t_stat, pVal = stats.ttest_rel(df[df['year'] == 6]['happy'], df[df['year'] == 7]['happy'])

In [9]:
t_stat

1.569733494456447

In [10]:
pVal

0.11655077112347458

## 3. Who reported watching more TV in 2012, men or women?

In [11]:
df['tvtot'] = df['tvtot'].fillna(df['tvtot'].mean())

In [12]:
df_2012 = df[df['year'] == 6]

In [13]:
t_stat, pVal = stats.ttest_ind(df_2012[df_2012['gndr'] == 1]['tvtot'], df_2012[df_2012['gndr'] == 2]['tvtot'])

In [14]:
t_stat

-0.6899854275080107

In [15]:
pVal

0.49024063084629854

In [16]:
df_2012[df_2012['gndr'] == 1]['tvtot'].mean()

3.901850489265741

In [17]:
df_2012[df_2012['gndr'] == 2]['tvtot'].mean()

3.944277159999256

## 4. Who was more likely to believe people were fair in 2012, people living with a partner or people living alone?
	
Lives with husband/wife/partner at household grid is significantly higher than other group.

In [23]:
df_2012['pplfair'] = df_2012['pplfair'].fillna(df_2012['pplfair'].mean())
df_2012['partner'] = df_2012['partner'].fillna(0)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [24]:
t_stat, pVal = stats.ttest_ind(df_2012[df_2012['partner'] == 1]['pplfair'], df_2012[df_2012['partner'] == 2]['pplfair'])

In [25]:
t_stat

3.319724959187467

In [26]:
pVal

0.0009085893209899278

In [27]:
df_2012[df_2012['partner'] == 1]['pplfair'].mean()

6.080395232959723

In [28]:
df_2012[df_2012['partner'] == 2]['pplfair'].mean()

5.857662850105444

## 5. Pick three or four of the countries in the sample and compare how often people met socially in 2014. Are there differences, and if so, which countries stand out?

In [44]:
df['cntry'].unique()
df['sclmeet'] = df['sclmeet'].fillna(df['sclmeet'].mean())

In [45]:
pick_3 = df[df['cntry'].isin(['CH', 'CZ', 'DE'])]

In [46]:
stats.f_oneway(pick_3[pick_3['cntry'] == 'CH']['sclmeet'], pick_3[pick_3['cntry'] == 'CZ']['sclmeet'], pick_3[pick_3['cntry'] == 'DE']['sclmeet'])

F_onewayResult(statistic=55.84394375204852, pvalue=1.6019571080508603e-24)

In [48]:
pick_3.groupby('cntry')['sclmeet'].mean()

cntry
CH    5.120809
CZ    4.550270
DE    4.714286
Name: sclmeet, dtype: float64

In [49]:
stats.ttest_ind(pick_3[pick_3['cntry'] == 'CH']['sclmeet'], pick_3[pick_3['cntry'] == 'CZ']['sclmeet'])

Ttest_indResult(statistic=10.526473546834529, pvalue=1.892760425960061e-25)

In [50]:
stats.ttest_ind(pick_3[pick_3['cntry'] == 'CH']['sclmeet'], pick_3[pick_3['cntry'] == 'DE']['sclmeet'])

Ttest_indResult(statistic=1.6191447891606487, pvalue=0.10561670200820897)

In [51]:
stats.ttest_ind(pick_3[pick_3['cntry'] == 'CZ']['sclmeet'], pick_3[pick_3['cntry'] == 'DE']['sclmeet'])

Ttest_indResult(statistic=-0.5480959616123593, pvalue=0.5837171679989348)

## 6. Pick three or four of the countries in the sample and compare how often people took part in social activities, relative to others their age, in 2014. Are there differences, and if so, which countries stand out?



In [52]:
df['sclact'] = df['sclact'].fillna(df['sclact'].mean())

In [53]:
pick_3 = df[df['cntry'].isin(['CH', 'CZ', 'DE'])]

In [54]:
stats.f_oneway(pick_3[pick_3['cntry'] == 'CH']['sclact'], pick_3[pick_3['cntry'] == 'CZ']['sclact'], pick_3[pick_3['cntry'] == 'DE']['sclact'])

F_onewayResult(statistic=3.241697785407571, pvalue=0.03923981919512365)

In [55]:
stats.ttest_ind(pick_3[pick_3['cntry'] == 'CH']['sclact'], pick_3[pick_3['cntry'] == 'CZ']['sclact'])

Ttest_indResult(statistic=2.500641792504975, pvalue=0.012452388797292846)

In [56]:
stats.ttest_ind(pick_3[pick_3['cntry'] == 'CH']['sclact'], pick_3[pick_3['cntry'] == 'DE']['sclact'])

Ttest_indResult(statistic=-0.2494953556393485, pvalue=0.8030102288867258)

In [57]:
stats.ttest_ind(pick_3[pick_3['cntry'] == 'CZ']['sclact'], pick_3[pick_3['cntry'] == 'DE']['sclact'])

Ttest_indResult(statistic=-0.7417567155562307, pvalue=0.4583645392298574)