### Challenge: What to use
Using selected questions from the 2012 and 2014 editions of the European Social Survey, address the following questions. Keep track of your code and results in a Jupyter notebook or other source that you can share with your mentor. For each question, explain why you chose the approach you did.

Here is the data file. And here is the codebook, with information about the variable coding and content.


#### Data Key: https://thinkful-ed.github.io/data-201-resources/ESS_practice_data/ESS_codebook.html


In this dataset, the same participants answered questions in 2012 and again 2014.

Did people become less trusting from 2012 to 2014? Compute results for each country in the sample.

Did people become happier from 2012 to 2014? Compute results for each country in the sample.

Who reported watching more TV in 2012, men or women?

Who was more likely to believe people were fair in 2012, people living with a partner or people living alone?

Pick three or four of the countries in the sample and compare how often people met socially in 2014. Are there differences, and if so, which countries stand out?

Pick three or four of the countries in the sample and compare how often people took part in social activities, relative to others their age, in 2014. Are there differences, and if so, which countries stand out?

Submit a link to your work below.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import f_oneway
from statsmodels.stats.multicomp import pairwise_tukeyhsd
%matplotlib inline

In [2]:
PATH = r'https://raw.githubusercontent.com/Thinkful-Ed/data-201-resources/master/ESS_practice_data/ESSdata_Thinkful.csv'
data = pd.read_csv(PATH)

In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8594 entries, 0 to 8593
Data columns (total 13 columns):
cntry      8594 non-null object
idno       8594 non-null float64
year       8594 non-null int64
tvtot      8586 non-null float64
ppltrst    8580 non-null float64
pplfair    8555 non-null float64
pplhlp     8569 non-null float64
happy      8563 non-null float64
sclmeet    8579 non-null float64
sclact     8500 non-null float64
gndr       8584 non-null float64
agea       8355 non-null float64
partner    8577 non-null float64
dtypes: float64(11), int64(1), object(1)
memory usage: 873.0+ KB


In [4]:
data

Unnamed: 0,cntry,idno,year,tvtot,ppltrst,pplfair,pplhlp,happy,sclmeet,sclact,gndr,agea,partner
0,CH,5.0,6,3.0,3.0,10.0,5.0,8.0,5.0,4.0,2.0,60.0,1.0
1,CH,25.0,6,6.0,5.0,7.0,5.0,9.0,3.0,2.0,2.0,59.0,1.0
2,CH,26.0,6,1.0,8.0,8.0,8.0,7.0,6.0,3.0,1.0,24.0,2.0
3,CH,28.0,6,4.0,6.0,6.0,7.0,10.0,6.0,2.0,2.0,64.0,1.0
4,CH,29.0,6,5.0,6.0,7.0,5.0,8.0,7.0,2.0,2.0,55.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
8589,SE,3729.0,7,3.0,4.0,5.0,3.0,6.0,6.0,2.0,1.0,18.0,2.0
8590,SE,3732.0,7,5.0,6.0,4.0,4.0,10.0,6.0,3.0,1.0,15.0,2.0
8591,SE,3743.0,7,4.0,5.0,7.0,6.0,8.0,6.0,3.0,1.0,44.0,2.0
8592,SE,3744.0,7,5.0,8.0,8.0,6.0,9.0,7.0,3.0,1.0,15.0,2.0


In [5]:
def missing_vals(df):
    num_cols = df.select_dtypes(['float64']).columns
    total_missing = df.isnull().sum().sort_values(ascending=False)
    percent_missing = (df.isnull().sum()/df.isnull().count()).sort_values(ascending=False)
    return pd.concat([total_missing, percent_missing], axis=1, keys=['Total', 'Percent'])

In [6]:
missing_vals(data)

Unnamed: 0,Total,Percent
agea,239,0.02781
sclact,94,0.010938
pplfair,39,0.004538
happy,31,0.003607
pplhlp,25,0.002909
partner,17,0.001978
sclmeet,15,0.001745
ppltrst,14,0.001629
gndr,10,0.001164
tvtot,8,0.000931


In [7]:
data.isnull().sum()

cntry        0
idno         0
year         0
tvtot        8
ppltrst     14
pplfair     39
pplhlp      25
happy       31
sclmeet     15
sclact      94
gndr        10
agea       239
partner     17
dtype: int64

of all the missing data, only 2.7% is missing from any one field. for this excersixe I'll just impute over those values depending on country.

In [8]:
def impute_over_by(df, common_col, v):
    # common_col is the column "str" you may want to lock onto
    # v is the column "list" of missing values 
    by = df[common_col].unique()
    for cols in v:
        for col in by:
            df.loc[df[str(common_col)] == col, cols] = df.loc[df[str(common_col)] == col, cols].fillna(
                df[df[str(common_col)] == col][cols].mean())

In [9]:
v = ["agea", "sclact", "pplfair", "happy", "pplhlp", "partner", "sclmeet", "ppltrst", "gndr", "tvtot"]
impute_over_by(data, "cntry", v)

In [10]:
data.isnull().sum()

cntry      0
idno       0
year       0
tvtot      0
ppltrst    0
pplfair    0
pplhlp     0
happy      0
sclmeet    0
sclact     0
gndr       0
agea       0
partner    0
dtype: int64

In [11]:
data.cntry.value_counts()

ES    2426
SE    1816
CH    1546
NO    1462
CZ    1316
DE      28
Name: cntry, dtype: int64

In [12]:
data.partner.value_counts()

1.000000    5276
2.000000    3301
1.426606       8
1.385537       6
1.369431       3
Name: partner, dtype: int64

### quick note on contries
es == spain

se == sweden

ch == switzerland

no == norway

cz == czech republic

de == germany but we only have 28 entries compared to the thousands of other countries. (for that reason we'll drop those entries and compare the other countries)

we also got some weird values in partner so we'll drop those observations

In [13]:
# copying data and storing id
data1 = data.copy()
IDS = data1['idno']
data1 = data1.drop(['idno'], axis=1)
data1 = data1[data1['cntry'] != 'DE']
index_names = data1[(data1['partner'] != 1) & (data1['partner'] != 2)].index
data1.drop(index_names, inplace=True)

In [14]:
data1.cntry.value_counts()

ES    2420
SE    1816
CH    1546
NO    1459
CZ    1308
Name: cntry, dtype: int64

In [15]:
data1.partner.value_counts()

1.0    5255
2.0    3294
Name: partner, dtype: int64

In [16]:
def dist_cover(df):
    num_cols = df.select_dtypes(['float64']).columns
    plt.figure(figsize=(22, 95))
    plt.subplots_adjust(hspace=1, wspace=1)
    for i, col in enumerate(num_cols):
        plt.subplot(len(num_cols), 3, i+1)
        sns.distplot(df[col], kde=True)
        plt.title(col, fontsize=20)
    plt.tight_layout()
    return

In [17]:
data1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8549 entries, 0 to 8593
Data columns (total 12 columns):
cntry      8549 non-null object
year       8549 non-null int64
tvtot      8549 non-null float64
ppltrst    8549 non-null float64
pplfair    8549 non-null float64
pplhlp     8549 non-null float64
happy      8549 non-null float64
sclmeet    8549 non-null float64
sclact     8549 non-null float64
gndr       8549 non-null float64
agea       8549 non-null float64
partner    8549 non-null float64
dtypes: float64(10), int64(1), object(1)
memory usage: 868.3+ KB


In [18]:
year_mask_2012 = data1['year'] == 6
year_mask_2014 = data1['year'] == 7

data_2012 = data1[year_mask_2012]
data_2014 = data1[year_mask_2014]

spain_mask = data1['cntry'] == 'ES'
sweden_mask = data1['cntry'] == 'SE'
switzerland_mask = data1['cntry'] == 'CH'
norway_mask = data1['cntry'] == 'NO'
czech_mask = data1['cntry'] == 'CZ'

spain2012 = data_2012[spain_mask]
spain2014 = data_2014[spain_mask]
sweden2012 = data_2012[sweden_mask]
sweden2014 = data_2014[sweden_mask]
switz2012 = data_2012[switzerland_mask]
switz2014 = data_2014[switzerland_mask]
norway2012 = data_2012[norway_mask]
norway2014 = data_2014[norway_mask]
czech2012 = data_2012[czech_mask]
czech2014 = data_2014[czech_mask]

male_mask = data_2012['gndr'] == 1
male_2012 = data_2012[male_mask]
female_2012 = data_2012[-male_mask]

solo_mask = data_2012['partner'] == 2
alone_2012 = data_2012[solo_mask]
not_alone_2012 = data_2012[-solo_mask]

  del sys.path[0]
  
  from ipykernel import kernelapp as app
  app.launch_new_instance()


### Did people become less trusting from 2012 to 2014? Compute results for each country in the sample.

cntry = country

year = year

pplfair = Most people can be trusted or you can't be too careful

H0: pplfair(year6) = pplfair(year7)

In [19]:
data1.groupby(['cntry', 'year'])['ppltrst'].describe().T

cntry,CH,CH,CZ,CZ,ES,ES,NO,NO,SE,SE
year,6,7,6,7,6,7,6,7,6,7
count,773.0,773.0,656.0,652.0,1210.0,1210.0,729.0,730.0,908.0,908.0
mean,5.677878,5.751617,4.353287,4.408579,5.113223,4.900008,6.645575,6.595375,6.058719,6.257709
std,2.130701,2.143888,2.392804,2.303178,2.186031,2.141964,1.749634,1.808995,2.053292,2.005422
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,4.0,5.0,3.0,3.0,4.0,3.0,5.0,5.0,5.0,5.0
50%,6.0,6.0,4.0,5.0,5.0,5.0,7.0,7.0,7.0,7.0
75%,7.0,7.0,6.0,6.0,7.0,6.0,8.0,8.0,8.0,8.0
max,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0


In [20]:
print('Spain', f_oneway(spain2012['ppltrst'], spain2014['ppltrst']))
print('Sweden', f_oneway(sweden2012['ppltrst'], sweden2014['ppltrst']))
print('Switzerland', f_oneway(switz2012['ppltrst'], switz2014['ppltrst']))
print('Norway', f_oneway(norway2012['ppltrst'], norway2014['ppltrst']))
print('Czech', f_oneway(czech2012['ppltrst'], czech2014['ppltrst']))

Spain F_onewayResult(statistic=5.872623703722467, pvalue=0.015451078645645377)
Sweden F_onewayResult(statistic=4.364598150674105, pvalue=0.03683218341456192)
Switzerland F_onewayResult(statistic=0.4600524426784201, pvalue=0.49770110247170185)
Norway F_onewayResult(statistic=0.29023942208729453, pvalue=0.5901494431976062)
Czech F_onewayResult(statistic=0.1812490546776069, pvalue=0.6703721600709842)


It appears only Spain and Sweden have significant changes regarding pplfair. 

The people of Spain 'ES' and the people of Sweden 'SE' became more trusting considering the results of p-values less than 0.05

### Did people become happier from 2012 to 2014? Compute results for each country in the sample.

happy = Taking all things together, how happy would you say you are?

H0: happy(year6) = happy(year7)

In [21]:
data1.groupby(['cntry', 'year'])['happy'].describe().T

cntry,CH,CH,CZ,CZ,ES,ES,NO,NO,SE,SE
year,6,7,6,7,6,7,6,7,6,7
count,773.0,773.0,656.0,652.0,1210.0,1210.0,729.0,730.0,908.0,908.0
mean,8.088366,8.116429,6.769991,6.89886,7.547508,7.422714,8.252856,7.916438,7.907409,7.946896
std,1.435124,1.405725,2.037283,1.895292,1.914102,1.868511,1.41878,1.581747,1.520975,1.403828
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,3.0
25%,7.0,7.0,5.0,6.0,7.0,7.0,8.0,7.0,7.0,7.0
50%,8.0,8.0,7.0,7.0,8.0,8.0,8.0,8.0,8.0,8.0
75%,9.0,9.0,8.0,8.0,9.0,9.0,9.0,9.0,9.0,9.0
max,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0


In [22]:
print('Spain', f_oneway(spain2012['happy'], spain2014['happy']))
print('Sweden', f_oneway(sweden2012['happy'], sweden2014['happy']))
print('Switzerland', f_oneway(switz2012['happy'], switz2014['happy']))
print('Norway', f_oneway(norway2012['happy'], norway2014['happy']))
print('Czech', f_oneway(czech2012['happy'], czech2014['happy']))

Spain F_onewayResult(statistic=2.633610227152548, pvalue=0.10475393302746475)
Sweden F_onewayResult(statistic=0.33047410717193054, pvalue=0.5654513378980963)
Switzerland F_onewayResult(statistic=0.1508478728325312, pvalue=0.6977799596182745)
Norway F_onewayResult(statistic=18.28545272372105, pvalue=2.0252146769781443e-05)
Czech F_onewayResult(statistic=1.4024538818987706, pvalue=0.23652862301689215)


The people of Norway became less happy in 2014, considering a 0.05 p-value. The other test results are insignificant.

### Who reported watching more TV in 2012, men or women?

Gender: 1 = male, 2 = female

tvtot: TV watching, total time on average weekday

In [23]:
male_2012.describe()

Unnamed: 0,year,tvtot,ppltrst,pplfair,pplhlp,happy,sclmeet,sclact,gndr,agea,partner
count,2146.0,2146.0,2146.0,2146.0,2146.0,2146.0,2146.0,2146.0,2146.0,2146.0,2146.0
mean,6.0,3.896054,5.625824,5.947273,5.290372,7.743982,5.226734,2.780306,1.0,46.960703,1.353681
std,0.0,1.979532,2.20748,2.12839,2.170728,1.719228,1.46749,0.912533,0.0,17.694042,0.478223
min,6.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,15.0,1.0
25%,6.0,2.0,4.0,5.0,4.0,7.0,4.0,2.0,1.0,33.0,1.0
50%,6.0,4.0,6.0,6.0,5.0,8.0,6.0,3.0,1.0,46.74318,1.0
75%,6.0,5.0,7.0,8.0,7.0,9.0,6.0,3.0,1.0,61.0,2.0
max,6.0,7.0,10.0,10.0,10.0,10.0,7.0,5.0,1.0,103.0,2.0


In [24]:
female_2012.describe()

Unnamed: 0,year,tvtot,ppltrst,pplfair,pplhlp,happy,sclmeet,sclact,gndr,agea,partner
count,2130.0,2130.0,2130.0,2130.0,2130.0,2130.0,2130.0,2130.0,2130.0,2130.0,2130.0
mean,6.0,3.938846,5.495154,6.044637,5.48593,7.701211,5.229295,2.698958,2.0,48.059986,1.4
std,0.0,2.047985,2.263383,2.139287,2.164137,1.804373,1.489651,0.906466,0.0,18.151017,0.490013
min,6.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,2.0,15.0,1.0
25%,6.0,2.0,4.0,5.0,4.0,7.0,4.0,2.0,2.0,34.0,1.0
50%,6.0,4.0,6.0,6.0,6.0,8.0,6.0,3.0,2.0,48.0,1.0
75%,6.0,6.0,7.0,8.0,7.0,9.0,6.0,3.0,2.0,61.75,2.0
max,6.0,7.0,10.0,10.0,10.0,10.0,7.0,5.0,2.0,97.0,2.0


In [25]:
print('Tv Test', f_oneway(male_2012['tvtot'], female_2012['tvtot']))

Tv Test F_onewayResult(statistic=0.4826287776023437, pvalue=0.4872717578758936)


Men and Woman reported watching statistically simialar hours of tv in 2012.

### Who was more likely to believe people were fair in 2012, people living with a partner or people living alone?

In [26]:
alone_2012.describe()

Unnamed: 0,year,tvtot,ppltrst,pplfair,pplhlp,happy,sclmeet,sclact,gndr,agea,partner
count,1611.0,1611.0,1611.0,1611.0,1611.0,1611.0,1611.0,1611.0,1611.0,1611.0,1611.0
mean,6.0,3.852779,5.396482,5.860766,5.248211,7.321911,5.514306,2.771953,1.528864,42.326163,2.0
std,0.0,2.129613,2.279474,2.158397,2.196962,1.917382,1.464927,0.94156,0.499321,21.263494,0.0
min,6.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,15.0,2.0
25%,6.0,2.0,4.0,5.0,4.0,7.0,5.0,2.0,1.0,22.0,2.0
50%,6.0,4.0,6.0,6.0,5.0,8.0,6.0,3.0,2.0,39.0,2.0
75%,6.0,6.0,7.0,7.0,7.0,9.0,7.0,3.0,2.0,59.0,2.0
max,6.0,7.0,10.0,10.0,10.0,10.0,7.0,5.0,2.0,103.0,2.0


In [27]:
not_alone_2012.describe()

Unnamed: 0,year,tvtot,ppltrst,pplfair,pplhlp,happy,sclmeet,sclact,gndr,agea,partner
count,2665.0,2665.0,2665.0,2665.0,2665.0,2665.0,2665.0,2665.0,2665.0,2665.0,2665.0
mean,6.0,3.956415,5.660024,6.077385,5.472158,7.96494,5.054943,2.720339,1.47955,50.640897,1.0
std,0.0,1.939808,2.204122,2.115558,2.148586,1.61397,1.459765,0.890518,0.499675,14.715924,0.0
min,6.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,19.0,1.0
25%,6.0,2.0,4.0,5.0,4.0,7.0,4.0,2.0,1.0,39.0,1.0
50%,6.0,4.0,6.0,7.0,5.0,8.0,5.0,3.0,1.0,49.040046,1.0
75%,6.0,5.0,7.0,8.0,7.0,9.0,6.0,3.0,2.0,62.0,1.0
max,6.0,7.0,10.0,10.0,10.0,10.0,7.0,5.0,2.0,95.0,1.0


In [28]:
print('Trust/Partner Test', f_oneway(alone_2012['pplfair'], not_alone_2012['pplfair']))

Trust/Partner Test F_onewayResult(statistic=10.367130723932021, pvalue=0.0012923877013200893)


People living with a partner on average trusted others more than those living alone, considering a p-value result less than 0.01.

### Pick three or four of the countries in the sample and compare how often people met socially in 2014. Are there differences, and if so, which countries stand out?

sclmeet : How often socially meet with friends, relatives or colleagues



### cntry key
es == spain

se == sweden

ch == switzerland

no == norway

cz == czech republic

In [29]:
sclmeet = np.asarray(spain2014['sclmeet'].sample(600).tolist() +
                     czech2014['sclmeet'].sample(600).tolist() +
                     switz2014['sclmeet'].sample(600).tolist())

cntry = np.array(['Spain', 'Czech Republic', 'Switzerland'])
cntry = np.repeat(cntry, 600)

tukey = pairwise_tukeyhsd(endog=sclmeet,      # Data
                          groups=cntry,       # Groups
                          alpha=0.05)         # Significance level

tukey.summary()  

group1,group2,meandiff,p-adj,lower,upper,reject
Czech Republic,Spain,0.8736,0.001,0.6827,1.0644,True
Czech Republic,Switzerland,0.665,0.001,0.4742,0.8559,True
Spain,Switzerland,-0.2085,0.0282,-0.3994,-0.0177,True


These results make sense considering the mean for sclmeet between Spain and Switzerland is small so we dont have the evidence to reject the null. Between Czech & Spain and Czech & Switzerland, the meandiff is quite a bit larger so there is evidence to reject the null and conclude that how often people meet up between those countries are signfantly differnet. 

### Pick three or four of the countries in the sample and compare how often people took part in social activities, relative to others their age, in 2014. Are there differences, and if so, which countries stand out?

In [30]:
def age_bracket(df):
    if (df['agea'] > 0) and (df['agea'] < 11):
        return 'child'
    elif (df['agea'] > 10) and (df['agea'] < 21):
        return 'adolscent'
    elif (df['agea'] > 20) and (df['agea'] < 31):
        return 'young adult'
    elif (df['agea'] > 30) and (df['agea'] < 51):
        return 'middle aged adult'
    elif (df['agea'] > 50) and (df['agea'] < 70):
        return 'mature adult'
    else:
        return 'senior'

In [31]:
data_2014['age_label'] = data_2014.apply(age_bracket, axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [32]:
sweden_mask1 = data_2014['cntry'] == 'SE'
switzerland_mask1 = data_2014['cntry'] == 'CH'
norway_mask1 = data_2014['cntry'] == 'NO'

In [33]:
swed_2014 = data_2014[sweden_mask1]
switz_2014 = data_2014[switzerland_mask1]
nor_2014 = data_2014[norway_mask1]

In [34]:
nor_age_mask_mad = nor_2014['age_label'] == 'middle aged adult'
nor_age_mask_ma = nor_2014['age_label'] == 'mature adult'
nor_age_mask_ya = nor_2014['age_label'] == 'young adult'
nor_age_mask_s = nor_2014['age_label'] == 'senior'
nor_age_mask_a = nor_2014['age_label'] == 'adolscent'

switz_age_mask_mad = switz_2014['age_label'] == 'middle aged adult'
switz_age_mask_ma = switz_2014['age_label'] == 'mature adult'
switz_age_mask_ya = switz_2014['age_label'] == 'young adult'
switz_age_mask_s = switz_2014['age_label'] == 'senior'
switz_age_mask_a = switz_2014['age_label'] == 'adolscent'

swed_age_mask_mad = swed_2014['age_label'] == 'middle aged adult'
swed_age_mask_ma = swed_2014['age_label'] == 'mature adult'
swed_age_mask_ya = swed_2014['age_label'] == 'young adult'
swed_age_mask_s = swed_2014['age_label'] == 'senior'
swed_age_mask_a = swed_2014['age_label'] == 'adolscent'

In [35]:
nor_mad = nor_2014[nor_age_mask_mad]
nor_ma = nor_2014[nor_age_mask_ma]
nor_ya = nor_2014[nor_age_mask_ya]
nor_s = nor_2014[nor_age_mask_s]
nor_a = nor_2014[nor_age_mask_a]

switz_mad = switz_2014[switz_age_mask_mad]
switz_ma = switz_2014[switz_age_mask_ma]
switz_ya = switz_2014[switz_age_mask_ya]
switz_s = switz_2014[switz_age_mask_s]
switz_a = switz_2014[switz_age_mask_a]

swed_mad = swed_2014[swed_age_mask_mad]
swed_ma = swed_2014[swed_age_mask_ma]
swed_ya = swed_2014[swed_age_mask_ya]
swed_s = swed_2014[swed_age_mask_s]
swed_a = swed_2014[swed_age_mask_a]

In [36]:
sclact = np.asarray(swed_mad['sclact'].sample(71).tolist() +
                    swed_ma['sclact'].sample(71).tolist() +
                    swed_ya['sclact'].sample(71).tolist() +
                    swed_s['sclact'].sample(71).tolist() +
                    swed_a['sclact'].sample(71).tolist() +
                    switz_mad['sclact'].sample(71).tolist() +
                    switz_ma['sclact'].sample(71).tolist() +
                    switz_ya['sclact'].sample(71).tolist() +
                    switz_s['sclact'].sample(71).tolist() +
                    switz_a['sclact'].sample(71).tolist() +
                    nor_mad['sclact'].sample(71).tolist() +
                    nor_ma['sclact'].sample(71).tolist() +
                    nor_ya['sclact'].sample(71).tolist() +
                    nor_s['sclact'].sample(71).tolist() +
                    nor_a['sclact'].sample(71).tolist())

ages = np.array(['Sweden - Middle', 'Sweden - Mature', 'Sweden - Y-Adult', 'Sweden - Senoir', 'Sweden - Adolescent',
                  'Switzerland - Middle', 'Switzerland - Mature', 'Switzerland - Y-Adult', 'Switzerland - Senoir', 'Switzerland - Adolescent',
                  'Norway - Middle', 'Norway - Mature', 'Norway - Y-Adult', 'Norway - Senoir', 'Norway - Adolescent'])
ages = np.repeat(ages, 71)

tukey = pairwise_tukeyhsd(endog=sclact,       # Data
                          groups=ages,        # Groups
                          alpha=0.05)         # Significance level
tukey.summary()  

group1,group2,meandiff,p-adj,lower,upper,reject
Norway - Adolescent,Norway - Mature,-0.0986,0.9,-0.5922,0.395,False
Norway - Adolescent,Norway - Middle,-0.1972,0.9,-0.6908,0.2964,False
Norway - Adolescent,Norway - Senoir,-0.0704,0.9,-0.564,0.4232,False
Norway - Adolescent,Norway - Y-Adult,0.0,0.9,-0.4936,0.4936,False
Norway - Adolescent,Sweden - Adolescent,-0.0423,0.9,-0.5358,0.4513,False
Norway - Adolescent,Sweden - Mature,-0.1549,0.9,-0.6485,0.3387,False
Norway - Adolescent,Sweden - Middle,-0.3521,0.4971,-0.8457,0.1415,False
Norway - Adolescent,Sweden - Senoir,-0.0141,0.9,-0.5077,0.4795,False
Norway - Adolescent,Sweden - Y-Adult,-0.0563,0.9,-0.5499,0.4372,False
Norway - Adolescent,Switzerland - Adolescent,-0.0845,0.9,-0.5781,0.4091,False


It comes to me as a major surprise to me that there does not appear to be a significant difference in people taking part in social activities across three different countries considering 5 separate age groups.