# Two-way Independent Groups Factorial ANOVA (no significant interaction)

ANOVA designs take categorical (grouping) independent variables (IV) and are used to asses if the different categories/ groups of the IV differ signficantly in their mean scores on some scale (continuous or discrete) dependent variable (DV). 

In previous notebooks I have demonstrated the different types of one-way ANOVA analyses that can be conducted. One-way designs take one IV with multiple groups (or levels) and compare those groups on their mean scores on the DV. In Factorial ANOVA designs we still have one scale DV but we can add more categorical IVs, containing the same participants, to investigate how the groups in those IVs individually differ in terms of their scores on the DV (so-called main effects) and also if scores on the DV are influenced by different combinations of the IV categories (so-called interaction effects). 

In this notebook I will demonstrate how to run a two-way independent groups factorial ANOVA. The DV used will be a scale measure of participants scores on a questionnaire evaluating the extent to which they worry about crime. The IVs used in the design will be categorical variables of 'Sex' consisting of two groups, whether the participant was Male or Female, and anxiety level, which was a categorical variable placing the participant into one of three groups, whether they were low, medium or high in terms of their anxiety level. 

I will work through the different stages of the analysis, starting by testing for homogeneity of variance using Levene's test; then conducting the factorial ANOVA and interpreting the results of both main effects and the interaction between the IVs; after which, if any of the ANOVA results are significant, I will follow these up with post-hoc tests and tests of simple effects if required. 

In [1]:
# Importing the key software libraries that will be used. 

import pandas as pd
import numpy as np
import scipy.stats as stats
import seaborn as sns
import matplotlib.pyplot as plt
import pingouin as pg

In [2]:
# Importing the dataset for analysis.

crime_df = pd.read_csv('fearofcrime.csv')

crime_df.head()

Unnamed: 0,sex,anxlevel,stress,totalworry,construct
0,2,2,1.3,3.0375,3.04878048780488
1,2,2,2.1,3.21875,2.95121951219512
2,1,3,1.95,2.025,3.29268292682927
3,2,2,2.1,1.80625,2.19512195121951
4,2,2,2.05,2.5625,2.80487804878049


The data saved in the csv for this dataset only has numbers (1 and 2) to represent the sex categorical variable, and numbers (1, 2 and 3) to represent the three anxiety level groups. It would be helpful to give labels to these numbers so they are more meaningful and easier to interpret when we conduct the analysis. In this case, the sex variable numbers represent the following codes: 1 = Male, 2 = Female, and the anxiety level variable (anxlevel) numbers represent the following codes: 1 = Low, 2 = Medium, 3 = High. 

I will create two extra columns on the dataframe that maps these numbers to their respective labels. 

In [3]:
# At present all variables are shown as objects rather than floats or integers. 
# These will need changing in order for the data to have the correct types and be in the right format for analysis. 
crime_df.dtypes

sex           object
anxlevel      object
stress        object
totalworry    object
construct     object
dtype: object

In [4]:
crime_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 235 entries, 0 to 234
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   sex         235 non-null    object
 1   anxlevel    235 non-null    object
 2   stress      235 non-null    object
 3   totalworry  235 non-null    object
 4   construct   235 non-null    object
dtypes: object(5)
memory usage: 9.3+ KB


In [5]:
# First, converting all variables to numeric using the pandas to_numeric method. 
crime_df = crime_df.apply(pd.to_numeric, errors = 'coerce')

crime_df.dtypes

sex           float64
anxlevel      float64
stress        float64
totalworry    float64
construct     float64
dtype: object

In [6]:
# Now that these values have been converted to numeric values we should be able to see any that were missing
# as they will have been converted to NaN by errors = coerce

crime_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 235 entries, 0 to 234
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   sex         234 non-null    float64
 1   anxlevel    228 non-null    float64
 2   stress      234 non-null    float64
 3   totalworry  228 non-null    float64
 4   construct   222 non-null    float64
dtypes: float64(5)
memory usage: 9.3 KB


In [7]:
# Dropping missing values

crime_df2 = crime_df.dropna()

In [8]:
# We are now left with a dataframe of 209 participants/ rows. 
crime_df2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 209 entries, 0 to 234
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   sex         209 non-null    float64
 1   anxlevel    209 non-null    float64
 2   stress      209 non-null    float64
 3   totalworry  209 non-null    float64
 4   construct   209 non-null    float64
dtypes: float64(5)
memory usage: 9.8 KB


In [9]:
# Using the series.map function to add labels to the sex categorical variable.

a = [1, 2]
b = ["male", "female"]

crime_df2['sex_cat'] = crime_df2['sex'].map(dict(zip(a, b)))


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  crime_df2['sex_cat'] = crime_df2['sex'].map(dict(zip(a, b)))


In [10]:
# Using the series.map function to add labels to the anxlevel categorical variable.

a = [1, 2, 3]
b = ["low", "medium", "high"]

crime_df2['anx_cat'] = crime_df2['anxlevel'].map(dict(zip(a, b)))

crime_df2.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  crime_df2['anx_cat'] = crime_df2['anxlevel'].map(dict(zip(a, b)))


Unnamed: 0,sex,anxlevel,stress,totalworry,construct,sex_cat,anx_cat
0,2.0,2.0,1.3,3.0375,3.04878,female,medium
1,2.0,2.0,2.1,3.21875,2.95122,female,medium
2,1.0,3.0,1.95,2.025,3.292683,male,high
3,2.0,2.0,2.1,1.80625,2.195122,female,medium
4,2.0,2.0,2.05,2.5625,2.804878,female,medium


In [11]:
crime_df2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 209 entries, 0 to 234
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   sex         209 non-null    float64
 1   anxlevel    209 non-null    float64
 2   stress      209 non-null    float64
 3   totalworry  209 non-null    float64
 4   construct   209 non-null    float64
 5   sex_cat     209 non-null    object 
 6   anx_cat     209 non-null    object 
dtypes: float64(5), object(2)
memory usage: 13.1+ KB


With the data cleaned and consisting of the correct data types we can now start the analysis. Firstly, doing a test of assumptions using Levene's test. 

### Test for homogeneity of variance

Here is will conduct Levene's test using the homoscedasticity method from pingouin. 

In [12]:
pg.homoscedasticity(crime_df2, dv = 'totalworry', group = 'sex_cat', center = 'mean')

Unnamed: 0,W,pval,equal_var
levene,2.309495,0.130112,True


In [13]:
pg.homoscedasticity(crime_df2, dv = 'totalworry', group = 'anx_cat', center = 'mean')

Unnamed: 0,W,pval,equal_var
levene,0.729261,0.483506,True


We can see that the two Levene's tests, conducted on each of the IVs, are not statistically significant. This tells us we are probably safe to assume equal variances for the groups in the dataset. 

We could report these results as follows:

Levene's test assessing equality of variance for the sex IV was non-significant (F(1, 207) = 2.31, p = 0.13) and the Levene's test on the anxiety level IV was also non-significant (F(2, 206) = 0.73, p = 0.48). 

### Two-way Factorial ANOVA

Assuming equal variances for all analyses, we can now conduct and interpret the two-way IG ANOVA. 

Here I am using the pingouin anova method. 

In [14]:
pg.anova(crime_df2, dv = 'totalworry', between = ['sex_cat', 'anx_cat']).round(3)

Unnamed: 0,Source,SS,DF,MS,F,p-unc,np2
0,sex_cat,11.862,1.0,11.862,31.332,0.0,0.134
1,anx_cat,2.164,2.0,1.082,2.858,0.06,0.027
2,sex_cat * anx_cat,0.374,2.0,0.187,0.494,0.611,0.005
3,Residual,76.852,203.0,0.379,,,


The above results show that we have a significant main effect of sex (F(1, 203) = 31.33, p < 0.001), a non-significant main effect of anxiety level (F(2, 203) = 2.86, p = 0.06), and a non-significant interaction between sex and anxiety (F(2, 203) = 0.49, p = 0.61). 

As sex has only two levels (male and female) we don't need to conduct post-hoc tests to assess where the significant differences are. All we need to do is obtain descriptive statistics for each sex category individually. We can then compare the mean scores for the two and see which had the highest level of total worry about crime. 

In [15]:
male_mean = crime_df2.loc[crime_df2.sex_cat == 'male']['totalworry']
female_mean = crime_df2.loc[crime_df2.sex_cat == 'female']['totalworry']

In [16]:
print(f"Male Mean Worry: {male_mean.mean():.2f}")
print(f"Female Mean Worry: {female_mean.mean():.2f}")

Male Mean Worry: 1.73
Female Mean Worry: 2.22


Comparing the mean worry scores by sex we can see that Females (Mean = 2.22) had significantly higher worry scores than Males (Mean = 1.73) in relation to crime. 