<a href="https://colab.research.google.com/github/MMRES-PyBootcamp/MMRES-python-bootcamp2022/blob/main/11_misophonia_II.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Session 11 - Misophonia (second part)
When reporting the results of a study, we first describe the variables of interest in tables and figures.

We describe demographics (sex, age, marital status, etc..)

We describe outcome variables (misophonia)

We describe explanatory variables (cephalometric measures, anxiety, depression)

We then test the main hypotheses of the study.

We state the main relationships we want to study and formulate the statistical hypothesis (Introduction)

We describe how the study was performed and the statistical methods to test the hypothesis (Methods)

We describe the results of the hypothesis tests with statistics, and significance measures.

We illustrate the results with figures.




Are the state and trait correlated?

<div class="alert alert-block alert-success"><b>Practice:</b> 
Imagine we want to study the anxiety of participants in the misophonia study. We formulate the following hypothesis:

> Participants who enrolled in the study had an increased level of anxiety from their baseline (trait) that is related to their:
<ul>
  <li> age
  <li> sex
  <li> misophonia state.
</ul>
</div>

**This document is devised as a tool to enable your self-learning process. If you get stuck at some step or need any kind of help, please don't hesitate to raise your hand and ask for the teacher's guidance.**

---

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from scipy import stats



## Data loading

Let's begin again by loading Pandas with the `pd` alias and by importing the misophonia dataset `misophonia_data.xlsx` from the `/MMRES-python-bootcamp2024/datasets` sub-folder:

In [None]:
# Load package with its corresponding alias
import pandas as pd

# Reading an Excel SpreadSheet and storing it in as a DataFrame called `df`
# df = pd.read_excel('https://github.com/MMRES-PyBootcamp/MMRES-python-bootcamp2022/blob/main/datasets/misophoinia_data.xlsx?raw=true')
df = pd.read_excel('https://github.com/MMRES-PyBootcamp/MMRES-python-bootcamp2024/raw/master/datasets/misophonia_data.xlsx')

# Return the DataFrame
df

## Data description

Here is the description of the variables

[1] “Misofonia”: Binary (si: misophinic, no: no misophinic)

[2] “Misofonia.dic”: Categorical (0: no misophinic, 1: severity 1, 2: severity 2, 3: severity 3, 4: severity 4)

[3] “Estado”: Marital status (casado: married, soltero: single, viuda: widow, divorciado:divorced)

[4] “Estado.dic”: Numeric Marital status

[5] “ansiedad.rasgo”: Score from 0-100 with anxiety personality trait

[6] “ansiedad.rasgo.dic”: Binary score (0,1) of anxiety personality trait

[7] “ansiedad.estado”: Score from 0-100 with current state of anxiety

[8] “ansiedad.estado.dic”: Binary score (0,1) with current state of anxiety

[9] “ansiedad.medicada”: Diagnosed with anxiety disorder (si, no)

[10] “ansiedad.medicada.dic”: Diagnosed with anxiety disorder (1, 0)

[11] “depresion”: Score from 0-50 with current state of depression

[12] “depresion.dic” : Binary score (0,1) with current state of depression

[13] “Sexo”: Male=H, Female:M

[14] “Edad”: Age

[15] “CLASE”: Type of jaw

[16] “Angulo_convexidad”: convexity angle

[17] “protusion.mandibular”: Projection of the jaw [18] “Angulo_cuelloYtercio”: angle between jaw and neck [19] “Subnasal_H”: Nasal angle

[20] “cambio.autoconcepto”: Whether people changed their self-concept after treatment.

[21] “Misofonia.post”: Misophionia diagnosed (A-MISO) after an educational program, where patients were made aware of a condition called misophonia.

[22] “Misofonia.pre”: Misophionia diagnosed (A-MISO) before an educational program, where patients were made aware of a condition called misophonia

[23] “ansiedad.dif”: Difference between anxiety state and anxiety trait scores

<br><br>

When reporting the results of a study, we first describe the variables of interest in tables and figures.

We describe demographics (sex, age, marital status, etc..)

We describe outcome variables (misophonia)

## 1. Correlation

Are the state and trait correlated?

In [None]:
sns.scatterplot(df['ansiedad.estado'],df['ansiedad.rasgo'])

In [None]:
sns.regplot(df['ansiedad.estado'],df['ansiedad.rasgo'])

In [None]:
df['ansiedad.estado'].corr(df['ansiedad.rasgo'],method='pearson')

In [None]:
# what happens here?
stats.linregress(df['ansiedad.estado'],df['ansiedad.rasgo'])

In [None]:
# Let's remove NA values
mask = ~np.isnan(df['ansiedad.estado']) & ~np.isnan(df['ansiedad.rasgo'])
stats.linregress(df[mask]['ansiedad.estado'],df[mask]['ansiedad.rasgo'])

In [None]:
slope, intercept, r_value, pv, se = stats.linregress(df[mask]['ansiedad.estado'],df[mask]['ansiedad.rasgo'])
sns.regplot(x="ansiedad.estado", y="ansiedad.rasgo", data=df[mask], 
      ci=None, label="y={0:.2f}x+{1:.2f}".format(slope, intercept)).legend(loc="best")

## Student’s t-test
We are interested in the variable misofonia.dif, that is the observed excess of anxiety from the trait (excess=state−trait)

Is excess in anxiety higher than 0?

In [None]:
df['excess']=df['ansiedad.estado']-df['ansiedad.rasgo']
sns.distplot(df['excess'])
plt.axvline(0,color='r')

In [None]:
df['excess'].describe()

scipy.stats.ttest_1samp() tests if the population mean of data is likely to be equal to a given value (technically if observations are drawn from a Gaussian distributions of given population mean). It returns the T statistic, and the p-value:

In [None]:
res=stats.ttest_1samp(df['excess'], 0) 
print (res[0])

# What's wrong?

In [None]:
# let's remove missing values
res=stats.ttest_1samp(df['excess'].dropna(), 0) 
print (res)


In [None]:
# or let's just ignore missing values
res=stats.ttest_1samp(df['excess'], 0,nan_policy='omit') 
print (res)

We do not see significant large values of the difference in anxiety; Enrollment in the study does not seem to detect individuals with an excess of anxiety.

Is excess in anxiety higher than 0 for men and women separately?
We first describe the conditional distributions



In [None]:
sns.boxplot(data=df, y='excess',x='Sexo')


We perform the hypothesis test for each sex separately


In [None]:
# Males
res=stats.ttest_1samp(df[df['Sexo']=='H']['excess'], 0,nan_policy='omit') 
print (res)


In [None]:
# Females
res=stats.ttest_1samp(df[df['Sexo']=='M']['excess'], 0,nan_policy='omit') 
print (res)

## 2-sample t-test: testing for difference across populations

We see that women (M) have a reduction in the excess of anxiety (almost significant), while men (H) had an increase (no significant). Why? perhaps because females tend to consult doctors before men do.

Is the excess of anxiety significantly different between the sexes?

To test if this is significant, we do a 2-sample t-test with scipy.stats.ttest_ind():

In [None]:
female_excess = df[df['Sexo']=='M']['excess']
male_excess = df[df['Sexo']=='H']['excess']
stats.ttest_ind(female_excess, male_excess) 

We see that the difference between the group means is within the limit of significance with women having less excess anxiety than men.


In [None]:
sns.distplot(female_excess,hist=False,color='purple')
sns.distplot(male_excess,hist=False,color='blue')


## Lineal model
Given two sets of observations, sex and excess of anxiety, we want to test the hypothesis that excess of anxiety is a linear function of sex.



In [None]:
from statsmodels.formula.api import ols
model = ols("excess ~ Sexo", df).fit()
print(model.summary())


Is excess in anxiety higher in older people?


In [None]:
sns.regplot(data=df, x='Edad', y='excess')

In [None]:
# We fit the regression model
model = ols("excess ~ Edad", df).fit()
print(model.summary())

The association, while positive it is not significant. What happens if we adjust by sex?



In [None]:
model = ols("excess ~ Edad + Sexo", df).fit()
print(model.summary())

If we adjust by sex the association is a bit stronger but still not significant.

In [None]:
sns.lmplot(data=df, x='Edad', y='excess', hue='Sexo')

Is excess in anxiety different between monophonic grades?


In [None]:
sns.boxplot(data=df, y='excess',x= 'Misofonia.dic')

We test the hypothesizes H0: means excess of anxiety are equal accross misophonic categories against H1: at least one of them is different

In [None]:
model = ols("excess ~ Misofonia.dic", df).fit()
print(model.summary())

In [None]:
# what happened here!!??
# OLS has some requirements regarding variable names... it is a bit picky...

# let's rename this variable:
df=df.rename(columns={'Misofonia.dic':'misophonic'})

# or alternatively you can just create an additional column with a different name
#df['misophonic']=df['Misophonia.dic']

model = ols("excess ~ misophonic", df).fit()
print(model.summary())

Notice that misophonic variable was treated as a quantitative trait and not as a categorical factor

In [None]:
df['misophonic']=df['misophonic'].astype('category')
model = ols("excess ~ misophonic", df).fit()
print(model.summary())

In [None]:
from statsmodels.stats import anova
anova.anova_lm(model)

We see that anxiety excess of misophonia grade 1 is significantly higher than misophonia grade 0 (no misophonia), as it is grade 3. The ANOVA table shows that we accept the alternative hypothesis, where the differences between groups are significantly higher than within groups.

Are the differences in excess in anxiety between monophonic grades modulated by sex?


In [None]:
sns.boxplot(data=df, y='excess',x='misophonic',hue='Sexo')
#boxplot(excess ~ misophonic, subset = which(sex=="H"))

In [None]:
model = ols("excess ~ misophonic * Sexo", df).fit()
print(model.summary())
#anova(mod)

In [None]:
anova.anova_lm(model)

We do not see a significant interaction (modulation) of the effect of sex on the group differences.

We cannot say that the profiles of anxiety excess across misophonia grades are different between sexes.