<a href="https://colab.research.google.com/github/MMRES-PyBootcamp/MMRES-python-bootcamp2022/blob/main/07_misophonia_I.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Session 7 - Misophonia (First part)
> Misophonia is a recently described neurological condition whereby patients feel strong anxiety when hearing particular noises (someone blowing their nose, mobile ringing, trains passing, etc..). It is believed that 5% of the population suffers from this condition without knowing it, likely blaming their anxiety on other causes.

The misophonia dataset is from a recent (unpublished) study that aimed to describe the relationships between misophonia and anxiety, depression, and cephalometric measures (shape of the jaw).

<div class="alert alert-block alert-success"><b>Practice:</b> Practice cells announce exercises that you should try during the current boot camp session.
</div>

<div class="alert alert-block alert-warning"><b>Extension:</b> Extension cells correspond to exercises (or links to contents) that are a bit more advanced. We recommend to try them after the current boot camp session.
</div>

<div class="alert alert-block alert-info"><b>Tip:</b> Tip cells just give some advice or complementary information.
</div>

<div class="alert alert-block alert-danger"><b>Caveat:</b> Caveat cells warn you about the most common pitfalls one founds when starts his/her path learning Python.

</div>

**This document is devised as a tool to enable your self-learning process. If you get stuck at some step or need any kind of help, please don't hesitate to raise your hand and ask for the teacher's guidance.**

---

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

## Data loading

Let's begin again by loading Pandas with the `pd` alias and by importing the misophonia dataset `misophonia_data.xlsx` from the `/MMRES-python-bootcamp2022/datasets` sub-folder:

In [None]:
# Load package with its corresponding alias
import pandas as pd

# Reading an Excel SpreadSheet and storing it in as a DataFrame called `df`
# df = pd.read_excel('https://github.com/MMRES-PyBootcamp/MMRES-python-bootcamp2022/blob/main/datasets/misophoinia_data.xlsx?raw=true')
df = pd.read_excel('https://github.com/MMRES-PyBootcamp/MMRES-python-bootcamp2024/raw/master/datasets/misophonia_data.xlsx')

# Return the DataFrame
df

<div class="alert alert-block alert-warning"><b>Extension:</b>

We can load data from different file formats. Export the current dataset as a *.csv file, or as a *.tsv file and load it again using the proper function. You will have to find them in the documentation:

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html

https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html
</div>

<div class="alert alert-block alert-info"><b>Tip:</b> 
Uncomment the following cell to have an example
</div> 

In [None]:

#df = pd.read_csv('https://raw.githubusercontent.com/MMRES-PyBootcamp/MMRES-python-bootcamp2022/main/datasets/misophoinia_data.mod.csv', sep=',', na_values="NA")
#df


### Data description

Here is the description of the variables

[1] “Misofonia”: Binary (si: misophinic, no: no misophinic)

[2] “Misofonia.dic”: Categorical (0: no misophinic, 1: severity 1, 2: severity 2, 3: severity 3, 4: severity 4)

[3] “Estado”: Marital status (casado: married, soltero: single, viuda: widow, divorciado:divorced)

[4] “Estado.dic”: Numeric Marital status

[5] “ansiedad.rasgo”: Score from 0-100 with anxiety personality trait

[6] “ansiedad.rasgo.dic”: Binary score (0,1) of anxiety personality trait

[7] “ansiedad.estado”: Score from 0-100 with current state of anxiety

[8] “ansiedad.estado.dic”: Binary score (0,1) with current state of anxiety

[9] “ansiedad.medicada”: Diagnosed with anxiety disorder (si, no)

[10] “ansiedad.medicada.dic”: Diagnosed with anxiety disorder (1, 0)

[11] “depresion”: Score from 0-50 with current state of depression

[12] “depresion.dic” : Binary score (0,1) with current state of depression

[13] “Sexo”: Male=H, Female:M

[14] “Edad”: Age

[15] “CLASE”: Type of jaw

[16] “Angulo_convexidad”: convexity angle

[17] “protusion.mandibular”: Projection of the jaw [18] “Angulo_cuelloYtercio”: angle between jaw and neck [19] “Subnasal_H”: Nasal angle

[20] “cambio.autoconcepto”: Whether people changed their self-concept after treatment.

[21] “Misofonia.post”: Misophionia diagnosed (A-MISO) after an educational program, where patients were made aware of a condition called misophonia.

[22] “Misofonia.pre”: Misophionia diagnosed (A-MISO) before an educational program, where patients were made aware of a condition called misophonia

[23] “ansiedad.dif”: Difference between anxiety state and anxiety trait scores

<br><br>

When reporting the results of a study, we first describe the variables of interest in tables and figures.

We describe demographics (sex, age, marital status, etc..)

We describe outcome variables (misophonia)

### 1. Descriptive statistics of explanatory variables


We describe explanatory variables such as cephalometric measures, anxiety, depression, etc.


<div class="alert alert-block alert-success"><b>Practice:</b>

Imagine we want to study the anxiety of participants in the misophonia study. Once loaded the data, describe the participants’ sex, age and marital status. 
</div>


In [None]:
# Count the numnber of occurences of each value in a categorical variable
from collections import Counter
print(Counter(df['Sexo']))


In [None]:

# this is the same using pandas
df.groupby('Sexo').size()


In [None]:
# this way we can get the percentages
df['Sexo'].value_counts(normalize=True) * 100

In [None]:
# plot the percentage using matplotlib

fig, ax = plt.subplots()
ax.bar([0,1],df['Sexo'].value_counts(normalize=True) * 100)
ax.set_xticks([0,1])
ax.set_xticklabels(['H','M'])
ax.set_ylabel('Percentage')

In [None]:
# plot the counts using seaborn
fig, ax = plt.subplots()
sns.countplot(data=df,x='Sexo')

In [None]:
#mean with pandas
df['Edad'].mean()

In [None]:
#mean with numpy
np.mean(df['Edad'])

In [None]:
#standard deviation with numpy
np.std(df['Edad'])

In [None]:
#standard deviation with pandas
df['Edad'].std()

In [None]:
df['Edad'].describe()


<div class="alert alert-block alert-success"><b>Practice:</b>

Try to obtain the descriptive statistics shown above with the individual functions of numpy and/or pandas.
 
</div>

In [None]:
# histogram
sns.distplot(df['Edad'])

In [None]:
sns.boxplot(y=df['Edad'])

In [None]:
fig, ax = plt.subplots()
fig.set_size_inches(3, 5)
sns.boxplot(data=df, y='Edad',ax=ax)

In [None]:
# Age by sex
maleage = df[df['Sexo']=="H"]['Edad']
femaleage = df[df['Sexo']=="M"]['Edad']
print('male age\n',maleage.describe())
print('female age\n',femaleage.describe())

In [None]:
sns.boxplot(data=df,y='Edad', x='Sexo')

In [None]:
# Marital status:
maritaldf=df['Estado'].value_counts(normalize=True) * 100
maritaldf

In [None]:
plt.pie(maritaldf,labels=maritaldf.index)

### 2. Descriptive statistics of clinical outcome

We have four measures of anxiety:
<ul>
  <li> ansiedad.rasgo (are you an anxious person?) continuous:0-100 </li>
  <li> ansiedad.estado (are you currently feeling anxious?) continuous:0-100 </li>
  <li> ansiedad.medicada (have you been diagnosed with an anxiety disorder?) binary (si, no) </li>
  <li> ansiedad.dif (difference between ansiedad.estado and ansiedad.rasgo) </li>
</ul>

#### Anxiety trait 
are you an anxious person?
continuous:0-100

In [None]:
df['ansiedad.rasgo'].describe()

In [None]:
sns.distplot(df['ansiedad.rasgo'])

In [None]:
fig, ax = plt.subplots()
fig.set_size_inches(3, 5)
sns.boxplot(data=df,y='ansiedad.rasgo',ax=ax)

#### Anxiety state
are you currently feeling anxious? 
*continuous*:0-100

In [None]:
df['ansiedad.estado'].describe()

In [None]:
sns.distplot(df['ansiedad.estado'])

In [None]:
fig, ax = plt.subplots()
fig.set_size_inches(3, 5)
sns.boxplot(data=df,y='ansiedad.estado',ax=ax)

#### Diagnosed
have you been diagnosed with an anxiety disorder? binary (si, no)


In [None]:
# plot the counts using seaborn
fig, ax = plt.subplots()
sns.countplot(data=df,x='ansiedad.medicada')

#### Relationships between outcomes

In [None]:
fig, ax = plt.subplots()
ax.scatter(df['ansiedad.rasgo'],df['ansiedad.estado'])
ax.set_xlabel('ansiedad.rasgo')
ax.set_ylabel('ansiedad.estado')


In [None]:
sns.regplot(data=df,x='ansiedad.rasgo',y='ansiedad.estado')

In [None]:
sns.relplot(data=df,x='ansiedad.rasgo',y='ansiedad.estado',hue='ansiedad.medicada')

#### Relationships between explanatory and outcome variables

In [None]:
# Trait by sex
fig, ax = plt.subplots()
fig.set_size_inches(3, 5)
sns.boxplot(data=df, x='ansiedad.medicada', y='Edad',ax=ax)

In [None]:
# State by sex
fig, ax = plt.subplots()
fig.set_size_inches(3, 5)
sns.boxplot(data=df, x='Sexo', y='ansiedad.estado', ax=ax)

In [None]:
# Diagnosed by sex
# State by sex
fig, ax = plt.subplots()
fig.set_size_inches(3, 5)
sns.boxplot(data=df, x='Sexo', y='ansiedad.estado', hue='ansiedad.medicada', ax=ax)

In [None]:
# plot the percentages using seaborn

x, y, hue = "ansiedad.medicada", "proportion", "Sexo"
hue_order = ["M", "H"]

(df[x]
 .groupby(df[hue])
 .value_counts(normalize=True)
 .rename(y)
 .reset_index()
 .pipe((sns.barplot, "data"), x=x, y=y, hue=hue))

In [None]:
#conditional frequencies by sex
pd.crosstab(df['Sexo'],df['ansiedad.medicada'],normalize='index')


In [None]:
#Trait Vs age
sns.scatterplot(df['Edad'], df['ansiedad.rasgo'])

In [None]:
#State Vs age
sns.scatterplot(df['Edad'], df['ansiedad.estado'])

In [None]:
#age by diagnosis
sns.boxplot(data=df,y='Edad',x='ansiedad.medicada')