# <font color= #900c3f >MUIT - TSA 2018</font>
## Some simple Acoustic Analysis of Speech from Sleep Apnea Patients

### some references:

- #### [Formant Frequencies and Bandwidths in Relation to Clinical Variables in an Obstructive Sleep Apnea Population](https://www.sciencedirect.com/science/article/pii/S0892199715000077)
- #### [Obstructive Sleep Apnea in Women: Study of Speech and Craniofacial Characteristics](http://https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5696580/)



===================================================================
## Extracting data: $Formants$ from sustained vowel /i/
<br>

- ### UPLOAD: OSA_Excel.zip file from [TSA GitHub](http://https://github.com/MUIT-TSA/Python) 

In [None]:
! unzip /resources/data/audio/OSA/OSA_Excel.zip -d /resources/data/audio/OSA/

Archive:  /resources/data/audio/OSA/OSA_Excel.zip
replace /resources/data/audio/OSA/OSA_1.xls? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

### <font color=  #dc7633  >Now we will use Pandas DataFrames to read and analyze data</font>

In [None]:
import pandas as pd 

In [None]:
! pip install xlrd

### ... read only one file...

In [None]:
file = '/resources/data/audio/OSA/OSA_18.xls'
df1 = pd.read_excel(file,sheetname='Sheet1')

In [None]:
df1

### ... then read all files and concatenate DataFrames

In [None]:
## Get a list with all xls files in /resources/data/audio/OSA/

import os

exPath='/resources/data/audio/OSA/'
fileList=os.listdir(exPath)


In [None]:
len(fileList)

### ... read all the files in the list
### and concatenate all dataframes into a df_OSA dataframe

In [None]:
df_OSA=pd.DataFrame()  # an empty DataFrame

for exFile in fileList:
    if exFile.endswith('.xls'):
        df = pd.read_excel(exPath+exFile,sheetname='Sheet1')
        df_OSA=pd.concat([df_OSA,df], axis=0)

In [None]:
df_OSA.info()

In [None]:
df_OSA['F2'][0:10]

In [None]:
df_OSA.head(5)

In [None]:
df_OSA.iloc[1:3]

### ... unique

In [None]:
Gender_categories=pd.unique(df_OSA['Gender'])
print('Gender categories : ',Gender_categories)

### ...simple stats...

In [None]:
df_OSA['F1'].mean()

In [None]:
df_OSA.describe()

## $Groupby$
### ... analyze data by category or group

In [None]:
df_OSA.groupby('Gender').describe()

In [None]:
F1=df_OSA['F1']
F2=df_OSA['F2']

In [None]:
type(F1)

In [None]:
# convert F1 and F2 to arrays...

import numpy as np

F1=np.array(F1)
F2=np.array(F2)

In [None]:
type(F1)

# MATPLOTLIB

### [Matplotlib](https://matplotlib.org/#) is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms

<img src="https://matplotlib.org/_static/logo2.png">

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
# Scatter plot
plt.figure(figsize=(9,6))
plt.scatter(F1,F2)
plt.xlabel('F1',fontsize=18)
plt.ylabel('F2',fontsize=18)
plt.title('First and Second Formantas in OSA Patientes',fontsize=18)
plt.grid(color='k', linestyle='--')

In [None]:
import matplotlib.pylab as pylab

pylab.rcParams['figure.figsize']=14,12

df_OSA['logAHI']=np.log(df_OSA['AHI']+1)

figure=pd.tools.plotting.scatter_matrix(df_OSA)

Corr_matrix=df_OSA.corr()

In [26]:

Corr_matrix=df_OSA.corr()
Corr_matrix

Unnamed: 0,Gender,F1,F2,F3,age,height,weight,cervper,AHI,logAHI
Gender,1.0,0.674369,0.808553,0.76699,0.180988,-0.745747,-0.381738,-0.673181,-0.195652,-0.231785
F1,0.674369,1.0,0.551874,0.543594,0.158547,-0.550602,-0.316091,-0.498241,-0.066469,-0.095096
F2,0.808553,0.551874,1.0,0.831283,0.096365,-0.705537,-0.391327,-0.606713,-0.275134,-0.266446
F3,0.76699,0.543594,0.831283,1.0,0.095002,-0.628157,-0.323657,-0.57797,-0.219677,-0.253628
age,0.180988,0.158547,0.096365,0.095002,1.0,-0.383982,-0.1353,0.084769,0.117151,0.206456
height,-0.745747,-0.550602,-0.705537,-0.628157,-0.383982,1.0,0.399327,0.489564,0.067964,0.031462
weight,-0.381738,-0.316091,-0.391327,-0.323657,-0.1353,0.399327,1.0,0.689657,0.364809,0.294792
cervper,-0.673181,-0.498241,-0.606713,-0.57797,0.084769,0.489564,0.689657,1.0,0.414359,0.378732
AHI,-0.195652,-0.066469,-0.275134,-0.219677,0.117151,0.067964,0.364809,0.414359,1.0,0.86844
logAHI,-0.231785,-0.095096,-0.266446,-0.253628,0.206456,0.031462,0.294792,0.378732,0.86844,1.0


In [None]:
import seaborn as sns

sns.heatmap(Corr_matrix)

## Fitting a linear regression model

### We will use [Scikit-Learn](http://scikit-learn.org/) Machine Learning in Python

<img src=http://scikit-learn.org/stable/_static/scikit-learn-logo-small.png>

### Choosing predictors and target

In [None]:
columns=

target=

## Splitting train and test sets

# Fitting a linear regression model:

## Predicting...