Collaborative coding using GitHub
===========

Alexandre Perera Luna, Mónica Rojas Martínez

December 15th 2023


# Goal

The objective of this assignment is to construct a project through collaborative coding, showcasing an Exploratory Data Analysis (EDA) and a classification. To facilitate your understanding of GitHub, we will utilize code snippets from previous exercises, allowing you to focus on the process without concerns about the final outcome. The current notebook will serve as the main function in the project, and each participant is required to develop additional components and integrate their contributions into the main branch.


## Requirements

In order to work with functions created in other jupyter notebooks you need to install the package `nbimporter` using a shell and the following command:

<font color='grey'>pip install nbimporter</font> 

`nbimporter` allows you to import jupyter notebooks as modules. Once intalled and imported, you can use a command like the following to import a function called *fibonacci* that is stored on a notebook *fibbo_func* in the same path as the present notebook:

<font color='green'>from</font> fibbo_func <font color='green'>import</font> fibbonaci  <font color='green'>as</font> fibbo



In [1]:
## Modify this cell by importing all the necessary modules you need to solve the assigmnent. Observe that we are importing
## the library nbimporter. You will need it for calling fuctions created in other notebooks. 
import nbimporter
import pandas as pd

ModuleNotFoundError: No module named 'nbimporter'

In [2]:
# Here is an example of invoking the Fibonacci function, whisch should be located in the same directory as the main:
from fibbo_func import fibbonaci as fibbo
fibbo(24)

ModuleNotFoundError: No module named 'fibbo_func'

## Exercises
As an illustration of Git workflow, you will analyze the *Parkinson's* dataset, which has been previously examined in past assignments. Each team member has specific responsibilities that may be crucial for the progress of others. Make sure all of you organize your tasks accordingly. We've structured the analysis into modules to assist you in tracking your tasks, but feel free to deviate from it if you prefer.   
Please use Markdown cells for describing your workflow and expalining the findings of your work. 
Remember you need both, to modify this notebook and, to create additional functions outside. Your work will only be available for others when you modify and merge your changes.


In [3]:
# We will start by loading the parkinson dataset. The rest is up to you!
df = pd.read_csv('parkinsons.data', 
                 dtype = { # indicate categorical variables
                     'status': 'category'})
df.head(5)

NameError: name 'pd' is not defined

### 1. Cleaning and tidying the dataset

In [None]:
dict_names = {'MDVP:Fo(Hz)':'avFF',
              'MDVP:Fhi(Hz)':'maxFF', 
              'MDVP:Flo(Hz)':'minFF',
              'MDVP:Jitter(%)': 'percJitter',
              'MDVP:Jitter(Abs)':'absJitter' ,
              'MDVP:RAP': 'rap',
              'MDVP:PPQ': 'ppq',
              'Jitter:DDP': 'ddp',
              'MDVP:Shimmer' : 'lShimer',
              'MDVP:Shimmer(dB)': 'dbShimer',
              'Shimmer:APQ3':'apq3',
              'Shimmer:APQ5': 'apq5',
              'MDVP:APQ':'apq',
              'Shimmer:DDA':'dda'}
# Rename variables
from renamevars import renamevars
df = renamevars(df, dict_names)
df.head(5)

### 2. Basic EDA based on plots and descriptive statistics

In [None]:
# your code here
from scat_plt_function import scat_plt

# Fundamental frequency variables
fundamental_frequency_vars = df[['avFF', 'maxFF', 'minFF', 'status']]

# Generating the scatter plot using the scat_plt function
scat_plt(fundamental_frequency_vars['avFF'], fundamental_frequency_vars['maxFF'], fundamental_frequency_vars['status'])
scat_plt(fundamental_frequency_vars['avFF'], fundamental_frequency_vars['minFF'], fundamental_frequency_vars['status'])
scat_plt(fundamental_frequency_vars['minFF'], fundamental_frequency_vars['maxFF'], fundamental_frequency_vars['status'])

# Eliminates the columns that are not important for the fundamental frequency
cleaned_df_fundamental = df.drop(columns = ['avFF', 'maxFF','percJitter','absJitter','rap','ppq','ddp','lShimer','dbShimer','apq3', 'apq5', 'apq', 'dda'])

#########################################################################################################################################################################

# Jitter scatter plots
jitter_vars = df[['absJitter', 'rap', 'ppq', 'ddp', 'status']]

# Generating the scatter plot using the scat_plt function
scat_plt(jitter_vars['absJitter'], jitter_vars['rap'], jitter_vars['status'])
scat_plt(jitter_vars['absJitter'], jitter_vars['ppq'], jitter_vars['status'])
scat_plt(jitter_vars['absJitter'], jitter_vars['ddp'], jitter_vars['status'])
scat_plt(jitter_vars['rap'], jitter_vars['ppq'], jitter_vars['status'])
scat_plt(jitter_vars['rap'], jitter_vars['ddp'], jitter_vars['status'])
scat_plt(jitter_vars['ppq'], jitter_vars['ddp'], jitter_vars['status'])

# Eliminates the columns that are not important for the Jitter
cleaned_df_jitter = df.drop(columns = ['maxFF','minFF','avFF','percJitter','absJitter','lShimer','dbShimer','apq3','apq5','apq','dda'])

#######################################################################################################################################################

# Shimer scatter plots
shimer_vars = df[['lShimer', 'dbShimer', 'apq3', 'apq5', 'apq', 'dda', 'status']]

# Generating the scatter plot using the scat_plt function
scat_plt(shimer_vars['lShimer'],shimer_vars['dbShimer'],shimer_vars['status'] ) 
scat_plt(shimer_vars['lShimer'],shimer_vars['apq3'],shimer_vars['status'] ) 
scat_plt(shimer_vars['lShimer'],shimer_vars['apq5'],shimer_vars['status'] ) 
scat_plt(shimer_vars['lShimer'],shimer_vars['apq'],shimer_vars['status'] ) 
scat_plt(shimer_vars['lShimer'],shimer_vars['dda'],shimer_vars['status'] ) 
scat_plt(shimer_vars['dbShimer'],shimer_vars['apq3'],shimer_vars['status'] )
scat_plt(shimer_vars['dbShimer'],shimer_vars['apq5'],shimer_vars['status'] )
scat_plt(shimer_vars['dbShimer'],shimer_vars['apq'],shimer_vars['status'] ) 
scat_plt(shimer_vars['dbShimer'],shimer_vars['dda'],shimer_vars['status'] ) 
scat_plt(shimer_vars['apq3'],shimer_vars['apq5'],shimer_vars['status'] ) 
scat_plt(shimer_vars['apq3'],shimer_vars['apq'],shimer_vars['status'] ) 
scat_plt(shimer_vars['apq3'],shimer_vars['dda'],shimer_vars['status'] ) 
scat_plt(shimer_vars['apq5'],shimer_vars['apq'],shimer_vars['status'] )
scat_plt(shimer_vars['apq5'],shimer_vars['dda'],shimer_vars['status'] )
scat_plt(shimer_vars['apq'],shimer_vars['dda'],shimer_vars['status'] )

# Eliminates the columns that are not important for the Shimmer
cleaned_df_shimer = df.drop(columns=["minFF", "avFF", "maxFF", "percJitter", "absJitter", "rap", "ppq", "ddp", "apq", "apq5"])

########################################################################################################################################################

# Combine the cleaned dataframes
cleaned_df = pd.merge(cleaned_df_fundamental, cleaned_df_jitter, left_index=True, right_index=True).merge(cleaned_df_shimer, left_index=True, right_index=True)

# Now 'cleaned_df' contains only the representative variables for each category, with this instruction we see the first lines
cleaned_df.head(5)

### 3. Aggregating and transforming variables in the dataset

In [None]:
# your code here
gv = 'rap'
df = cleaned_df.iloc[:, 1:]

from group_and_average import group_and_average

result = group_and_average(df, gv)
print(result)


### 4. Differentiating between controls (healthy subjects) and patients

In [None]:
# your code here