<a href="https://colab.research.google.com/github/Paddrox06/Machine-Learning-Projects/blob/main/MLH_16803_p1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Machine Learning in Healthcare

**Practise 1*: EDA + Modeling clinical data***

**Professor**: Fernando Pozo

**Starting date**: September 17, 2021

**Due date**: October 25, 2021

**Degree**: B.Sc. in Data Science and Engineering

**Course**: 2021-2022

---

## Introduction

First of all, in case you are not familiarized with Google Colab, follow this [overview of colaboratory feature](https://colab.research.google.com/notebooks/basic_features_overview.ipynb) or feel free googling for it.

## Getting data from [Physionet](https://www.physionet.org/)

PhysioNet is a repository of freely-available medical research data, managed by the MIT Laboratory for Computational Physiology.

[Register](https://physionet.org/register/) your free acount

### Clinical data from the MIMIC-II database for a case study on indwelling arterial catheters

[Reference](https://physionet.org/content/mimic2-iaccd/1.0/).

**Indwelling arterial catheters** (IACs) are used extensively in the ICU for hemodynamic monitoring and for blood gas analysis. IAC use also poses potentially serious risks, including bloodstream infections and vascular complications. In 2015, Hsu et al published a study to assess whether IAC use was associated with mortality in patients who are mechanically ventilated and do not require vasopressor support.

The dataset is derived from MIMIC-II, the publicly-accessible critical care database. It contains summary clinical data and  outcomes for 1,776 patients. Moreover, the dataset contains 46 variables were extracted from MIMIC-II, including demographics (e.g. age, weight), clinical observations collected during the first day of ICU stay (e.g. white blood cell count, heart rate), and outcomes (e.g. 28 day mortality and length of stay). It is shared as a comma separated value (CSV) file, along with a data dictionary.

The dataset (`full_cohort_data.csv`) is a comma separated value file that includes a header with descriptive variable names. `day_28_flg` was the main outcome of interest, while `aline_flg` was the primary covariate of interest. There is an accompanying data dictionary (`data_dictionary.txt`) which contains the metadata about the variables

In [None]:
#@title Downloading the database
!wget -r -N -c -np https://physionet.org/files/mimic2-iaccd/1.0/

#### Setup

Below, import the set of libraries you are going to use to follow this hands-on.

In [None]:
#@title Loading the database
import pandas as pd
df = pd.read_csv('/content/physionet.org/files/mimic2-iaccd/1.0/full_cohort_data.csv')
df.head(15)

## Exercise 1 (20%):
Please, analyse and explore the data set and report some the particularities you can notice in this type of data. Then, comment some of the ideas extracted with your partners in the [class document](https://docs.google.com/document/d/1zwZ3AAOqsv1N2wpmsyzD8rfl6om970l4JNzVxJO8Rlc/edit). One you have commented in the document, collect this ideas in your final report and justify them.

In [None]:
#@title 1.1 What are the particularities of clinical data? Justify why.


## Exercise 2 (80%): 
The previous data set was used throughout [Chapter 16 (Data Analysis)](https://link.springer.com/chapter/10.1007/978-3-319-43742-2_16) by Raffa J. et al. to investigate the effectiveness of indwelling arterial catheters in hemodynamically stable patients with respiratory failure for mortality outcomes. This [repository](https://github.com/MIT-LCP/aline-mimic-ii) also contains information about data analysis reported. The student should be able to:
- Read the materials already described by authors. It includes information to understand how the data set was done, some of the particularities of clinical data (previously commented above) and some data analysis methods used.
- Create an exploratory data analysis (EDA) of this data set (40%).
- Develop another machine learning models different than the ones shown in the report. (40%).
- Try to get some extra insights with the proposal 2.3, 2.4 and 2.5 activities.
- Take their old machine learning stuff and consult all the possible useful material over the web in order to provide the best results for this assignment. Of course, ask to the Professor in case of don't know how to make it.
- Share your **own work** with the Professor by email ([fpozoca@gmail.com](mailto:fpozoca@gmail.com)) before **25 October 2021 (23:59 CEST)**. It could be shared directly by this link or setting up your own GitHub directory within this notebook inside.

In [None]:
#@title 2.1 Exploratory data analysis (compulsory)
!pip install dataprep
from dataprep.eda import plot

In [None]:
plot(df)

In [None]:
plot(df, "age")

In [None]:
from dataprep.eda import create_report
report = create_report(df, title='My first report')
report.save(filename='report_01', to='/content')

In [None]:
#@title 2.2 Data modelling (compulsory)


In [None]:
#@title 2.3 Model evaluation (optional)


In [None]:
#@title 2.4 Model intepretability (optional)


In [None]:
#@title 2.5 Extra content (optional)


## Extra notes
If you want to set up a GitHub repository (recommended), please, follow this name structure (`MLH16803_P1_XX`) being the `XX` your initials.

In my case, my repository would be called `MLH16803_P1_FP`. In case you don't know how to create a GitHub repository, please check this [tutorial](https://docs.github.com/en/get-started/quickstart/create-a-repo). In case you don't manage version control systems like Git, we are going to follow a special Hands-on for this task, but you can start consulting this [tutorial](https://swcarpentry.github.io/git-novice/).

## Useful references for notebook code formatting

* [Github Markdown basics](https://help.github.com/articles/markdown-basics/)
* [Github flavored Markdown](https://help.github.com/articles/github-flavored-markdown/)
* [Original Markdown spec: Syntax](http://daringfireball.net/projects/markdown/syntax)
* [Original Markdown spec: Basics](http://daringfireball.net/projects/markdown/basics)
* [marked.js library used by Colab](https://github.com/chjj/marked)
* [LaTex mathematics for equations](https://en.wikibooks.org/wiki/LaTeX/Mathematics)

# Conclusion

Feel free to ask whatever you need by email ([fpozoca@gmail.com](mailto:fpozoca@gmail.com)), in class or contacting with me by [my website](https://www.fpozoc.com/).