In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

## Google 🧠 Ventilator Lazy Prediction & EDA 🔮

###### An article extract from Wikipedia - https://en.wikipedia.org/wiki/Mechanical_ventilation

![Mechanical Ventilation](https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Respiratory_therapist.jpg/440px-Respiratory_therapist.jpg)
![A Ventilator](https://upload.wikimedia.org/wikipedia/commons/thumb/9/94/VIP_Bird2.jpg/440px-VIP_Bird2.jpg)
![Display](https://upload.wikimedia.org/wikipedia/commons/thumb/5/59/PIA23775-NASA-VITAL-Ventilator-20200430.jpg/490px-PIA23775-NASA-VITAL-Ventilator-20200430.jpg)

### History of Ventilators
The history of mechanical ventilation begins with various versions of what was eventually called the iron lung, a form of noninvasive negative-pressure ventilator widely used during the polio epidemics of the twentieth century after the introduction of the "Drinker respirator" in 1928, improvements introduced by John Haven Emerson in 1931, and the Both respirator in 1937. Other forms of noninvasive ventilators, also used widely for polio patients, include Biphasic Cuirass Ventilation, the rocking bed, and rather primitive positive pressure machines.

### Mechanical ventilation
Mechanical ventilation is indicated when the patient's spontaneous breathing is inadequate to maintain life. It is also indicated as prophylaxis for imminent collapse of other physiologic functions, or ineffective gas exchange in the lungs. Because mechanical ventilation serves only to provide assistance for breathing and does not cure a disease, the patient's underlying condition should be identified and treated in order to resolve over time. In addition, other factors must be taken into consideration because mechanical ventilation is not without its complications One of the main reasons why a patient is admitted to an ICU is for delivery of mechanical ventilation. Monitoring a patient in mechanical ventilation has many clinical applications: Enhance understanding of pathophysiology, aid with diagnosis, guide patient management, avoid complications and assessment of trends. In general, mechanical ventilation is initiated to protect the airway/reduce work of breathing and/or correct blood gases.

### Common Uses
Common specific medical indications for use include:
* Acute lung injury, including acute respiratory distress syndrome (ARDS) and trauma
* Apnea with respiratory arrest, including cases from intoxication
* Acute severe asthma requiring intubation
* Acute or chronic respiratory acidosis, most commonly with chronic obstructive pulmonary disease (COPD) and obesity hypoventilation syndrome * Acute respiratory acidosis with partial pressure of carbon dioxide (pCO 2) > 50 mmHg and pH < 7.25, which may be due to paralysis of the diaphragm due to Guillain–Barré syndrome, myasthenia gravis, motor neuron disease, spinal cord injury, or the effect of anaesthetics and muscle relaxants
* Increased work of breathing as evidenced by significant tachypnea, retractions, and other physical signs of respiratory distress
* Hypoxemia with arterial partial pressure of oxygen (PaO 2) < 55 mm Hg with supplemental fraction of inspired oxygen (FiO 2) = 1.0
* Hypotension including sepsis, shock, congestive heart failure
* Neurological diseases such as muscular dystrophy and amyotrophic lateral sclerosis (ALS)
* Newborn infants with breathing problems may require mechanical ventilation.
Mechanical ventilation can be used as a short-term measure, for example during an operation or critical illness (often in the setting of an intensive-care unit). It may be used at home or in a nursing or rehabilitation institution if patients have chronic illnesses that require long-term ventilatory assistance.

### Positive pressure

>> Carl Gunnar Engström invented in 1950 one of the first intermittent positive pressure ventilator, which delivers air straight into the lungs using an endotracheal tube placed into the windpipe.

>>> ***Neonatal mechanical ventilator***
>>>> The design of the modern positive-pressure ventilators were based mainly on technical developments by the military during World War II to supply oxygen to fighter pilots in high altitude. Such ventilators replaced the iron lungs as safe endotracheal tubes with high-volume/low-pressure cuffs were developed.
>>>> Positive-pressure ventilators work by increasing the patient's airway pressure through an endotracheal or tracheostomy tube. The positive pressure allows air to flow into the airway until the ventilator breath is terminated. Then, the airway pressure drops to zero, and the elastic recoil of the chest wall and lungs push the tidal volume — the breath-out through passive exhalation.

### Negative pressure machines

>> Negative pressure mechanical ventilators are produced in small, field-type and larger formats. The prominent design of the smaller devices is known as the cuirass, a shell-like unit used to create negative pressure only to the chest using a combination of a fitting shell and a soft bladder

### Intermittent abdominal pressure ventilator
>> Another type is the intermittent abdominal pressure ventilator that applies pressure externally via an inflated bladder, forcing exhalation, sometimes termed exsufflation.

### Monitoring
>> In ventilated patients, pulse oximetry it is commonly used when titrating FIO2. A reliable target of Spo2 is greater than 95%.

>> Different strategies exist to find the level of PEEP in these patients with ARDS guided by esophageal pressure, Stress Index, static airway pressure-volume curve. In such patients, some experts recommend limiting PEEP to low levels (~10cmH2O). In patients who have diffused loss of aeration, PEEP can be used provided it does not cause the plateau pressure to rise above the upper inflection point.


### Breath delivery mechanisms
#### Trigger
>> The trigger is what causes a breath to be delivered by a mechanical ventilator. Breaths may be triggered by a patient taking their own breath, a ventilator operator pressing a manual breath button, or by the ventilator based on the set breath rate and mode of ventilation.

#### Cycle
>> The cycle is what causes the breath to transition from the inspiratory phase to the exhalation phase. Breaths may be cycled by a mechanical ventilator when a set time has been reached, or when a preset flow or percentage of the maximum flow delivered during a breath is reached depending on the breath type and the settings. Breaths can also be cycled when an alarm condition such as a high pressure limit has been reached, which is a primary strategy in pressure regulated volume control.

#### Limit
>> Limit is how the breath is controlled. Breaths may be limited to a set maximum circuit pressure or a set maximum flow.

#### Breath exhalation
>> Exhalation in mechanical ventilation is almost always completely passive. The ventilator's expiratory valve is opened, and expiratory flow is allowed until the baseline pressure (PEEP) is reached. Expiratory flow is determined by patient factors such as compliance and resistance.

<a id="table-of-contents"></a>
<h1 style='background:#B2FF33; border:0;'><center>Table of Contents</center></h1>

## [1. Introduction](#1)
### [1.1 Loading of Libraries](#1.1)
### [1.2 Data Loading](#1.2)
## [2. Data Exploration](#2)
### [2.1 Number of Rows and columns](#2.1)
### [2.2 Missing Data Information](#2.2)
### [2.3 Header Rows](#2.3)
### [2.4 Automated EDA](#2.4)
## [3. Features Analysis](#3)
## [4. Use of LazyPredict](#4)
## [Work In Progress](#999)



###### [back to top](#table-of-contents)
# [¶](#1)
<h1 style='background:#B2FF33; border:0;'><center>1. Introduction</center></h1>

## This competition is about prediction of ventilator usage data for use in a pandemic/epidemic disaster type situation such as *COVID-19*

***Some nice papers to read*** 

- [Improving Mechanical Ventilator Clinical Decision Support Systems with A Machine Learning Classifier for Determining Ventilator Mode <== ***by*** ==> Gregory B. Rehm , Brooks T. Kuhn, Jimmy Nguyen, Nicholas R. Andersonb, Chen-Nee Chuah , Jason Y. Adams](https://arxiv.org/ftp/arxiv/papers/1904/1904.12969.pdf)
- [Machine learning methods to predict mechanical ventilation and mortality in patients with COVID-19 <== ***by*** ==>  Limin Yu ,Alexandra Halalau,Bhavinkumar Dalal,Amr E. Abbas,Felicia Ivascu,Mitual Amin,Girish B. Nair](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0249285)
- [Second Link for the above](https://pubmed.ncbi.nlm.nih.gov/33793600/)
- [Artificial Intelligence in the Intensive Care Unit <== ***by*** ==> Guillermo Gutierrez](https://ccforum.biomedcentral.com/track/pdf/10.1186/s13054-020-2785-y.pdf)
- [Development and validation of a reinforcement learning algorithm to dynamically optimize mechanical ventilation in critical care <== ***by*** ==>
Arne Peine, Ahmed Hallawa, Johannes Bickenbach, Guido Dartmann, Lejla Begic Fazlic, Anke Schmeink, Gerd Ascheid, Christoph Thiemermann, Andreas Schuppert, Ryan Kindle, Leo Celi, Gernot Marx & Lukas Martin ](https://www.nature.com/articles/s41746-021-00388-6)


###### [back to top](#table-of-contents)
# [¶](#1.1)
<h3 style='background:#B2FF33; border:0;'><center>1.1. Loading of Libraries</center></h3>

In [None]:
from IPython.display import clear_output
!pip3 install -U lazypredict
clear_output()

In [None]:
!pip3 install -U pandas==1.2.3 #Upgrading pandas
import numpy as np
import pandas as pd 

import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import matplotlib.dates as mdates
import matplotlib.colors as mcolors
from matplotlib import style

import seaborn as sns
sns.set_style('whitegrid')
plt.style.use("fivethirtyeight")
%matplotlib inline

from sklearn.metrics import mean_absolute_error,explained_variance_score,max_error
from sklearn.metrics import mean_absolute_error,mean_squared_error,mean_squared_log_error
from sklearn.metrics import median_absolute_error
from sklearn.metrics import r2_score,mean_poisson_deviance
from sklearn.metrics import mean_gamma_deviance,mean_tweedie_deviance

import lazypredict
from lazypredict import Supervised
from lazypredict.Supervised import LazyRegressor, LazyClassifier
clear_output()

import warnings
warnings.filterwarnings('ignore')


###### [back to top](#table-of-contents)
# [¶](#1.2)
<h3 style='background:#B2FF33; border:0;'><center>1.2. Data Loading</center></h3>

In [None]:
# import required modules
import pandas as pd 
from sklearn.model_selection import train_test_split
import datetime

train = pd.read_csv("../input/ventilator-pressure-prediction/train.csv")
test = pd.read_csv("../input/ventilator-pressure-prediction/test.csv")

###### [back to top](#table-of-contents)
# [¶](#2)
<h1 style='background:#B2FF33; border:0;'><center>2. Data Exploration</center></h1>

###### [back to top](#table-of-contents)
# [¶](#2.1)
<h3 style='background:#B2FF33; border:0;'><center>2.1. Numbers of rows and columns</center></h3>

In [None]:
print('Rows and Columns in train dataset:', train.shape)
print('Rows and Columns in test dataset:', test.shape)

###### [back to top](#table-of-contents)
# [¶](#2.2)
<h3 style='background:#B2FF33; border:0;'><center>2.2. Missing Values Data</center></h3>

In [None]:
print('Missing values in train dataset:', sum(train.isnull().sum()))
print('Missing values in test dataset:', sum(test.isnull().sum()))

In [None]:
print('Missing values per columns in train dataset')
for col in train.columns:
    temp_col = train[col].isnull().sum()
    print(f'{col}: {temp_col}')

In [None]:
print('Missing values per columns in test dataset')
for col in test.columns:
    temp_col = test[col].isnull().sum()
    print(f'{col}: {temp_col}')

###### [back to top](#table-of-contents)
# [¶](#2.3)
<h3 style='background:#B2FF33; border:0;'><center>2.3. Header Rows , Skewness and Kurtosis </center></h3>

In [None]:
train.head()

In [None]:
test.head()

In [None]:
print(train.kurt())
print(train.skew())



## Training dataset tells us that ***u_in*** and ***pressure*** are ***skewed*** distribution with a ***kurtosis***!
- In statistics, *skewness is a measure of the asymmetry of the probability distribution of a random variable* about its mean. ... If skewness is less than -1 or greater than 1, the distribution is highly skewed. If skewness is between -1 and -0.5 or between 0.5 and 1, the distribution is moderately skewed.
- Kurtosis is a *statistical measure that defines how heavily the tails of a distribution differ from the tails of a normal distribution*. In other words, kurtosis identifies whether the tails of a given distribution contain extreme values.
> Source : Google Search

In [None]:
print(test.kurt())
print(test.skew())



## Testing dataset tells us that ***u_in*** and ***pressure*** are skewed distribution with a kurtosis!

In [None]:
import seaborn as sns

corr=train.corr()
plt.figure(figsize=(10,10))
sns.heatmap(corr,annot=True,cmap=plt.cm.tab20b)

corr=test.corr()
plt.figure(figsize=(10,10))
sns.heatmap(corr,annot=True,cmap=plt.cm.tab20b)

###### [back to top](#table-of-contents)
# [¶](#2.4)
<h3 style='background:#B2FF33; border:0;'><center>2.4. Automated EDA using SweetViz and AutoViz </center></h3>

In [None]:
try:
    import sweetviz
except:
    !pip install sweetviz
    import sweetviz

In [None]:


import pandas as pd
df = pd.read_csv('../input/ventilator-pressure-prediction/train.csv')
my_report  = sweetviz.analyze([df,'Train'], target_feat='pressure')
my_report.show_html('FinalReport.html')

In [None]:
my_report.show_notebook()

In [None]:
!pip install xlrd autoviz
try:
    from autoviz.AutoViz_Class import AutoViz_Class
except:
    !pip install  statsmodel xlrd autoviz  
    from autoviz.AutoViz_Class import AutoViz_Class

In [None]:


AV = AutoViz_Class()
df = AV.AutoViz('../input/ventilator-pressure-prediction/train.csv')

###### [back to top](#table-of-contents)
# [¶](#3)
<h1 style='background:#B2FF33; border:0;'><center>3. Features</center></h1>

## Build a KDEplot for all the key columns 


###### Source code credit to a kernel https://www.kaggle.com/dwin183287/tps-september-2021-eda or the grandmaster SHARLTO COPE

In [None]:
plt.rcParams['figure.dpi'] = 600
fig = plt.figure(figsize=(10, 5), facecolor='#f6f5f5')
gs = fig.add_gridspec(2, 4)
gs.update(wspace=0.3, hspace=0.3)
background_color = "#f6f5f5"

run_no = 0
for row in range(0, 2):
    for col in range(0, 4):
        locals()["ax"+str(run_no)] = fig.add_subplot(gs[row, col])
        locals()["ax"+str(run_no)].set_facecolor(background_color)
        for s in ["top","right"]:
            locals()["ax"+str(run_no)].spines[s].set_visible(False)
        run_no += 1  

features = list(train.columns[0:7])
print(features)


run_no = 0
for col in features:
    sns.kdeplot(ax=locals()["ax"+str(run_no)], x=train[col], zorder=2, alpha=1, linewidth=1, color='#ffd514')
    locals()["ax"+str(run_no)].grid(which='major', axis='x', zorder=0, color='#EEEEEE', linewidth=0.4)
    locals()["ax"+str(run_no)].grid(which='major', axis='y', zorder=0, color='#EEEEEE', linewidth=0.4)
    locals()["ax"+str(run_no)].set_ylabel('')
    locals()["ax"+str(run_no)].set_xlabel(col, fontsize=4, fontweight='bold')
    locals()["ax"+str(run_no)].tick_params(labelsize=4, width=0.5)
    locals()["ax"+str(run_no)].xaxis.offsetText.set_fontsize(4)
    locals()["ax"+str(run_no)].yaxis.offsetText.set_fontsize(4)
    run_no += 1

run_no = 0
for col in features:
    sns.kdeplot(ax=locals()["ax"+str(run_no)], x=test[col], zorder=2, alpha=1, linewidth=1, color='#ff355d')
    locals()["ax"+str(run_no)].grid(which='major', axis='x', zorder=0, color='#EEEEEE', linewidth=0.4)
    locals()["ax"+str(run_no)].grid(which='major', axis='y', zorder=0, color='#EEEEEE', linewidth=0.4)
    locals()["ax"+str(run_no)].set_ylabel('')
    locals()["ax"+str(run_no)].set_xlabel(col, fontsize=4, fontweight='bold')
    locals()["ax"+str(run_no)].tick_params(labelsize=4, width=0.5)
    locals()["ax"+str(run_no)].xaxis.offsetText.set_fontsize(4)
    locals()["ax"+str(run_no)].yaxis.offsetText.set_fontsize(4)
    run_no += 1

plt.show()

## Plot the Histogram distributions

In [None]:
import random

def random_color():
        rand = lambda: random.randint(1, 255)
        return '#%02X%02X%02X' % (rand(), rand(), rand())
    
def plot_feature_distributions(figrows,figcols,colstart,colend,collist,df_to_plot):
    plt.figure(1)
    plt.subplots(figrows,figcols, figsize=(4,3))
    for i, item in enumerate(collist[colstart:colend]):
        plt.subplot(figrows,figcols,i+1)
        plt.hist(x=df_to_plot[item],color=random_color(),alpha=0.75)
        plt.title(item)
        plt.grid(True)
        plt.subplots_adjust(top=1.5, bottom=0.2, left=0.10, right=0.95, hspace=0.3,
        wspace=0.35)

In [None]:
plt.rcParams['figure.dpi'] = 600
fig = plt.figure(figsize=(10, 5), facecolor='#f6f5f5')
gs.update(wspace=0.3, hspace=0.3)
background_color = "#f6f5f5"

plot_feature_distributions(2,3,0,5,features,train)

###### [back to top](#table-of-contents)
# [¶](#4)
<h1 style='background:#B2FF33; border:0;'><center>4. Use Of LazyPredict Library for Regression</center></h1> 

* create the test and training , validation datasets 

In [None]:
X = train.drop(['id','breath_id','pressure'], axis=1)
y = train['pressure']
# #Spliting into training and validation set
X_train, X_valid, y_train, y_valid = train_test_split(X, y,test_size=.2,random_state =1)

# [¶](#4.1)
<h2 style='background:#B2FF33; border:0;'><center>4.1 Build and run the regression models</center></h2> 

In [None]:
skip_list = [8,9,10,11,12,15, 16,17,18,26,27,28,29,30,31,33, 37]
#skip_list = [0,1,2,3,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41]

regs_name =[]
regs = []

for i in range(42):
    if i in skip_list:
        print('Skipping', i, " ->", lazypredict.Supervised.REGRESSORS[i][0])
    else:
        regs_name.append(lazypredict.Supervised.REGRESSORS[i][0])
        regs.append(lazypredict.Supervised.REGRESSORS[i][1])
#        print(i, " ->", lazypredict.Supervised.REGRESSORS[i][0])

#print(regs_name)

- Run the model

In [None]:

offset = int(X.shape[0] * 0.9)
#Let’s split the dataset into the training and testing part:

### splitting dataset into training and testing part.
X_train, y_train = X[:offset], y[:offset]
X_test, y_test = X[offset:], y[offset:]
#Let’s create an object of LazyRegressor class:

### fitting data in LazyRegressor because here we are solving Regression use case. 
num_models = 42 - len(skip_list)
mod_idx = [m for m in range(num_models)]
results = pd.DataFrame()
for i in range(0,num_models):
    print(i,regs_name[i])
    reg = LazyRegressor(verbose=0, 
                    ignore_warnings=True,
                    predictions=True,
                    custom_metric=  r2_score,
                    regressors = [regs[i]])
    models, predictions = reg.fit(X_train, X_valid, y_train, y_valid)
    models.index = [regs_name[i]]
    results = results.append(models)

clear_output()
print(results)

### fitting data in LazyClassifier
models, predictions = reg.fit(X_train, X_test, y_train, y_test)

- Generate a heatmap summary on r2_score

In [None]:
results = results.sort_values(by = "r2_score")
results.style.background_gradient(cmap ='viridis')
    

In [None]:
#for i in range(0,num_models):
#    results['Model'][i] = regs_name[i]

results['Model'] = regs_name
#print(results.columns)


# [¶](#4.2)
<h2 style='background:#B2FF33; border:0;'><center>4.2 Plot the outcomes</center></h2> 

- *Plot the key parameters for the regression*  - ***Adjusted R-Squared","R-Squared", "RMSE", "r2_score***

In [None]:
x_len = 10 + (2 * (num_models/10))
y_len = 2.25 *  num_models

# Font
plt.rcParams['font.family'] = 'Cursive'

# Visualization
ax=results.plot(x="Model", y=["Adjusted R-Squared","R-Squared", "r2_score"], kind="barh",figsize=(x_len,y_len))
ax.patch.set_facecolor('#e4f2f7')

# Remove ticks
ax.xaxis.set_ticks_position('none')
ax.yaxis.set_ticks_position('none')

# Annotate values
#for p in ax.patches:
#    ax.annotate(str(p.get_width()), (p.get_y() * 1.005, p.get_width() * 1.005))
    
# Remove axes splines
for i in ['top', 'bottom', 'left', 'right']:
    ax.spines[i].set_visible(False)

# Remove grid
ax.grid(b=None)


# Y axis position
ax.spines['left'].set_position(('data', -0.5))


# Labels titles
ax.set_ylabel('Regression Algorithm', fontsize=8)
ax.set_xlabel('Value', fontsize=8, labelpad=20)


# Title
ax.set_title('Performance of various LazyPredict Regression Algorithms', fontsize=15)

plt.show()


- *plot the RMSE Value

In [None]:
fig, ax = plt.subplots(1,1)
ax=results["RMSE"].plot(kind='line', 
                              linewidth=4,
                              marker='h', 
                              markerfacecolor='skyblue', 
                              markeredgewidth=2,
                              markersize=12, 
                              markevery=1, 
                              figsize=(10,5))
ax.set_title('Performance of various LazyPredict Regression Algorithms', fontsize=15)
ax.set_xlabel('Regression Algorithm', fontsize=12)
ax.set_ylabel('RMSE Value', fontsize=12)
ax.patch.set_facecolor('#f4e2f7')
# Remove ticks
ax.xaxis.set_ticks_position('none')
ax.yaxis.set_ticks_position('none')

# Remove axes splines
for i in ['top', 'bottom', 'left', 'right']:
    ax.spines[i].set_visible(False)

# Remove grid
plt.grid(b=None)
plt.show()


- *plot the time taken*

In [None]:
fig, ax = plt.subplots(1,1)
ax=results["Time Taken"].plot(kind='line', 
                              linewidth=4,
                              marker='h', 
                              markerfacecolor='lightgreen', 
                              markeredgewidth=2,
                              markersize=12, 
                              markevery=1, 
                              figsize=(10,5))
ax.set_title('Performance of various LazyPredict Regression Algorithms', fontsize=15)
ax.set_xlabel('Regression Algorithm', fontsize=12)
ax.set_ylabel('Time Taken', fontsize=12)
ax.patch.set_facecolor('#e4f2f7')
# Remove ticks
ax.xaxis.set_ticks_position('none')
ax.yaxis.set_ticks_position('none')

# Remove axes splines
for i in ['top', 'bottom', 'left', 'right']:
    ax.spines[i].set_visible(False)

# Remove grid
plt.grid(b=None)
plt.show()


# [¶](#4.3)

<h2 style='background:#B2FF33; border:0;'><center>4.3 Try SHAP and XGB to plot the values as a Bee Swarm</center></h2> 

In [None]:
import shap  # pip install shap
import xgboost as xgb

# Load and train a model
clf = xgb.XGBRegressor().fit(X[0:10000], y[0:10000])

# Explain model's predictions with SHAP
explainer = shap.Explainer(clf)
shap_values = explainer(X[0:10000])

# Visualize the predictions' explanation
shap.plots.beeswarm(shap_values)

###### [back to top](#table-of-contents)
<a id="999"></a>
# Work in Progress More to come