# Intro
Welcome to the [Google Brain - Ventilator Pressure Prediction](https://www.kaggle.com/c/ventilator-pressure-prediction/data) competition.

![](https://storage.googleapis.com/kaggle-competitions/kaggle/29594/logos/header.png)
<span style="color: royalblue;">Please vote the notebook up if it helps you. Feel free to leave a comment above the notebook. Thank you. </span>

# Libraries

In [None]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.tree import DecisionTreeRegressor
from xgboost import XGBRegressor

import warnings
warnings.filterwarnings("ignore")

# Path

In [None]:
path = '/kaggle/input/ventilator-pressure-prediction/'
os.listdir(path)

# Load Data

In [None]:
train_data = pd.read_csv(path+'train.csv')
test_data = pd.read_csv(path+'test.csv')
samp_subm = pd.read_csv(path+'sample_submission.csv')

# Overview

In [None]:
print('Number of train samples: ', len(train_data.index))
print('Number of test samples: ', len(test_data.index))
print('Number of features: ', len(train_data.columns))

**Features**
* id - globally-unique time step identifier across an entire file
* breath_id - globally-unique time step for breaths
* R - lung attribute indicating how restricted the airway is (in cmH2O/L/S). Physically, this is the change in pressure per change in flow (air volume per time). Intuitively, one can imagine blowing up a balloon through a straw. We can change R by changing the diameter of the straw, with higher R being harder to blow.
* C - lung attribute indicating how compliant the lung is (in mL/cmH2O). Physically, this is the change in volume per change in pressure. Intuitively, one can imagine the same balloon example. We can change C by changing the thickness of the balloon’s latex, with higher C having thinner latex and easier to blow.
* time_step - the actual time stamp.
* u_in - the control input for the inspiratory solenoid valve. Ranges from 0 to 100.
* u_out - the control input for the exploratory solenoid valve. Either 0 or 1.
* pressure - the airway pressure measured in the respiratory circuit, measured in cmH2O.

In [None]:
train_data.head()

# Exploratory Data Analysis

In [None]:
train_data[train_data.columns[1:]].describe().round(3)

In [None]:
train_data['pressure'].hist(bins=100);

# Scale Data

In [None]:
def scale_data(df):
    norm_df = (df-df.min())/(df.max()-df.min())
    return norm_df

In [None]:
train_data[train_data.columns[1:-1]] = scale_data(train_data[train_data.columns[1:-1]])
test_data[test_data.columns[1:]] = scale_data(test_data[test_data.columns[1:]])

# Split Data
We define the train and test data. The goal is to predict the pessure. We skip the feature *id* in the train and test data. To test the workflow we train on a subset of the train data set.

In [None]:
number_subset = len(train_data) #test: 1000000
X_train = train_data[train_data.columns[1:-1]][:number_subset]
y_train = train_data['pressure'][:number_subset]
X_test = test_data[test_data.columns[1:]]

We split the train data into the train data to train the model and the validation data to evaluate the model:

In [None]:
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size = 0.3, random_state=2021)

# Model
We use a simple Regressor first:

In [None]:
model = XGBRegressor(objective='reg:squarederror', n_estimators=500)
#model = DecisionTreeRegressor(random_state=2021)
model.fit(X_train, y_train)
y_val_pred = model.predict(X_val)
print('MAE:', mean_absolute_error(y_val, y_val_pred))

# Feature Importance

In [None]:
importance = model.feature_importances_
fig = plt.figure(figsize=(10, 6))
x = X_train.columns.values
plt.barh(x, 100*importance)
plt.title('Feature Importance', loc='left')
plt.xlabel('Percentage')
plt.grid()
plt.show()

# Analyse The Error

In [None]:
y_train_pred = model.predict(X_train)
y_val_pred = model.predict(X_val)

fig, axs = plt.subplots(1, 2, figsize=(22, 6))
fig.subplots_adjust(hspace = .5, wspace=.5)
axs = axs.ravel()
axs[0].plot(y_train, y_train_pred, 'ro')
axs[0].plot(y_train, y_train, 'blue')
axs[1].plot(y_val, y_val_pred, 'ro')
axs[1].plot(y_val, y_val, 'blue')
for i in range(2):
    axs[i].grid()
    axs[i].set_xlabel('true')
    axs[i].set_ylabel('pred')
axs[0].set_title('train')
axs[1].set_title('val')
plt.show()

# Predict Test Data

In [None]:
y_test = model.predict(X_test)
samp_subm['pressure'] = y_test

# Export

In [None]:
samp_subm.head()

In [None]:
samp_subm.to_csv('submission.csv', index=False)