# Code Graveyard

For our own reference. Code that we have removed from our project but may require again are dumped here.

### Dropping Outliers

We should also drop any outliers that may affect our model. We can observe some outliers for trestbps, chol, thalach and oldpeak variables from the boxplots. Let us begin by removing them.

In [None]:
# We remove all the outliers all at once so that it does not affect the quantiles of future variables.
# To do this, we first store the lower and upper bounds of each variable in a list. 

lst = []

for v in num_var:
    IRQ = clean_data[v].quantile(0.75) - clean_data[v].quantile(0.25)
    upper = clean_data[v].quantile(0.75) + IRQ * 1.5
    lower = clean_data[v].quantile(0.25) - IRQ * 1.5
    outliers = clean_data[v][(clean_data[v] < lower) | (clean_data[v] > upper)]
    print(f"{v}: {outliers.shape[0]}")
    lst.append((lower, upper))

lst = list(zip(num_var, lst))

for v, (lower, upper) in lst:
    clean_data = clean_data[(clean_data[v] >= lower) & (clean_data[v] <= upper)]

print("\nNew dimension of clean_data:", clean_data.shape)

### Fixing Skew

We can also observe some skew for many of the numerical variables. Let us take a closer look at each of their skews by calling `.skew()` on each of the variables.

In [11]:
for v in num_var:
    print(f"{v}: {clean_data[v].skew().round(2)}")

age: -0.21
trestbps: 0.29
chol: 0.19
thalach: -0.43
oldpeak: 0.92


We can see that the skew values of `age`, `trestbps`, `chol` and `thalach` are below 0.5 and thus quite low. However, `oldpeak` has a very high positive skew value of 0.92, and we should address this. 