
<h1 style='background-color:Green; font-family:newtimeroman; font-size:250%; text-align:center; border-radius: 15px 50px;' > Auto-Sklearn </h1>


##### The Auto-Sklearn architecture is composed of 3 phases: meta-learning, bayesian optimization, ensemble selection. The key idea of the meta-learning phase is to reduce the space search by learning from models that performed well on similar datasets. Right after, the bayesian optimization phase takes the space search created in the meta-learning step and creates bayesian models for finding the optimal pipeline configuration. Finally, an ensemble selection model is created by reusing the most accurate models found in the bayesian optimization step. In Figure 2 it’s described the Auto-Sklearn architectur



<img src="https://miro.medium.com/max/1000/1*w8qIzewO97qdqmiZi69Maw.jpeg" width="800px">



<h1 style='background-color:Green; font-family:newtimeroman; font-size:250%; text-align:center; border-radius: 15px 50px;' > Heart Failure </h1>

##### Heart failure, sometimes known as congestive heart failure, occurs when your heart muscle doesn't pump blood as well as it should. Certain conditions, such as narrowed arteries in your heart (coronary artery disease) or high blood pressure, gradually leave your heart too weak or stiff to fill and pump efficiently.


<img src="https://2rdnmg1qbg403gumla1v9i2h-wpengine.netdna-ssl.com/wp-content/uploads/sites/3/2017/01/HeartDiseaseCloggedArteries-650x428.jpg" width="800px">


<h1 style='background-color:Green; font-family:newtimeroman; font-size:250%; text-align:center; border-radius: 15px 50px;' > Dataset in this link </h1>




#### [Click Here](https://www.kaggle.com/andrewmvd/heart-failure-clinical-data)

In [None]:
pip install auto-sklearn

In [None]:
# Import Lab
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from autosklearn.classification import AutoSklearnClassifier

In [None]:
# Load the dataset

df= pd.read_csv('../input/heart-failure-clinical-data/heart_failure_clinical_records_dataset.csv')
df.head()

In [None]:
df.info()

In [None]:
df.describe().T

In [None]:
df["DEATH_EVENT"].value_counts()

df["DEATH_EVENT"].value_counts() * 100 / len(df)


sns.countplot(x="DEATH_EVENT", data=df, palette='viridis')

In [None]:
from sklearn.utils import resample

not_fraud = df[df["DEATH_EVENT"] == 0]
fraud = df[df["DEATH_EVENT"] == 1]

# upsample minority
fraud_upsampled = resample(fraud,
                          replace=True, # sample with replacement
                          n_samples=len(not_fraud), # match number in majority class
                          random_state=27) # reproducible results

# combine majority and upsampled minority
upsampled = pd.concat([not_fraud, fraud_upsampled])

# check new class counts
upsampled["DEATH_EVENT"].value_counts()

In [None]:
upsampled["DEATH_EVENT"].value_counts()

upsampled["DEATH_EVENT"].value_counts() * 100 / len(df)


sns.countplot(x="DEATH_EVENT", data=upsampled, palette='viridis')

In [None]:
import pandas_profiling as pp
pp.ProfileReport(upsampled)

In [None]:
sns.kdeplot(x=upsampled["time"], y=upsampled["age"], hue =upsampled["DEATH_EVENT"])

In [None]:
x=upsampled.drop(["DEATH_EVENT"],axis=1)
y=upsampled["DEATH_EVENT"]

In [None]:
# Split into train and test sets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.20, random_state=23)

In [None]:
print(f'Training Shape x:',x_train.shape)
print(f'Testing Shape x:',x_test.shape)
print('*****___________*****___________*****')
print(f'Training Shape y:',x.shape)
print(f'Testing Shape y:',y.shape)

In [None]:
# Auto-Sklearn Initialization

# time_left_for_this_task : Time limit in seconds to find the optimal configuration
# per_run_time_limi : Time limit in seconds for the each model
# ensemble_size: Number of models added to the Ensemble model
# initial_configurations_via_metalearning: "k" configurations to start the Bayesian Optimization
model = AutoSklearnClassifier(time_left_for_this_task=300, 
                              per_run_time_limit=9, 
                              ensemble_size=1, 
                              initial_configurations_via_metalearning=0)
# Init training
model.fit(x_train, y_train)

In [None]:
model.score(x_train, y_train)

In [None]:
model.score(x_test, y_test)

In [None]:
print(model.sprint_statistics())

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = model.predict(x_test)
cm = confusion_matrix(y_test, y_pred)
print(f'CM:',cm)
print(f'Accuracy:',accuracy_score(y_test, y_pred)* 100 ,'%')

In [None]:
conf_matrix = confusion_matrix(y_pred, y_test)

print(f'Confussion Matrix: \n{conf_matrix}\n')

sns.heatmap(conf_matrix, annot=True)

In [None]:
#Performance Measures
tn = conf_matrix[0,0]
fp = conf_matrix[0,1]
tp = conf_matrix[1,1]
fn = conf_matrix[1,0]

total = tn + fp + tp + fn
real_positive = tp + fn
real_negative = tn + fp

In [None]:
accuracy  = (tp + tn) / total # Accuracy Rate
precision = tp / (tp + fp) # Positive Predictive Value
recall    = tp / (tp + fn) # True Positive Rate
f1score  = 2 * precision * recall / (precision + recall)
specificity = tn / (tn + fp) # True Negative Rate
error_rate = (fp + fn) / total # Missclassification Rate
prevalence = real_positive / total
miss_rate = fn / real_positive # False Negative Rate
fall_out = fp / real_negative # False Positive Rate

print(f'Accuracy    : {accuracy}')
print(f'Precision   : {precision}')
print(f'Recall      : {recall}')
print(f'F1 score    : {f1score}')
print(f'Specificity : {specificity}')
print(f'Error Rate  : {error_rate}')
print(f'Prevalence  : {prevalence}')
print(f'Miss Rate   : {miss_rate}')
print(f'Fall Out    : {fall_out}')

In [None]:
#Classification Report
from sklearn.metrics import classification_report
print(classification_report(y_pred, y_test))