<a href="https://colab.research.google.com/github/PRIYANSHUJAINJECRC/Heart_disease_prediction/blob/main/HeartDisease.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **ðŸŽ¯ Project Objective**
## Predict whether a patient has heart disease based on medical attributes, using machine learning, while minimizing medically dangerous errors.

In [1]:
pip install ucimlrepo

Collecting ucimlrepo
  Downloading ucimlrepo-0.0.7-py3-none-any.whl.metadata (5.5 kB)
Downloading ucimlrepo-0.0.7-py3-none-any.whl (8.0 kB)
Installing collected packages: ucimlrepo
Successfully installed ucimlrepo-0.0.7


# **Problem Statement**
## The objective of this project is to predict the presence of heart disease in patients based on clinical and demographic features using machine learning techniques. The focus is on building a reliable classification model with appropriate evaluation metrics, as false negatives in medical diagnosis can be critical.

In [2]:
from ucimlrepo import fetch_ucirepo

# fetch dataset
heart_disease = fetch_ucirepo(id=45)

# data (as pandas dataframes)
X = heart_disease.data.features
y = heart_disease.data.targets

# metadata
print(heart_disease.metadata)

# variable information
print(heart_disease.variables)
print("Features (X):")
print(X.head())
print("\nTargets (y):")
print(y.head())


{'uci_id': 45, 'name': 'Heart Disease', 'repository_url': 'https://archive.ics.uci.edu/dataset/45/heart+disease', 'data_url': 'https://archive.ics.uci.edu/static/public/45/data.csv', 'abstract': '4 databases: Cleveland, Hungary, Switzerland, and the VA Long Beach', 'area': 'Health and Medicine', 'tasks': ['Classification'], 'characteristics': ['Multivariate'], 'num_instances': 303, 'num_features': 13, 'feature_types': ['Categorical', 'Integer', 'Real'], 'demographics': ['Age', 'Sex'], 'target_col': ['num'], 'index_col': None, 'has_missing_values': 'yes', 'missing_values_symbol': 'NaN', 'year_of_dataset_creation': 1989, 'last_updated': 'Fri Nov 03 2023', 'dataset_doi': '10.24432/C52P4X', 'creators': ['Andras Janosi', 'William Steinbrunn', 'Matthias Pfisterer', 'Robert Detrano'], 'intro_paper': {'ID': 231, 'type': 'NATIVE', 'title': 'International application of a new probability algorithm for the diagnosis of coronary artery disease.', 'authors': 'R. Detrano, A. JÃ¡nosi, W. Steinbrunn, 

In [3]:
###Task B
print(y.value_counts())
y_binary=(y>0).astype(int)
print(y_binary.value_counts())


num
0      164
1       55
2       36
3       35
4       13
Name: count, dtype: int64
num
0      164
1      139
Name: count, dtype: int64


In [4]:
import pandas as pd

# Rename the 'num' column in y_binary to 'target'
target_df = y_binary.rename(columns={'num': 'target'})

# Concatenate X and the correctly named target_df
df = pd.concat([X, target_df], axis=1)

print(df.head())
print(df.shape)
print(df.columns)
print(df.isnull().sum())

   age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  slope  \
0   63    1   1       145   233    1        2      150      0      2.3      3   
1   67    1   4       160   286    0        2      108      1      1.5      2   
2   67    1   4       120   229    0        2      129      1      2.6      2   
3   37    1   3       130   250    0        0      187      0      3.5      3   
4   41    0   2       130   204    0        2      172      0      1.4      1   

    ca  thal  target  
0  0.0   6.0       0  
1  3.0   3.0       1  
2  2.0   7.0       1  
3  0.0   3.0       0  
4  0.0   3.0       0  
(303, 14)
Index(['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach',
       'exang', 'oldpeak', 'slope', 'ca', 'thal', 'target'],
      dtype='object')
age         0
sex         0
cp          0
trestbps    0
chol        0
fbs         0
restecg     0
thalach     0
exang       0
oldpeak     0
slope       0
ca          4
thal        2
target      0
dtype: int

In [5]:
print(df['target'].value_counts(normalize=True)*100)
print(df.describe())
print(df.groupby('target')['age'].mean())
print(df.groupby('target')['chol'].mean())
print(df.groupby('target')['trestbps'].mean())
print(df.groupby('target')['thalach'].mean())
print(df.groupby('target')['thalach'].mean())
print(df.groupby('target')['sex'].mean())
print(df.groupby('target')['cp'].mean())
print(df.groupby('target')['thal'].mean())
print(df.groupby('target')['ca'].mean())
##Sex is the most related feature


target
0    54.125413
1    45.874587
Name: proportion, dtype: float64
              age         sex          cp    trestbps        chol         fbs  \
count  303.000000  303.000000  303.000000  303.000000  303.000000  303.000000   
mean    54.438944    0.679868    3.158416  131.689769  246.693069    0.148515   
std      9.038662    0.467299    0.960126   17.599748   51.776918    0.356198   
min     29.000000    0.000000    1.000000   94.000000  126.000000    0.000000   
25%     48.000000    0.000000    3.000000  120.000000  211.000000    0.000000   
50%     56.000000    1.000000    3.000000  130.000000  241.000000    0.000000   
75%     61.000000    1.000000    4.000000  140.000000  275.000000    0.000000   
max     77.000000    1.000000    4.000000  200.000000  564.000000    1.000000   

          restecg     thalach       exang     oldpeak       slope          ca  \
count  303.000000  303.000000  303.000000  303.000000  303.000000  299.000000   
mean     0.990099  149.607261    0.326

In [6]:
##Task D
df['AgeGroup']=pd.cut(df['age'],bins=[0,11,21,df['age'].max() + 1],labels=['Child','Adult','Senior'], right=False)
df['CholesterolRisk']=(df['chol']>=240).astype(int)
df['BloodPressureRisk']=(df['trestbps']>=140).astype(int)
df['exercise_related_heart_stress'] = df['oldpeak'] / df['thalach']

# Impute missing values in 'ca' with its median
df['ca'].fillna(df['ca'].median(), inplace=True)

# Impute missing values in 'thal' with its mode
df['thal'].fillna(df['thal'].mode()[0], inplace=True)

# Handle any remaining missing AgeGroup values by filling with the most frequent category
df['AgeGroup'].fillna(df['AgeGroup'].mode()[0], inplace=True)

print(df[['oldpeak', 'thalach', 'exercise_related_heart_stress']].head())
print(df.head())
print(df.groupby('target')['AgeGroup'].value_counts())
print(df.groupby('target')['CholesterolRisk'].value_counts())
print(df.groupby('target')['BloodPressureRisk'].value_counts())
print(df.groupby('target')['exercise_related_heart_stress'].value_counts())
print("Missing values after imputation:")
print(df.isnull().sum())

   oldpeak  thalach  exercise_related_heart_stress
0      2.3      150                       0.015333
1      1.5      108                       0.013889
2      2.6      129                       0.020155
3      3.5      187                       0.018717
4      1.4      172                       0.008140
   age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  slope  \
0   63    1   1       145   233    1        2      150      0      2.3      3   
1   67    1   4       160   286    0        2      108      1      1.5      2   
2   67    1   4       120   229    0        2      129      1      2.6      2   
3   37    1   3       130   250    0        0      187      0      3.5      3   
4   41    0   2       130   204    0        2      172      0      1.4      1   

    ca  thal  target AgeGroup  CholesterolRisk  BloodPressureRisk  \
0  0.0   6.0       0   Senior                0                  1   
1  3.0   3.0       1   Senior                1                  1   


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['ca'].fillna(df['ca'].median(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['thal'].fillna(df['thal'].mode()[0], inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setti

In [7]:
y=df['target']
x=df.drop(columns=['target'])
num_col=x.select_dtypes(include=['int64','float64']).columns
cat_col=x.select_dtypes(include=['object','category']).columns
print(num_col)
print(cat_col)
print(x)

Index(['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach',
       'exang', 'oldpeak', 'slope', 'ca', 'thal', 'CholesterolRisk',
       'BloodPressureRisk', 'exercise_related_heart_stress'],
      dtype='object')
Index(['AgeGroup'], dtype='object')
     age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  \
0     63    1   1       145   233    1        2      150      0      2.3   
1     67    1   4       160   286    0        2      108      1      1.5   
2     67    1   4       120   229    0        2      129      1      2.6   
3     37    1   3       130   250    0        0      187      0      3.5   
4     41    0   2       130   204    0        2      172      0      1.4   
..   ...  ...  ..       ...   ...  ...      ...      ...    ...      ...   
298   45    1   1       110   264    0        0      132      0      1.2   
299   68    1   4       144   193    1        0      141      0      3.4   
300   57    1   4       130   131    0        0    

In [8]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score,confusion_matrix,classification_report,precision_score,recall_score,f1_score
import matplotlib.pyplot as plt
import seaborn as sns

model=Pipeline(steps=[('preprocessor',ColumnTransformer(transformers=[('num',StandardScaler(),num_col),('cat',OneHotEncoder(),cat_col)])),('classifier',LogisticRegression(max_iter=1000))])
x_train,x_test,y_train,y_test=train_test_split(x,y_binary.squeeze(),test_size=0.2,random_state=42)
model.fit(x_train,y_train)
y_pred=model.predict(x_test)
print(accuracy_score(y_test,y_pred))
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
print(precision_score(y_test,y_pred))
print(recall_score(y_test,y_pred))

0.8360655737704918
[[24  5]
 [ 5 27]]
              precision    recall  f1-score   support

           0       0.83      0.83      0.83        29
           1       0.84      0.84      0.84        32

    accuracy                           0.84        61
   macro avg       0.84      0.84      0.84        61
weighted avg       0.84      0.84      0.84        61

0.84375
0.84375


In [9]:
model_=Pipeline(steps=[('preprocessor',ColumnTransformer(transformers=[('num',StandardScaler(),num_col),('cat',OneHotEncoder(),cat_col)])),('classifier',LogisticRegression(max_iter=1000,class_weight='balanced'))])
x_train,x_test,y_train,y_test=train_test_split(x,y_binary.squeeze(),test_size=0.2,random_state=42)
model_.fit(x_train,y_train)
y_pred_=model_.predict(x_test)
print(accuracy_score(y_test,y_pred_))
print(confusion_matrix(y_test,y_pred_))
print(classification_report(y_test,y_pred_))
print(precision_score(y_test,y_pred_))
print(recall_score(y_test,y_pred_))

0.8688524590163934
[[24  5]
 [ 3 29]]
              precision    recall  f1-score   support

           0       0.89      0.83      0.86        29
           1       0.85      0.91      0.88        32

    accuracy                           0.87        61
   macro avg       0.87      0.87      0.87        61
weighted avg       0.87      0.87      0.87        61

0.8529411764705882
0.90625


In [10]:
print(model_.predict_proba(x_test))


[[0.20706479 0.79293521]
 [0.1818227  0.8181773 ]
 [0.16846323 0.83153677]
 [0.47292999 0.52707001]
 [0.28947191 0.71052809]
 [0.14827058 0.85172942]
 [0.02308793 0.97691207]
 [0.00341159 0.99658841]
 [0.71439481 0.28560519]
 [0.36291422 0.63708578]
 [0.93548184 0.06451816]
 [0.98044062 0.01955938]
 [0.18939243 0.81060757]
 [0.01202348 0.98797652]
 [0.00338229 0.99661771]
 [0.91702913 0.08297087]
 [0.96072845 0.03927155]
 [0.28066015 0.71933985]
 [0.00115036 0.99884964]
 [0.88376076 0.11623924]
 [0.1764381  0.8235619 ]
 [0.92168307 0.07831693]
 [0.00527858 0.99472142]
 [0.94071779 0.05928221]
 [0.00514709 0.99485291]
 [0.96951245 0.03048755]
 [0.6211149  0.3788851 ]
 [0.10228289 0.89771711]
 [0.08859678 0.91140322]
 [0.45596205 0.54403795]
 [0.7601539  0.2398461 ]
 [0.35593484 0.64406516]
 [0.97824056 0.02175944]
 [0.53649954 0.46350046]
 [0.78465069 0.21534931]
 [0.27932138 0.72067862]
 [0.01569259 0.98430741]
 [0.41428403 0.58571597]
 [0.0069483  0.9930517 ]
 [0.80611025 0.19388975]


In [11]:
positive_class_probabilities = model_.predict_proba(x_test)[:, 1]
print("Probabilities for the positive class:")
print(positive_class_probabilities)

Probabilities for the positive class:
[0.79293521 0.8181773  0.83153677 0.52707001 0.71052809 0.85172942
 0.97691207 0.99658841 0.28560519 0.63708578 0.06451816 0.01955938
 0.81060757 0.98797652 0.99661771 0.08297087 0.03927155 0.71933985
 0.99884964 0.11623924 0.8235619  0.07831693 0.99472142 0.05928221
 0.99485291 0.03048755 0.3788851  0.89771711 0.91140322 0.54403795
 0.2398461  0.64406516 0.02175944 0.46350046 0.21534931 0.72067862
 0.98430741 0.58571597 0.9930517  0.19388975 0.98700503 0.30683987
 0.9117747  0.04762202 0.26895359 0.94131036 0.12430773 0.17941468
 0.85233517 0.841808   0.33804159 0.02620832 0.0262289  0.1781316
 0.9590998  0.20212201 0.04709275 0.97560795 0.92995536 0.95279634
 0.15667743]


In [12]:
custom_threshold = 0.4
custom_predictions = (positive_class_probabilities >= custom_threshold).astype(int)

print(f"Predictions using a custom threshold of {custom_threshold}:")
print(custom_predictions)

# You can also evaluate these custom predictions against y_test if needed
# from sklearn.metrics import classification_report
# print(classification_report(y_test, custom_predictions))

Predictions using a custom threshold of 0.4:
[1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 0 0 1 1 0 1 0 1 0 1 0 0 1 1 1 0 1 0 1 0 1 1
 1 1 0 1 0 1 0 0 1 0 0 1 1 0 0 0 0 1 0 0 1 1 1 0]


# Task
Import the necessary libraries for building the interactive UI, specifically `ipywidgets`.

## Import UI Libraries

### Subtask:
Import necessary libraries for building the interactive UI, specifically `ipywidgets`.


**Reasoning**:
Import the `ipywidgets` library as `widgets` to enable interactive UI elements for the Colab notebook.



In [13]:
import ipywidgets as widgets
print("ipywidgets imported as widgets.")

ipywidgets imported as widgets.


## Define Prediction Function

### Subtask:
Create a Python function that takes new patient data (as a Pandas DataFrame or Series), applies the trained preprocessing pipeline (StandardScaler and OneHotEncoder), and then uses the `model_` to predict the probability of heart disease. This function will also apply the `custom_threshold` to return a binary prediction.


**Reasoning**:
To define the prediction function, I will create a Python function `predict_heart_disease` that takes patient data, uses the pre-trained `model_` to predict probabilities, applies the `custom_threshold`, and returns both the probability and the binary prediction.



In [20]:
def predict_heart_disease(patient_data):
    # Ensure patient_data is a DataFrame, even if a Series is passed
    if isinstance(patient_data, pd.Series):
        patient_data = pd.DataFrame([patient_data])

    # Predict probabilities using the trained pipeline
    # model_.predict_proba returns probabilities for [class 0, class 1]
    probabilities = model_.predict_proba(patient_data)
    positive_class_probability = probabilities[:, 1][0] # Get probability for the positive class (heart disease)

    # Apply the custom threshold for binary prediction
    binary_prediction = (positive_class_probability >= custom_threshold).astype(int)

    return positive_class_probability, binary_prediction

## Create Input Widgets

### Subtask:
Develop interactive input widgets for each feature in the `x` DataFrame, including numerical inputs (e.g., `IntSlider`, `FloatText`) and categorical dropdowns (e.g., `Dropdown`), ensuring they cover all original and engineered features like 'AgeGroup', 'CholesterolRisk', 'BloodPressureRisk', and 'exercise_related_heart_stress'.


**Reasoning**:
To create interactive widgets for each feature in the `x` DataFrame, I will iterate through its columns and generate `ipywidgets` widgets (IntSlider, FloatText, Dropdown) with appropriate min/max values or categorical options based on the data in `df` and descriptions from `heart_disease.variables`.



In [15]:
input_widgets = {}

# Get ranges for numerical features
feature_ranges = {
    'age': (int(df['age'].min()), int(df['age'].max()), 1),
    'trestbps': (int(df['trestbps'].min()), int(df['trestbps'].max()), 1),
    'chol': (int(df['chol'].min()), int(df['chol'].max()), 1),
    'thalach': (int(df['thalach'].min()), int(df['thalach'].max()), 1),
    'oldpeak': (df['oldpeak'].min(), df['oldpeak'].max(), 0.1),
    'exercise_related_heart_stress': (df['exercise_related_heart_stress'].min(), df['exercise_related_heart_stress'].max(), 0.001),
    'ca': (0, 3, 1) # ca is specified as 0-3
}

# Define options for categorical features
categorical_options = {
    'sex': [('Male', 1), ('Female', 0)],
    'fbs': [('No (<120 mg/dl)', 0), ('Yes (>120 mg/dl)', 1)],
    'exang': [('No', 0), ('Yes', 1)],
    'cp': [('Typical Angina', 1), ('Atypical Angina', 2), ('Non-anginal Pain', 3), ('Asymptomatic', 4)],
    'restecg': [('Normal', 0), ('ST-T Wave Abnormality', 1), ('Left Ventricular Hypertrophy', 2)],
    'slope': [('Upsloping', 1), ('Flat', 2), ('Downsloping', 3)],
    'thal': [('Normal', 3), ('Fixed Defect', 6), ('Reversible Defect', 7)],
    'AgeGroup': [('Child', 'Child'), ('Adult', 'Adult'), ('Senior', 'Senior')],
    'CholesterolRisk': [('No Risk', 0), ('At Risk', 1)],
    'BloodPressureRisk': [('No Risk', 0), ('At Risk', 1)]
}

# Create widgets for each feature
for col in x.columns:
    if col in ['age', 'trestbps', 'chol', 'thalach']:
        min_val, max_val, step_val = feature_ranges[col]
        input_widgets[col] = widgets.IntSlider(
            min=min_val, max=max_val, step=step_val, description=col.capitalize(),
            value=int(df[col].median()), continuous_update=False
        )
    elif col == 'ca':
        min_val, max_val, step_val = feature_ranges[col]
        input_widgets[col] = widgets.IntSlider(
            min=min_val, max=max_val, step=step_val, description=col.upper(),
            value=int(df[col].median()), continuous_update=False
        )
    elif col in ['oldpeak', 'exercise_related_heart_stress']:
        min_val, max_val, step_val = feature_ranges[col]
        input_widgets[col] = widgets.FloatText(
            min=min_val, max=max_val, step=step_val, description=col.replace('_', ' ').title(),
            value=df[col].median()
        )
    elif col in categorical_options:
        options = categorical_options[col]
        default_value = df[col].mode()[0] if col != 'AgeGroup' else df[col].mode()[0]
        input_widgets[col] = widgets.Dropdown(
            options=options, description=col.replace('_', ' ').title(),
            value=default_value
        )
    else:
        print(f"Warning: No specific widget type defined for column: {col}")

# Verify all columns have a widget
if len(input_widgets) == len(x.columns):
    print(f"Successfully created {len(input_widgets)} input widgets for all features.")
else:
    print(f"Error: Expected {len(x.columns)} widgets, but created {len(input_widgets)}.")

# Display some widgets to confirm (optional)
# from IPython.display import display
# for key, widget in list(input_widgets.items())[:5]: # Display first 5 widgets
#     display(widget)

Successfully created 17 input widgets for all features.


## Arrange Widgets and Button

### Subtask:
Organize the input widgets and a 'Predict' button using `ipywidgets.VBox` or `HBox` for a structured UI layout.


**Reasoning**:
To organize the input widgets and a 'Predict' button, I will import the necessary `ipywidgets` classes (`VBox`, `HBox`, `Button`) and then arrange the previously created `input_widgets` along with a new 'Predict' button into a structured UI layout using `VBox`.



In [16]:
from ipywidgets import VBox, HBox, Button, Layout
from IPython.display import display

# Create a Predict button
predict_button = Button(
    description='Predict Heart Disease',
    disabled=False,
    button_style='', # 'success', 'info', 'warning', 'danger' or ''
    tooltip='Click to predict',
    icon='heart'
)

# Arrange widgets in a grid-like structure for better readability
# Grouping related features together

age_sex_cp = HBox([input_widgets['age'], input_widgets['sex'], input_widgets['cp']])
chol_bp_fbs = HBox([input_widgets['trestbps'], input_widgets['chol'], input_widgets['fbs']])
restecg_thalach_exang = HBox([input_widgets['restecg'], input_widgets['thalach'], input_widgets['exang']])
oldpeak_slope_ca_thal = HBox([input_widgets['oldpeak'], input_widgets['slope'], input_widgets['ca'], input_widgets['thal']])
engineered_features = HBox([input_widgets['AgeGroup'], input_widgets['CholesterolRisk'], input_widgets['BloodPressureRisk'], input_widgets['exercise_related_heart_stress']])

# Combine all groups into a vertical box
ui_layout = VBox([
    age_sex_cp,
    chol_bp_fbs,
    restecg_thalach_exang,
    oldpeak_slope_ca_thal,
    engineered_features
])

# Combine the arranged widgets and the predict button into the final UI
full_ui = VBox([ui_layout, predict_button])

print("Predict button and UI layout created.")
# Display the full UI (optional, will be displayed in the next step when connecting functionality)
# display(full_ui)

Predict button and UI layout created.


**Reasoning**:
The previous step created the UI layout and the predict button. Now, it's essential to display this created UI to the user to visually confirm its structure and readiness for interaction.



In [17]:
display(full_ui)
print("UI displayed.")

VBox(children=(VBox(children=(HBox(children=(IntSlider(value=56, continuous_update=False, description='Age', mâ€¦

UI displayed.


## Implement Prediction Logic and Display

### Subtask:
Write the logic to be executed when the 'Predict' button is clicked. This will involve retrieving values from the input widgets, passing them to the prediction function, and displaying the predicted probability and binary outcome in an `ipywidgets.Output` area. Ensure robust error handling for invalid inputs.


**Reasoning**:
To implement the prediction logic, I will first import the `Output` widget for displaying results and initialize an `output_area`. Then, I will define the `on_predict_button_clicked` function to handle data retrieval from widgets, call the prediction function, display results, and include error handling. Finally, I will link this function to the predict button's click event and update the UI to include the output area.



In [22]:
from ipywidgets import Output

# 1. & 2. Define an output area for displaying results
output_area = Output()

# 3. Create a function that will be called when the predict_button is clicked
def on_predict_button_clicked(b):
    with output_area: # Direct output to this widget
        output_area.clear_output() # Clear previous output
        try:
            # 4b. Collect the current values from each widget
            patient_data_dict = {}
            for col, widget in input_widgets.items():
                patient_data_dict[col] = widget.value

            # 4c. Convert dictionary to a Pandas Series
            # Ensure the order of features matches the x DataFrame used during model training
            patient_data_series = pd.Series(patient_data_dict, index=x.columns)

            # 4d. Call the predict_heart_disease function
            probability, binary_prediction = predict_heart_disease(patient_data_series)

            # 4e. Display the predicted probability and binary prediction
            print(f"Predicted Heart Disease Probability: {probability:.4f}")
            if binary_prediction == 1:
                print("Prediction: Heart Disease (Positive)")
            else:
                print("Prediction: No Heart Disease (Negative)")

        except Exception as e:
            # 4f. Implement error handling
            print(f"An error occurred: {e}")

# 5. Attach the on_predict_button_clicked function to the predict_button's on_click event
predict_button.on_click(on_predict_button_clicked)

# 6. Update the full_ui layout to include the output_area
full_ui = VBox([ui_layout, predict_button, output_area])

print("Prediction logic implemented, button linked, and UI updated.")
display(full_ui)

Prediction logic implemented, button linked, and UI updated.


VBox(children=(VBox(children=(HBox(children=(IntSlider(value=37, continuous_update=False, description='Age', mâ€¦

## Final Task

### Subtask:
Provide a summary of the UI and how to use it for making new predictions.


## Summary:

### Q&A
The UI allows users to make new predictions by adjusting various patient health parameters using interactive widgets and then clicking a "Predict Heart Disease" button. The results, including the predicted probability and binary outcome, are displayed directly within the UI.

To use the UI for making new predictions:
1.  **Adjust Input Parameters**: Utilize the `IntSlider`, `FloatText`, and `Dropdown` widgets to input new patient data. Each widget corresponds to a specific feature (e.g., 'age', 'sex', 'chol', 'AgeGroup', 'exercise_related_heart_stress').
2.  **Initiate Prediction**: Once all desired input values are set, click the "Predict Heart Disease" button.
3.  **View Results**: The prediction outcome, including the precise predicted probability of heart disease and a clear binary classification (Heart Disease: Positive/Negative), will appear in the output area below the button.

### Data Analysis Key Findings
*   The `ipywidgets` library was successfully imported and aliased as `widgets`.
*   A `predict_heart_disease` function was defined to take patient data (DataFrame or Series), apply the pre-trained `model_` for probability prediction, and then use a `custom_threshold` to return a binary outcome.
*   Interactive input widgets were successfully created for all 17 features of the dataset, including original features ('age', 'sex', 'chol', etc.) and engineered features ('AgeGroup', 'CholesterolRisk', 'BloodPressureRisk', 'exercise_related_heart_stress').
    *   Numerical features used `IntSlider` or `FloatText` widgets, with ranges and steps derived from the dataset's min/max values.
    *   Categorical features used `Dropdown` widgets with predefined options.
*   The widgets and a 'Predict Heart Disease' button were organized into a structured UI layout using `ipywidgets.VBox` and `HBox`, enhancing readability by grouping related features.
*   Prediction logic was implemented to:
    *   Collect values from all input widgets upon button click.
    *   Convert collected data into a Pandas Series, ensuring feature order matches the model's training data.
    *   Call the `predict_heart_disease` function.
    *   Display the predicted heart disease probability (formatted to four decimal places) and a clear binary prediction (e.g., "Prediction: Heart Disease (Positive)") in an `ipywidgets.Output` area.
    *   Include robust error handling to catch and report issues during the prediction process.

### Insights or Next Steps
*   The developed interactive UI provides a user-friendly tool for instant heart disease prediction based on customizable patient parameters, making the model accessible for individual assessments.
*   Further enhancements could include visual indicators (e.g., color-coded output) for high/low risk, or the ability to save prediction results for comparative analysis.
