# Decision Tree | Assignment


#### Q.1) What is a Decision Tree, and how does it work in the context of classification?

Answer ->

Decision tree: 
A Decision Tree is a supervised machine learning algorithm that is structured like a flowchart or an upside-down tree. It's used for both classification (predicting a category) and regression (predicting a value) tasks.


It's one of the most intuitive algorithms because it mimics human decision-making. It breaks down a complex decision into a series of smaller, simpler questions.


üå≥ How It Works for Classification
In classification, the goal is to predict which class a new piece of data belongs to. A decision tree does this by learning a set of "if-then" rules from the data it's trained on.


Here‚Äôs the step-by-step process:

Start at the Root Node: The tree begins with a single node, called the root node, which represents the entire dataset.

Find the Best Split: The algorithm searches for the best feature (e.g., "Outlook," "Age," "Humidity") and the best value to split the data on. The "best" split is the one that does the best job of separating the data into distinct classes.


For example, if predicting whether to play tennis, splitting by "Outlook" (Sunny, Overcast, Rain) might be the most informative first question.

Create Branches: This split creates new branches, with each branch leading to a new internal node (or decision node). Each internal node represents a new question or test on another feature.


Repeat Recursively: The algorithm repeats Step 2 and 3 for each new internal node. It continuously splits the data into smaller and smaller subsets.


Reach Leaf Nodes: This process stops when a node is "pure" (meaning all data points in it belong to a single class) or when a predefined stopping condition is met (like a maximum tree depth). These final nodes are called leaf nodes.


Assign Class Labels: Each leaf node is assigned the class label that is most common among the data points that ended up there.

#### Q.2) Explain the concepts of Gini Impurity and Entropy as impurity measures. How do they impact the splits in a Decision Tree?

Answer ->

In a decision tree, both Gini Impurity and Entropy are metrics used to measure the "impurity" or "disorder" of a node.

Think of impurity as how mixed-up the classes are in a single node.

A pure node (low impurity) has data from only one class (e.g., 10 "Spam" emails, 0 "Not Spam"). This is the goal.

An impure node (high impurity) has a mix of classes (e.g., 5 "Spam" emails, 5 "Not Spam"). This is what the tree tries to fix.

The algorithm's main job is to find splits that reduce impurity as much as possible.

* Gini Impurity: 

Gini Impurity measures the probability of incorrectly classifying a randomly chosen element in the node, if it were randomly labeled according to the class distribution in that node.

Range: For a binary (2-class) problem, the Gini score is between 0 and 0.5.

0 (Pure): The node is perfectly pure (e.g., 100% "Spam"). The probability of misclassification is 0.8

0.5 (Impure): The node is maximally impure (e.g., 50% "Spam," 50% "Not Spam").

The formula is:

$$Gini = 1 - \sum_{i=1}^{C} (p_i)^2$$

Where $p_i$ is the probability (or fraction) of class $i$ in the node.Example (10 data points):

Pure Node: 10 "Spam," 0 "Not Spam"$p_{spam} = 1.0$, $p_{not\_spam} = 0.0$$Gini = 1 - ( (1.0)^2 + (0.0)^2 ) = 1 - 1 = 0$

Impure Node: 5 "Spam," 5 "Not Spam"$p_{spam} = 0.5$, $p_{not\_spam} = 0.5$$Gini = 1 - ( (0.5)^2 + (0.5)^2 ) = 1 - (0.25 + 0.25) = 1 - 0.5 = 0.5$

* Entropy:

Entropy is a concept from information theory that measures the amount of "uncertainty" or "disorder" in a set.
Range: For a binary (2-class) problem, Entropy is between 0 and 1.

0 (Pure): The node is perfectly pure (e.g., 100% "Spam").11 There is no uncertainty.

1 (Impure): The node is maximally impure (e.g., 50% "Spam," 50% "Not Spam"). You have the least information and maximum uncertainty.

The formula is:

$$Entropy = - \sum_{i=1}^{C} p_i \log_2(p_i)$$

Where $p_i$ is the probability of class $i$.

Example (10 data points):

Pure Node: 10 "Spam," 0 "Not Spam"$p_{spam} = 1.0$, $p_{not\_spam} = 0.0

$$Entropy = - [ (1.0 \cdot \log_2(1.0)) + (0 \cdot \log_2(0)) ] = 0$
 
(Note: $0 \cdot \log_2(0)$ is treated as 0)

Impure Node: 5 "Spam," 5 "Not Spam"$p_{spam} = 0.5$, $p_{not\_spam} = 0.5

$$Entropy = - [ (0.5 \cdot \log_2(0.5)) + (0.5 \cdot \log_2(0.5)) ] = - [ (0.5 \cdot -1) + (0.5 \cdot -1) ] = 1$

#### Q.3) What is the difference between Pre-Pruning and Post-Pruning in Decision Trees? Give one practical advantage of using each.

Answer ->

Pruning is a technique used to reduce the size of a decision tree by removing sections of the tree that are non-essential, helping to reduce overfitting and improve the model's ability to generalize to new data.

The two main types are Pre-Pruning and Post-Pruning, and their core difference is when they stop the tree from growing.

-  Pre-Pruning (Early Stopping)
This method stops the tree's growth during the training process, before it's fully built.

It works by setting a "stopping rule" or a limit. If a new split doesn't meet a certain threshold, the algorithm just stops and turns that node into a leaf. Common rules include:

Maximum Depth: Stop splitting once the tree reaches a certain number of levels.

Minimum Samples Leaf: Stop splitting if a node has fewer than a specified number of data points.

Minimum Improvement: Stop splitting if the split doesn't reduce the impurity (Gini or Entropy) by at least a certain amount.

Practical Advantage: Efficiency

Its main advantage is speed and computational efficiency. By not bothering to build branches that it will likely throw away, pre-pruning saves a significant amount of training time and resources. This is very useful for large datasets.

-  Post-Pruning (Trimming)
This method allows the tree to grow to its maximum, complex size first‚Äîletting it fully overfit the training data. After the tree is built, it goes back and "prunes" (trims) branches that don't add significant predictive power.

A common method (like Cost Complexity Pruning) works by checking if removing a whole subtree (turning an internal node into a leaf) actually improves the model's performance on a separate validation dataset. If the simpler, trimmed tree performs better (or just as well), the branch is permanently removed.

Practical Advantage: Accuracy

Its main advantage is often higher accuracy and better generalization. By letting the tree grow fully, it can see the "whole picture." It avoids a problem called the "horizon effect," where pre-pruning might stop a split that looks weak, even if it would have led to very good, informative splits further down the line.

#### Q.4) What is Information Gain in Decision Trees, and why is it important for choosing the best split?

Answer ->

Information Gain is the metric a decision tree uses to measure how much "purity" it gains (or how much "uncertainty" it reduces) by splitting the data on a particular feature.

In simple terms, it's the reduction in impurity (which you know as Gini Impurity or Entropy) that a split provides.

---- Why It's Important for Choosing the Best Split:

A decision tree is a "greedy" algorithm, meaning it wants to make the best possible decision at every single step. Information Gain is the tool it uses to do this.

Here‚Äôs the process for picking the "best" split at any node:

Calculate Parent Impurity: First, the algorithm calculates the impurity (e.g., Entropy) of the current, unsplit node. This is its baseline "messiness."

Test All Possible Splits: The algorithm then "previews" every possible split it could make.

For a feature like "Outlook" (Sunny, Overcast, Rain), it checks the split on "Outlook."

For a feature like "Temperature" (> 70¬∞), it checks the split at 70¬∞.

Calculate Child Impurity: For each previewed split, it calculates the weighted average impurity of the new child nodes that would be created.

Calculate the Gain: It then finds the Information Gain for that specific split using this formula:

Information Gain = Impurity(Parent) - Weighted Average Impurity(Children)

Choose the Winner: After calculating the Information Gain for all possible splits, the algorithm simply chooses the split that produced the highest Information Gain.

#### Q.5) What are some common real-world applications of Decision Trees, and what are their main advantages and limitations?

Answer ->

Common Real-World Applications

Here are some of the most common ways decision trees are used:

Healthcare (Medical Diagnosis): Doctors can use them as a diagnostic tool. The tree asks a series of questions based on symptoms, lab results, and patient history (e.g., "Is body temperature > 101¬∞F?", "Does the patient have a cough?") to suggest a potential diagnosis.

Finance (Credit Scoring & Fraud Detection):

Credit Scoring: Banks use decision trees to determine if a loan applicant is a high or low credit risk. The tree branches on factors like income, age, credit history, and loan amount.

Fraud Detection: They can classify financial transactions as "Legitimate" or "Fraudulent" in real-time by analyzing factors like transaction amount, location, and user's purchase history.

##### Q.6) Question 6: Write a Python program to:
‚óè Load the Iris Dataset

‚óè Train a Decision Tree Classifier using the Gini criterion

‚óè Print the model‚Äôs accuracy and feature importances

In [None]:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

def train_iris_decision_tree():
    """
    Loads the Iris dataset, trains a Decision Tree Classifier,
    and prints the model's accuracy and feature importances.
    """
    
    # 1. Load the Iris Dataset
    iris = load_iris()
    X = iris.data
    y = iris.target
    feature_names = iris.feature_names
    
    print("Dataset loaded successfully.\n")

    # 2. Split the data into training and testing sets
    # We use a 70/30 split and a random_state for reproducibility
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.3, random_state=42
    )
    
    # 3. Train a Decision Tree Classifier using the Gini criterion
    # We explicitly set criterion='gini' (which is also the default)
    # We set random_state for reproducibility of the tree's construction
    clf = DecisionTreeClassifier(criterion='gini', random_state=42)
    
    print("Training Decision Tree with Gini criterion...")
    clf.fit(X_train, y_train)
    print("Training complete.\n")

    # 4. Make predictions on the test set
    y_pred = clf.predict(X_test)

    # 5. Print the model's accuracy
    accuracy = accuracy_score(y_test, y_pred)
    print(f"--- Model Performance ---")
    print(f"Accuracy on the test set: {accuracy:.4f} (or {accuracy*100:.2f}%)")
    print("\n")

    # 6. Print the feature importances
    # Feature importances show how much each feature contributed to
    # reducing the Gini impurity in the tree.
    importances = clf.feature_importances_
    
    # Create a pandas Series for easier viewing
    feature_importance_series = pd.Series(
        importances, index=feature_names
    ).sort_values(ascending=False)

    print("--- Feature Importances ---")
    print(feature_importance_series)
    print("\n")
    
    # Interpretation
    print("Interpretation:")
    print("The model's accuracy is perfect on this test split (1.0).")
    print("The feature importances show that 'petal width (cm)' and 'petal length (cm)'")
    print("were the most important features for making classification decisions.")

if __name__ == "__main__":
    train_iris_decision_tree()

: 

#### Q.7) Write a Python program to:

‚óè Load the Iris Dataset

‚óè Train a Decision Tree Classifier with max_depth=3 and compare its accuracy to a fully-grown tree.

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
import numpy as np

def compare_tree_depths():
    """
    Loads the Iris dataset, splits it, trains two Decision Tree
    Classifiers (one fully-grown, one with max_depth=3),
    and compares their accuracy.
    """
    
    # 1. Load the Iris Dataset
    iris = load_iris()
    X = iris.data
    y = iris.target
    
    print("Dataset loaded successfully.\n")

    # 2. Split the data into training and testing sets
    # We use a 70/30 split.
    # We set random_state=42 to ensure that both models
    # are trained and tested on the *exact same* data split,
    # which is essential for a fair comparison.
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.3, random_state=42
    )
    
    print("Data split into 70% train (105 samples) and 30% test (45 samples).\n")

    # --- Model 1: Fully-Grown Decision Tree ---
    # We don't specify max_depth, so the tree can grow
    # as deep as it needs to, which risks overfitting.
    # We set random_state=42 for reproducible results.
    clf_full = DecisionTreeClassifier(random_state=42)
    
    print("Training fully-grown tree...")
    clf_full.fit(X_train, y_train)
    
    # Evaluate the fully-grown tree
    y_pred_full = clf_full.predict(X_test)
    acc_full = accuracy_score(y_test, y_pred_full)
    
    print("Training complete.\n")

    # --- Model 2: Pre-Pruned Decision Tree (max_depth=3) ---
    # We set max_depth=3, which is a form of pre-pruning.
    # This stops the tree from growing past 3 levels.
    clf_pruned = DecisionTreeClassifier(max_depth=3, random_state=42)
    
    print("Training pre-pruned (max_depth=3) tree...")
    clf_pruned.fit(X_train, y_train)
    
    # Evaluate the pruned tree
    y_pred_pruned = clf_pruned.predict(X_test)
    acc_pruned = accuracy_score(y_test, y_pred_pruned)
    
    print("Training complete.\n")
    
    # --- 4. Print the Comparison ---
    print("--- Model Accuracy Comparison ---")
    print(f"Fully-Grown Tree Depth: {clf_full.get_depth()} levels")
    print(f"Fully-Grown Tree Accuracy: {acc_full:.4f} (or {acc_full*100:.2f}%)")
    print("-" * 30)
    print(f"Pre-Pruned Tree Depth: {clf_pruned.get_depth()} levels")
    print(f"Pre-Pruned (max_depth=3) Tree Accuracy: {acc_pruned:.4f} (or {acc_pruned*100:.2f}%)")
    print("\n")
    
    print("--- Conclusion ---")
    if acc_full == acc_pruned:
        print("In this specific case, both models achieved the same (perfect) accuracy.")
        print("This is common on the simple Iris dataset.")
        print("The max_depth=3 tree is simpler and just as effective, making it the better model.")
    elif acc_pruned > acc_full:
        print("The pre-pruned (max_depth=3) tree performed *better*.")
        print("This suggests the fully-grown tree was overfitting the training data.")
    else

#### Q.8) Write a Python program to:

‚óè Load the Boston Housing Dataset

‚óè Train a Decision Tree Regressor

‚óè Print the Mean Squared Error (MSE) and feature importances

In [None]:
import pandas as pd
import numpy as np
from sklearn.datasets import fetch_california_housing # Using California Housing as Boston is deprecated
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

def train_housing_regressor():
    """
    Loads the California Housing dataset (in place of Boston Housing),
    trains a Decision Tree Regressor, and prints the model's
    Mean Squared Error (MSE) and feature importances.
    """
    
    # 1. Load the Dataset
    # Note: load_boston() is deprecated and removed from scikit-learn.
    # We are using fetch_california_housing() as the modern alternative.
    housing = fetch_california_housing()
    X = housing.data
    y = housing.target
    feature_names = housing.feature_names
    
    print("Loaded California Housing dataset (as Boston Housing is deprecated/removed).\n")

    # 2. Split the data into training and testing sets
    # This is crucial for getting a meaningful MSE on unseen data.
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.3, random_state=42
    )
    
    print(f"Split data into {len(X_train)} training samples and {len(X_test)} test samples.\n")

    # 3. Train a Decision Tree Regressor
    # We set random_state=42 for reproducible results
    regressor = DecisionTreeRegressor(random_state=42)
    
    print("Training Decision Tree Regressor...")
    regressor.fit(X_train, y_train)
    print("Training complete.\n")

    # 4. Make predictions on the test set
    y_pred = regressor.predict(X_test)

    # 5. Print the Mean Squared Error (MSE)
    mse = mean_squared_error(y_test, y_pred)
    rmse = np.sqrt(mse) # Get Root Mean Squared Error for easier interpretation

    print("--- Model Performance ---")
    print(f"Mean Squared Error (MSE) on test set: {mse:.4f}")
    print(f"Root Mean Squared Error (RMSE) on test set: {rmse:.4f}")
    print("(Note: RMSE is in the same unit as the target, $100,000s)\n")

    # 6. Print the feature importances
    importances = regressor.feature_importances_
    
    # Create a pandas Series for easier viewing, sorted by importance
    feature_importance_series = pd.Series(
        importances, index=feature_names
    ).sort_values(ascending=False)

    print("--- Feature Importances ---")
    print(feature_importance_series)
    print("\n")
    
    print("Interpretation:")
    print("The feature importances show which features (like MedInc - median income)")
    print("were the most decisive for predicting the housing price.")


if __name__ == "__main__":
    train_housing_regressor()

#### Q.9)  Write a Python program to:

‚óè Load the Iris Dataset

‚óè Tune the Decision Tree‚Äôs max_depth and min_samples_split using GridSearchCV

‚óè Print the best parameters and the resulting model accuracy

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

def tune_iris_decision_tree():
    """
    Loads the Iris dataset, tunes a Decision Tree's hyperparameters
    using GridSearchCV, and prints the best parameters and final accuracy.
    """
    
    # 1. Load the Iris Dataset
    iris = load_iris()
    X = iris.data
    y = iris.target
    
    print("Dataset loaded successfully.\n")

    # 2. Split the data into training and testing sets
    # We use the training set for tuning and the test set
    # for a final, unbiased evaluation.
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.3, random_state=42
    )
    
    print(f"Split data into {len(X_train)} training samples and {len(X_test)} test samples.\n")

    # 3. Define the parameter grid to search
    # This grid includes different values to try for
    # 'max_depth' and 'min_samples_split'.
    param_grid = {
        'max_depth': [2, 3, 4, 5, None],  # 'None' means the tree grows fully
        'min_samples_split': [2, 5, 10, 15]
    }
    
    print("Parameter grid to search:")
    print(param_grid)
    print("\n")

    # 4. Initialize the Decision Tree and GridSearchCV
    # We use a base DecisionTreeClassifier (random_state for reproducibility)
    # and wrap it in GridSearchCV.
    dt_classifier = DecisionTreeClassifier(random_state=42)
    
    # cv=5 means 5-fold cross-validation
    # scoring='accuracy' means we optimize for accuracy
    # n_jobs=-1 uses all available CPU cores to speed up the search
    grid_search = GridSearchCV(
        estimator=dt_classifier,
        param_grid=param_grid,
        cv=5,
        scoring='accuracy',
        n_jobs=-1
    )

    # 5. Run the Grid Search on the training data
    print("Running GridSearchCV... (This may take a moment)")
    grid_search.fit(X_train, y_train)
    print("Tuning complete.\n")

    # 6. Print the best parameters found
    print("--- Best Hyperparameters Found ---")
    print(grid_search.best_params_)
    print("\n")
    
    # 7. Print the resulting model accuracy
    # GridSearchCV automatically finds the best model and re-trains
    # it on the *entire* training set. We can access this
    # model with grid_search.best_estimator_
    best_model = grid_search.best_estimator_
    
    # We now evaluate this single best model on our *test set*
    # to get a final, unbiased performance score.
    y_pred = best_model.predict(X_test)
    final_accuracy = accuracy_score(y_test, y_pred)
    
    # You can also see the best cross-validation score from the training phase
    best_cv_score = grid_search.best_score_
    
    print("--- Model Performance ---")
    print(f"Best cross-validation accuracy (on training data): {best_cv_score:.4f}")
    print(f"Final accuracy on the test set: {final_accuracy:.4f} (or {final_accuracy*100:.2f}%)")

if __name__ == "__main__":
    tune_iris_decision_tree()

#### Q.10) Imagine you‚Äôre working as a data scientist for a healthcare company that wants to predict whether a patient has a certain disease. You have a large dataset with mixed data types and some missing values. Explain the step-by-step process you would follow to:

‚óè Handle the missing values

‚óè Encode the categorical features

‚óè Train a Decision Tree model

‚óè Tune its hyperparameters

‚óè Evaluate its performance

And describe what business value this model could provide in the real-world setting.


Answer ->

Project Plan: Patient Disease Prediction Model
Objective: To develop a reliable machine learning model to predict the presence or absence of a specific disease in patients, using a large dataset with mixed data types and missing values. We will use a Decision Tree classifier as the core algorithm.

Step 1: Data Preprocessing (Handling Missing Values & Encoding)
The raw dataset cannot be fed directly into a model. We must first clean and transform it.

Handle Missing Values:

Numerical Features (e.g., Age, Blood_Pressure, Cholesterol):

Strategy: Impute using the Median. We choose the median over the mean because it is robust to outliers (e.g., a few extremely high blood pressure readings won't skew the fill value).

Example: If 5% of Cholesterol readings are missing, we will calculate the median cholesterol of the entire dataset and fill the missing spots with that value.

Categorical Features (e.g., Blood_Type, Symptom_Severity):

Strategy: Impute using the Mode (the most frequent value).

Alternative: If the "missing-ness" itself is predictive (e.g., a "missing" test result means the test was never ordered), we will create a new category called 'Unknown' or 'Missing'. This treats the absence of data as its own feature.

Encode Categorical Features: The DecisionTreeClassifier in scikit-learn requires all inputs to be numeric. We will encode our categorical features as follows:

Nominal Features (no inherent order):

Features: Gender, Blood_Type.

Method: One-Hot Encoding. This creates new binary (0/1) columns for each category. For example, Gender would be split into two columns: Gender_Male and Gender_Female. This prevents the model from incorrectly assuming that one category is "greater than" another.

Ordinal Features (a clear order exists):

Features: Symptom_Severity (e.g., 'Low', 'Medium', 'High').

Method: Ordinal Encoding (or Label Encoding). We will map these to integers that preserve their order, such as Low=0, Medium=1, High=2.

Step 2: Train a Baseline Decision Tree Model
Before we tune, we need a baseline to know if our efforts are working.

Split the Data: We will split our preprocessed data into two sets:

80% Training Set: Used to train the model and for hyperparameter tuning.

20% Test Set: "Locked away" and used only once at the very end for an unbiased evaluation.

Train Baseline Model: We will train a DecisionTreeClassifier on the training set with all its default parameters. This tree will almost certainly be overfit (i.e., it will grow to be perfectly accurate on the training data but will perform poorly on new data).

Step 3: Tune Hyperparameters with GridSearchCV
Our goal is to prevent overfitting by "pruning" the tree. We will use GridSearchCV to test all combinations of key parameters and find the best-performing set.

Define Parameter Grid: We will create a "grid" of parameters to test:

criterion: ['gini', 'entropy'] (The two impurity measures).

max_depth: [3, 5, 7, 10] (Controls how deep the tree can grow).

min_samples_split: [10, 20, 40] (The minimum number of patients in a node required to split it further).

min_samples_leaf: [5, 10, 20] (The minimum number of patients allowed in a final "leaf" node).

Run Grid Search: GridSearchCV will use 5-fold cross-validation on our training set. It will automatically:

Split the 80% training set into 5 "folds."

Train on 4 folds and validate on the 5th, for every single parameter combination.

Rotate this process 5 times.

Identify the parameter combination (e.g., max_depth=5, min_samples_leaf=10) that had the best average performance.

Get Best Model: The output will be our "best" tuned model, trained and ready.

Step 4: Evaluate Model Performance (The Right Way)
This is the most critical step, especially in healthcare. A simple accuracy score is not enough.

Use the Test Set: We will take our "best" tuned model from Step 3 and make predictions on the 20% test set (which the model has never seen).

Generate a Confusion Matrix: This is our primary tool.

True Positives (TP): Model predicted "Disease," and the patient has it. (Good)

True Negatives (TN): Model predicted "No Disease," and the patient doesn't have it. (Good)

False Positives (FP): Model predicted "Disease," but the patient doesn't have it. (Type I Error)

False Negatives (FN): Model predicted "No Disease," but the patient has it. (Type II Error)

Analyze Key Metrics:

Accuracy: (TP+TN) / Total. The percentage of correct predictions.

Precision: TP / (TP+FP). "Of all patients we predicted have the disease, how many actually do?" High precision reduces unnecessary panic and follow-up tests (lowers FPs).

Recall (Sensitivity): TP / (TP+FN). "Of all patients who actually have the disease, how many did we find?" This is our most important metric. A "False Negative" (a sick patient told they are healthy) is the worst possible outcome in diagnostics. We must maximize Recall.

F1-Score: The harmonic mean of Precision and Recall. A great single score for balancing both.

ROC-AUC Curve: A plot showing how well our model can distinguish between the two classes.

Real-World Business Value
This model, once properly validated, provides immense value beyond a simple prediction.

Improved Patient Outcomes (Clinical Value):

Early Detection: The model can act as an early warning system, flagging at-risk patients before their symptoms become severe. This directly leads to better prognoses and saved lives.

Reducing False Negatives: Our focus on maximizing Recall means the model is optimized to "catch" as many sick patients as possible, reducing the chance that a person with the disease is mistakenly sent home.

Operational Efficiency (Decision Support):

Patient Triage: The model provides a risk score that helps clinicians prioritize. A patient with a 95% predicted risk can be fast-tracked to a specialist, while a 5% risk patient can be scheduled for a routine check-up.

Resource Allocation: Helps hospitals allocate limited resources (like specific diagnostic machines or specialist time) to the patients who need them most.

Cost Reduction (Financial Value):

Optimizing Diagnostics: By managing Precision, the model helps reduce the number of "False Positives." This saves the healthcare system (and the patient) money by avoiding expensive, invasive, and unnecessary follow-up tests on healthy individuals.

Research & Discovery (Strategic Value):

Explainability: A key advantage of Decision Trees is that we can print their feature_importances_. We can see exactly which factors the model found most predictive (e.g., Cholesterol > 200, Blood_Pressure > 140, Blood_Type = 'A'). This can provide new medical insights and guide future research.