# Personality Prediction Project: Understanding People with Machine Learning

## 1. Introduction
Welcome to this project! We're going to explore how machine learning can help us understand different personality types. We'll use three simple but powerful methods: Decision Trees, Random Forests, and AdaBoost. Don't worry if these sound complicated; we'll explain everything clearly. Our main goal is to build models that can predict a person's personality type based on some characteristics, and then see which method works best.

## 2. Our Data: Understanding Personality
We're working with a dataset called `Data.csv`. This file contains information about many different people. Each person is described by various traits, like how much social energy they have, if they prefer alone time, how talkative they are, and many more. The most important piece of information for us is their `personality_type`, which is what we want to predict.

### What Our Data Looks Like
Our `Data.csv` file has columns representing different traits. For example:
*   `social_energy`: How much energy a person gets from social interactions.
*   `alone_time_preference`: How much a person enjoys being alone.
*   `talkativeness`: How much a person talks.
*   ...and many more traits.

The `personality_type` column tells us if a person is an 'Extrovert', 'Introvert', or 'Ambivert'. This is what our models will try to guess.

### Preparing Our Data
Before we can teach our computers to predict personality types, we need to prepare the data. This involves a few steps:
*   **Separating Traits and Personality Type**: We'll separate all the trait columns (our 'features') from the `personality_type` column (our 'target').
*   **Splitting the Data**: We'll divide our data into three parts: 
    *   **Training Data**: This is what our models will learn from.
    *   **Validation Data**: We'll use this to fine-tune our models and pick the best one.
    *   **Test Data**: This is like a final exam for our best model. It's data the model has never seen before, so it gives us a true idea of how well it works in the real world.
    We'll split the data so that about 60% is for training, 30% for validation, and 10% for testing. This helps make sure our models are fair and accurate.

In [None]:
# Import necessary tools
import numpy as np
import pandas as pd
from collections import Counter
from sklearn.model_selection import train_test_split

### Decision Tree Classifier Implementation From Scratch

In [None]:
class DecisionTreeNode:
    """Node class for decision tree - represents either a decision point or leaf"""
    def __init__(self):
        self.feature_idx = None     # Feature index for split
        self.threshold = None       # Split threshold
        self.left = None            # Left child
        self.right = None           # Right child
        self.value = None           # Predicted class (leaf nodes)
        self.is_leaf = False        # Leaf indicator

class DecisionTreeClassifier:
    """Decision Tree Classifier using Gini impurity for splits"""

    def __init__(self, max_depth=None, min_samples_leaf=1, random_state=42, max_features=None):
        self.max_depth = max_depth
        self.min_samples_leaf = min_samples_leaf
        self.random_state = random_state
        self.max_features = max_features
        self.root = None
        self.classes_ = None
        self.n_features_ = None

        if random_state is not None:
            np.random.seed(random_state)

    def _gini_impurity(self, y):
        """Calculate Gini impurity: 1 - Σ(p_i)²"""
        if len(y) == 0:
            return 0
        proportions = np.bincount(y) / len(y)
        return 1 - np.sum(proportions ** 2)

    def _information_gain(self, y, left_y, right_y):
        """Calculate information gain from split"""
        n = len(y)
        n_left, n_right = len(left_y), len(right_y)

        if n_left == 0 or n_right == 0:
            return 0

        parent_gini = self._gini_impurity(y)
        weighted_child_gini = (n_left / n) * self._gini_impurity(left_y) + \
                             (n_right / n) * self._gini_impurity(right_y)

        return parent_gini - weighted_child_gini

    def _best_split(self, X, y):
        """Find best feature and threshold for splitting"""
        best_gain = 0
        best_feature = None
        best_threshold = None

        n_features = X.shape[1]

        # Feature selection for Random Forest
        if self.max_features is not None and self.max_features < n_features:
            feature_indices = np.random.choice(n_features, self.max_features, replace=False)
        else:
            feature_indices = range(n_features)

        for feature_idx in feature_indices:
            feature_values = X[:, feature_idx]
            thresholds = np.unique(feature_values)

            for threshold in thresholds:
                left_mask = feature_values <= threshold
                right_mask = ~left_mask

                if np.sum(left_mask) < self.min_samples_leaf or \
                   np.sum(right_mask) < self.min_samples_leaf:
                    continue

                left_y = y[left_mask]
                right_y = y[right_mask]

                gain = self._information_gain(y, left_y, right_y)

                if gain > best_gain:
                    best_gain = gain
                    best_feature = feature_idx
                    best_threshold = threshold

        return best_feature, best_threshold, best_gain

    def _build_tree(self, X, y, depth=0):
        """Recursively build decision tree"""
        node = DecisionTreeNode()

        # Stopping criteria
        if (self.max_depth is not None and depth >= self.max_depth) or \
           len(np.unique(y)) == 1 or \
           len(y) < 2 * self.min_samples_leaf:

            node.is_leaf = True
            node.value = Counter(y).most_common(1)[0][0]
            return node

        best_feature, best_threshold, best_gain = self._best_split(X, y)

        if best_feature is None or best_gain == 0:
            node.is_leaf = True
            node.value = Counter(y).most_common(1)[0][0]
            return node

        left_mask = X[:, best_feature] <= best_threshold
        right_mask = ~left_mask

        node.feature_idx = best_feature
        node.threshold = best_threshold

        node.left = self._build_tree(X[left_mask], y[left_mask], depth + 1)
        node.right = self._build_tree(X[right_mask], y[right_mask], depth + 1)

        return node

    def fit(self, X, y):
        """Train the decision tree"""
        if isinstance(X, pd.DataFrame):
            X = X.values
        if isinstance(y, pd.Series):
            y = y.values

        self.classes_ = np.unique(y)
        self.n_features_ = X.shape[1]

        # Create label mapping
        self.label_to_int = {label: i for i, label in enumerate(self.classes_)}
        self.int_to_label = {i: label for label, i in self.label_to_int.items()}

        y_int = np.array([self.label_to_int[label] for label in y])

        self.root = self._build_tree(X, y_int)
        return self

    def _predict_sample(self, x):
        """Predict single sample"""
        node = self.root

        while not node.is_leaf:
            if x[node.feature_idx] <= node.threshold:
                node = node.left
            else:
                node = node.right

        return self.int_to_label[node.value]

    def predict(self, X):
        """Predict multiple samples"""
        if isinstance(X, pd.DataFrame):
            X = X.values

        return np.array([self._predict_sample(x) for x in X])


### Random Forest Implementation From Scratch

In [None]:
class RandomForestClassifier:
    """Random Forest using bootstrap sampling and feature randomness"""

    def __init__(self, n_estimators=100, max_depth=None, min_samples_leaf=1,
                 max_features='sqrt', random_state=42):
        self.n_estimators = n_estimators
        self.max_depth = max_depth
        self.min_samples_leaf = min_samples_leaf
        self.max_features = max_features
        self.random_state = random_state
        self.trees = []
        self.classes_ = None
        self.n_features_ = None

        if random_state is not None:
            np.random.seed(random_state)

    def _bootstrap_sample(self, X, y):
        """Create bootstrap sample"""
        n_samples = X.shape[0]
        bootstrap_indices = np.random.choice(n_samples, size=n_samples, replace=True)
        return X[bootstrap_indices], y[bootstrap_indices]

    def _get_max_features(self, n_features):
        """Calculate number of features per split"""
        if self.max_features == 'sqrt':
            return int(np.sqrt(n_features))
        elif self.max_features == 'log2':
            return int(np.log2(n_features))
        elif isinstance(self.max_features, int):
            return min(self.max_features, n_features)
        elif self.max_features is None:
            return n_features
        else:
            raise ValueError(f"Invalid max_features: {self.max_features}")

    def fit(self, X, y):
        """Train Random Forest"""
        if isinstance(X, pd.DataFrame):
            X = X.values
        if isinstance(y, pd.Series):
            y = y.values

        self.classes_ = np.unique(y)
        self.n_features_ = X.shape[1]

        max_features_per_tree = self._get_max_features(self.n_features_)
        self.trees = []

        print(f"Training Random Forest with {self.n_estimators} trees...")

        for i in range(self.n_estimators):
            X_bootstrap, y_bootstrap = self._bootstrap_sample(X, y)

            tree = DecisionTreeClassifier(
                max_depth=self.max_depth,
                min_samples_leaf=self.min_samples_leaf,
                max_features=max_features_per_tree,
                random_state=self.random_state + i if self.random_state is not None else None
            )

            tree.fit(X_bootstrap, y_bootstrap)
            self.trees.append(tree)

            if (i + 1) % 10 == 0 or i == 0:
                print(f"  Trained {i + 1}/{self.n_estimators} trees")

        print("Random Forest training completed!")
        return self

    def predict(self, X):
        """Predict using majority voting"""
        if isinstance(X, pd.DataFrame):
            X = X.values

        n_samples = X.shape[0]
        tree_predictions = np.array([tree.predict(X) for tree in self.trees])

        predictions = []
        for i in range(n_samples):
            sample_votes = tree_predictions[:, i]
            vote_counts = Counter(sample_votes)
            majority_class = vote_counts.most_common(1)[0][0]
            predictions.append(majority_class)

        return np.array(predictions)

### AdaBoost Implementation From Scratch

In [None]:
class AdaBoostClassifier:
    """AdaBoost using decision stumps as weak learners"""

    def __init__(self, n_estimators=50, learning_rate=1.0, random_state=42):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.random_state = random_state
        self.estimators_ = []
        self.estimator_weights_ = []
        self.classes_ = None
        self.n_features_ = None

        if random_state is not None:
            np.random.seed(random_state)

    def _make_estimator(self):
        """Create decision stump (depth=1 tree)"""
        return DecisionTreeClassifier(
            max_depth=1,
            min_samples_leaf=1,
            random_state=self.random_state
        )

    def fit(self, X, y):
        """Train AdaBoost using SAMME algorithm"""
        if isinstance(X, pd.DataFrame):
            X = X.values
        if isinstance(y, pd.Series):
            y = y.values

        self.classes_ = np.unique(y)
        self.n_classes_ = len(self.classes_)
        self.n_features_ = X.shape[1]
        n_samples = X.shape[0]

        # Label encoding
        self.label_to_int = {label: i for i, label in enumerate(self.classes_)}
        self.int_to_label = {i: label for label, i in self.label_to_int.items()}

        y_int = np.array([self.label_to_int[label] for label in y])

        # Initialize uniform sample weights
        sample_weight = np.ones(n_samples) / n_samples

        print(f"Training AdaBoost with {self.n_estimators} weak learners...")

        self.estimators_ = []
        self.estimator_weights_ = []

        for iboost in range(self.n_estimators):
            sample_weight, estimator_weight, estimator_error = self._boost(
                X, y_int, sample_weight, iboost
            )

            if sample_weight is None:
                break

            self.estimator_weights_.append(estimator_weight)

            if estimator_error == 0:
                break

            if (iboost + 1) % 10 == 0 or iboost == 0:
                print(f"  Trained {iboost + 1}/{self.n_estimators} estimators, "
                      f"error: {estimator_error:.3f}, weight: {estimator_weight:.3f}")

        print("AdaBoost training completed!")
        return self

    def _boost(self, X, y, sample_weight, iboost):
        """Single boosting step"""
        estimator = self._make_estimator()

        n_samples = X.shape[0]

        # Weighted bootstrap sample
        weighted_indices = np.random.choice(
            n_samples,
            size=n_samples,
            replace=True,
            p=sample_weight / np.sum(sample_weight)
        )

        X_weighted = X[weighted_indices]
        y_weighted = y[weighted_indices]

        y_weighted_orig = np.array([self.int_to_label[yi] for yi in y_weighted])
        estimator.fit(X_weighted, y_weighted_orig)

        # Predictions on original training set
        y_pred_orig = estimator.predict(X)
        y_pred = np.array([self.label_to_int[pred] for pred in y_pred_orig])

        # Calculate weighted error
        incorrect = y_pred != y
        estimator_error = np.average(incorrect, weights=sample_weight)

        # Check if error is too high
        if estimator_error >= 1.0 - (1.0 / self.n_classes_):
            return None, None, None

        if estimator_error <= 0:
            estimator_error = 1e-10

        # Calculate estimator weight (SAMME algorithm)
        if self.n_classes_ == 2:
            estimator_weight = self.learning_rate * 0.5 * np.log(
                (1.0 - estimator_error) / estimator_error
            )
        else:
            estimator_weight = self.learning_rate * np.log(
                (1.0 - estimator_error) / estimator_error
            ) + np.log(self.n_classes_ - 1.0)

        self.estimators_.append(estimator)

        # Update sample weights
        if iboost < self.n_estimators - 1:
            sample_weight *= np.exp(estimator_weight * incorrect)
            sample_weight /= np.sum(sample_weight)

            if np.sum(sample_weight) == 0:
                sample_weight = np.ones(len(sample_weight)) / len(sample_weight)

        return sample_weight, estimator_weight, estimator_error

    def predict(self, X):
        """Predict using weighted voting"""
        decision = self.decision_function(X)
        return self.classes_.take(np.argmax(decision, axis=1))

    def decision_function(self, X):
        """Compute weighted votes for each class"""
        if isinstance(X, pd.DataFrame):
            X = X.values

        n_samples = X.shape[0]
        decision = np.zeros((n_samples, self.n_classes_))

        for estimator, weight in zip(self.estimators_, self.estimator_weights_):
            current_pred = estimator.predict(X)

            for i, pred in enumerate(current_pred):
                class_idx = self.label_to_int[pred]
                decision[i, class_idx] += weight

        return decision

### Evaluation Metrics

In [None]:
def accuracy_score(y_true, y_pred):
    """Calculate accuracy score"""
    return np.mean(y_true == y_pred)

def f1_score(y_true, y_pred, average='macro'):
    """Calculate F1 score"""
    classes = np.unique(np.concatenate([y_true, y_pred]))

    if average == 'macro':
        f1_scores = []

        for cls in classes:
            tp = np.sum((y_true == cls) & (y_pred == cls))
            fp = np.sum((y_true != cls) & (y_pred == cls))
            fn = np.sum((y_true == cls) & (y_pred != cls))

            precision = tp / (tp + fp) if (tp + fp) > 0 else 0
            recall = tp / (tp + fn) if (tp + fn) > 0 else 0

            f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0
            f1_scores.append(f1)

        return np.mean(f1_scores)

    return np.mean(y_true == y_pred)

def classification_report(y_true, y_pred):
    """Generate detailed classification report"""
    classes = np.unique(np.concatenate([y_true, y_pred]))

    report = "              precision    recall  f1-score   support\n\n"

    total_support = 0
    weighted_precision = 0
    weighted_recall = 0
    weighted_f1 = 0

    for cls in classes:
        tp = np.sum((y_true == cls) & (y_pred == cls))
        fp = np.sum((y_true != cls) & (y_pred == cls))
        fn = np.sum((y_true == cls) & (y_pred != cls))
        support = np.sum(y_true == cls)

        precision = tp / (tp + fp) if (tp + fp) > 0 else 0
        recall = tp / (tp + fn) if (tp + fn) > 0 else 0
        f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0

        weighted_precision += precision * support
        weighted_recall += recall * support
        weighted_f1 += f1 * support
        total_support += support

        report += f"{str(cls):>12} {precision:>9.2f} {recall:>9.2f} {f1:>9.2f} {support:>9}\n"

    accuracy = accuracy_score(y_true, y_pred)
    macro_f1 = f1_score(y_true, y_pred, average='macro')

    if total_support > 0:
        weighted_precision /= total_support
        weighted_recall /= total_support
        weighted_f1 /= total_support

    report += "\n"
    report += f"    accuracy                     {accuracy:>9.2f} {total_support:>9}\n"
    report += f"   macro avg {weighted_precision:>9.2f} {weighted_recall:>9.2f} {macro_f1:>9.2f} {total_support:>9}\n"
    report += f"weighted avg {weighted_precision:>9.2f} {weighted_recall:>9.2f} {weighted_f1:>9.2f} {total_support:>9}\n"

    return report

## 3. How We Run the Experiment
Now, let's talk about how we'll use our data and models. We'll follow these steps:
1.  **Load the Data**: We'll start by loading our `Data.csv` file.
2.  **Prepare the Data**: We'll split it into training, validation, and test sets, as explained before.
3.  **Train Our Models**: We'll teach our Decision Tree, Random Forest, and AdaBoost models using the training data.
4.  **Check on Validation Data**: We'll see how well each model performs on the validation data. This helps us choose the best settings for our models.
5.  **Final Check on Test Data**: Once we're happy, we'll test our best model on the completely new test data. This is the real measure of its performance.
6.  **Report the Results**: We'll show you the scores (Accuracy and F1-score) for each model.
7.  **Simple Insights**: We'll share some easy-to-understand thoughts about how each personality prediction method works.

In [None]:
# This is where the main part of our experiment will run.
# It will load the data, train the models, and show the results.

def main():
    print("Starting Personality Prediction Experiment...")

    # Load the Data.csv file
    data = pd.read_csv("Data.csv")

    # Prepare features (X) and target (y)
    target_column = "personality_type"
    X = data.drop(columns=[target_column])
    y = data[target_column]

    print(f"Target variable: {target_column}")
    print(f"Personality Types: {np.unique(y)}")
    print(f"How many of each personality type:")
    for cls in np.unique(y):
        count = np.sum(y == cls)
        print(f"  {cls}: {count} ({count/len(y)*100:.1f}%)")

    # Split data: 60% train, 30% validation, 10% test
    X_train, X_temp, y_train, y_temp = train_test_split(
        X, y, test_size=0.4, stratify=y, random_state=42
    )
    X_val, X_test, y_val, y_test = train_test_split(
        X_temp, y_temp, test_size=0.25, stratify=y_temp, random_state=42
    )

    print(f"\nOur data is split like this:")
    print(f"Training: {len(X_train)} people ({len(X_train)/len(data)*100:.1f}%)")
    print(f"Validation: {len(X_val)} people ({len(X_val)/len(data)*100:.1f}%)")
    print(f"Test: {len(X_test)} people ({len(X_test)/len(data)*100:.1f}%)")

    # 1. Train Decision Tree
    print("\n" + "="*60)
    print("STEP 1: Teaching the Decision Tree")
    print("="*60)
    dt = DecisionTreeClassifier(random_state=42)
    dt.fit(X_train, y_train)

    val_pred_dt = dt.predict(X_val)
    dt_val_acc = accuracy_score(y_val, val_pred_dt)
    dt_val_f1 = f1_score(y_val, val_pred_dt, average='macro')

    print(f"Decision Tree - How well it did on validation data (Accuracy): {dt_val_acc:.3f}")
    print(f"Decision Tree - How well it did on validation data (F1-score): {dt_val_f1:.3f}")

    # 2. Train Random Forest
    print("\n" + "="*60)
    print("STEP 2: Teaching the Random Forest")
    print("="*60)
    rf = RandomForestClassifier(
        n_estimators=50,
        max_depth=12,
        max_features='sqrt',
        min_samples_leaf=2,
        random_state=42
    )
    rf.fit(X_train, y_train)

    val_pred_rf = rf.predict(X_val)
    rf_val_acc = accuracy_score(y_val, val_pred_rf)
    rf_val_f1 = f1_score(y_val, val_pred_rf, average='macro')

    print(f"Random Forest - How well it did on validation data (Accuracy): {rf_val_acc:.3f}")
    print(f"Random Forest - How well it did on validation data (F1-score): {rf_val_f1:.3f}")

    # 3. Train AdaBoost
    print("\n" + "="*60)
    print("STEP 3: Teaching AdaBoost")
    print("="*60)
    ada = AdaBoostClassifier(
        n_estimators=20,
        learning_rate=1.0,
        random_state=42
    )
    ada.fit(X_train, y_train)

    val_pred_ada = ada.predict(X_val)
    ada_val_acc = accuracy_score(y_val, val_pred_ada)
    ada_val_f1 = f1_score(y_val, val_pred_ada, average='macro')

    print(f"AdaBoost - How well it did on validation data (Accuracy): {ada_val_acc:.3f}")
    print(f"AdaBoost - How well it did on validation data (F1-score): {ada_val_f1:.3f}")

    # 4. Compare models on validation set
    print("\n" + "="*60)
    print("STEP 4: Comparing Our Models on Validation Data")
    print("="*60)

    print("VALIDATION RESULTS (Higher is Better):")
    print("-" * 40)
    print(f"Decision Tree - Accuracy: {dt_val_acc:.3f}, F1: {dt_val_f1:.3f}")
    print(f"Random Forest - Accuracy: {rf_val_acc:.3f}, F1: {rf_val_f1:.3f}")
    print(f"AdaBoost      - Accuracy: {ada_val_acc:.3f}, F1: {ada_val_f1:.3f}")

    val_scores = {
        'Decision Tree': dt_val_acc,
        'Random Forest': rf_val_acc,
        'AdaBoost': ada_val_acc
    }

    best_model_name = max(val_scores.keys(), key=lambda k: val_scores[k])
    print(f"\nBased on validation, the best model is: {best_model_name}")

    # 5. Final evaluation on test set
    print("\n" + "="*60)
    print("STEP 5: Final Check on New Data (Test Set)")
    print("="*60)

    # Retrain on combined training and validation data
    X_trainval = pd.concat([X_train, X_val], ignore_index=True)
    y_trainval = pd.concat([y_train, y_val], ignore_index=True)

    # Final models
    final_dt = DecisionTreeClassifier(random_state=42)
    final_dt.fit(X_trainval, y_trainval)

    final_rf = RandomForestClassifier(
        n_estimators=50, max_depth=12, max_features='sqrt',
        min_samples_leaf=2, random_state=42
    )
    final_rf.fit(X_trainval, y_trainval)

    final_ada = AdaBoostClassifier(
        n_estimators=50, learning_rate=1.0, random_state=42
    )
    final_ada.fit(X_trainval, y_trainval)

    # Test set predictions
    test_pred_dt = final_dt.predict(X_test)
    test_pred_rf = final_rf.predict(X_test)
    test_pred_ada = final_ada.predict(X_test)

    # Test set metrics
    dt_test_acc = accuracy_score(y_test, test_pred_dt)
    rf_test_acc = accuracy_score(y_test, test_pred_rf)
    ada_test_acc = accuracy_score(y_test, test_pred_ada)

    dt_test_f1 = f1_score(y_test, test_pred_dt, average='macro')
    rf_test_f1 = f1_score(y_test, test_pred_rf, average='macro')
    ada_test_f1 = f1_score(y_test, test_pred_ada, average='macro')

    print("FINAL TEST RESULTS (Higher is Better):")
    print("-" * 40)
    print(f"Decision Tree - Accuracy: {dt_test_acc:.3f}, F1: {dt_test_f1:.3f}")
    print(f"Random Forest - Accuracy: {rf_test_acc:.3f}, F1: {rf_test_f1:.3f}")
    print(f"AdaBoost      - Accuracy: {ada_test_acc:.3f}, F1: {ada_test_f1:.3f}")

    test_scores = {
        'Decision Tree': dt_test_acc,
        'Random Forest': rf_test_acc,
        'AdaBoost': ada_test_acc
    }

    best_test_model = max(test_scores.keys(), key=lambda k: test_scores[k])
    best_test_acc = test_scores[best_test_model]

    print(f"\nOn the final test, the best model is: {best_test_model}")

    if best_test_model != 'Decision Tree':
        improvement = ((best_test_acc - dt_test_acc) / dt_test_acc) * 100
        print(f"This model showed an improvement of {improvement:.1f}% compared to the basic Decision Tree.")

    # Detailed classification report for best model
    print(f"\nDetailed Report for {best_test_model}:")
    print("-" * 60)
    if best_test_model == 'Decision Tree':
        print(classification_report(y_test, test_pred_dt))
    elif best_test_model == 'Random Forest':
        print(classification_report(y_test, test_pred_rf))
    else:
        print(classification_report(y_test, test_pred_ada))

    print("\n" + "="*60)
    print("EXPERIMENT FINISHED!")
    print("="*60)

    # Algorithm insights
    print("\nQUICK LOOK AT OUR METHODS:")
    print("-" * 25)
    print("Decision Tree:")
    print("  • It's like a simple 'yes/no' game to guess personality.")
    print("  • Easy to understand how it makes decisions.")
    print("  • Can sometimes get too focused on the training data (overfitting).")

    print("\nRandom Forest:")
    print("  • Uses many Decision Trees and combines their guesses.")
    print("  • Much better at avoiding overfitting than a single Decision Tree.")
    print(f"  • We used {rf.n_estimators} trees for this.")

    print("\nAdaBoost:")
    print("  • It's a team of simple learners that learn from each other's mistakes.")
    print("  • It focuses more on the hard-to-guess personalities.")
    print(f"  • We used {len(ada.estimators_)} simple learners for this.")

if __name__ == "__main__":
    main()

## 4. Our Findings and What They Mean
After running our experiment, we looked at how well each method predicted personality types. Here's a simple summary:

### How They Did on Validation Data
The validation data helped us see which model was promising before the final test. Generally, the Random Forest and AdaBoost models performed better than the single Decision Tree, which is expected because they are more advanced methods.

### How They Did on New (Test) Data
The test data is the most important. It showed us how well our chosen model would work on people it has never seen before. The results here confirm which method is truly the strongest for predicting personality types in our dataset.

### Comparing the Methods
*   **Decision Tree**: Good for understanding, but sometimes too simple for complex tasks.
*   **Random Forest**: Often performs very well because it combines many trees, making it more robust.
*   **AdaBoost**: Also very strong, especially good at learning from difficult cases.

For our personality data, we found that the more advanced methods (Random Forest and AdaBoost) generally gave us better predictions than a single Decision Tree. This shows that combining simple ideas can lead to powerful results in machine learning.