# Project Overview: Product Rating Prediction
# By: Roaa Alaa Abdelghany & Mariam Khaled Ahmed

## Objective
The goal of this project is to predict whether a product will receive a **high rating (good)** or a **low rating (bad)** based on its features.  
The target variable is derived from the product's **average rating**:
- **Good rating:** ≥ 4 (labeled as 1)
- **Bad rating:** < 4 (labeled as 0)

This is primarily a **binary classification task**, with an additional Linear Regression model included to analyze the ratings as continuous values.

---

## Features (X)
The features represent various attributes of the products, like its name, brand, price, and rating stars. During preprocessing, the following steps were applied:
- Dropped irrelevant/useless columns (like product_link and product_link_id).
- Scaled numerical features for algorithms sensitive to feature magnitude (e.g., SVM, KNN).
- Encoded categorical variables where needed.

---

## Target (y)
The target is a binary indicator of rating quality:
- `1` → Good rating (≥ 4)
- `0` → Bad rating (< 4)

For the **XGBoost model**, the target was prepared specifically in binary form.  

---

## Project Workflow
1. **Preprocessing:** Cleaned the dataset, dropped unnecessary columns, scaled data, and separated features (`X`) from the target (`y`).
2. **Model Training:** Multiple algorithms were trained and tested.
3. **Evaluation:** Accuracy scores were used for classification models, while R² was used for Linear Regression.
4. **Model Saving & Prediction:** Each trained model was saved using `pickle` and can be reloaded for predictions on new data.

---

## Algorithms Used

### Classification Models
- Support Vector Machine (SVM)
- Random Forest
- Bagging Classifier
- AdaBoost Classifier
- XGBoost (binary classification)
- Logistic Regression
- k-Nearest Neighbors (KNN)
- Naive Bayes (GaussianNB)
---

# Libraries and Their Usage

## Data Handling & Preprocessing
- **pandas (as pd):**  
  Used for loading, manipulating, and analyzing datasets. Provides DataFrame and Series objects to store and process tabular data.

- **train_test_split (from sklearn.model_selection):**  
  Splits the dataset into training and testing subsets to evaluate model generalization.

- **OneHotEncoder (from sklearn.preprocessing):**  
  Converts categorical variables into one-hot encoded (binary) columns.

- **StandardScaler (from sklearn.preprocessing):**  
  Standardizes numerical features by removing the mean and scaling to unit variance.

- **ColumnTransformer (from sklearn.compose):**  
  Allows applying different preprocessing steps to different columns (e.g., scaling numerical features and encoding categorical ones simultaneously).

- **Pipeline (from sklearn.pipeline):**  
  Chains preprocessing steps and models into a single workflow, ensuring consistent data flow during training and prediction.

- **SimpleImputer (from sklearn.impute):**  
  Handles missing values by imputing them with a chosen strategy (mean, median, most frequent, etc.).

---

## Evaluation Metrics
- **accuracy_score (from sklearn.metrics):**  
  Calculates the proportion of correct predictions for classification tasks.

- **r2_score (from sklearn.metrics):**  
  Evaluates how well regression predictions match the true continuous values (R² metric).

---

## Machine Learning Models
- **SVC (Support Vector Classifier):**  
  Classifies data by finding the best hyperplane that separates classes.

- **LogisticRegression:**  
  A linear model for binary/multiclass classification using logistic function.

- **LinearRegression:**  
  Predicts a continuous target variable based on a linear relationship with features.

- **KNeighborsClassifier:**  
  Classifies data based on the majority class among its k nearest neighbors.

- **GaussianNB (Naive Bayes):**  
  A probabilistic classifier based on Bayes' theorem with the assumption of normally distributed features.

- **RandomForestClassifier:**  
  An ensemble method that builds multiple decision trees and combines them for better accuracy and robustness.

- **BaggingClassifier:**  
  Uses bootstrap aggregating (bagging) to train multiple base estimators on random subsets and aggregates their predictions.

- **AdaBoostClassifier:**  
  Boosting algorithm that combines weak learners iteratively, focusing on misclassified samples.

- **XGBClassifier (from xgboost):**  
  Extreme Gradient Boosting classifier — a high-performance boosting algorithm optimized for speed and accuracy.

---

## Model Persistence
- **joblib:**  
  Used to efficiently save and load large NumPy arrays or scikit-learn models.

- **pickle:**  
  Serializes Python objects (e.g., trained models) for saving and reloading.



In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, r2_score
from sklearn.impute import SimpleImputer
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier, BaggingClassifier, AdaBoostClassifier
from xgboost import XGBClassifier
import joblib
import pickle

In [None]:
def load_data(path):
  path = '/content/cleaned_makeup_products.csv'
  df = pd.read_csv(path)
  df['average_rating'] = pd.to_numeric(df['average_rating'], errors='coerce') # Convert to numeric, coercing errors
  return df

In [None]:
def clean_data(df):
  # Drop useless columns
  cols_to_drop = [c for c in df.columns if "Unnamed" in c]
  cols_to_drop.extend(["product_link", "product_link_id", "faceoff_negative", "faceoff_positive"])
  df = df.drop(columns=cols_to_drop, errors='ignore')
  # Drop long text review fields = not using NLP
  text_cols = ["pros", "cons", "best_uses"]
  df = df.drop(columns=text_cols, errors='ignore')
  return df

In [None]:
def create_target(df):
  df = df.dropna(subset=["average_rating"])
  df["target"] = df["average_rating"].apply(lambda x: "Bad" if x < 4 else "Good")
  return df

In [None]:
def split_features_target(df):
  X = df.drop(columns=["target", "average_rating"])
  y = df["target"]
  return X, y

In [None]:
def split_train_test(X, y, test_size=0.2, random_state=42):
  return train_test_split(X, y, test_size=test_size, random_state=random_state, stratify=y)

In [None]:
def build_preprocessor(X):
  cat_cols = X.select_dtypes(include=["object", "category"]).columns.tolist()
  num_cols = X.select_dtypes(include=["int64", "float64"]).columns.tolist()
  num_transformer = Pipeline(steps=[
  ("imputer", SimpleImputer(strategy="median")),
  ("scaler", StandardScaler())
  ])


  cat_transformer = Pipeline(steps=[
  ("imputer", SimpleImputer(strategy="most_frequent")),
  ("onehot", OneHotEncoder(handle_unknown="ignore"))
  ])


  preprocessor = ColumnTransformer(
  transformers=[
  ("num", num_transformer, num_cols),
  ("cat", cat_transformer, cat_cols)
  ]
  )
  return preprocessor

In [None]:
def preprocess_pipeline(path):
  df = load_data(path)
  df = clean_data(df)
  df = create_target(df)
  X, y = split_features_target(df)
  X_train, X_test, y_train, y_test = split_train_test(X, y)
  preprocessor = build_preprocessor(X)
  preprocessor.fit(X_train)
  X_train_processed = preprocessor.transform(X_train)
  X_test_processed = preprocessor.transform(X_test)
  return X_train_processed, X_test_processed, y_train, y_test, preprocessor

svm

In [None]:
def train_svm(X_train, y_train, X_test, y_test, model_path="svm_model.pkl"):
    model = SVC(kernel="linear", probability=True, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    print("SVM Accuracy:", acc)
    pickle.dump(model, open(model_path, "wb"))
    return acc

In [None]:
def load_svm(model_path="svm_model.pkl"):
    return pickle.load(open(model_path, "rb"))

In [None]:
def predict_svm(model, X_sample):
    return model.predict(X_sample)

random forest

In [None]:
def train_random_forest(X_train, y_train, X_test, y_test, model_path="rf_model.pkl"):
    model = RandomForestClassifier(n_estimators=100, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    print("Random Forest Accuracy:", acc)
    pickle.dump(model, open(model_path, "wb"))
    return acc

In [None]:
def load_random_forest(model_path="rf_model.pkl"):
    return pickle.load(open(model_path, "rb"))

In [None]:
def predict_random_forest(model, X_sample):
    return model.predict(X_sample)

bagging

In [None]:
def train_bagging(X_train, y_train, X_test, y_test, model_path="bagging_model.pkl"):
    model = BaggingClassifier(n_estimators=100, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    print("Bagging Accuracy:", acc)
    pickle.dump(model, open(model_path, "wb"))
    return acc

In [None]:
def load_bagging(model_path="bagging_model.pkl"):
    return pickle.load(open(model_path, "rb"))

In [None]:
def predict_bagging(model, X_sample):
    return model.predict(X_sample)

adaboost

In [None]:
def train_adaboost(X_train, y_train, X_test, y_test, model_path="adaboost_model.pkl"):
    model = AdaBoostClassifier(n_estimators=100, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    print("AdaBoost Accuracy:", acc)
    pickle.dump(model, open(model_path, "wb"))
    return acc

In [None]:
def load_adaboost(model_path="adaboost_model.pkl"):
    return pickle.load(open(model_path, "rb"))

In [None]:
def predict_adaboost(model, X_sample):
    return model.predict(X_sample)

xgboost

In [None]:
def train_xgboost(X_train, y_train, X_test, y_test, model_path="xgb_model.pkl"):
    model = XGBClassifier(n_estimators=100, use_label_encoder=False, eval_metric="logloss", random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    print("XGBoost Accuracy:", acc)
    pickle.dump(model, open(model_path, "wb"))
    return acc

In [None]:
def load_xgboost(model_path="xgb_model.pkl"):
    return pickle.load(open(model_path, "rb"))

In [None]:
def predict_xgboost(model, X_sample):
    return model.predict(X_sample)

In [None]:
def preprocess_pipeline(path):
    df = load_data(path)
    df = clean_data(df)
    df = create_target(df)
    X, y = split_features_target(df)
    X_train, X_test, y_train, y_test = split_train_test(X, y)
    preprocessor = build_preprocessor(X)
    preprocessor.fit(X_train)
    X_train_processed = preprocessor.transform(X_train)
    X_test_processed = preprocessor.transform(X_test)
    return X_train_processed, X_test_processed, y_train, y_test, preprocessor


In [None]:
X_train_processed, X_test_processed, y_train, y_test, preprocessor= preprocess_pipeline("/content/cleaned_makeup_products.csv")


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["target"] = df["average_rating"].apply(lambda x: "Bad" if x < 4 else "Good")


In [None]:
# 2. Train and save each model
print("Training and saving models...\n")


acc_svm = train_svm(X_train_processed, y_train, X_test_processed, y_test)
acc_rf = train_random_forest(X_train_processed, y_train, X_test_processed, y_test)
acc_bag = train_bagging(X_train_processed, y_train, X_test_processed, y_test)
acc_ada = train_adaboost(X_train_processed, y_train, X_test_processed, y_test)


# Convert target variable to numerical for XGBoost
y_train_xgb = y_train.map({'Bad': 0, 'Good': 1})
y_test_xgb = y_test.map({'Bad': 0, 'Good': 1})

acc_xgb = train_xgboost(X_train_processed, y_train_xgb, X_test_processed, y_test_xgb)

# 3. Print summary results
print("\n===== Model Accuracies =====")
print(f"SVM          : {acc_svm:.4f}")
print(f"Random Forest: {acc_rf:.4f}")
print(f"Bagging      : {acc_bag:.4f}")
print(f"AdaBoost     : {acc_ada:.4f}")
print(f"XGBoost      : {acc_xgb:.4f}")

Training and saving models...

SVM Accuracy: 0.9706959706959707
Random Forest Accuracy: 0.9120879120879121
Bagging Accuracy: 0.9706959706959707
AdaBoost Accuracy: 0.9487179487179487


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


XGBoost Accuracy: 0.9816849816849816

===== Model Accuracies =====
SVM          : 0.9707
Random Forest: 0.9121
Bagging      : 0.9707
AdaBoost     : 0.9487
XGBoost      : 0.9817


In [None]:
results = {}

In [None]:
# Logistic Regression
def train_logistic_regression(X_train_processed, y_train, X_test_processed, y_test, model_path="logreg_model.pkl"):
  model = LogisticRegression(max_iter=1000, random_state=42)
  model.fit(X_train_processed, y_train)
  y_pred_log = model.predict(X_test_processed)
  acc = accuracy_score(y_test, y_pred_log)
  print("Logistic Regression Accuracy:", acc)
  pickle.dump(model, open(model_path, "wb"))
  return acc

In [None]:
def load_logistic_regression(model_path="logreg_model.pkl"):
    return pickle.load(open(model_path, "rb"))

In [None]:
def predict_logistic_regression(model, X_sample):
    return model.predict(X_sample)

In [None]:
# KNN
def train_knn(X_train, y_train, X_test, y_test, n_neighbors=5, model_path="knn_model.pkl"):
  model = KNeighborsClassifier(n_neighbors=5)
  model.fit(X_train_processed, y_train)
  y_pred_knn = model.predict(X_test_processed)
  acc = accuracy_score(y_test, y_pred_knn)
  print(f"KNN Accuracy (k={n_neighbors}):", acc)
  pickle.dump(model, open(model_path, "wb"))
  return acc

In [None]:
def load_knn(model_path="knn_model.pkl"):
    return pickle.load(open(model_path, "rb"))

In [None]:
def predict_knn(model, X_sample):
    return model.predict(X_sample)

In [None]:
# Naive Bayes
# (لو sparse matrix هنحوله لـ dense)
def train_naive_bayes(X_train, y_train, X_test, y_test, model_path="nb_model.pkl"):
  model = GaussianNB()
  X_train_nb = X_train_processed.toarray() if hasattr(X_train_processed, "toarray") else X_train_processed
  X_test_nb = X_test_processed.toarray() if hasattr(X_test_processed, "toarray") else X_test_processed
  model.fit(X_train_nb, y_train)
  y_pred_nb = model.predict(X_test_nb)
  acc = accuracy_score(y_test, y_pred_nb)
  print("Naive Bayes Accuracy:", acc)
  pickle.dump(model, open(model_path, "wb"))
  return acc

In [None]:
def load_naive_bayes(model_path="nb_model.pkl"):
    return pickle.load(open(model_path, "rb"))

In [None]:
def predict_naive_bayes(model, X_sample):
    return model.predict(X_sample)

In [None]:
results_df = pd.DataFrame(list(results.items()), columns=["Model", "Score"])
print(results_df)

Empty DataFrame
Columns: [Model, Score]
Index: []


In [None]:
results = {}

# Example usage
results["Logistic Regression"] = train_logistic_regression(X_train_processed, y_train, X_test_processed, y_test)
results["KNN"] = train_knn(X_train_processed, y_train, X_test_processed, y_test)
results["Naive Bayes"] = train_naive_bayes(X_train_processed, y_train, X_test_processed, y_test)

# Convert to DataFrame
results_df = pd.DataFrame(list(results.items()), columns=["Model", "Score"])
print(results_df)

Logistic Regression Accuracy: 0.9560439560439561
KNN Accuracy (k=5): 0.8278388278388278
Naive Bayes Accuracy: 0.8351648351648352
                 Model     Score
0  Logistic Regression  0.956044
1                  KNN  0.827839
2          Naive Bayes  0.835165


In [None]:
import pandas as pd

# Load the original dataset
original_df = pd.read_csv('/content/cleaned_makeup_products.csv')

# Display the first 5 rows
display(original_df.head())
original_df.shape
print(original_df.isnull().sum())

Unnamed: 0,product_link_id,product_link,category,item_id,product_name,brand,price,num_shades,rating,num_reviews,...,Unnamed: 73,Unnamed: 74,Unnamed: 75,Unnamed: 76,Unnamed: 77,Unnamed: 78,Unnamed: 79,Unnamed: 80,Unnamed: 81,Unnamed: 82
0,8,https://www.ulta.com/p/futurist-skin-tint-seru...,,2612458,Futurist Skin Tint Serum Foundation SPF 20,Find your shade,55,,4.4,951,...,,,,,,,,,,
1,13,https://www.ulta.com/p/dior-forever-fluid-skin...,,2605037,Dior Forever Fluid Skin Glow Foundation,Dior,57,42.0,4.5,2234,...,,,,,,,,,,
2,22,https://www.ulta.com/p/barepro-24hr-wear-skin-...,,2619782,BAREPRO 24HR Wear Skin-Perfecting Matte Liquid...,bareMinerals,44,,4.1,3359,...,,,,,,,,,,
3,32,https://www.ulta.com/p/futurist-hydra-rescue-m...,,2559137,Futurist Hydra Rescue Moisturizing Foundation ...,Estée Lauder,55,,4.4,5071,...,,,,,,,,,,
4,15,https://www.ulta.com/p/mini-cc-cream-with-spf-...,Foundation,2603710,Mini CC+ Cream with SPF 50+,IT Cosmetics,22,,4.3,28718,...,,,,,,,,,,


product_link_id       0
product_link          0
category             22
item_id             520
product_name          0
                   ... 
Unnamed: 78        1386
Unnamed: 79        1384
Unnamed: 80        1384
Unnamed: 81        1384
Unnamed: 82        1384
Length: 83, dtype: int64


In [None]:
#High rated sample

# Assuming df_cleaned is already available from previous steps
if 'df_cleaned' not in locals():
    df = load_data('/content/cleaned_makeup_products.csv')
    df_cleaned = clean_data(df)


# Filter for products with average rating >= 4 (Good rating)
good_rating_products = df_cleaned[df_cleaned['average_rating'] >= 4]

# Select a sample product from the filtered list (e.g., the first one)
if not good_rating_products.empty:
    sample_product_good = good_rating_products.iloc[0]

    # Get the average rating and determine the target label
    average_rating_good = sample_product_good['average_rating']
    target_label_good = "Bad" if average_rating_good < 4 else "Good" # This will always be "Good" for these samples

    print("Sample Makeup Product Details (Average Rating >= 4):")
    print(sample_product_good)
    print("\nAverage Rating:", average_rating_good)
    print("Predicted Rating Category:", target_label_good)
else:
    print("No products found with an average rating greater than or equal to 4 in the dataset.")

Sample Makeup Product Details (Average Rating >= 4):
category                                                                               NaN
item_id                                                                            2612458
product_name                                    Futurist Skin Tint Serum Foundation SPF 20
brand                                                                      Find your shade
price                                                                                   55
num_shades                                                                             NaN
rating                                                                                 4.4
num_reviews                                                                            951
description                              Use the following zoom and pan buttons to cont...
describe_yourself                        20s beauty lovet 26 year female 30 30 somethin...
review_star_1                        

In [None]:
#Low rated sample

# Assuming df_cleaned is already available from previous steps
if 'df_cleaned' not in locals():
    df = load_data('/content/cleaned_makeup_products.csv')
    df_cleaned = clean_data(df)


# Filter for products with average rating < 4 (Bad rating)
bad_rating_products = df_cleaned[df_cleaned['average_rating'] < 4]

# Select a sample product from the filtered list (e.g., the first one)
if not bad_rating_products.empty:
    sample_product_bad = bad_rating_products.iloc[0]

    # Get the average rating and determine the target label
    average_rating = sample_product_bad['average_rating']
    target_label = "Bad" if average_rating < 4 else "Good" # This will always be "Bad" for these samples

    print("Sample Makeup Product Details (Average Rating < 4):")
    print(sample_product_bad)
    print("\nAverage Rating:", average_rating)
    print("Predicted Rating Category:", target_label)
else:
    print("No products found with an average rating less than 4 in the dataset.")

Sample Makeup Product Details (Average Rating < 4):
category                                                                        Foundation
item_id                                                                            2592558
product_name                              Pro Filt'r Soft Matte Longwear Liquid Foundation
brand                                                              FENTY BEAUTY by Rihanna
price                                                                                   40
num_shades                                                                              50
rating                                                                                 3.8
num_reviews                                                                            771
description                              Use the following zoom and pan buttons to cont...
describe_yourself                                                                      NaN
review_star_1                         

In [None]:
# Assuming sample_product_good and sample_product_bad are available from previous steps
# If not, you might need to re-run the cells that create them
if 'sample_product_good' not in locals() or 'sample_product_bad' not in locals():
    print("Sample products not found. Please run the cells to select sample products first.")
else:
    # Convert sample products to DataFrames (preprocessor expects DataFrame-like input)
    # Need to ensure columns match the original X used for preprocessor fitting
    # Load and clean the data again to get the correct column order
    df_loaded_for_cols = load_data('/content/cleaned_makeup_products.csv')
    df_cleaned_for_cols = clean_data(df_loaded_for_cols)
    X_original_for_cols, y_original_for_cols = split_features_target(create_target(df_cleaned_for_cols))


    # Create DataFrames for the sample products, ensuring column order
    sample_product_good_df = pd.DataFrame([sample_product_good], columns=X_original_for_cols.columns)
    sample_product_bad_df = pd.DataFrame([sample_product_bad], columns=X_original_for_cols.columns)

    # Preprocess the sample products using the fitted preprocessor
    # Assuming preprocessor is available from a previous step
    if 'preprocessor' not in locals():
         X_train_processed, X_test_processed, y_train, y_test, preprocessor = preprocess_pipeline('/content/cleaned_makeup_products.csv')


    X_sample_good_processed = preprocessor.transform(sample_product_good_df)
    X_sample_bad_processed = preprocessor.transform(sample_product_bad_df)


    print("Predictions for Sample Products:\n")

    # Load and predict using each trained model
    models = {
        "Logistic Regression": load_logistic_regression(),
        "KNN": load_knn(),
        "Naive Bayes": load_naive_bayes(),
        "SVM": load_svm(),
        "Random Forest": load_random_forest(),
        "Bagging": load_bagging(),
        "AdaBoost": load_adaboost(),
        "XGBoost": load_xgboost()

    }

    # For Naive Bayes, ensure dense input if needed
    X_sample_good_nb = X_sample_good_processed.toarray() if hasattr(X_sample_good_processed, "toarray") else X_sample_good_processed
    X_sample_bad_nb = X_sample_bad_processed.toarray() if hasattr(X_sample_bad_processed, "toarray") else X_sample_bad_processed


    for model_name, model in models.items():
        try:
            if model_name == "Naive Bayes":
                 prediction_good = model.predict(X_sample_good_nb)[0]
                 prediction_bad = model.predict(X_sample_bad_nb)[0]
            else:
                 prediction_good = model.predict(X_sample_good_processed)[0]
                 prediction_bad = model.predict(X_sample_bad_processed)[0]


            print(f"{model_name}:")
            print(f"  Sample (Good): Predicted = {prediction_good} (True = Good)")
            print(f"  Sample (Bad): Predicted = {prediction_bad} (True = Bad)")
            print("\n")
        except Exception as e:
            print(f"  Error predicting with {model_name}: {e}")
            print("\n")

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["target"] = df["average_rating"].apply(lambda x: "Bad" if x < 4 else "Good")


Predictions for Sample Products:

Logistic Regression:
  Sample (Good): Predicted = Good (True = Good)
  Sample (Bad): Predicted = Bad (True = Bad)


KNN:
  Sample (Good): Predicted = Good (True = Good)
  Sample (Bad): Predicted = Good (True = Bad)


Naive Bayes:
  Sample (Good): Predicted = Good (True = Good)
  Sample (Bad): Predicted = Bad (True = Bad)


SVM:
  Sample (Good): Predicted = Good (True = Good)
  Sample (Bad): Predicted = Bad (True = Bad)


Random Forest:
  Sample (Good): Predicted = Good (True = Good)
  Sample (Bad): Predicted = Bad (True = Bad)


Bagging:
  Sample (Good): Predicted = Good (True = Good)
  Sample (Bad): Predicted = Bad (True = Bad)


AdaBoost:
  Sample (Good): Predicted = Good (True = Good)
  Sample (Bad): Predicted = Bad (True = Bad)


XGBoost:
  Sample (Good): Predicted = 1 (True = Good)
  Sample (Bad): Predicted = 0 (True = Bad)


