# Final Random Forest Model for Flask Application

This notebook contains the finalized code to build the Random Forest model for the breast cancer prediction web app. It uses the 13 features identified during the feature selection process in `Set3Model.ipynb`.

**Purpose:** To train the model on the *entire* dataset and then serialize (save) the trained model object and the data scaler to disk using `joblib`.

These saved files (`breast_cancer_model.joblib` and `scaler.joblib`) are what the Flask application will load at startup to make predictions on new user input. This separates the model training process (which is done once, here) from the prediction process (which is done by the live app).

In [3]:
import numpy as np
import joblib
import os
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier

print("--- Starting Final Model and Scaler Creation ---")

# --- 1. Load Data ---
breast_cancer = load_breast_cancer()
print("Breast cancer dataset loaded.")

# --- 2. Feature Selection ---
# These are the 13 features selected from the analysis in Set3Model.ipynb
# 'worst perimeter', 'worst radius', 'worst concave points', 'worst area', 'mean concave points', 
# 'mean concavity', 'mean area', 'area error', 'mean perimeter', 'mean radius', 
# 'worst concavity', 'worst texture', 'mean texture'
selected_features_indices = [22, 20, 27, 23, 7, 6, 3, 13, 2, 0, 26, 21, 1]

# Get the corresponding names for printing
feature_names = breast_cancer.feature_names[selected_features_indices]
print(f"\nUsing the following 13 features for the model: {list(feature_names)}")

# Select the feature data (X) and target (y)
X = breast_cancer.data[:, selected_features_indices]
y = breast_cancer.target

# --- 3. Data Scaling ---
# A scaler is trained on the entire dataset. This SAME scaler must be used
# to transform the input data for prediction in the Flask app.
print("\nFitting the StandardScaler on the entire dataset...")
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
print("Data scaling complete.")

# --- 4. Model Training ---
# The Random Forest model is configured with the best parameters found during
# experimentation in Set3Model.ipynb. It is now trained on the ENTIRE scaled dataset
# to make it as robust as possible for the final application.
print("\nTraining the final Random Forest model on the entire dataset...")
model = RandomForestClassifier(
    max_features=6,
    n_estimators=200,
    max_depth=10,
    min_samples_split=2,
    min_samples_leaf=1,
    random_state=123
)
model.fit(X_scaled, y)
print("Model training complete.")

# --- 5. Save Model and Scaler ---
# The trained model and scaler are saved to files in the project's root directory.
# The Flask app is configured to load these specific files.
print("\nSaving model and scaler to disk using joblib...")
saved_models_dir = os.path.join(os.getcwd(), 'saved_models')
# job lib dump should be saved in the saved_models directory

joblib.dump(model, os.path.join(saved_models_dir, 'breast_cancer_model.joblib'))
joblib.dump(scaler, os.path.join(saved_models_dir, 'scaler.joblib'))


print("\n--- Model, Scaler, and Feature Names Saving Complete ---")
print(f"Files saved in current directory: {saved_models_dir}")
print("  - breast_cancer_model.joblib")
print("  - scaler.joblib")
print("\nThese files are now ready to be used by the Flask application.")

--- Starting Final Model and Scaler Creation ---
Breast cancer dataset loaded.

Using the following 13 features for the model: [np.str_('worst perimeter'), np.str_('worst radius'), np.str_('worst concave points'), np.str_('worst area'), np.str_('mean concave points'), np.str_('mean concavity'), np.str_('mean area'), np.str_('area error'), np.str_('mean perimeter'), np.str_('mean radius'), np.str_('worst concavity'), np.str_('worst texture'), np.str_('mean texture')]

Fitting the StandardScaler on the entire dataset...
Data scaling complete.

Training the final Random Forest model on the entire dataset...
Model training complete.

Saving model and scaler to disk using joblib...

--- Model, Scaler, and Feature Names Saving Complete ---
Files saved in current directory: /Users/christopherphan/School/171_ECS/final_project2/ECS-171/saved_models
  - breast_cancer_model.joblib
  - scaler.joblib

These files are now ready to be used by the Flask application.
