### ==============================================================================
### 🚀 SCENARIO - 1: The Impact of downtime
### ==============================================================================

Let's take all the production runs from our test set that had high downtime, and then we will ask the model:

`What would the defect rate have been for these exact same runs if we had successfully reduced their downtime to a low level?`

In [1]:
import joblib
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

In [2]:
print("1. Loading the saved model preprocessor, and original test data...")

# Load the pre-trained model and the preprocessor
loaded_model = joblib.load("../model/defect_rate_predictor.joblib")
loaded_preprocessor = joblib.load("../model/preprocessor.joblib")

print("✅ Model and preprocessor loaded successfully.")

1. Loading the saved model preprocessor, and original test data...
✅ Model and preprocessor loaded successfully.


In [3]:
# Load the raw dataset to get the original, untransformed test data
df_raw = pd.read_csv("../data/smart_phone_surface_plastic_manufacture.csv")
X = df_raw.drop(columns=['Defect Rate (%)', 'Production Output (Units)', 'Unnamed: 0', 'Production Run ID', 'Date'])
y = df_raw['Defect Rate (%)'].dropna()

In [4]:
# Split the data into training and testing sets
X_train_raw, X_test_raw, y_train, y_test = train_test_split(X.loc[y.index], y, test_size=0.2, random_state=42)

In [5]:
# --- 2. Identify "High Downtime" Runs from the Real Test Set ---
# A "problem run" is any run with more than 60 minutes of downtime
high_downtime_runs = X_test_raw[X_test_raw['Downtime (Minutes)'] > 60].copy()

In [6]:
high_downtime_runs

Unnamed: 0,Temperature (°C),Pressure (Pa),Cooling Rate (°C/min),Machine Speed (RPM),Raw Material Quality (Score),Humidity (%),Ambient Temperature (°C),Maintenance (Days Since),Operator Shift,Batch Size (Units),Energy Consumption (kWh),Production Line,Downtime (Minutes)
3551,399.391217,15286.9654,0.583935,2045.611881,37.528843,74.199083,,190.0,Night,450.0,96.16514,Line 3,220.0
824,401.890164,15593.199726,0.520471,2344.201651,45.816788,80.523322,32.855938,,Day,470.0,99.568212,Line 1,420.0
2816,375.574849,14581.173646,0.44928,2281.459214,45.915524,74.702273,38.040283,150.0,Night,485.0,79.601751,Line 1,270.0
3376,412.361969,15314.84183,0.528416,2239.080858,41.77093,78.366749,34.783061,180.0,Day,505.0,102.315304,Line 1,320.0
800,400.158578,16035.632392,0.398215,2321.532181,44.406143,86.549924,37.873521,135.0,Night,415.0,103.418507,Line 1,300.0
2055,367.464597,,0.473049,2249.092416,46.437182,83.240119,35.295756,135.0,Night,590.0,102.966068,Line 2,280.0
2379,402.637983,14848.718297,0.420459,2313.905968,42.90056,72.15505,38.533344,,Day,590.0,88.46537,Line 3,310.0
1619,417.150112,14968.417926,0.473741,2232.080469,42.536542,65.387961,40.695524,145.0,Night,525.0,,Line 3,310.0
739,391.405873,15706.981672,0.605575,2264.940691,40.854567,70.69465,39.221904,125.0,Day,625.0,95.279107,Line 3,270.0


In [7]:
if high_downtime_runs.empty:
    print("No high downtime runs found in the test set to simulate. Please adjust the threshold if needed.")
else:
    # --- 3. Create a "Simulated Optimized" Version of These Runs ---
    # What if these exact runs had their downtime reduced to just 15 minutes?
    simulated_optimized_runs = high_downtime_runs.copy()
    simulated_optimized_runs['Downtime (Minutes)'] = 15.0

    # --- Re-engineer features for both original and simulated data ---
    for df in [high_downtime_runs, simulated_optimized_runs]:
        df['Energy_per_Unit'] = df['Energy Consumption (kWh)'] / (df['Batch Size (Units)'] + 1e-6)
        df['Process_Stress_Index'] = df['Temperature (°C)'] * df['Pressure (Pa)']


    # --- 4. Preprocess and Predict for Both Groups ---
    # Preprocess the original high-downtime runs
    original_processed = loaded_preprocessor.transform(high_downtime_runs)
    # Preprocess the simulated low-downtime runs
    simulated_processed = loaded_preprocessor.transform(simulated_optimized_runs)

    # Get column names to create DataFrames for dropping features
    ohe_feature_names = loaded_preprocessor.named_transformers_['cat'].get_feature_names_out()
    all_feature_names = (
        list(loaded_preprocessor.named_transformers_['skewed'].feature_names_in_) +
        list(loaded_preprocessor.named_transformers_['symmetric'].feature_names_in_) +
        list(ohe_feature_names)
    )

    original_processed_df = pd.DataFrame(original_processed, columns=all_feature_names)
    simulated_processed_df = pd.DataFrame(simulated_processed, columns=all_feature_names)

    # Drop high-VIF columns from both
    cols_to_drop = ['Process_Stress_Index', 'Pressure (Pa)']
    original_final = original_processed_df.drop(columns=cols_to_drop)
    simulated_final = simulated_processed_df.drop(columns=cols_to_drop)

    # Predict the defect rate for both groups
    original_predicted_defects = loaded_model.predict(original_final)
    simulated_predicted_defects = loaded_model.predict(simulated_final)


    # --- 5. Display the Clear Business Impact ---
    avg_original_defect_rate = np.mean(original_predicted_defects)
    avg_simulated_defect_rate = np.mean(simulated_predicted_defects)
    improvement = avg_original_defect_rate - avg_simulated_defect_rate

    print("\n--- Business Impact Simulation ---")
    print(f"Average Predicted Defect Rate for High-Downtime Runs: {avg_original_defect_rate:.2f}%")
    print(f"Predicted Defect Rate if Downtime is Reduced: {avg_simulated_defect_rate:.2f}%")
    print("-" * 50)
    print(f"✅ By focusing on reducing downtime, the model predicts an average improvement of {improvement:.2f} percentage points in the defect rate.")
    print("\nThis provides a clear, data-driven justification for prioritizing process improvements to reduce downtime, directly leading to increased effective capacity.")


--- Business Impact Simulation ---
Average Predicted Defect Rate for High-Downtime Runs: 10.05%
Predicted Defect Rate if Downtime is Reduced: 3.21%
--------------------------------------------------
✅ By focusing on reducing downtime, the model predicts an average improvement of 6.84 percentage points in the defect rate.

This provides a clear, data-driven justification for prioritizing process improvements to reduce downtime, directly leading to increased effective capacity.
