# Part 3 â€” Predict: Layoff Risk Demo

Use the trained layoff-risk model to score a sample CSV and summarize results.

## Setup
- Data: `Data/cleaned_dataset.csv` (with target) and `Data/prediction_sample.csv` (features only).
- Best model: Decision Tree from Part 2 (balanced, max_depth=4, min_samples_leaf=2).
- Task: load sample CSV, predict risk, print predictions, and provide a brief takeaway.

In [None]:
from pathlib import Path
import pandas as pd
import joblib
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeClassifier

pd.set_option('display.max_columns', None)
NOTEBOOK_DIR = Path(__file__).resolve().parent if '__file__' in globals() else Path().resolve()
DATA_DIR = NOTEBOOK_DIR.parent / 'Data'
TRAIN_PATH = DATA_DIR / 'cleaned_dataset.csv'
PRED_PATH = DATA_DIR / 'prediction_sample.csv'
MODEL_PATH = NOTEBOOK_DIR.parent / "Model" / "best_model.pkl"
print('Model path:', MODEL_PATH)
print('Train path:', TRAIN_PATH)
print('Predict input:', PRED_PATH)


In [None]:
# Load training data

df = pd.read_csv(TRAIN_PATH)
target_col = 'target_high_risk'
X = df.drop(columns=[target_col])
y = df[target_col]

categorical_cols = ['company', 'industry', 'headquarter_location', 'status']
X.head()


In [None]:
# Load or train the best model (Decision Tree)

preprocess = ColumnTransformer(
    [('cat', OneHotEncoder(handle_unknown='ignore'), categorical_cols)],
    remainder='passthrough'
)

dt_model = DecisionTreeClassifier(
    max_depth=4,
    min_samples_leaf=2,
    class_weight='balanced',
    random_state=42,
)

if MODEL_PATH.exists():
    clf = joblib.load(MODEL_PATH)
    print("Loaded saved model from", MODEL_PATH)
else:
    clf = Pipeline([
        ('prep', preprocess),
        ('model', dt_model),
    ])
    clf.fit(X, y)
    MODEL_PATH.parent.mkdir(exist_ok=True)
    joblib.dump(clf, MODEL_PATH)
    print("Model trained and saved to", MODEL_PATH)

print("Class balance:")
print(y.value_counts())


In [None]:
# Load prediction CSV and run inference

pred_df = pd.read_csv(PRED_PATH)
preds = clf.predict(pred_df)
pred_df_with_preds = pred_df.copy()
pred_df_with_preds['pred_high_risk'] = preds

pred_df_with_preds[['company', 'industry', 'layoff_ratio', 'pred_high_risk']]


## Prediction summary
- Model predicts a subset of companies as `1 = high layoff risk`, based on historical ratios and recency of layoffs.
- Features expected: same columns as `cleaned_dataset.csv` minus `target_high_risk`; see `Data/prediction_sample.csv` for format.
- To score a new file, replace `prediction_sample.csv` with your own CSV (same columns) and rerun the notebook.