# Stage 13 Homework Starter — Productization

## Objective
Deploy your trained model as a **reusable, handoff-ready API or dashboard** and finalize your project for reproducibility and clarity.

## Steps
1. Create a mock, very basic analysis in a notebook.
2. Clean your notebook by removing exploratory cells and documenting your code.
3. Move reusable functions into `/src/`.
4. Load your trained model from Stage 12 or earlier stages.
5. Pickle/save the model and test reload.
6. Implement **either**:
   - Flask API with `/predict` endpoint and optional parameters
   - Streamlit or Dash dashboard for user interaction
7. Include:
   - Error handling for invalid inputs
   - `requirements.txt` for reproducibility
   - Documentation in `README.md`
8. Test your deployment locally and provide evidence.
9. Organize project folders and finalize notebooks for handoff.

In [6]:
from pathlib import Path
import os, pickle
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import joblib
import sys
sys.path.append("..") 

## 1. Create mock, very basic analysis

In [2]:
# detect project root (either current dir or parent if running from notebooks/)
CWD = os.getcwd()
if os.path.exists(os.path.join(CWD, "model")) and os.path.exists(os.path.join(CWD, "data")):
    ROOT = CWD
else:
    ROOT = os.path.dirname(CWD)

DATA_DIR = os.path.join(ROOT, "data")
MODEL_DIR = os.path.join(ROOT, "model")
REPORTS_DIR = os.path.join(ROOT, "reports")

# make sure dirs exist
for d in [DATA_DIR, MODEL_DIR, REPORTS_DIR]:
    os.makedirs(d, exist_ok=True)

print(f"[Path] ROOT={ROOT}")
print(f"[Path] MODEL_DIR={MODEL_DIR}")
print(f"[Path] REPORTS_DIR={REPORTS_DIR}")

# generate synthetic data
X, y = make_regression(n_samples=200, n_features=2, noise=0.3, random_state=42)
df = pd.DataFrame(X, columns=["feature1", "feature2"])
df["target"] = y

#Drop duplicates + fill missing numeric values
df = df.drop_duplicates().copy()
for col in df.select_dtypes(include="number").columns:
    df[col] = df[col].fillna(df[col].mean())

# add an interaction feature
df["f1_x_f2"] = df["feature1"] * df["feature2"]

# Train/test split + model training 
X = df[["feature1", "feature2", "f1_x_f2"]]
y = df["target"]
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


model = LinearRegression()
model.fit(X_train, y_train)

# Evaluation metrics
preds = model.predict(X_test)
rmse = mean_squared_error(y_test, preds, squared=False)
r2 = r2_score(y_test, preds)
print(f"[Metrics] RMSE = {rmse:.3f} | R² = {r2:.3f}")

# Save a simple scatter plot (True vs Predicted) 
plt.figure()
plt.scatter(y_test, preds, alpha=0.7)
plt.xlabel("True")
plt.ylabel("Predicted")
plt.title("True vs Predicted (mock)")
line_min = float(min(y_test.min(), preds.min()))
line_max = float(max(y_test.max(), preds.max()))
plt.plot([line_min, line_max], [line_min, line_max], "r--")
plot_path = os.path.join(REPORTS_DIR, "true_vs_pred.png")
plt.savefig(plot_path, dpi=150, bbox_inches="tight")
plt.close()
print(f"[Saved] Plot -> {plot_path}")

# Save the model to disk 
model_path = os.path.join(MODEL_DIR, "model.pkl")
joblib.dump(model, model_path)
print(f"[Saved] Model -> {model_path}")

# Reload model + single prediction 
reloaded = joblib.load(model_path)
sample = pd.DataFrame([{
    "feature1": 0.5,
    "feature2": 4.2,
    "f1_x_f2": 0.5 * 4.2
}])
sample_pred = reloaded.predict(sample)[0]
print(f"[Reload check] Single prediction = {sample_pred:.3f}")


[Path] ROOT=C:\Users\27228\bootcamp_Zimeng_He\homework\homework13
[Path] MODEL_DIR=C:\Users\27228\bootcamp_Zimeng_He\homework\homework13\model
[Path] REPORTS_DIR=C:\Users\27228\bootcamp_Zimeng_He\homework\homework13\reports
[Metrics] RMSE = 0.303 | R² = 1.000
[Saved] Plot -> C:\Users\27228\bootcamp_Zimeng_He\homework\homework13\reports\true_vs_pred.png
[Saved] Model -> C:\Users\27228\bootcamp_Zimeng_He\homework\homework13\model\model.pkl
[Reload check] Single prediction = 135.475


## 2. Notebook Cleanup
Remove exploratory cells and document your code.

In [3]:
# TODO: Remove exploratory cells
# TODO: Document your code clearly
# Example placeholder for cleaned analysis
print("Notebook cleaned and ready for handoff.")

Notebook cleaned and ready for handoff.


## 3. Move reusable functions to /src/
Create src/utils.py and store functions there.

## 4. Folder Structure Reminder

Ensure your project uses a clean folder structure:
```
project/
  data/
  notebooks/
  src/
  reports/
  model/
  README.md
```
For API/Dashboard: minimal example:
```
project/
    app.py
    model.pkl
    requirements.txt
    README.md
```

## 5. Pickle / Save Final Model

### TODO: Replace this with your trained model

In [4]:
import pickle
os.makedirs("model", exist_ok=True)
model_path = os.path.join("model", "model.pkl")

with open(model_path, "wb") as f:
    pickle.dump(model, f)

print(f"Model saved to {model_path}")

Model saved to model\model.pkl


## 6. Flask API Starter

### TODO: Implement Flask endpoints for /predict and /plot

In [8]:
from flask import Flask, request, jsonify, Response
import joblib, io
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import os

app = Flask(__name__)
MODEL_PATH = os.path.join("model", "model.pkl")

# load model once before first request
@app.before_first_request
def load_model():
    global model
    model = joblib.load(MODEL_PATH)

@app.route("/predict", methods=["POST"])
def predict():
    """
    Example: POST /predict
    Body: {"feature1": 0.5, "feature2": 4.2}
    """
    try:
        data = request.get_json(force=True)
        f1 = float(data["feature1"])
        f2 = float(data["feature2"])
        df = pd.DataFrame([{"feature1": f1, "feature2": f2, "f1_x_f2": f1 * f2}])
        pred = model.predict(df)[0]
        return jsonify({"prediction": float(pred)})
    except Exception as e:
        return jsonify({"error": str(e)}), 400

@app.route("/plot", methods=["GET"])
def plot():
    # Simple sine curve for demo
    x = np.linspace(0, 2*np.pi, 200)
    y = np.sin(x)

    fig, ax = plt.subplots()
    ax.plot(x, y)
    ax.set_title("Health Check Plot")
    buf = io.BytesIO()
    fig.savefig(buf, format="png")
    plt.close(fig)
    buf.seek(0)
    return Response(buf.getvalue(), mimetype="image/png")

if __name__ == "__main__":
    app.run(host="127.0.0.1", port=5000, debug=True)



 * Serving Flask app '__main__'
 * Debug mode: on


 * Running on http://127.0.0.1:5000
Press CTRL+C to quit
 * Restarting with watchdog (windowsapi)


SystemExit: 1

## 7. Testing the Flask API from Notebook

### TODO: Modify examples with your actual features

In [11]:
import requests

base = "http://127.0.0.1:5000"

# 1) Test POST /predict
resp = requests.post(f"{base}/predict", json={"feature1": 0.5, "feature2": 4.2})
print("POST /predict:", resp.json())

# 2) Test GET /plot
resp = requests.get(f"{base}/plot")
print("GET /plot:", resp.status_code, "bytes:", len(resp.content))

# Save returned plot as evidence
os.chdir("..") 
with open("reports/api_plot_check.png", "wb") as f:
    f.write(resp.content)


POST /predict: {'prediction': 135.47472387825084}
GET /plot: 200 bytes: 26217


## 9. Handoff Best Practices

- Ensure README.md is complete and clear
- Provide `requirements.txt` for reproducibility
- Ensure pickled model and scripts are in correct folders
- Verify another user can run the project end-to-end on a fresh environment