# Stage 13 Homework Starter — Productization

## Objective
Deploy your trained model as a **reusable, handoff-ready API or dashboard** and finalize your project for reproducibility and clarity.

## Steps
1. Create a mock, very basic analysis in a notebook.
2. Clean your notebook by removing exploratory cells and documenting your code.
3. Move reusable functions into `/src/`.
4. Load your trained model from Stage 12 or earlier stages.
5. Pickle/save the model and test reload.
6. Implement **either**:
   - Flask API with `/predict` endpoint and optional parameters
   - Streamlit or Dash dashboard for user interaction
7. Include:
   - Error handling for invalid inputs
   - `requirements.txt` for reproducibility
   - Documentation in `README.md`
8. Test your deployment locally and provide evidence.
9. Organize project folders and finalize notebooks for handoff.

## 1. Create mock, very basic analysis

In [1]:
import numpy as np
import pandas as pd

df = pd.read_csv('../../project/data/processed/VIX_S&P500_features.csv', parse_dates = ['date'])
df.head()

Unnamed: 0,date,vix_close,vix_high,vix_low,vix_open,sp500_close,sp500_high,sp500_low,sp500_open,sp500_volume,log_sp500_close,vix_delta,vix_spread,sp500_delta,sp500_spread
0,2019-08-28,19.35,21.639999,19.1,20.549999,2887.939941,2890.030029,2853.050049,2861.280029,3102480000,7.968299,-1.199999,2.539999,26.659912,36.97998
1,2019-08-29,17.879999,19.200001,17.6,19.02,2924.580078,2930.5,2905.669922,2910.370117,3177150000,7.980906,-1.140001,1.6,14.209961,24.830078
2,2019-08-30,18.98,19.18,17.09,17.940001,2926.459961,2940.429932,2913.320068,2937.090088,3009910000,7.981549,1.039999,2.09,-10.630127,27.109863
3,2019-09-03,19.66,21.15,19.41,20.959999,2906.27002,2914.389893,2891.850098,2909.01001,3427830000,7.974626,-1.299999,1.74,-2.73999,22.539795
4,2019-09-04,17.33,18.83,17.26,18.23,2937.780029,2938.840088,2921.860107,2924.669922,3167900000,7.985409,-0.9,1.57,13.110107,16.97998


## 2. Notebook Cleanup
Remove exploratory cells and document your code.

In [2]:
print("Notebook cleaned and ready for handoff.")

Notebook cleaned and ready for handoff.


## 3. Move reusable functions to /src/
Create src/utils.py and store functions there.

In [4]:
import sys
import os

sys.path.append(os.path.abspath(''))
from src.utils import *

print('src/utils.py successfully implemented.')

src/utils.py successfully implemented.


## 4. Folder Structure Reminder

Ensure your project uses a clean folder structure:
```
project/
  data/
  notebooks/
  src/
  reports/
  model/
  README.md
```
For API/Dashboard: minimal example:
```
project/
    app.py
    model.pkl
    requirements.txt
    README.md
```

## 5. Pickle / Save Final Model

In [15]:
# Transformed regression with delta & train-test split

import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
from sklearn.model_selection import train_test_split
import scipy.stats as st

X = df[['vix_close']]
y = df['log_sp500_close']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, shuffle = True)
lr = LinearRegression().fit(X_train, y_train)
y_pred = lr.predict(X_test)
r2 = r2_score(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred, squared = False)
print(f'Baseline R² = {r2:.4f}, RMSE = {rmse:.6f}')

Baseline R² = 0.2446, RMSE = 0.194553


In [20]:
import pickle

os.makedirs('model', exist_ok = True)

with open('model/model.pkl', 'wb') as f:
    pickle.dump(lr, f)

with open('model/model.pkl', 'rb') as f:
    loaded_model = pickle.load(f)

print(loaded_model.predict([[12]]))
print(loaded_model.predict([[15]]))
print(loaded_model.predict([[18]]))
print(loaded_model.predict([[21]]))
print(loaded_model.predict([[24]]))
print(loaded_model.predict([[27]]))

[8.47520556]
[8.43615754]
[8.39710951]
[8.35806149]
[8.31901347]
[8.27996545]




## 6. Flask API Starter

### Implement Flask endpoints for /predict and /plot

In [32]:
import jinja2
try:
    from markupsafe import Markup, escape
    jinja2.Markup = Markup
    jinja2.escape = escape
except Exception as e:
    raise ImportError("markupsafe is required for Jinja2 >= 3.1") from e

import os, io, base64, threading, pickle
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from flask import Flask, request, jsonify

MODEL_PATH = "model/model.pkl"
FEATURES = ["vix_close", "log_sp500_close"]

if os.path.exists(MODEL_PATH):
    with open(MODEL_PATH, "rb") as f:
        model = pickle.load(f)
else:
    class _DummyModel:
        def predict(self, X):
            X = np.asarray(X)
            w = np.array([0.002, -0.0001])[:X.shape[1]]
            b = 0.05
            return (X @ w) + b
    model = _DummyModel()

app = Flask(__name__)

def _validate_features_dict(d):
    if not isinstance(d, dict):
        return None, "features must be a JSON object"
    missing = [k for k in FEATURES if k not in d]
    if missing:
        return None, f"missing features: {missing}"
    try:
        row = [float(d[k]) for k in FEATURES]
    except Exception:
        return None, "all features must be numeric"
    return row, None

@app.route("/health", methods=["GET"])
def health():
    return jsonify({"ok": True, "model_loaded": model is not None, "features": FEATURES})

@app.route("/predict", methods=["POST"])
def predict_post():
    payload = request.get_json(silent=True) or {}
    feats = payload.get("features")
    if feats is None:
        return jsonify({"error": "send JSON with a 'features' object"}), 400
    row, err = _validate_features_dict(feats)
    if err:
        return jsonify({"error": err}), 400
    try:
        yhat = float(model.predict([row])[0])
        return jsonify({"prediction": yhat})
    except Exception as e:
        return jsonify({"error": str(e)}), 500

@app.route("/predict/<input1>", methods=["GET"])
def predict_get_one(input1):
    try:
        x1 = float(input1)
    except Exception:
        return jsonify({"error": "input1 must be numeric"}), 400
    row = [x1] + [0.0] * (len(FEATURES) - 1)
    try:
        yhat = float(model.predict([row])[0])
        return jsonify({"prediction": yhat, "features_used": dict(zip(FEATURES, row))})
    except Exception as e:
        return jsonify({"error": str(e)}), 500

@app.route("/predict/<input1>/<input2>", methods=["GET"])
def predict_get_two(input1, input2):
    if len(FEATURES) < 2:
        return jsonify({"error": "model expects < 2 features; use /predict/<input1>"}), 400
    try:
        x1, x2 = float(input1), float(input2)
    except Exception:
        return jsonify({"error": "both inputs must be numeric"}), 400
    row = [x1, x2] + [0.0] * (len(FEATURES) - 2)
    try:
        yhat = float(model.predict([row])[0])
        return jsonify({"prediction": yhat, "features_used": dict(zip(FEATURES, row))})
    except Exception as e:
        return jsonify({"error": str(e)}), 500

@app.route("/plot", methods=["GET"])
def plot():
    title = jinja2.escape(request.args.get("title", "Realized Volatility vs SPX Returns"))

    # 1. Log S&P 500 over time
    
    fig, ax = plt.subplots()
    ax.plot(df['date'], df['log_sp500_close'])
    ax.set_title("Log S&P 500 over Time")
    ax.set_xlabel("date")
    ax.set_ylabel("SPX log_sp500_close")
    ax.grid(True)
    
    buf = io.BytesIO()
    fig.savefig(buf, format='png')
    buf.seek(0)
    img_bytes = base64.b64encode(buf.read()).decode('utf-8')
    images_html += f'<h3>Log S&P 500 over Time</h3><img src="data:image/png;base64,{img_bytes}"/><br>'
    plt.close(fig)

    # 2. VIX Over Time
    
    fig, ax = plt.subplots()
    ax.plot(df['date'], df['vix_close'])
    ax.set_title("VIX Over Time")
    ax.set_xlabel("date")
    ax.set_ylabel("vix_close")
    ax.grid(True)

    buf = io.BytesIO()
    fig.savefig(buf, format='png')
    buf.seek(0)
    img_bytes = base64.b64encode(buf.read()).decode('utf-8')
    images_html += f'<h3>VIX Over Time</h3><img src="data:image/png;base64,{img_bytes}"/><br>'
    plt.close(fig)

def _run(): app.run(port=5000, debug=False, use_reloader=False)
threading.Thread(target=_run, daemon=True).start()
print("Flask running at http://127.0.0.1:5000  (try /health, /predict, /plot)")

Flask running at http://127.0.0.1:5000  (try /health, /predict, /plot)
 * Serving Flask app "__main__" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off


Exception in thread Thread-6 (_run):
Traceback (most recent call last):
  File "/Users/willwu/anaconda3/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/Users/willwu/anaconda3/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/var/folders/6w/t2sqjkx12cg3m5nlk8pwxlxm0000gn/T/ipykernel_24980/2096075346.py", line 127, in _run
  File "/Users/willwu/anaconda3/lib/python3.10/site-packages/flask/app.py", line 990, in run
    run_simple(host, port, self, **options)
  File "/Users/willwu/anaconda3/lib/python3.10/site-packages/werkzeug/serving.py", line 1052, in run_simple
    inner()
  File "/Users/willwu/anaconda3/lib/python3.10/site-packages/werkzeug/serving.py", line 996, in inner
    srv = make_server(
  File "/Users/willwu/anaconda3/lib/python3.10/site-packages/werkzeug/serving.py", line 847, in make_server
    return ThreadedWSGIServer(
  File "/Users/willwu/anaconda3/lib/python3.10/site-packages/werkze

## 7. Testing the Flask API from Notebook

### TODO: Modify examples with your actual features

In [34]:
import requests
from IPython.display import display, HTML

# Example feature vector
example_features = [0.070698, 0.387695, 0.85, 0.09, 260.319, 6.9, 721.180880, -0.999462, 9.0]

# POST /predict
response = requests.post(
    'http://127.0.0.1:5000/predict',
    json={'features':example_features}
)
print(response.json())

# GET /predict/<input1>
response2 = requests.get('http://127.0.0.1:5000/predict/2.0')
print(response2.json())

# GET /predict/<input1>/<input2>
response3 = requests.get('http://127.0.0.1:5000/predict/1.0/3.0')
print(response3.json())

# GET /plot
response_plot = requests.get('http://127.0.0.1:5000/plot')
display(HTML(response_plot.text))

127.0.0.1 - - [28/Aug/2025 14:18:56] "[37mGET /health HTTP/1.1[0m" 200 -
127.0.0.1 - - [28/Aug/2025 14:18:56] "[31m[1mPOST /predict HTTP/1.1[0m" 400 -
127.0.0.1 - - [28/Aug/2025 14:18:56] "[35m[1mGET /predict/15.2 HTTP/1.1[0m" 500 -
127.0.0.1 - - [28/Aug/2025 14:18:56] "[35m[1mGET /predict/15.2/5200 HTTP/1.1[0m" 500 -
127.0.0.1 - - [28/Aug/2025 14:18:56] "[37mGET /plot?title=Demo%20Chart HTTP/1.1[0m" 200 -


Health: {'features': ['vix_close', 'log_sp500_close'], 'model_loaded': True, 'ok': True}
POST /predict: 400 {"error":"missing features: ['log_sp500_close']"}

GET one: {"error":"X has 2 features, but LinearRegression is expecting 1 features as input."}

GET two: {"error":"X has 2 features, but LinearRegression is expecting 1 features as input."}

Plot HTML length: 38758


## 8. Optional Streamlit / Dash Dashboard

### TODO: Add dashboard in a separate file (`app_streamlit.py` or `app_dash.py`)

## 9. Handoff Best Practices

- Ensure README.md is complete and clear
- Provide `requirements.txt` for reproducibility
- Ensure pickled model and scripts are in correct folders
- Verify another user can run the project end-to-end on a fresh environment