# Stage 13 Homework Starter — Productization

## Objective
Deploy your trained model as a **reusable, handoff-ready API or dashboard** and finalize your project for reproducibility and clarity.

## Steps
1. Create a mock, very basic analysis in a notebook.
2. Clean your notebook by removing exploratory cells and documenting your code.
3. Move reusable functions into `/src/`.
4. Load your trained model from Stage 12 or earlier stages.
5. Pickle/save the model and test reload.
6. Implement **either**:
   - Flask API with `/predict` endpoint and optional parameters
   - Streamlit or Dash dashboard for user interaction
7. Include:
   - Error handling for invalid inputs
   - `requirements.txt` for reproducibility
   - Documentation in `README.md`
8. Test your deployment locally and provide evidence.
9. Organize project folders and finalize notebooks for handoff.

## 1. Create mock, very basic analysis

In [2]:
import pandas as pd
import numpy as np

# TODO: Basic analysis step 1 - Load data
# Here we simulate a small dataset as an example
df = pd.DataFrame({
    "feature1": np.random.normal(0, 1, 100),
    "feature2": np.random.normal(5, 2, 100),
    "target": np.random.choice([0, 1], size=100)
})

print("Step 1: Data loaded.")
print(df.head())

# TODO: Basic analysis step 2 - Summary statistics
summary = df.describe()
print("\nStep 2: Summary statistics:")
print(summary)

# TODO: Basic analysis step 3 - Correlation analysis
corr = df.corr()
print("\nStep 3: Correlation matrix:")
print(corr)

# TODO: Basic analysis step 4 - Simple model fit (logistic regression as example)
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

X = df[["feature1", "feature2"]]
y = df["target"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression()
model.fit(X_train, y_train)

score = model.score(X_test, y_test)
print(f"\nStep 4: Logistic Regression test accuracy = {score:.3f}")

# Final step
print("\nBasic analysis complete.")


Step 1: Data loaded.
   feature1  feature2  target
0 -0.286585  5.295582       1
1 -0.056931  5.030768       1
2  0.467220  1.759712       0
3 -0.499170  0.480741       0
4 -0.095233  6.134598       0

Step 2: Summary statistics:
         feature1    feature2      target
count  100.000000  100.000000  100.000000
mean    -0.055147    5.159591    0.500000
std      0.970519    1.964934    0.502519
min     -3.512905    0.149378    0.000000
25%     -0.606554    4.020283    0.000000
50%     -0.003779    5.198314    0.500000
75%      0.547156    6.442581    1.000000
max      2.377478    9.538248    1.000000

Step 3: Correlation matrix:
          feature1  feature2    target
feature1  1.000000  0.022601 -0.020480
feature2  0.022601  1.000000  0.018578
target   -0.020480  0.018578  1.000000

Step 4: Logistic Regression test accuracy = 0.350

Basic analysis complete.


## 2. Notebook Cleanup
Remove exploratory cells and document your code.

In [None]:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

df = pd.DataFrame({
    "feature1": np.random.normal(0, 1, 100),
    "feature2": np.random.normal(5, 2, 100),
    "target": np.random.choice([0, 1], size=100)
})

X = df[["feature1", "feature2"]]
y = df["target"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = LogisticRegression()
model.fit(X_train, y_train)

accuracy = model.score(X_test, y_test)
print(f"Model test accuracy = {accuracy:.3f}")

print("Notebook cleaned and ready for handoff.")


Model test accuracy = 0.250
Notebook cleaned and ready for handoff.


## 3. Move reusable functions to /src/
Create src/utils.py and store functions there.

In [4]:
from sklearn.metrics import accuracy_score, precision_score, recall_score


def calculate_metrics(df: pd.DataFrame) -> pd.DataFrame:
    """
    Return basic descriptive statistics for a DataFrame.

    Parameters
    ----------
    df : pd.DataFrame
        Input DataFrame.

    Returns
    -------
    pd.DataFrame
        Descriptive statistics of the DataFrame.
    """
    return df.describe()


def split_features_target(df: pd.DataFrame, target_col: str):
    """
    Split a DataFrame into features (X) and target (y).

    Parameters
    ----------
    df : pd.DataFrame
        Input DataFrame.
    target_col : str
        Column name of target variable.

    Returns
    -------
    X : pd.DataFrame
        Features.
    y : pd.Series
        Target.
    """
    X = df.drop(columns=[target_col])
    y = df[target_col]
    return X, y


def evaluate_model(y_true, y_pred) -> dict:
    """
    Evaluate a classification model with accuracy, precision, and recall.

    Parameters
    ----------
    y_true : array-like
        True labels.
    y_pred : array-like
        Predicted labels.

    Returns
    -------
    dict
        Dictionary with accuracy, precision, and recall.
    """
    return {
        "accuracy": accuracy_score(y_true, y_pred),
        "precision": precision_score(y_true, y_pred, zero_division=0),
        "recall": recall_score(y_true, y_pred, zero_division=0),
    }

## 4. Folder Structure Reminder

Ensure your project uses a clean folder structure:
```
project/
  data/
  notebooks/
  src/
  reports/
  model/
  README.md
```
For API/Dashboard: minimal example:
```
project/
    app.py
    model.pkl
    requirements.txt
    README.md
```

## 5. Pickle / Save Final Model

### TODO: Replace this with your trained model

In [None]:
import pickle
from pathlib import Path

df = pd.DataFrame({
    "feature1": np.random.normal(0, 1, 100),
    "feature2": np.random.normal(5, 2, 100),
    "target": np.random.choice([0, 1], size=100)
})

X = df[["feature1", "feature2"]]
y = df["target"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = LogisticRegression()
model.fit(X_train, y_train)

model_dir = Path("model")
model_dir.mkdir(parents=True, exist_ok=True)

with open(model_dir / "model.pkl", "wb") as f:
    pickle.dump(model, f)

print("Model saved to model/model.pkl")

with open(model_dir / "model.pkl", "rb") as f:
    loaded_model = pickle.load(f)

print("Model loaded successfully.")

example_input = [[0.1, 0.2]]
prediction = loaded_model.predict(example_input)
print(f"Example prediction for {example_input}: {prediction}")


Model saved to model/model.pkl
Model loaded successfully.
Example prediction for [[0.1, 0.2]]: [0]




## 6. Flask API Starter

### TODO: Implement Flask endpoints for /predict and /plot

In [7]:
from flask import Flask, request, jsonify
import pickle
import numpy as np
import matplotlib.pyplot as plt
import io
import base64

app = Flask(__name__)

with open("model/model.pkl", "rb") as f:
    model = pickle.load(f)

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json()
    features = data.get('features', None)
    if features is None:
        return jsonify({'error': 'No features provided'}), 400
    try:
        X = np.array(features).reshape(1, -1)
        pred = model.predict(X).tolist()
        return jsonify({'prediction': pred})
    except Exception as e:
        return jsonify({'error': str(e)}), 400

@app.route('/predict/<float:input1>', methods=['GET'])
def predict_one(input1):
    try:
        X = np.array([[input1, 0]])
        pred = model.predict(X).tolist()
        return jsonify({'prediction': pred})
    except Exception as e:
        return jsonify({'error': str(e)}), 400

@app.route('/predict/<float:input1>/<float:input2>', methods=['GET'])
def predict_two(input1, input2):
    try:
        X = np.array([[input1, input2]])
        pred = model.predict(X).tolist()
        return jsonify({'prediction': pred})
    except Exception as e:
        return jsonify({'error': str(e)}), 400

@app.route('/plot')
def plot():
    fig, ax = plt.subplots()
    ax.plot([0, 1, 2], [0, 1, 4])
    buf = io.BytesIO()
    fig.savefig(buf, format='png')
    buf.seek(0)
    img_bytes = base64.b64encode(buf.read()).decode('utf-8')
    return f'<img src="data:image/png;base64,{img_bytes}"/>'

if __name__ == '__main__':
    app.run(port=5000)


 * Serving Flask app '__main__'
 * Debug mode: off


 * Running on http://127.0.0.1:5000
Press CTRL+C to quit
127.0.0.1 - - [29/Aug/2025 00:35:24] "GET / HTTP/1.1" 404 -
127.0.0.1 - - [29/Aug/2025 00:35:24] "GET /favicon.ico HTTP/1.1" 404 -


## 7. Testing the Flask API from Notebook

### TODO: Modify examples with your actual features

In [8]:
import requests 
from IPython.display import display, HTML

# POST /predict
response = requests.post(
    'http://127.0.0.1:5000/predict',
    json={'features':[0.1, 0.2]}
)
print(response.json())

# GET /predict/<input1>
response2 = requests.get('http://127.0.0.1:5000/predict/2.0')
print(response2.json())

# GET /predict/<input1>/<input2>
response3 = requests.get('http://127.0.0.1:5000/predict/1.0/3.0')
print(response3.json())

# GET /plot
response_plot = requests.get('http://127.0.0.1:5000/plot')
display(HTML(response_plot.text))


ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded with url: /predict (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x00000206E7564400>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝，无法连接。'))

## 8. Optional Streamlit / Dash Dashboard

### TODO: Add dashboard in a separate file (`app_streamlit.py` or `app_dash.py`)

## 9. Handoff Best Practices

- Ensure README.md is complete and clear
- Provide `requirements.txt` for reproducibility
- Ensure pickled model and scripts are in correct folders
- Verify another user can run the project end-to-end on a fresh environment