In [None]:
# Example snippet to update lambda/lambda_function.py programmatically (prints instructions, does not edit files)
print('Open lambda/lambda_function.py and set EndpointName to:', ENDPOINT_NAME)
print('\nEnsure the Lambda execution role has:')
print('- sagemaker:InvokeEndpoint')
print('- dynamodb:PutItem on table PhishingDetections')

### Update Lambda with endpoint name

If you deploy to SageMaker, update `lambda/lambda_function.py` so `EndpointName` matches `ENDPOINT_NAME` above. Also ensure the Lambda role has permission to invoke the SageMaker endpoint and to write to DynamoDB.

In [None]:
# Build a CSV body using the same feature extractor
def features_to_csv(url: str):
    X = extract_features(url).reshape(-1)
    return ','.join(map(str, X.tolist()))

sample_url = 'http://phishy.example.com/login'
body = features_to_csv(sample_url)

resp = sagemaker_runtime.invoke_endpoint(
    EndpointName=ENDPOINT_NAME,
    ContentType='text/csv',
    Body=body
)

# Many runtime responses are bytes; decode and print
result = resp['Body'].read().decode()
print('Raw runtime response:', result)

### Invoke the endpoint using sagemaker-runtime

This cell demonstrates invoking the endpoint with CSV body matching the feature vector format used by the model.

In [None]:
try:
    resp = sagemaker.create_endpoint(
        EndpointName=ENDPOINT_NAME,
        EndpointConfigName=ENDPOINT_CONFIG_NAME
    )
    print('Create endpoint response:', resp)
except sagemaker.exceptions.ClientError as e:
    print('Create endpoint failed or already exists:', e)

# Wait for endpoint to be InService
print('Waiting for endpoint to be InService... (this may take several minutes)')
waiter = sagemaker.get_waiter('endpoint_in_service')
waiter.wait(EndpointName=ENDPOINT_NAME)
print('Endpoint is InService')

### Create endpoint

Create or update the endpoint. This operation can take several minutes depending on the instance type.

In [None]:
try:
    resp = sagemaker.create_endpoint_config(
        EndpointConfigName=ENDPOINT_CONFIG_NAME,
        ProductionVariants=[{
            'VariantName': 'AllTraffic',
            'ModelName': MODEL_NAME,
            'InitialInstanceCount': INSTANCE_COUNT,
            'InstanceType': INSTANCE_TYPE,
            'InitialVariantWeight': 1
        }]
    )
    print('Endpoint config created:', resp)
except sagemaker.exceptions.ClientError as e:
    print('Create endpoint config failed or already exists:', e)

### Create endpoint configuration

Create an endpoint config referencing the model name and instance type.

In [None]:
create_model_payload = {
    'ModelName': MODEL_NAME,
    'PrimaryContainer': {
        'Image': IMAGE_URI,
        'ModelDataUrl': S3_MODEL_ARTIFACT
    },
    'ExecutionRoleArn': ROLE_ARN
}

try:
    resp = sagemaker.create_model(**create_model_payload)
    print('Create model response:', resp)
except sagemaker.exceptions.ClientError as e:
    print('Create model failed or model already exists:', e)

### Create model (Using a custom container)

If you're using a custom inference container, register the model pointing to the ECR image and S3 model artifact. For scikit-learn models you can use the pre-built SKLearn containers or package a custom container.

In [None]:
import boto3
import time

sagemaker = boto3.client('sagemaker')
sagemaker_runtime = boto3.client('sagemaker-runtime')

# Configuration variables - update before running
S3_MODEL_ARTIFACT = 's3://your-bucket/path/to/model.tar.gz'  # artifact tarball containing model.joblib or framework-specific model
MODEL_NAME = 'phishing-detector-model-1'
ENDPOINT_CONFIG_NAME = 'phishing-detector-endpoint-config-1'
ENDPOINT_NAME = 'phishing-detector-endpoint-1'
ROLE_ARN = 'arn:aws:iam::123456789012:role/SageMakerExecutionRole'  # replace with your role
IMAGE_URI = '123456789012.dkr.ecr.us-west-2.amazonaws.com/phishing-detector:latest'  # for custom container
INSTANCE_TYPE = 'ml.t2.medium'
INSTANCE_COUNT = 1

print('SageMaker client ready')

## SageMaker deployment (optional)

The cells below show how to register the model with SageMaker, create an endpoint configuration, create an endpoint, wait for it to be InService, and invoke it using the `sagemaker-runtime` client. These operations require AWS credentials configured in the environment with permissions for SageMaker and S3.

Important: this notebook does not create IAM roles. Use an existing SageMaker execution role ARN with appropriate permissions.

In [None]:
import hashlib, subprocess

# Save model with metadata
os.makedirs('../output', exist_ok=True)
artifact_path = '../output/model.joblib'
if 'model' in globals():
    joblib.dump(model, artifact_path)
    commit_hash = subprocess.check_output(['git','rev-parse','--short','HEAD']).decode().strip() if os.path.exists('.git') else 'local'
    metadata = {'artifact': os.path.basename(artifact_path), 'version': '1.0', 'commit': commit_hash}
    with open('../output/model.metadata.json','w') as f:
        json.dump(metadata, f)
    print('Saved model and metadata to ../output')
else:
    print('No model in memory to save. Copy your trained artifact to ../output/model.joblib')

## 13) Save and version model artifacts

Save the model artifact alongside metadata (version, commit hash) and optionally upload to S3 or an artifact store.

In [None]:
workflow_yaml = '''
name: CI
on: [push, pull_request]

jobs:
  test-and-build:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: [3.8, 3.9]
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: ${{ matrix.python-version }}
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt
      - name: Run tests
        run: |
          pytest -q
      - name: Build Docker image
        run: |
          docker build -t phishing-detector:${{ github.sha }} .
'''

print('Sample workflow YAML written to variable')

## 12) Automated tests: CI workflow (GitHub Actions)

A sample workflow to run tests and build the Docker image.

## 11) Build and run Docker container (commands)

Example commands (run in PowerShell):

```powershell
docker build -t phishing-detector:latest .
docker run -p 8000:8000 --rm phishing-detector:latest
```

Then call the endpoint:

```powershell
curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" -d '{"url":"https://example.com/login"}'
```

In [None]:
dockerfile_content = '''
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY model_server.py ./
COPY model.joblib ./
EXPOSE 8000
CMD ["uvicorn","model_server:app","--host","0.0.0.0","--port","8000"]
'''
with open('Dockerfile','w') as f:
    f.write(dockerfile_content)
print('Wrote Dockerfile')

## 10) Create Dockerfile for containerized deployment

Create a Dockerfile that installs dependencies, copies the model and app, and runs uvicorn.

In [None]:
# Run uvicorn in a subprocess (suitable for local dev; stop manually when done)
import subprocess
import time

uvicorn_proc = subprocess.Popen(['uvicorn','model_server:app','--host','127.0.0.1','--port','8000'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print('Started uvicorn PID', uvicorn_proc.pid)

time.sleep(1)

# call predict
import requests
resp = requests.post('http://127.0.0.1:8000/predict', json={'url': example_url})
print('Status:', resp.status_code)
print('Response:', resp.json())

# Terminate uvicorn
uvicorn_proc.terminate()
uvicorn_proc.wait()
print('Stopped uvicorn')

## 9) Run API locally with Uvicorn

Run the FastAPI app locally and call the /predict endpoint using `requests`. The cell below runs uvicorn in the background (note: notebook kernels differ in handling background processes).

In [None]:
%%bash
cat > model_server.py <<'PY'
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np

class InputSchema(BaseModel):
    url: str

class OutputSchema(BaseModel):
    url: str
    prediction: str
    confidence: float

app = FastAPI()

MODEL_PATH = 'model.joblib'

try:
    model = joblib.load(MODEL_PATH)
except Exception:
    model = None

@app.get('/health')
async def health():
    return {'status':'ok', 'model_loaded': model is not None}

@app.post('/predict', response_model=OutputSchema)
async def predict(payload: InputSchema):
    url = payload.url
    # feature extraction - same as training
    X = np.array([len(url), url.count('.'), url.count('-'), url.count('@'), int('https' in url), int('login' in url)]).reshape(1,-1)
    if model is None:
        return {'url': url, 'prediction': 'error', 'confidence': 0.0}
    if hasattr(model,'predict_proba'):
        proba = model.predict_proba(X)
        score = float(proba[0,1]) if proba.shape[1]>1 else float(proba[0,0])
        label = 'phishing' if score>0.5 else 'legit'
    else:
        pred = model.predict(X)
        label = 'phishing' if int(pred[0])==1 else 'legit'
        score = float(pred[0])
    return {'url': url, 'prediction': label, 'confidence': round(score,4)}
PY

print('Wrote model_server.py')

## 8) Build a FastAPI inference endpoint

We'll create a minimal FastAPI app in this notebook (written to `model_server.py`) that loads the model and exposes /health and /predict endpoints.

In [None]:
# Create a small tests directory and files programmatically (for demonstration)
os.makedirs('tests', exist_ok=True)
with open('tests/test_inference.py','w') as f:
    f.write('''
import numpy as np
from model_deploy_demo import extract_features, postprocess_proba

def test_extract_features():
    f = extract_features('http://a.b')
    assert f.shape == (1,6)

def test_postprocess():
    label, score = postprocess_proba(np.array([[0.2,0.8]]))
    assert label == 'phishing'
''')

print('Wrote tests/tests_inference.py')

# Run pytest
print('\nTo run tests from terminal:')
print('pytest -q tests')

## 7) Local inference tests and unit tests

Create quick pytest tests to validate preprocessing and inference logic.

In [None]:
def predict_url(url: str, model):
    X = extract_features(url)
    if hasattr(model, 'predict_proba'):
        proba = model.predict_proba(X)
        label, score = postprocess_proba(proba)
    else:
        pred = model.predict(X)
        # assume binary 0/1
        label = 'phishing' if int(pred[0])==1 else 'legit'
        score = float(pred[0])
    return {
        'url': url,
        'prediction': label,
        'confidence': score
    }

# Quick local test (will warn if model not loaded)
if 'model' in globals():
    print('Test prediction:', predict_url(example_url, model))
else:
    print('Model not loaded. Skipping quick test.')

## 6) Model inference function

Combine preprocessing, model predict/predict_proba, and postprocessing. Handle CPU-only environments by default.

In [None]:
class OutputSchema(BaseModel):
    url: str
    prediction: str
    confidence: float


def postprocess_proba(proba: np.ndarray, threshold: float = 0.5):
    score = float(proba[0,1]) if proba.shape[1] > 1 else float(proba[0,0])
    label = 'phishing' if score > threshold else 'legit'
    return label, round(score, 4)

# Example output
print('Example postprocess:', postprocess_proba(np.array([[0.3,0.7]])))

## 5) Define postprocessing and output schema

Map model outputs to human-readable labels and format the confidence score.

In [1]:
class InputSchema(BaseModel):
    url: str


def extract_features(url: str):
    return np.array([
        len(url),
        url.count('.'),
        url.count('-'),
        url.count('@'),
        int('https' in url),
        int('login' in url)
    ]).reshape(1, -1)

# Example
example_url = 'https://secure-login.example.com/account'
print('Example features:', extract_features(example_url))

NameError: name 'BaseModel' is not defined

## 4) Define input schema and preprocessing

We replicate the feature extraction used during training. The Lambda and training notebooks use similar logic: URL length, dot count, dash count, @ count, https presence, suspicious tokens (e.g., 'login'). Adjust as needed to match your training features.

In [None]:
# Update this path to the model artifact produced by training
MODEL_PATH = os.path.join('..','output','model.joblib')

if not os.path.exists(MODEL_PATH):
    print('Model artifact not found at', MODEL_PATH)
else:
    model = joblib.load(MODEL_PATH)
    print('Loaded model from', MODEL_PATH)
    try:
        # scikit-learn style
        print('Model type:', type(model))
        if hasattr(model, 'predict_proba'):
            print('Supports predict_proba')
    except Exception as e:
        print('Model inspection failed:', e)

## 3) Load trained model artifact

This section loads a pre-trained model artifact saved with `joblib`. Update `MODEL_PATH` to point to the artifact location (local or S3-extracted local path).

In [None]:
import os
import json
import joblib
import numpy as np
import pandas as pd
from typing import List, Dict

# FastAPI imports for later
from pydantic import BaseModel

print('Imports ready')

## 2) Import required libraries

The notebook uses the following imports for model loading, preprocessing, and serving.

In [None]:
# Shell check - list installed packages (run after activating .venv)
import sys
!python -V
!pip freeze | sed -n '1,100p'

## 1) Notebook setup and environment

- Python version: 3.8+ recommended (match deployment target)
- Use a virtual environment and `requirements.txt` for reproducibility.

Create a local virtual environment and install dependencies:

```powershell
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
```

Example `requirements.txt` (used by cells later):

```
fastapi
uvicorn[standard]
pydantic
numpy
pandas
scikit-learn
joblib
pytest
requests
```

Verify installed packages in a notebook shell cell below.

# Deploy Model: package, local API, Docker, and CI

This notebook demonstrates preparing, testing, and packaging a trained ML model for deployment.

Outline:
1. Environment setup
2. Imports
3. Load model artifact
4. Preprocessing & schema
5. Postprocessing & schema
6. Inference function
7. Tests
8. FastAPI app
9. Run locally with uvicorn
10. Dockerfile
11. Docker build/run
12. CI workflow
13. Save/version artifact

This notebook is intended to be runnable locally for development and to provide artifacts for deployment on AWS SageMaker or a container platform.