# Fraud Detection API: Complete Project Documentation
### From Local Development to Production CI/CD Pipeline

## Table of Contents
#### 1- Project Setup & Structure

#### 2- Code Modularization

#### 3- Local Testing (Flask API)

#### 4- Docker Containerization

#### 5- Kubernetes Deployment

#### 6- CI/CD Pipeline Implementation

#### 7- Troubleshooting Guide

#### 8- Final Workflow Diagrams

### Phase 1: Building the Fraud Detection Model
#### Initial Repository Setup

In [None]:
# In powershell
# Create project folder
mkdir fraud-detection-cicd
cd fraud-detection-cicd

# Initialize Git
git init

# Create directory structure
mkdir -p src/{config,preprocess,feature_engineer,train,predict,evaluate} tests deployments data models logs

#### Project Structure 

In [None]:
fraud-detection-cicd/
├── .github/workflows/
│   ├── ci.yml
│   └── cd.yml
├── src/
│   ├── config.py
│   ├── preprocess.py
│   ├── feature_engineer.py
│   ├── train.py
│   ├── predict.py
│   ├── evaluate.py
│   ├── serve.py
│   ├── setup.py
│   ├── app.py
│   └── __init__.py
├── tests/
│   ├── __init__.py
│   └── test_app.py
├── deployments/
│   ├── deployment.yaml
│   ├── codebuild-project.json
│   ├── codebuild-role.json
│   ├── ecs-policy.json
│   ├── ecs-service.yaml
│   ├── pipeline.json
│   ├── task-definition.json
│   └── service.yaml
├── data                
│        └── sample_fraud.csv
├── logs 
├── Models
│        └── fraud_model.joblib
├── bucket-policy.json
├── buildspec.yml
├── codebuild-access-policy.json
├── codebuild-policy.json
├── requirements.txt 
├── Dockerfile
├── requirements.txt
└── .gitignore

#### Install Dependencies
##### Create requirements.txt with your specifications

In [None]:
pandas>=1.5.0
scikit-learn>=1.2.0
Flask>=2.0.0
joblib>=1.0.0
imbalanced-learn>=0.10.0
scipy>=1.7.0
numpy>=1.21.0
waitress>=2.1.0
gunicorn==20.1.0
python-dotenv==1.0.0

#### Key Files:

#### src/: Contains all Python modules

#### deployments/: AWS infrastructure templates

#### .github/workflows/: CI/CD automation

### Code Modularization
#### Key Modules

#### 1-Data Configuration (src/config.py)

In [None]:
from pathlib import Path
from datetime import datetime
import os
import sys

# Project setup
PROJECT_ROOT = Path(__file__).parent.parent

# Data configuration
DATA_DIR = PROJECT_ROOT / 'data'
DATA_DIR.mkdir(exist_ok=True)

# Try these data files in order (first found will be used)
DATA_PATHS = [
    DATA_DIR / 'sample_fraud.csv',  # Small sample for CI/testing (should be committed)
    DATA_DIR / 'Fraud.csv',        # Full dataset for local development (gitignored)
    Path(r'C:\Projects\fraud_detection\data\Fraud.csv')  # Fallback to original location
]

DATA_PATH = None
for path in DATA_PATHS:
    if path.exists():
        DATA_PATH = path
        break

if DATA_PATH is None:
    print("\nERROR: No suitable data file found. Please:", file=sys.stderr)
    print("1. Add 'sample_fraud.csv' to project's data/ folder for testing", file=sys.stderr)
    print("2. Or add 'Fraud.csv' to project's data/ folder for development", file=sys.stderr)
    print(f"3. Or keep original at C:\\Projects\\fraud_detection\\data\\Fraud.csv", file=sys.stderr)
    print("\nCreating empty data directory...", file=sys.stderr)
    (DATA_DIR / '.gitkeep').touch()
    sys.exit(1)

print(f"\nℹ️ Using data file at: {DATA_PATH}")

# Model configuration
MODEL_DIR = PROJECT_ROOT / 'models'
MODEL_DIR.mkdir(exist_ok=True)
MODEL_PATH = MODEL_DIR / 'fraud_model.joblib'

# Logs configuration
LOG_DIR = PROJECT_ROOT / 'logs'
LOG_DIR.mkdir(exist_ok=True)

# Data processing parameters
N_ROWS = None  # Set to None to use all rows, or specify a number (e.g., 100000)
AMOUNT_PERCENTILE = 0.95
BALANCE_PERCENTILE = 0.9

# Model training parameters
RANDOM_STATE = 42
TEST_SIZE = 0.3
SMOTE_RATIO = 0.3

class AppConfig:
    # API Settings
    HOST = "0.0.0.0"
    PORT = 8080
    DEBUG = False
    
    # Model Monitoring
    PREDICTION_LOGS = LOG_DIR / "predictions.log"
    DRIFT_THRESHOLD = 0.15
    
    # Performance
    MAX_REQUEST_SIZE = 1024 * 1024  # 1MB
    
    @classmethod
    def validate_paths(cls):
        """Ensure all required directories exist"""
        required_dirs = [
            DATA_DIR,
            MODEL_DIR,
            LOG_DIR
        ]
        for directory in required_dirs:
            directory.mkdir(exist_ok=True)
            
        if not DATA_PATH.exists():
            raise FileNotFoundError(f"Data file not found at {DATA_PATH}")

# Initialize directories
AppConfig.validate_paths()

# Environment detection
IS_CI = os.getenv('CI') == 'true'
IS_TEST = os.getenv('TEST_MODE') == 'true'

if __name__ == '__main__':
    print("\nCurrent Configuration:")
    print(f"Project Root: {PROJECT_ROOT}")
    print(f"Data File: {DATA_PATH}")
    print(f"Model Path: {MODEL_PATH}")
    print(f"Log Directory: {LOG_DIR}")
    print(f"CI Mode: {IS_CI}")
    print(f"Test Mode: {IS_TEST}")

#### 2- Data Preprocessing (src/preprocess.py)

#### What it does

Loads transaction data

Handles missing values

Splits into train/test sets

Applies feature engineering

In [None]:
import pandas as pd
from sklearn.preprocessing import PowerTransformer
from sklearn.model_selection import train_test_split
from imblearn.over_sampling import SMOTE
from src.config import DATA_PATH, N_ROWS, TEST_SIZE, RANDOM_STATE, SMOTE_RATIO
from src.feature_engineer import engineer_features

def load_and_preprocess():
    """Load and preprocess data with proper error handling"""
    try:
        print(f"Loading data from: {DATA_PATH}")
        df = pd.read_csv(DATA_PATH, nrows=N_ROWS)
        
        print("Applying feature engineering...")
        df = engineer_features(df)
        
        X = df.drop(['isFraud', 'nameOrig', 'nameDest'], axis=1)
        y = df['isFraud']
        
        # Train-test split
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=TEST_SIZE, random_state=RANDOM_STATE, stratify=y
        )
        
        # Scaling
        pt = PowerTransformer(method='yeo-johnson')
        X_train_scaled = pt.fit_transform(X_train)
        X_test_scaled = pt.transform(X_test)
        
        # Resampling
        smote = SMOTE(sampling_strategy=SMOTE_RATIO, random_state=RANDOM_STATE)
        X_res, y_res = smote.fit_resample(X_train_scaled, y_train)
        
        return X_res, y_res, X_test_scaled, y_test, pt
        
    except Exception as e:
        print(f"Error in preprocessing: {str(e)}")
        raise

#### To run:

python src/preprocess.py

#### 3- Feature Engineering (src/feature_engineer.py)

#### Key Features Created:

Transaction amount relative to account balance

Time-based features (hour, day of week)

Suspicious activity flags

In [None]:
import pandas as pd
import numpy as np
from src.config import AMOUNT_PERCENTILE, BALANCE_PERCENTILE

def engineer_features(df):
    """Feature engineering pipeline"""
    # Transaction features
    amt_thresh = df[df['isFraud']==0]['amount'].quantile(AMOUNT_PERCENTILE)
    bal_thresh = df[df['isFraud']==0]['oldbalanceOrg'].quantile(BALANCE_PERCENTILE)
    
    df['amount_to_balance'] = df['amount'] / (df['oldbalanceOrg'] + 1)
    df['high_amount_flag'] = (df['amount'] > amt_thresh).astype(int)
    df['balance_change_abs'] = df['oldbalanceOrg'] - df['newbalanceOrig']
    df['suspicious_withdrawal'] = (
        (df['balance_change_abs'] > bal_thresh) & 
        (df['amount_to_balance'] > 0.5)
    ).astype(int)
    
    # Time features
    df['hour_of_day'] = ((df['step'] - 1) % 24) + 1
    df['day_of_week'] = ((df['step'] - 1) // 24) % 7
    df['is_weekend'] = ((df['day_of_week'] == 5) | (df['day_of_week'] == 6)).astype(int)
    
    # Categorical encoding
    df = pd.get_dummies(df, columns=['type'], prefix='type')
    
    return df

#### 4- Model Training (src/train.py)

#### Implementation:

In [None]:
# src/train.py
from sklearn.ensemble import RandomForestClassifier
from joblib import dump
from src.preprocess import load_and_preprocess
from src.config import MODEL_PATH, RANDOM_STATE
from sklearn.metrics import classification_report
import pandas as pd
import os

def train_model():
    print("🚀 Starting model training...")
    
    # Load and preprocess data
    print("🔍 Loading and preprocessing data...")
    X_res, y_res, X_test, y_test, pt = load_and_preprocess()
    
    # Initialize model
    print("🤖 Initializing Random Forest model...")
    model = RandomForestClassifier(
        class_weight='balanced',
        n_estimators=50,
        max_depth=7,
        max_samples=0.8,
        n_jobs=-1,
        random_state=RANDOM_STATE
    )
    
    # Train model
    print("⚡ Training model...")
    model.fit(X_res, y_res)
    
    # Evaluate
    print("🧪 Evaluating model...")
    y_pred = model.predict(X_test)
    print(classification_report(y_test, y_pred))
    
    # Create models directory if it doesn't exist
    os.makedirs(os.path.dirname(MODEL_PATH), exist_ok=True)
    
    # Save ALL required artifacts
    artifacts = {
        'model': model,
        'transformer': pt,
        'feature_order': X_res.columns.tolist() if hasattr(X_res, 'columns') else [],
        'metadata': {
            'training_date': datetime.now().isoformat(),
            'git_commit': os.getenv('GIT_COMMIT', 'unknown'),
            'python_version': os.getenv('PYTHON_VERSION', 'unknown')
        }
    }
    
    dump(artifacts, MODEL_PATH)
    print(f"\n✅ Model successfully saved to {MODEL_PATH}")
    
if __name__ == '__main__':
    train_model()

#### To train the model:

python src/train.py

#### Expected Output:

In [None]:
🚀 Starting model training...
✅ Model successfully saved to models/fraud_model.joblib

#### 5- Predict.py

In [None]:
# src/predict.py
from joblib import load
import pandas as pd
from src.config import MODEL_PATH, AppConfig
import json
from datetime import datetime
import logging
import os

class FraudPredictor:
    def __init__(self, test_mode=False):
        self.test_mode = test_mode
        self.model = None
        self.pt = None
        self.feature_order = []
        self._init_logging()
        
        if not test_mode:
            try:
                self._load_model()
            except Exception as e:
                print(f"⚠️ Failed to load model: {str(e)}")
                # Fallback to test mode if model loading fails
                self.test_mode = True

    def _load_model(self):
        """Load model artifacts with validation"""
        if not os.path.exists(MODEL_PATH):
            raise FileNotFoundError(f"Model file not found at {MODEL_PATH}")
            
        artifacts = load(MODEL_PATH)
        
        # Validate all required components exist
        required_keys = {'model', 'transformer', 'feature_order'}
        missing_keys = required_keys - set(artifacts.keys())
        if missing_keys:
            raise ValueError(f"Missing required keys in model file: {missing_keys}")
            
        self.model = artifacts['model']
        self.pt = artifacts['transformer']
        self.feature_order = artifacts.get('feature_order', [])
        
        print("✅ Model loaded successfully")
        print(f"Model trained on: {artifacts.get('metadata', {}).get('training_date', 'unknown')}")

    def _init_logging(self):
        """Set up prediction logging"""
        os.makedirs(os.path.dirname(AppConfig.PREDICTION_LOGS), exist_ok=True)
        logging.basicConfig(
            filename=AppConfig.PREDICTION_LOGS,
            format='%(asctime)s - %(message)s',
            level=logging.INFO
        )
        self.logger = logging.getLogger(__name__)

    def _validate_input(self, data: dict) -> None:
        """Ensure minimum required fields exist"""
        required_fields = {
            'amount', 'oldbalanceOrg', 'newbalanceOrig',
            'oldbalanceDest', 'newbalanceDest', 'step',
            'isFlaggedFraud', 'type'
        }
        missing = required_fields - set(data.keys())
        if missing:
            raise ValueError(f"Missing required fields: {missing}")

    def log_prediction(self, data: dict, prediction: int):
        """Log prediction with context"""
        log_entry = {
            "timestamp": datetime.now().isoformat(),
            "input": {k: v for k, v in data.items() if k != 'type'},
            "prediction": prediction,
            "model_version": "1.0.0",
            "test_mode": self.test_mode
        }
        self.logger.info(json.dumps(log_entry))

    def preprocess(self, transaction_data: dict):
        """Recreate features EXACTLY as during training"""
        df = pd.DataFrame([transaction_data])
        
        # Feature engineering
        df['amount_to_balance'] = df['amount'] / (df['oldbalanceOrg'] + 1)
        df['high_amount_flag'] = (df['amount'] > 10000).astype(int)
        df['balance_change_abs'] = df['oldbalanceOrg'] - df['newbalanceOrig']
        df['suspicious_withdrawal'] = (
            (df['balance_change_abs'] > 5000) & 
            (df['amount_to_balance'] > 0.5)
        ).astype(int)
        
        # Time features
        df['hour_of_day'] = ((df['step'] - 1) % 24) + 1
        df['day_of_week'] = ((df['step'] - 1) // 24) % 7
        df['is_weekend'] = ((df['day_of_week'] == 5) | (df['day_of_week'] == 6)).astype(int)
        
        # Transaction type handling
        valid_types = ['CASH_IN', 'CASH_OUT', 'DEBIT', 'PAYMENT', 'TRANSFER']
        for t in valid_types:
            df[f'type_{t}'] = 0
        if 'type' in df and df['type'].iloc[0] in valid_types:
            df[f'type_{df["type"].iloc[0]}'] = 1
            
        # Verify feature match
        missing = set(self.feature_order) - set(df.columns)
        if missing:
            raise ValueError(f"Missing features after processing: {missing}")
            
        return self.pt.transform(df[self.feature_order])

    def predict(self, transaction_data: dict) -> int:
        """Make a fraud prediction"""
        if self.test_mode:
            self.log_prediction(transaction_data, 0)
            return 0  # Dummy prediction in test mode
            
        try:
            self._validate_input(transaction_data)
            processed = self.preprocess(transaction_data)
            prediction = int(self.model.predict(processed)[0])
            self.log_prediction(transaction_data, prediction)
            return prediction
        except Exception as e:
            self.logger.error(f"Prediction failed: {str(e)}")
            raise RuntimeError(f"Prediction failed: {str(e)}")

# For testing the predictor directly
if __name__ == '__main__':
    predictor = FraudPredictor(test_mode=True)
    test_data = {
        "amount": 100,
        "oldbalanceOrg": 1000,
        "newbalanceOrig": 900,
        "oldbalanceDest": 500,
        "newbalanceDest": 600,
        "step": 1,
        "isFlaggedFraud": 0,
        "type": "TRANSFER"
    }
    print("Test prediction:", predictor.predict(test_data))

#### 6- Evaluate.py

In [None]:
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

def evaluate(y_true, y_pred):
    print(classification_report(y_true, y_pred))
    
    cm = confusion_matrix(y_true, y_pred)
    plt.figure(figsize=(6,4))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
    plt.show()

#### Now create empty __init__.py 

#### setup.py

In [None]:
from setuptools import setup, find_packages

setup(
    name="fraud-detection",
    version="0.1",
    packages=find_packages(include=['src*']), 
    install_requires=[
        'pandas>=1.5.0',
        'scikit-learn>=1.2.0',
        'Flask>=2.0.0',
        'joblib>=1.0.0',
        'imbalanced-learn>=0.10.0',
        'scipy>=1.7.0',
        'numpy>=1.21.0',
        'waitress>=2.1.0'
    ],
)

### Phase 2: Local Testing & Validation

#### Flask API Implementation (src/app.py)

#### Key Endpoints:
/health: Service status check

/predict: Fraud prediction endpoint

In [None]:
from flask import Flask, request, jsonify
from src.predict import FraudPredictor
from datetime import datetime
from src.config import AppConfig
import os
from pathlib import Path

app = Flask(__name__)

def get_model_path():
    """Resolve model path for both development and Docker environments"""
    # Try multiple possible locations
    possible_paths = [
        Path('models/fraud_model.joblib'),  # Development
        Path('/app/models/fraud_model.joblib'),  # Docker
        Path(__file__).parent.parent / 'models' / 'fraud_model.joblib'  # Relative to app
    ]
    
    for path in possible_paths:
        if path.exists():
            return str(path)
    return None

def get_data_path():
    """Resolve data file path"""
    possible_paths = [
        Path('data/sample_fraud.csv'),  # Development
        Path('/app/data/sample_fraud.csv'),  # Docker
        Path(__file__).parent.parent / 'data' / 'sample_fraud.csv'  # Relative to app
    ]
    
    for path in possible_paths:
        if path.exists():
            return str(path)
    return None

# Initialize predictor with proper paths
model_path = get_model_path()
data_path = get_data_path()

is_ci = os.getenv('GITHUB_ACTIONS') == 'true'
predictor = FraudPredictor(
    test_mode=is_ci or not model_path,
    model_path=model_path,
    data_path=data_path
)

@app.route('/predict', methods=['POST'])
def predict():
    try:
        data = request.get_json()
        
        if not data:
            return jsonify({"error": "No JSON provided"}), 400
            
        # Validate required fields
        required_fields = {
            'amount', 'oldbalanceOrg', 'newbalanceOrig',
            'oldbalanceDest', 'newbalanceDest', 'step',
            'isFlaggedFraud', 'type'
        }
        missing = required_fields - set(data.keys())
        if missing:
            return jsonify({"error": f"Missing required fields: {missing}", "status": "input_error"}), 400

        prediction = predictor.predict(data)
        
        return jsonify({
            "fraud_prediction": prediction,
            "model_info": {
                "version": "1.0.0",
                "type": "RandomForest",
                "test_mode": predictor.test_mode,
                "model_used": str(model_path) if model_path else "none",
                "data_used": str(data_path) if data_path else "none"
            },
            "status": "success"
        })
        
    except Exception as e:
        return jsonify({"error": str(e), "status": "server_error"}), 500

@app.route('/health', methods=['GET'])
def health():
    return jsonify({
        "status": "healthy",
        "timestamp": datetime.now().isoformat(),
        "model_loaded": not predictor.test_mode,
        "model_path": str(model_path) if model_path else "none",
        "data_path": str(data_path) if data_path else "none"
    })

if __name__ == '__main__':
    app.run(host=AppConfig.HOST, port=AppConfig.PORT, debug=AppConfig.DEBUG)

#### To run locally:

In [None]:
# Install dependencies
pip install -r requirements.txt

# Run the Training
python src\train.py

# Run Flask
python src/app.py

#### Test the API:

In [None]:
curl -X POST http://localhost:8080/predict ^
-H "Content-Type: application/json" ^
-d "{^
\"step\": 1,^
\"amount\": 9839.64,^
\"oldbalanceOrg\": 170136.0,^
\"newbalanceOrig\": 160296.36,^
\"oldbalanceDest\": 0.0,^
\"newbalanceDest\": 9839.64,^
\"isFlaggedFraud\": 0,^
\"type\": \"CASH_OUT\",^
\"amount_to_balance\": 0.0578,^
\"high_amount_flag\": 1,^
\"balance_change_abs\": 9839.64,^
\"suspicious_withdrawal\": 0,^
\"hour_of_day\": 1,^
\"day_of_week\": 0,^
\"is_weekend\": 0^
}"

#### Create tests/test_app.py 

In [None]:
import unittest
from src.app import app
import os
import json

class TestAPI(unittest.TestCase):
    def setUp(self):
        app.config['TESTING'] = True
        self.client = app.test_client()
        self.test_data = {
            "amount": 100,
            "oldbalanceOrg": 1000,
            "newbalanceOrig": 900,
            "oldbalanceDest": 500,
            "newbalanceDest": 600,
            "step": 1,
            "isFlaggedFraud": 0,
            "type": "TRANSFER"
        }
    
    def test_health_check(self):
        response = self.client.get('/health')
        self.assertEqual(response.status_code, 200)
        self.assertEqual(response.json['status'], 'healthy')
        # Don't assert model_loaded since it depends on environment

    def test_predict_endpoint(self):
        response = self.client.post('/predict', json=self.test_data)
        self.assertEqual(response.status_code, 200)
        self.assertIn('fraud_prediction', response.json)
        # Accept either test mode or not
        self.assertIn(response.json['model_info']['test_mode'], [True, False])

    def test_invalid_input(self):
        invalid_data = self.test_data.copy()
        invalid_data.pop('amount')
        response = self.client.post('/predict', json=invalid_data)
        self.assertIn(response.status_code, [400, 500])  # Accept either error code

if __name__ == '__main__':
    unittest.main()

### 4. Docker Containerization
#### Dockerfile Configuration

In [None]:
FROM python:3.10-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    python3-dev \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements first
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY src/ ./src/
COPY app.py .
COPY config.py .

# Create data directory and copy sample data
RUN mkdir -p /app/data
COPY data/sample_fraud.csv /app/data/

# Copy model file (must exist in build context)
COPY models/fraud_model.joblib /app/models/

# Environment variables
ENV MODEL_PATH=/app/models/fraud_model.joblib
ENV FLASK_APP=app.py
ENV PYTHONPATH=/app

EXPOSE 8080
CMD ["python", "-c", "from waitress import serve; from app import app; serve(app, host='0.0.0.0', port=8080)"]

#### Create Your serve.py file serves two main purposes:

#### 1- Replaces Flask's Development Server

##### Flask's built-in server (app.run()) is not suitable for production (slow, insecure, single-threaded).

##### waitress is a production-ready WSGI server that handles multiple requests efficiently.

#### 2- Standardizes the Startup Process

##### Provides a consistent entry point for Docker to launch your app.

##### Ensures directories exist and logging is configured before starting.

#### serve.py

In [None]:
from waitress import serve
from app import app  # Import your Flask app
from src.config import PROJECT_ROOT
import os
import logging

# Production configuration
MODEL_DIR = PROJECT_ROOT / 'models'
os.makedirs(MODEL_DIR, exist_ok=True)

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger('waitress')
logger.info('Starting server...')

if __name__ == '__main__':
    print(f"🚀 Serving fraud detection API on http://localhost:8080")
    serve(app, host='0.0.0.0', port=8080)  # Production-ready server

#### Build & Run

In [None]:
docker build -t fraud-detection-api .
docker run -p 8080:8080 fraud-detection-api 

#### Test the API 

In [None]:
curl -X POST http://localhost:8080/predict ^
-H "Content-Type: application/json" ^
-d "{\"step\":1,\"amount\":1000,\"oldbalanceOrg\":5000,\"newbalanceOrig\":4000,\"oldbalanceDest\":0,\"newbalanceDest\":1000,\"isFlaggedFraud\":0,\"amount_to_balance\":0.2,\"high_amount_flag\":0,\"balance_change_abs\":1000,\"suspicious_withdrawal\":0,\"hour_of_day\":10,\"day_of_week\":2,\"is_weekend\":0,\"type\":\"CASH_OUT\",\"type_CASH_IN\":0,\"type_CASH_OUT\":1,\"type_DEBIT\":0,\"type_PAYMENT\":0,\"type_TRANSFER\":0}"


### 5. Local Kubernetes Deployment

#### Kubernetes Deployment Manifest
#### deployment.yaml

In [None]:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fraud-detection-api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: fraud-detection
  template:
    metadata:
      labels:
        app: fraud-detection
    spec:
      containers:
      - name: fraud-api
        image: moeyahya/fraud-detection-api:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
        env:
        - name: MODEL_PATH
          value: "/app/models/fraud_model.joblib"
        volumeMounts:
        - mountPath: /app/models
          name: models-volume
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"
      volumes:
      - name: models-volume
        emptyDir: {}

#### service.yaml 

In [None]:
apiVersion: v1
kind: Service
metadata:
  name: fraud-detection-service
spec:
  type: NodePort
  selector:
    app: fraud-detection
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080
      nodePort: 30080

#### To deploy:

In [None]:
Apply Configurations
kubectl apply -f deployments/deployment.yaml
kubectl apply -f deployments/service.yaml

#### Verify deployment:

In [None]:
kubectl get pods
kubectl get services

###  AWS ECS Fargate Deployment

#### AWS Infrastructure Setup

##### 1- Create ECR Repository:

In [None]:
aws ecr create-repository --repository-name fraud-detection-api --region ca-central-1

#### 2- Push Docker Image:

In [None]:
aws ecr get-login-password --region ca-central-1 | docker login --username AWS --password-stdin 311410995726.dkr.ecr.ca-central-1.amazonaws.com

docker tag fraud-detection-api:latest 311410995726.dkr.ecr.ca-central-1.amazonaws.com/fraud-detection-api:latest

docker push 311410995726.dkr.ecr.ca-central-1.amazonaws.com/fraud-detection-api:latest

### GitHub Setup & CI
#### Initialize Git Repository

In [None]:
git init
git add .
git commit -m "Initial commit"

# Create GitHub repo manually first
git remote add origin https://github.com/yourusername/fraud-detection.git
git push -u origin main

### Configure GitHub Actions
#### GitHub Actions Workflow (ci.yml)

In [None]:
name: Continuous Integration

on:
  push:
    branches: [ "main" ]
  pull_request:
    branches: [ "main" ]

jobs:
  test:
    runs-on: ubuntu-latest
    timeout-minutes: 15

    steps:
    - uses: actions/checkout@v4
    
    - name: Set up Python 3.10
      uses: actions/setup-python@v4
      with:
        python-version: '3.10'
        cache: 'pip'

    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install pandas pytest pytest-cov joblib scikit-learn imbalanced-learn
        pip install -e .

    - name: Create sample data
      run: |
        mkdir -p data
        python -c "
        import pandas as pd;
        df = pd.DataFrame({
            'step': [1, 2, 3, 4, 5],
            'type': ['CASH_IN', 'CASH_OUT', 'PAYMENT', 'TRANSFER', 'DEBIT'],
            'amount': [100, 200, 300, 400, 500],
            'nameOrig': ['A', 'B', 'C', 'D', 'E'],
            'oldbalanceOrg': [1000, 2000, 3000, 4000, 5000],
            'newbalanceOrig': [900, 1900, 2900, 3900, 4900],
            'nameDest': ['X', 'Y', 'Z', 'W', 'V'],
            'oldbalanceDest': [500, 600, 700, 800, 900],
            'newbalanceDest': [600, 700, 800, 900, 1000],
            'isFraud': [0, 1, 0, 1, 0],
            'isFlaggedFraud': [0, 0, 0, 0, 0]
        });
        df.to_csv('data/sample_fraud.csv', index=False)
        "

    - name: Run tests
      run: |
        PYTHONPATH=$PYTHONPATH:$GITHUB_WORKSPACE pytest tests/ \
          --cov=src \
          --cov-report=xml \
          --cov-report=term-missing \
          -v

    - name: Upload coverage
      uses: codecov/codecov-action@v3
      with:
        token: ${{ secrets.CODECOV_TOKEN }}
        files: coverage.xml
        flags: unittests

#### AWS Infrastructure Setup

#### Create S3 Bucket for Artifacts

In [None]:
$BUCKET_NAME = "fraud-detection-artifacts-" + (Get-Date -Format "yyyyMMddHHmmss")
aws s3api create-bucket --bucket $BUCKET_NAME --region ca-central-1 --create-bucket-configuration LocationConstraint=ca-central-1

#### Bucket-policy.json 

In [None]:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "codepipeline.amazonaws.com"
      },
      "Action": [
        "s3:GetObject",
        "s3:GetObjectVersion",
        "s3:GetBucketVersioning",
        "s3:PutObject"
      ],
      "Resource": [
        "arn:aws:s3:::fraud-detection-artifacts-*/*",
        "arn:aws:s3:::fraud-detection-artifacts-*"
      ]
    }
  ]
}


In [None]:
aws s3api put-bucket-policy --bucket $BUCKET_NAME --policy file://deployments/bucket-policy.json

#### Create IAM Roles
##### Codebuild Role

In [None]:
// C:\Projects\fraud-detection-cicd\deployments\codebuild-role.json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "codebuild.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}


In [None]:
aws iam create-role --role-name CodeBuildFraudDetectionRole --assume-role-policy-document file://deployments/codebuild-role.json

# Attach managed policies
aws iam attach-role-policy --role-name CodeBuildFraudDetectionRole --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryPowerUser
aws iam attach-role-policy --role-name CodeBuildFraudDetectionRole --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
aws iam attach-role-policy --role-name CodeBuildFraudDetectionRole --policy-arn arn:aws:iam::aws:policy/AmazonECS_FullAccess

#### Attach Custom Policies

In [None]:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ecr:GetAuthorizationToken",
        "ecr:BatchCheckLayerAvailability",
        "ecr:GetDownloadUrlForLayer",
        "ecr:GetRepositoryPolicy",
        "ecr:DescribeRepositories",
        "ecr:ListImages",
        "ecr:DescribeImages",
        "ecr:BatchGetImage",
        "ecr:InitiateLayerUpload",
        "ecr:UploadLayerPart",
        "ecr:CompleteLayerUpload",
        "ecr:PutImage",
        "ecr:CreateRepository"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:ca-central-1:311410995726:*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:GetObjectVersion",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::codepipeline-ca-central-1-*/*"
    }
  ]
}

In [None]:
aws iam put-role-policy --role-name CodeBuildFraudDetectionRole --policy-name CodeBuildECR --policy-document file://deployments/codebuild-policy.json

#### CodePipeline Role

In [None]:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "codepipeline.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

In [None]:
aws iam create-role --role-name CodePipelineServiceRole --assume-role-policy-document file://deployments/codepipeline-role.json

# Attach managed policies
aws iam attach-role-policy --role-name CodePipelineServiceRole --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
aws iam attach-role-policy --role-name CodePipelineServiceRole --policy-arn arn:aws:iam::aws:policy/AWSCodePipeline_FullAccess

#### Attach custom ECS policy
##### ecs-policy.json

In [None]:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecs:DescribeServices",
                "ecs:DescribeTaskDefinition",
                "ecs:DescribeTasks",
                "ecs:ListTasks",
                "ecs:RegisterTaskDefinition",
                "ecs:UpdateService"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ],
            "Resource": [
                "arn:aws:iam::311410995726:role/ecsTaskExecutionRole"
            ]
        }
    ]
}

In [None]:
aws iam put-role-policy --role-name CodePipelineServiceRole --policy-name ECSPermissions --policy-document file://deployments/ecs-policy.json

#### ECS Task Execution Role

In [None]:
aws iam create-role --role-name ecsTaskExecutionRole --assume-role-policy-document file://deployments/ecs-task-execution-role.json

# Attach managed policies
aws iam attach-role-policy --role-name ecsTaskExecutionRole --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
aws iam attach-role-policy --role-name ecsTaskExecutionRole --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly

#### Create ECR Repository

In [None]:
aws ecr create-repository --repository-name fraud-detection-api --region ca-central-1

#### Push Docker Image

In [None]:
aws ecr get-login-password --region ca-central-1 | docker login --username AWS --password-stdin 311410995726.dkr.ecr.ca-central-1.amazonaws.com

docker tag fraud-detection-api:latest 311410995726.dkr.ecr.ca-central-1.amazonaws.com/fraud-detection-api:latest

docker push 311410995726.dkr.ecr.ca-central-1.amazonaws.com/fraud-detection-api:latest

#### Configure ECS

In [None]:
# task-definition.json
{
  "family": "fraud-detection-task",
  "networkMode": "awsvpc",
  "executionRoleArn": "arn:aws:iam::311410995726:role/ecsTaskExecutionRole",
  "containerDefinitions": [
    {
      "name": "fraud-detection-api",
      "image": "311410995726.dkr.ecr.ca-central-1.amazonaws.com/fraud-detection-api:latest",
      "essential": true,
      "portMappings": [
        {
          "containerPort": 8080,
          "hostPort": 8080,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {
          "name": "MODEL_PATH",
          "value": "/app/models/fraud_model.joblib"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/fraud-detection-task",
          "awslogs-region": "ca-central-1",
          "awslogs-stream-prefix": "ecs"
        }
      }
    }
  ],
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "256",
  "memory": "512"
}

#### Register Task Definition

In [None]:
aws ecs register-task-definition --cli-input-json file://deployments/task-definition.json

#### Create ECS Service

##### ecs-service.yaml

In [None]:
service: fraud-detection-service
cluster: fraud-detection-cluster
taskDefinition: fraud-detection-task
desiredCount: 1
launchType: FARGATE
networkConfiguration:
  awsvpcConfiguration:
    subnets:
      - subnet-12345678  # ← Replace with your subnet
      - subnet-87654321  # ← Replace with your subnet
    securityGroups:
      - sg-12345678     # ← Replace with your security group
    assignPublicIp: ENABLED

In [None]:
aws ecs create-service --cli-input-json file://deployments/ecs-service.yaml

#### AWS CodePipeline Setup

#### Pipeline Structure:

#### 1- Source Stage: GitHub repository

#### 2- Build Stage: CodeBuild project

#### 3- Deploy Stage: ECS service update

#### CodeBuild Configuration
##### Buildspec.yml:

In [None]:
version: 0.2

env:
  variables:
    AWS_ACCOUNT_ID: "311410995726"
    AWS_REGION: "ca-central-1"
    ECR_REPOSITORY: "fraud-detection-api"
    IMAGE_TAG: "latest"

phases:
  pre_build:
    commands:
      - echo "Logging in to ECR..."
      - aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com
      - echo "Checking ECR repo..."
      - aws ecr describe-repositories --repository-names $ECR_REPOSITORY || aws ecr create-repository --repository-name $ECR_REPOSITORY
  build:
    commands:
      - echo "Building Docker image..."
      - docker build -t $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$ECR_REPOSITORY:$IMAGE_TAG .
      - docker tag $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$ECR_REPOSITORY:$IMAGE_TAG $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$ECR_REPOSITORY:$IMAGE_TAG
  post_build:
    commands:
      - echo "Pushing to ECR..."
      - docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$ECR_REPOSITORY:$IMAGE_TAG
      - echo "Creating imagedefinitions.json..."
      - printf '[{"name":"fraud-detection-api","imageUri":"%s"}]' $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$ECR_REPOSITORY:$IMAGE_TAG > imagedefinitions.json

artifacts:
  files:
    - imagedefinitions.json

#### Verification

##### Check ECS Service:

In [None]:
aws ecs describe-services --cluster fraud-detection-cluster --services fraud-detection-service

#### CodePipeline Setup
##### pipeline.json

In [None]:
{
  "pipeline": {
    "name": "fraud-detection-pipeline",
    "roleArn": "arn:aws:iam::311410995726:role/CodePipelineServiceRole",
    "artifactStore": {
      "type": "S3",
      "location": "fraud-detection-artifacts-20250430013204"
    },
    "stages": [
      {
        "name": "Source",
        "actions": [
          {
            "name": "GitHub_Source",
            "actionTypeId": {
              "category": "Source",
              "owner": "ThirdParty",
              "provider": "GitHub",
              "version": "1"
            },
            "configuration": {
              "Owner": "moeyahya",
              "Repo": "fraud-detection-ml-api-aws-cicd",
              "Branch": "main",
              "OAuthToken": "{{resolve:secretsmanager:GITHUBTOKENSECRET:SecretString:token}}"
            },
            "outputArtifacts": [
              {
                "name": "SourceOutput"
              }
            ]
          }
        ]
      },
      {
        "name": "Build",
        "actions": [
          {
            "name": "Build",
            "actionTypeId": {
              "category": "Build",
              "owner": "AWS",
              "provider": "CodeBuild",
              "version": "1"
            },
            "configuration": {
              "ProjectName": "fraud-detection-build"
            },
            "inputArtifacts": [
              {
                "name": "SourceOutput"
              }
            ],
            "outputArtifacts": [
              {
                "name": "BuildOutput"
              }
            ]
          }
        ]
      },
      {
        "name": "Deploy",
        "actions": [
          {
            "name": "Deploy",
            "actionTypeId": {
              "category": "Deploy",
              "owner": "AWS",
              "provider": "ECS",
              "version": "1"
            },
            "configuration": {
              "ClusterName": "fraud-detection-cluster",
              "ServiceName": "fraud-detection-service",
              "FileName": "imagedefinitions.json"
            },
            "inputArtifacts": [
              {
                "name": "BuildOutput"
              }
            ]
          }
        ]
      }
    ]
  }
}

#### Create Pipeline

In [None]:
aws codepipeline create-pipeline --cli-input-json file://deployments/pipeline.json

#### Start Pipeline 

In [None]:
aws codepipeline start-pipeline-execution `
  --name fraud-detection-pipeline `
  --region ca-central-1

### Verification & Monitoring

#### Check Pipeline Status

In [None]:
aws codepipeline get-pipeline-state --name fraud-detection-pipeline

#### Test Live Endpoint

In [None]:
ALB_DNS=$(aws elbv2 describe-load-balancers --query 'LoadBalancers[0].DNSName' --output text)
curl -X POST http://$ALB_DNS/predict -H "Content-Type: application/json" -d '{
    "amount": 5000,
    "oldbalanceOrg": 10000,
    "newbalanceOrig": 5000,
    "oldbalanceDest": 2000,
    "newbalanceDest": 7000,
    "step": 1,
    "type": "CASH_OUT"
}'

#### View Logs

In [None]:
aws logs tail /ecs/fraud-detection-task --follow

#### CD Pipeline (cd.yml)

In [None]:
name: Continuous Deployment

on:
  push:
    branches: [ "main" ]
    paths:
      - 'src/**'
      - 'Dockerfile'
      - 'requirements.txt'
      - 'deployments/**'

jobs:
  deploy-to-ecs:
    runs-on: ubuntu-latest
    environment: production
    
    steps:
    - uses: actions/checkout@v4
    
    - name: Configure AWS Credentials
      uses: aws-actions/configure-aws-credentials@v2
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-region: ca-central-1

    - name: Login to Amazon ECR
      id: login-ecr
      uses: aws-actions/amazon-ecr-login@v1

    - name: Build, tag, and push to ECR
      env:
        ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
        ECR_REPOSITORY: fraud-detection-api
        IMAGE_TAG: latest
      run: |
        docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
        docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG

    - name: Deploy to ECS
      run: |
        # Register task definition
        aws ecs register-task-definition \
          --cli-input-json file://deployments/ecs-task-definition.json \
          --region ca-central-1

        # Update ECS service
        aws ecs update-service \
          --cluster fraud-api-cluster \
          --service fraud-detection-service \
          --task-definition fraud-detection-task \
          --region ca-central-1

#### Create a .gitignore File (Could becreated at very early stages as we know this will be needed)

In [None]:
# Data files
data/
!data/.gitkeep

# Byte-compiled files
__pycache__/
*.pyc

# Logs
logs/

# Models
models/

# Environment files
.env
.venv
venv/

# Editor files
.idea/
.vscode/
*.swp
*.swo

# System files
.DS_Storedata/
data/
models/
logs/


### 2. Set Up AWS Infrastructure
#### 2.1 Create ECR Repository

In [None]:
aws ecr create-repository --repository-name fraud-detection-api --region ca-central-1

#### 2.2 Create ECS Cluster (Fargate) 

In [None]:
aws ecs create-cluster --cluster-name fraud-api-cluster --region ca-central-1

#### 2.3 Create S3 Bucket for Artifacts

In [None]:
$BUCKET_NAME = "fraud-detection-artifacts-" + (Get-Date -Format "yyyyMMddHHmmss")
aws s3api create-bucket --bucket $BUCKET_NAME --region ca-central-1

#### 3.3 Create CodeBuild Project

#### codebuild-project.json

In [None]:
{
  "name": "fraud-detection-build",
  "source": { "type": "CODEPIPELINE" },
  "artifacts": { "type": "CODEPIPELINE" },
  "environment": {
    "type": "LINUX_CONTAINER",
    "image": "aws/codebuild/amazonlinux2-x86_64-standard:4.0",
    "computeType": "BUILD_GENERAL1_SMALL",
    "privilegedMode": true,
    "environmentVariables": [
      { "name": "AWS_ACCOUNT_ID", "value": "311410995726" },
      { "name": "AWS_REGION", "value": "ca-central-1" }
    ]
  },
  "serviceRole": "CodeBuildFraudDetectionRole"
}

In [None]:
aws codebuild create-project --cli-input-json file://deployments/codebuild-project.json --region ca-central-1