Experiment Tracking MLOps Pipeline

Capstone Project - MBDS Term 3

Author: Roche - Group 1

Date: March 16, 2026

Project Version: 0.2.0

Overview

This project implements an end-to-end MLOps pipeline for tracking experiments at risk of delay in a laboratory setting. The system predicts operational risk scores for experiments based on workflow logs, instrument telemetry, reagent data, and queue management information.

Key Features

Data Ingestion: Automated collection from lab instruments and workflow systems
Data Processing: Multi-layer data lake architecture (Bronze, Silver, Gold) with event-driven ETL
Risk Prediction: LightGBM final model for experiment delay prediction (holdout ROC-AUC 0.977, PR-AUC 0.94)
Real-time Monitoring: Event-driven architecture via AWS EventBridge for continuous risk assessment
Automated Alerts: Email notifications via SendGrid for high-risk experiments
Interactive Dashboard: Streamlit application with embedded Tableau for experiment monitoring
Model Governance: MLflow tracking for model versioning and artifact management
Drift Detection: Continuous model performance monitoring with automated retraining via SageMaker
Model Training at Scale: Dockerized SageMaker training infrastructure for automated model updates

Business Value

The system helps laboratory managers:

Proactively identify experiments likely to experience delays
Optimize resource allocation and scheduling
Reduce operational costs through predictive maintenance
Improve overall lab efficiency and throughput

Project Structure

├── 01-Documents/                       # Project documentation
├── 02-Architecture/                    # Architecture diagrams
│   └── Roche_RFP_Architecture.drawio
├── 03-Data/                            # Data generation and processing
│   ├── 01_generate_workflow_logs.py
│   ├── 02_generate_instrument_telemetry.py
│   ├── 03_generate_reagent_logs.py
│   ├── 04_generate_queue_logs.py
│   ├── 05_dataset_generator.py
│   ├── config.py
│   ├── data_review.ipynb
│   ├── Raw/                            # Raw generated data
│   ├── Processed/                      # Processed datasets with feature importance
│   └── Documents/
├── 04-EDA/                             # Exploratory Data Analysis notebooks
│   ├── EDA_processed_file.ipynb        # Analysis of processed features
│   ├── EDA_workflow.ipynb
│   ├── EDA_telemetry.ipynb
│   ├── EDA_reagent.ipynb
│   ├── EDA_queue.ipynb
│   ├── helpers.py
│   └── figures/                        # EDA visualizations and summaries
├── 05-Experiment/                      # Machine learning experiments
│   ├── ML Final Model.ipynb            # Finalized LightGBM model
│   ├── ML Model I.ipynb
│   ├── ML Model II.ipynb
│   ├── ML Model III.ipynb
│   ├── helpers.py
│   └── ml_files/
├── 06-Deployment/                      # Production deployment code
│   ├── Docs/
│   ├── Experiment_interface/           # Streamlit dashboard application
│   │   ├── app.py
│   │   ├── Dockerfile
│   │   └── requirements.txt
│   ├── Inference_API/                  # Flask ML API for real-time inference
│   │   ├── app.py
│   │   ├── detect_drift.py
│   │   ├── inference.py
│   │   ├── retraining.py
│   │   ├── send_retraining_alert.py
│   │   ├── Dockerfile
│   │   └── requirements.txt
│   ├── Lambda_functions/               # Serverless processing functions
│   │   ├── template.yaml
│   │   ├── consolidate_dataset/        # Data consolidation ETL
│   │   ├── dashboard_data/             # Dashboard data synchronization
│   │   ├── generate_datasets/          # Dataset generation triggers
│   │   ├── run_inference/              # Inference orchestration
│   │   └── send_email_alert/           # Risk alert notifications
│   └── Sagemaker_Training_Image/       # Docker image for SageMaker training
│       ├── Dockerfile
│       ├── requirements.txt
│       └── retraining.py
├── 07-Deliverables/                    # Final project deliverables
│   └── Roche_G1_Dashboard.twb          # Dashboard embeded in Streamlit          
│   └── Roche_G1_ML_Final_Model.ipynb   # Final trained model and results
│   └── Roche_G1_Poster.pdf             
│   └── Roche_G1_ppt.pdf                
├── 00-Backups/                         # Previous versions and backups
├── pyproject.toml                      # Project configuration
├── requirements.txt                    # Dependencies
└── README.md                           # This file

Architecture

High-Level Architecture

The system follows an event-driven lakehouse architecture deployed on AWS, utilizing serverless components and containerized services for scalability and cost-efficiency.

flowchart TD

    Sources["🔬 Lab Instruments & Workflow Systems"]

    subgraph AWS[" "]
        subgraph DataLake["Data Lake Architecture"]
            S3Bronze["S3 Bronze Layer<br/>Raw Data"]
            S3Silver["S3 Silver Layer<br/>Cleaned Data"]
            S3Gold["S3 Gold Layer<br/>Features"]
        end

        subgraph Events["Event Orchestration"]
            EventBridge["AWS EventBridge<br/>Rules Engine"]
        end

        subgraph Serverless["Serverless Processing"]
            LambdaConsolidate["Lambda:<br/>Consolidate"]
            LambdaInference["Lambda:<br/>Inference"]
            LambdaEmail["Lambda:<br/>Email Alert"]
            LambdaDashboard["Lambda:<br/>Dashboard"]
        end

        subgraph MLOps["ML Operations"]
            EC2API["Flask API<br/>EC2 Docker"]
            SageMaker["SageMaker<br/>Training"]
            MLflow["MLflow<br/>Tracking"]
            Models["S3 Models &<br/>Artifacts"]
        end

        subgraph Presentation["Visualization & Alerts"]
            Dashboard["Streamlit<br/>Dashboard"]
            Tableau["Tableau<br/>Embedded"]
            SendGrid["SendGrid<br/>Alerts"]
        end
    end

    CI_CD["GitHub Actions<br/>CI/CD"]

    %% Data Flow
    Sources -->|Raw Data| S3Bronze
    S3Bronze -->|Trigger| EventBridge
    EventBridge -->|Process| LambdaConsolidate
    LambdaConsolidate -->|Store| S3Silver

    S3Silver -->|Trigger| EventBridge
    EventBridge -->|Execute| LambdaInference
    LambdaInference -->|Invoke| EC2API
    EC2API -->|Predict| S3Gold

    EC2API -->|Detect Drift| MLflow
    MLflow -->|Trigger| SageMaker
    SageMaker -->|Update| Models
    EC2API -->|Load| Models

    S3Gold -->|Monitor| EventBridge
    EventBridge -->|Alert| LambdaEmail
    LambdaEmail -->|Send| SendGrid

    EventBridge -->|Sync| LambdaDashboard
    LambdaDashboard -->|Update| Dashboard
    Dashboard -->|Display| Tableau

    CI_CD -->|Deploy| EC2API
    CI_CD -->|Deploy| LambdaConsolidate
    CI_CD -->|Deploy| LambdaInference
    CI_CD -->|Deploy| LambdaEmail
    CI_CD -->|Deploy| LambdaDashboard
    CI_CD -->|Build| SageMaker

Event-driven data pipeline with serverless processing, ML inference, and real-time monitoring dashboards.

Low-Level Architecture

Detailed component interactions showing the complete MLOps workflow with training infrastructure.

flowchart TB

    Sources["🔬 Lab Instruments &<br/>Workflow Systems"]

    subgraph AWS["AWS Event-Driven Lakehouse MLOps Platform"]

        subgraph DataLake["📊 Data Lake"]
            Bronze["S3 Bronze<br/>Raw Data<br/>Ingestion"]
            Silver["S3 Silver<br/>Cleaned &<br/>Transformed"]
            Gold["S3 Gold<br/>Feature Store<br/>Ready for ML"]
            DashboardData["S3 Gold/<br/>dashboard_data"]
        end

        subgraph EventOrch["⚙️ Event Orchestration"]
            EventBridge["AWS EventBridge<br/>Rule Engine"]
        end

        subgraph ETL["🔄 ETL Lambdas"]
            ConsolidateLambda["Lambda:<br/>Consolidate Dataset<br/>Bronze→Silver"]
            GenDataLambda["Lambda:<br/>Generate Datasets"]
        end

        subgraph Inference["🎯 Inference Layer"]
            InferenceLambda["Lambda:<br/>Run Inference<br/>Trigger"]
            FlaskAPI["Flask API<br/>on EC2 Docker<br/>REST Endpoints"]
            Preprocessor["Model<br/>Preprocessor"]
            Model["LightGBM<br/>Final Model<br/>ROC-AUC 0.977 (holdout)"]
        end

        subgraph Training["🚀 Training Infrastructure"]
            DriftDetect["Drift Detection<br/>Module"]
            SageMaker["SageMaker<br/>Training Job<br/>Docker Container"]
            TrainScript["retraining.py<br/>Training Logic"]
        end

        subgraph MLGov["📋 ML Governance"]
            MLflow["MLflow<br/>Tracking Server"]
            RDS[("RDS SQL<br/>Tracking DB")]
            MLArtifacts["S3 ml/<br/>mlflow_artifacts"]
            Models["S3 ml/<br/>models &<br/>preprocessors"]
        end

        subgraph Notifications["📧 Alerts"]
            EmailLambda["Lambda:<br/>Send Risk Email"]
            SendGrid["SendGrid<br/>Email Service"]
        end

        subgraph Dashboard["📈 Visualization"]
            DashboardLambda["Lambda:<br/>Dashboard Sync"]
            ElasticBeanstalk["Elastic Beanstalk<br/>Host"]
            Streamlit["Streamlit<br/>Dashboard"]
            Tableau["Embedded Tableau<br/>Analytics"]
        end

        subgraph CICD["🔧 DevOps"]
            GitHub["GitHub Actions<br/>CI/CD"]
            ECR["ECR<br/>Repositories"]
        end
    end

    %% Data Ingestion Flow
    Sources -->|Raw Data| Bronze
    Bronze -->|Trigger Event| EventBridge
    EventBridge -->|Process| ConsolidateLambda
    ConsolidateLambda -->|Store| Silver

    %% Inference Flow
    Silver -->|Trigger Event| EventBridge
    EventBridge -->|Invoke| InferenceLambda
    InferenceLambda -->|Call API| FlaskAPI
    Models -->|Load| Preprocessor
    Models -->|Load| Model
    Preprocessor -->|Transform| FlaskAPI
    FlaskAPI -->|Predict| Model
    FlaskAPI -->|Store Predictions| Gold

    %% Drift & Retraining
    FlaskAPI -->|Monitor| DriftDetect
    DriftDetect -->|Detected| SageMaker
    SageMaker -->|Execute| TrainScript
    TrainScript -->|Log Metrics| MLflow
    MLflow -->|Track| RDS
    MLflow -->|Store| MLArtifacts
    TrainScript -->|Save| Models
    DriftDetect -->|Alert if Drift| SendGrid

    %% Alert Flow
    Gold -->|Trigger Event| EventBridge
    EventBridge -->|High Risk| EmailLambda
    EmailLambda -->|Send Alert| SendGrid

    %% Dashboard Flow
    Gold -->|Trigger Event| EventBridge
    EventBridge -->|Sync| DashboardLambda
    DashboardLambda -->|Update| DashboardData
    DashboardData -->|Load| Streamlit
    Streamlit -->|Display| Tableau
    ElasticBeanstalk -->|Host| Streamlit

    %% CI/CD
    GitHub -->|Build & Push| ECR
    GitHub -->|Deploy| ConsolidateLambda
    GitHub -->|Deploy| InferenceLambda
    GitHub -->|Deploy| EmailLambda
    GitHub -->|Deploy| DashboardLambda
    GitHub -->|Deploy| FlaskAPI
    GitHub -->|Deploy| ElasticBeanstalk
    ECR -->|Image| SageMaker

Key Components

Data Lake Layers (Medallion Architecture)

Bronze Layer: Raw data ingestion from lab instruments and workflow systems
Silver Layer: Cleaned, validated, and transformed data after quality checks
Gold Layer: Aggregated features and curated datasets ready for ML and analytics

Event Orchestration

AWS EventBridge: Decouples services and triggers workflows based on S3 events and custom rules

Serverless Data Processing (Lambda Functions)

Consolidate Dataset: ETL pipeline aggregating raw data from Bronze to Silver
Generate Datasets: Triggers dataset generation and feature engineering on schedule
Run Inference: Orchestrates model predictions on new data, triggers Flask API
Send Risk Email: Sends high-risk experiment alerts via SendGrid
Dashboard Sync: Synchronizes Gold layer data to dashboard updates

ML Inference Layer

Flask API (EC2 Docker): RESTful API for real-time risk predictions
Final Model (LightGBM): Selected production model (holdout ROC-AUC 0.977, PR-AUC 0.94)
Model Preprocessor: Standardized feature engineering and transformation

Model Training Infrastructure

SageMaker Training: Containerized training environment for model retraining
Training Script (retraining.py): Orchestrates model retraining with latest data
Drift Detection: Monitors data and model performance drift

ML Governance & Tracking

MLflow Tracking Server: Experiment tracking and artifact management
RDS SQL Server: Database for MLflow tracking metadata
S3 Artifact Storage: Stores models, preprocessors, and training artifacts

Notifications & Monitoring

SendGrid Email Service: Delivers alerts for high-risk experiments and drift notifications

Visualization & Analytics

Streamlit Dashboard: Interactive web application for monitoring and analytics
Embedded Tableau: Advanced analytics and business intelligence visualizations
Elastic Beanstalk: Managed hosting for dashboard applications

CI/CD & DevOps

GitHub Actions: Automated testing, building, and deployment workflows
ECR Repositories: Docker image registry for containerized services

CI/CD Pipeline Architecture

The project utilizes GitHub Actions for continuous integration and deployment, with automated testing and deployment triggered on push to the master branch.

flowchart LR
    subgraph Source["🔀 Source Control"]
        GitHub["GitHub<br/>master"]
    end
    
    subgraph Routing["🔀 Path-Based Routing"]
        PathFilter["Route on<br/>Changed Files"]
    end
    
    subgraph Workflows["⚙️ Workflows"]
        DeployLambdas["Lambdas<br/>deploy_lambdas.yml"]
        DeployMLAPI["ML API<br/>deploy_ml_api.yml"]
        DeployWebsite["Dashboard<br/>deploy_website.yml"]
        DeploySageMaker["SageMaker<br/>deploy_sagemaker_training.yml"]
    end
    
    subgraph AWS["☁️ AWS Deployment"]
        Lambda["Lambda<br/>Functions"]
        EC2["EC2<br/>Docker"]
        Beanstalk["Elastic<br/>Beanstalk"]
        ECRDeploy["ECR<br/>Repository"]
    end

    subgraph Monitoring["📊 Monitoring"]
        Health["Health<br/>Checks"]
        Logs["CloudWatch<br/>Logs"]
    end
    
    GitHub -->|Push| PathFilter
    
    PathFilter -->|Lambda/**| DeployLambdas
    PathFilter -->|API/**| DeployMLAPI
    PathFilter -->|Interface/**| DeployWebsite
    PathFilter -->|Training/**| DeploySageMaker
    
    DeployLambdas -->|Update| Lambda
    DeployMLAPI -->|Deploy| EC2
    DeployWebsite -->|Deploy| Beanstalk
    DeploySageMaker -->|Push| ECRDeploy
    
    Lambda -->|Monitor| Health
    EC2 -->|Monitor| Health
    Beanstalk -->|Monitor| Health
    Health -->|Log| Logs

CI/CD Workflow Configuration

All workflows trigger on push to master branch with path-specific filters to run only when relevant code changes:

deploy_lambdas.yml - Deploys serverless Lambda functions
- Trigger: Push to 06-Deployment/Lambda_functions/
- Steps: Checkout → Configure AWS → Login ECR → Build Docker → Push to ECR → Deploy Lambdas
- Functions: consolidate_dataset, run_inference, send_email_alert, dashboard_data
deploy_ml_api.yml - Deploys Flask ML API to EC2
- Trigger: Push to 06-Deployment/Inference_API/
- Steps: Checkout → Configure AWS → Login ECR → Build Docker → Push to ECR → Update EC2 Container
- Endpoint: http://endpoint:5000 (inference and retraining)
deploy_website.yml - Deploys Streamlit dashboard to Elastic Beanstalk
- Trigger: Push to 06-Deployment/Experiment_interface/
- Steps: Checkout → Configure AWS → Install EB CLI → Deploy to Elastic Beanstalk
- Interface: Interactive web dashboard for experiment monitoring
deploy_sagemaker_training.yml - Builds and pushes SageMaker training image
- Trigger: Push to 06-Deployment/Sagemaker_Training_Image/
- Steps: Checkout → Configure AWS → Login ECR → Build Docker → Tag → Push to ECR
- Usage: SageMaker uses this image for automated model retraining
delete_artifacts.yml - Manual cleanup of build artifacts
- Trigger: Manual workflow dispatch
- Steps: Require confirmation → Delete all artifacts from GitHub Actions

Deployment Strategy

Automated on push: All deployments run automatically when code is pushed to master with relevant path changes
Path-based filtering: Only components with code changes are rebuilt and deployed
AWS credentials: All workflows use GitHub Secrets for AWS authentication
Docker-based: Services are containerized for consistency across environments

Data Pipeline

Data Sources

Workflow Logs: Experiment execution details, timing, and status
Instrument Telemetry: Real-time instrument performance metrics
Reagent Logs: Reagent usage and availability tracking
Queue Logs: Laboratory queue management and wait times

Data Processing Flow

Ingestion: Raw data collected from lab systems into S3 Bronze layer
Validation & Cleaning: Data quality checks and basic transformations
Feature Engineering: Aggregation and feature creation for ML models
Storage: Processed data stored in optimized formats (Parquet)

Generated Dataset Features

Experiment metadata (type, priority, scientist, instrument)
Temporal features (start/end times, duration, delays)
Resource utilization metrics
Risk scores and predictions

Machine Learning Pipeline

Problem Statement

Predict the operational risk score for experiments, indicating likelihood of delay or failure.

Model Development

Exploratory Data Analysis

Univariate and multivariate analysis
Correlation analysis and feature importance
Time-series analysis of telemetry data
Queue congestion pattern identification

Model Selection

LightGBM for primary risk prediction (final selection)
XGBoost as an alternative gradient-boosting model
Ensemble methods for improved accuracy

Business Perspective — Why LightGBM

From a business and operational point of view we selected LightGBM as the production model because it best satisfies Roche's objectives of minimizing operational cost while remaining deployable and maintainable:

Cost-aware decisioning: LightGBM yielded the lowest expected operational misclassification cost at the optimized decision threshold, which directly translates to fewer missed risky experiments and lower overall operational losses.
High recall for risk detection: The model delivers strong recall on the risky class (reducing missed alerts), which aligns with the business priority of proactively catching at-risk experiments.
Stable, well-calibrated probabilities: Acceptable calibration and stable probability estimates make thresholds reliable for operational workflows and escalation playbooks.
Production efficiency & lower infra cost: Faster training and inference with lower memory footprint reduces compute and hosting costs (important for SageMaker jobs, EC2 containers, and Lambda-based orchestration).
Operational flexibility: The model adapts well to cost-sensitive threshold tuning, enabling product owners to change FP/FN trade-offs without retraining.
Explainability & governance: LightGBM integrates well with SHAP and MLflow for explainability and audit trails, supporting regulatory and stakeholder review.

Together, these business-aligned attributes—cost minimization, high-risk recall, operational efficiency, and explainability—make LightGBM the preferred production choice for Roche's experiment risk pipeline.

Feature Engineering

Temporal aggregations
Categorical encoding
Interaction features
Time-series derived metrics

Model Performance with optimized threshold (test set)

Metric	XGBoost	LightGBM
ROC-AUC	0.981	0.980
Precision	0.811	0.795
Recall	0.908	0.919
F1-Score	0.856	0.852

Model Monitoring

Drift Detection: Statistical tests for data drift
Performance Monitoring: Continuous evaluation metrics
Automated Retraining: Trigger-based model updates via SageMaker Training Image

Deliverables

The 07-Deliverables folder contains:

ML Final Model.ipynb: Complete final model implementation with:
- LightGBM model trained and optimized on full dataset (final production model)
- Feature importance analysis and visualization
- Model evaluation metrics and performance analysis
- Predictions with interpreted risk scores
- Documentation of model decisions and trade-offs

Deployment

Infrastructure as Code

AWS Services: Lambda, EC2, S3, RDS, EventBridge, Elastic Beanstalk, SageMaker
Containerization: Docker for API, dashboard, and training services
CI/CD: GitHub Actions for automated deployment

Production Components

1. Data Processing (Lambda Functions)

consolidate_dataset: ETL pipeline to consolidate and transform raw data from Bronze to Silver layer
generate_datasets: Triggers dataset generation and feature engineering workflows
run_inference: Orchestrates model inference on processed data, stores predictions in Gold layer
send_email_alert: Sends alerts via SendGrid for high-risk experiments
dashboard_data: Synchronizes processed data for visualization in dashboards

2. ML API (Flask on EC2)

RESTful API container deployed on EC2 providing:

Real-time risk predictions via /process endpoint
Model retraining triggers via /retraining endpoint with drift detection
Integration with MLflow for model versioning and governance
Automated drift detection and retraining orchestration

3. Dashboard Application

Streamlit-based web interface for:

Real-time experiment risk visualization
Historical trend analysis
Model performance monitoring
Alert management and investigation

4. Model Training Infrastructure

SageMaker Training Image provides:

Dockerized training environment for model retraining
Integration with processed datasets from S3
Artifact storage in MLflow and S3
Automated hyperparameter optimization

Scalability Considerations

Serverless architecture for automatic scaling
Event-driven processing for efficient resource utilization
Multi-layer caching for improved performance
Containerized services for easy horizontal scaling

Installation

Prerequisites

Python 3.12 or higher
AWS CLI configured with appropriate credentials
Docker and Docker Compose (for local deployment)
Git

Local Development Setup

Clone the repository:

git clone <repository-url>
cd "Capstone Project"

Create and activate virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Configure environment variables (if needed):

cp .env.example .env
# Edit .env with your AWS and SendGrid configuration

Data Generation Setup

Run the data generation scripts in sequence from 03-Data directory:

cd 03-Data
python 01_generate_workflow_logs.py
python 02_generate_instrument_telemetry.py
python 03_generate_reagent_logs.py
python 04_generate_queue_logs.py
python 05_dataset_generator.py

This generates synthetic lab data simulating real-world experiment workflows for model training and testing.

Usage

Reviewing the Final Model

For the complete final model implementation and results:

Open 07-Deliverables/ML Final Model.ipynb in Jupyter

Running EDA Analyses

Explore data patterns across different data sources:

cd 04-EDA
jupyter notebook EDA_workflow.ipynb        # Workflow log analysis
jupyter notebook EDA_telemetry.ipynb       # Instrument telemetry analysis
jupyter notebook EDA_reagent.ipynb         # Reagent usage patterns
jupyter notebook EDA_queue.ipynb           # Queue congestion analysis
jupyter notebook EDA_processed_file.ipynb  # Final processed feature analysis

Model Experimentation and Training

Review model development process:

cd 05-Experiment
jupyter notebook "ML Final Model.ipynb"   # Final production model
jupyter notebook "ML Model I.ipynb"       # Initial model iteration
jupyter notebook "ML Model II.ipynb"      # Improved model iteration
jupyter notebook "ML Model III.ipynb"     # Alternative model approaches

Starting the Inference API (Production)

Deploy the ML API for real-time predictions:

cd 06-Deployment/Inference_API
python app.py

The API will start on http://localhost:5000

Running the Dashboard Application

Start the interactive monitoring dashboard:

cd 06-Deployment/Experiment_interface
streamlit run app.py

The dashboard will open in your default browser

API Endpoints

POST /process: Submit experiment data and get risk prediction
- Input: Experiment features (workflow, telemetry, reagent, queue data)
- Output: Risk score and prediction confidence
POST /retraining: Triggers model retraining if drift is detected
- Input: Current dataset for drift evaluation
- Output: Retraining status and updated model metrics

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Guidelines

Follow PEP 8 style guidelines
Add tests for new features and Lambda functions
Update documentation for API changes and Lambda modifications
Ensure all tests pass before submitting PR
For SageMaker training changes, test locally with Docker first
Update README with any new deployments or configurations

Testing

# Run unit tests
pytest tests/

# Test Docker containers locally
docker build -t experiment-api:latest 06-Deployment/Inference_API/
docker run -p 5000:5000 experiment-api:latest

Recent Updates (March, 2026)
✅ Finalized ML model with LightGBM (holdout ROC-AUC: 0.977, PR-AUC: 0.94, Recall: 0.911)
✅ Completed comprehensive EDA across all data sources (workflow, telemetry, reagent, queue)
✅ Implemented SageMaker Training Image for automated model retraining
✅ Enhanced Lambda functions for complete ETL pipeline
✅ Added detailed feature importance analysis and interpretation
✅ Project documentation and deliverables finalized

Known Issues and Future Enhancements

Model retraining currently requires manual trigger via API endpoint (future: fully automated via SageMaker schedules)
Data generation is synthetic (future: integrate with actual lab systems)
Dashboard currently supports single Tableau instance (future: multi-tenant support)

License

This project is licensed under the MIT License - see the LICENSE file for details.

This capstone project demonstrates the application of MLOps principles to solve real-world laboratory management challenges through predictive analytics and automated monitoring systems.

Name		Name	Last commit message	Last commit date
Latest commit History 192 Commits
.github/workflows		.github/workflows
01-Documents		01-Documents
02-Architecture		02-Architecture
03-Data		03-Data
04-EDA		04-EDA
05-Experiment		05-Experiment
06-Deployment		06-Deployment
07-Deliverables		07-Deliverables
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Experiment Tracking MLOps Pipeline

Capstone Project - MBDS Term 3

Table of Contents

Overview

Key Features

Business Value

Project Structure

Architecture

High-Level Architecture

Low-Level Architecture

Key Components

Data Lake Layers (Medallion Architecture)

Event Orchestration

Serverless Data Processing (Lambda Functions)

ML Inference Layer

Model Training Infrastructure

ML Governance & Tracking

Notifications & Monitoring

Visualization & Analytics

CI/CD & DevOps

CI/CD Pipeline Architecture

CI/CD Workflow Configuration

Deployment Strategy

Data Pipeline

Data Sources

Data Processing Flow

Generated Dataset Features

Machine Learning Pipeline

Problem Statement

Model Development

Exploratory Data Analysis

Model Selection

Business Perspective — Why LightGBM

Feature Engineering

Model Performance with optimized threshold (test set)

Model Monitoring

Deliverables

Deployment

Infrastructure as Code

Production Components

1. Data Processing (Lambda Functions)

2. ML API (Flask on EC2)

3. Dashboard Application

4. Model Training Infrastructure

Scalability Considerations

Installation

Prerequisites

Local Development Setup

Data Generation Setup

Usage

Reviewing the Final Model

Running EDA Analyses

Model Experimentation and Training

Starting the Inference API (Production)

Running the Dashboard Application

API Endpoints

Contributing

Development Guidelines

Testing

Recent Updates (March, 2026)

Known Issues and Future Enhancements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages