**Title:** HPLC/GC Performance Data Analytics for Quality Control Intelligence

**Author:** Christopher Edozie Sunday     
**Tools Used:** Excel, SQL (SQLite), Python, Jupyter Notebook & Tableau    
**Domain:** Analytical Chemistry / Data Analytics  
**Date:** 21 st December 2025 

---
#### Business Task Statement:
Analytical laboratories rely on HPLC and GC instruments to generate data that directly impacts product quality, regulatory compliance, and operational efficiency. However, chromatographic QC data often remains under-utilized, fragmented across spreadsheets, databases, and instrument software.
This project demonstrates how laboratory QC data can be transformed into actionable quality intelligence, enabling early detection of instrument drift, calibration instability, and process anomalies before they compromise results.

#### Primary Business Goal:
Enable data-driven quality decisions by converting chromatographic QC data into clear, traceable, and regulator-aligned performance insights.

#### Scope of Work:
This project delivers a complete, reproducible analytics pipeline that:

- Simulates realistic chromatographic QC datasets
- Structures data into a normalized relational database
- Computes regulatory-aligned QC metrics
- Applies statistical quality control (SQC) and trend analytics
- Visualizes results in interactive dashboards for decision-makers

#### Project Overview
This project demonstrates how HPLC/GC quality control data can be transformed into actionable insights using:

- Excel → Data simulation, initial data profiling, and validation
- SQLite + SQL (DBeaver) → Data modeling, normalization, querying
- Python (Pandas, NumPy, Scikit-Learn) → QC metrics, SQC, trend analysis
- Tableau → Interactive dashboards for QC monitoring and reporting

#### Project Objectives

- Clean and structure chromatographic QC datasets
- Design a 3NF relational schema suitable for chromatographic QC data
- Implement foreign keys and indexing, and demonstrate SQL joins
- Compute accuracy, precision, calibration, and system suitability metrics
- Detect instrument drift early, anomalies, and out-of-control conditions
- Provide a reproducible end-to-end analytics workflow in JupyterLab (DDL + queries)
- Produce narrative-driven Jupyter Notebooks explaining analytical intent and methodology
- Scientific visualization in Python for QC interpretation and decision support
- Build interactive Tableau dashboards for exploratory analysis
- Generate KPI-driven report in Power BI for management and quality review
- Enable audit-ready, traceable data analysis
- Bridge analytical chemistry and data analytics

This report will be divided into sections corresponding to phases of the data analysis life cycle - Ask, Prepare, Process, Analyse, Share, and Act.

#### ASK PHASE — Defining the Problem

**Problem Statement**

Laboratory QC data is often:
- Locked inside vendor software
- Reviewed manually and retrospectively
- Poorly structured for trend analysis

This limits early detection of:
- Calibration drift
- Instrument instability
- Method performance degradation

**Key Business Questions Addressed**

- Are HPLC/GC instruments operating consistently over time?
- Are calibration models stable and linear?
- Can early warning signals detect drift before failure?
- How can QC data be summarized clearly for decision-makers?

#### PREPARE PHASE — Data Sources & Rationale
**Data Source Description:**

Because real chromatographic QC data is often proprietary, this project uses a simulated but analytically realistic dataset designed to reflect:

- HPLC & GC retention behavior
- Peak area variability
- Calibration responses
- System suitability metrics

**Raw Dataset:**
hplc_gc_qc_data_raw.xlsx

**Key Fields:**

- Sample_ID
- Instrument_ID
- RetentionTime_min
- Peak_Area
- PeakWidth_min
- Concentration_mgL
- TrueValue_mgL
- Run_Date

This approach preserves real-world analytical behavior while ensuring reproducibility and data privacy.

#### PROCESS PHASE — Cleaning & Structuring

Data Cleaning & Validation (Excel)

**Why This Matters:**

QC statistics are only meaningful if the underlying data is valid.

**Key Checks Performed:**

- Missing values detection
- Outlier screening (not blind removal)
- Date standardization for time-series analysis

**Cleaned Output:**

hplc_gc_qc_data_cleaned.xlsx

This file serves as the single source of truth for all downstream analysis.

**SQL Data Modeling & Normalization**

A fully normalized SQLite schema (3NF) was designed to emulate real laboratory data infrastructure.

**Core Tables:**
- instruments
- samples
- sample_metrics
- calibrations
- system_suitability
- control_summary

**Why This Matters:**
- Eliminates redundancy
- Enables traceability
- Supports mant-to-one relationship
- Supports robust and scalable QC analytics

#### ANALYZE PHASE — Statistical & QC Analytics
Python-Driven QC Analytics

Using Pandas, NumPy, and Scikit-Learn, the project computes the following:

**(a) Sample-Level Metrics**
- Error (mg/L, %)
- Percent recovery
- Response factor
- Z-score outlier detection
- %RSD (precision)

**(b) Calibration Analytics**
- Slope, intercept, R²
- Response factor stability
- Linearity assessment

**(c) Statistical Process Control**
- Shewhart limits
- EWMA charts
- CUSUM charts
- Rolling mean, std, CV

**(d) System Suitability**
- Plate count
- Resolution
- Tailing factor

These analyses align conceptually with USP <621>, USP <1225>, and ICH Q2(R2) expectations.

**Key Analysis Categories:**
- Calibration Trend & Stability
- QC & Anomaly Detection
- Method Performance (Accuracy & Precision)
- Instrument & System Suitability

#### SHARE PHASE — Visualization & Communication
**Tableau Dashboards**

**Dashboard 1 — Calibration Performance Overview**
- Parity plots
- Accuracy heatmaps
- Response factor trends
- R² linearity indicators

**Dashboard 2 — Instrument Health Monitoring**
- Peak area control charts
- EWMA & CUSUM charts
- Rolling statistics

**Dashboard 3 — Method Performance**
- Precision distributions
- Accuracy (%Recovery)
- Outlier detection maps

**Audience Considerations**
| Audience        | Needs                          |
| --------------- | ------------------------------ |
| QC Managers     | Stability & compliance signals |
| Lab Supervisors | Instrument health indicators   |
| Data Teams      | Reproducibility & structure    |
| Recruiters      | Tool proficiency & clarity     |

#### ACT PHASE — Insights & Recommendations
**Key Findings**
- Calibration models remain linear but show early response factor drift
- EWMA and CUSUM detect subtle instability before control limit breaches
- Precision varies slightly by instrument, highlighting maintenance sensitivity
- System suitability trends flag potential column or injector degradation

**Actionable Recommendations**
- Adopt EWMA & CUSUM for routine QC review
- Monitor response factor trends, not just R²
- Standardize QC dashboards for monthly review
- Use rolling CV as an early warning KPI
- Centralize QC computations in Python for auditability

**Expected Impact**
- Earlier detection of analytical drift
- Reduced risk of invalid results
- Improved regulatory defensibility
- Faster QC decision-making
- Demonstrates scalable analytics maturity

**Deliverables**
| Deliverable          | Description                     |
| -------------------- | ------------------------------- |
| Cleaned Dataset      | Excel-validated QC data         |
| SQLite Database      | Fully normalized QC schema      |
| Python Notebooks     | Reproducible analytics pipeline |
| Tableau Dashboards   | Interactive QC monitoring       |
| GitHub Documentation | End-to-end project narrative    |

**Schedule Overview**
| Phase                      | Duration |
| -------------------------- | -------- |
| Data Simulation & Cleaning | Week 1   |
| SQL Modeling & Seeding     | Week 2   |
| Python Analytics           | Week 3   |
| Visualization & Reporting  | Week 4   |

Estimated Completion: 4 weeks (part-time, revision-inclusive)

**Final Takeaway**

This project demonstrates how laboratory QC data can evolve from static records into proactive quality intelligence using accessible analytics tools. It bridges analytical chemistry and data analytics, showcasing a skill set directly relevant to modern, data-driven scientific and industrial environments.