ML Testing Framework

Enterprise-Grade Machine Learning Testing Suite with Enterprise Patterns

🎯 Overview

Comprehensive ML Testing Framework specifically designed for cryptocurrency trading systems. Built with Enterprise Patterns for production-ready ML model validation, data quality assurance, and performance benchmarking.

🚀 Key Features

🔍 Model Validation: Comprehensive testing of ML models (classification, regression, time series)
📊 Data Quality Testing: Automated detection of data issues, drift, and anomalies
⚡ Performance Benchmarking: Latency, throughput, memory usage, and scalability testing
📈 Drift Detection: Statistical tests for data and concept drift detection
🎨 Advanced Visualization: Interactive dashboards and publication-ready reports
🔧 Test Automation: Comprehensive test harness with parallel execution
💰 Crypto-Specific: Specialized features for cryptocurrency trading scenarios

📁 Project Structure


ml-testing/
├── src/
│   ├── model_testing/           # Model validation and testing
│   │   ├── model_validator.py   # Core model validation
│   │   ├── performance_tester.py # Performance testing
│   │   └── regression_tester.py  # Regression testing
│   ├── data_testing/           # Data quality and validation
│   │   ├── data_quality.py     # Data quality validator
│   │   ├── feature_validator.py # Feature engineering validation
│   │   └── drift_detector.py   # Data drift detection
│   ├── benchmarking/          # Performance benchmarking
│   │   ├── benchmark_runner.py # Benchmark execution
│   │   └── metrics_calculator.py # Metrics calculation
│   ├── integration/           # Integration components
│   │   ├── test_harness.py    # Main test orchestrator
│   │   └── pytest_fixtures.py # pytest fixtures
│   ├── utils/                 # Utilities
│   │   ├── test_data_generator.py # Test data generation
│   │   ├── visualization.py   # Visualization utilities
│   │   └── index.ts          # TypeScript exports
│   └── __init__.py
├── tests/
│   └── test_framework.py      # Comprehensive test suite
├── package.json              # Node.js configuration
├── pyproject.toml           # Python configuration
└── README.md               # This file

🛠️ Installation

Prerequisites

Python 3.8+
Node.js 16+ (for TypeScript integration)
Poetry or pip for dependency management

Python Dependencies

# Using pip
pip install -r requirements.txt

# Or using poetry
poetry install

Node.js Dependencies

npm install
# or
yarn install

Key Dependencies

ML Libraries: scikit-learn, pandas, numpy
Deep Learning: torch, tensorflow (optional)
Statistical Testing: scipy, statsmodels
Visualization: matplotlib, seaborn, plotly
Async/Performance: asyncio, psutil
Testing: pytest, pytest-asyncio
MLflow: Experiment tracking and model management

🚀 Quick Start

1. Basic Model Validation

from src.model_testing.model_validator import create_crypto_trading_validator
from sklearn.ensemble import RandomForestClassifier
import pandas as pd

# Create validator
validator = create_crypto_trading_validator()

# Your model and data
model = RandomForestClassifier()
X = pd.DataFrame(...)  # Your features
y = pd.Series(...)     # Your targets

# Train model
model.fit(X, y)

# Validate
result = validator.validate_model(model, X, y)
print(f"Model is valid: {result.is_valid}")
print(f"Validation score: {result.validation_score:.3f}")

2. Data Quality Testing

from src.data_testing.data_quality import create_crypto_data_validator

# Create data validator
data_validator = create_crypto_data_validator()

# Validate your data
quality_result = data_validator.validate_data(your_dataframe)

print(f"Overall quality score: {quality_result.overall_quality_score:.3f}")
print(f"Issues found: {len(quality_result.quality_issues)}")

3. Performance Benchmarking

from src.benchmarking.benchmark_runner import create_crypto_benchmark_runner

# Create benchmark runner
benchmark_runner = create_crypto_benchmark_runner()

# Run benchmark
benchmark_result = benchmark_runner.run_benchmark(model, test_data)

print(f"Average latency: {benchmark_result.latency_metrics['mean_ms']:.2f}ms")
print(f"Throughput: {benchmark_result.throughput_samples_per_sec:.0f} samples/sec")

4. Complete Testing Suite

import asyncio
from src.integration.test_harness import create_crypto_trading_harness

async def run_complete_tests():
    # Create test harness
    harness = create_crypto_trading_harness(
        environment="staging",
        risk_tolerance="conservative"
    )

    # Run complete test suite
    results = await harness.run_test_suite(
        model=your_model,
        data=your_data,
        target=your_target
    )

    print(f"Tests completed: {results.passed_count}/{len(results.test_results)}")
    return results

# Run tests
results = asyncio.run(run_complete_tests())

🔬 Advanced Features

Data Drift Detection

from src.data_testing.drift_detector import create_crypto_drift_detector

drift_detector = create_crypto_drift_detector()

# Compare reference data with current data
drift_result = drift_detector.detect_drift(
    reference_data=historical_data,
    current_data=new_data
)

if drift_result.drift_detected:
    print(f"⚠️ Data drift detected! Score: {drift_result.drift_score:.3f}")
    print(f"Features with drift: {list(drift_result.feature_drifts.keys())}")

Crypto Market Data Generation

from src.utils.test_data_generator import create_crypto_trading_data_generator
from src.utils.test_data_generator import MarketCondition

# Create data generator
generator = create_crypto_trading_data_generator()

# Generate crypto price data for different market conditions
bull_market_data = generator.generate_crypto_price_data(
    market_condition=MarketCondition.BULL_MARKET,
    symbol="BTCUSDT"
)

volatile_data = generator.generate_crypto_price_data(
    market_condition=MarketCondition.HIGH_VOLATILITY,
    symbol="ETHUSDT"
)

Interactive Visualization

from src.utils.visualization import create_crypto_trading_visualizer

# Create visualizer
visualizer = create_crypto_trading_visualizer(
    interactive=True,
    enterprise_theme=True
)

# Create performance dashboard
dashboard_path = visualizer.create_ml_testing_dashboard(
    test_results=your_test_results,
    title="Crypto Trading ML Dashboard"
)

print(f"Dashboard saved to: {dashboard_path}")

📊 Enterprise Patterns

This framework implements Enterprise Patterns for production ML systems:

1. Governance & Compliance

Automated quality gates and compliance checks
Audit logging for all testing activities
Enterprise-grade reporting and documentation

2. Observability & Monitoring

Real-time performance monitoring
Distributed tracing integration
Comprehensive metrics collection

3. Reliability & Resilience

Circuit breaker patterns for external dependencies
Graceful degradation and error recovery
Automated rollback capabilities

4. Security & Risk Management

Input validation and sanitization
Secure credential management
Risk assessment and mitigation

5. Performance & Scalability

Parallel test execution
Resource optimization
Auto-scaling capabilities

🧪 Testing

Run All Tests

# Run comprehensive test suite
python -m pytest tests/test_framework.py -v

# Run with coverage
python -m pytest tests/ --cov=src --cov-report=html

# Run specific test categories
python -m pytest tests/test_framework.py::TestMLFrameworkIntegration::test_end_to_end_classification_workflow -v

Test Categories

Integration Tests: End-to-end workflow testing
Unit Tests: Individual component testing
Performance Tests: Benchmark and latency testing
Edge Case Tests: Error handling and recovery
Crypto-Specific Tests: Trading scenario validation

📈 Performance Metrics

The framework provides comprehensive metrics for ML model evaluation:

Classification Metrics

Accuracy, Precision, Recall, F1-Score
ROC AUC, PR AUC
Confusion Matrix Analysis
Class-specific metrics

Regression Metrics

MSE, MAE, RMSE, R²
Mean Absolute Percentage Error (MAPE)
Explained Variance Score
Custom crypto trading metrics

Performance Metrics

Prediction Latency (mean, p95, p99)
Throughput (samples per second)
Memory Usage (peak, average)
CPU Utilization

Business Metrics (Crypto Trading)

Profit Factor
Sharpe Ratio
Maximum Drawdown
Win Rate
Risk-Adjusted Returns

🔧 Configuration

Environment Variables

# MLflow Configuration
MLFLOW_TRACKING_URI=http://localhost:5000
MLFLOW_EXPERIMENT_NAME=crypto_ml_testing

# Testing Configuration
ML_TESTING_PARALLEL=true
ML_TESTING_WORKERS=4
ML_TESTING_TIMEOUT=3600

# Crypto Trading Configuration
TRADING_ENVIRONMENT=staging
RISK_TOLERANCE=conservative
MARKET_DATA_SOURCE=binance

Configuration File

# config.py
ML_TESTING_CONFIG = {
    "model_validation": {
        "strict_mode": True,
        "performance_threshold": 0.8,
        "drift_threshold": 0.1
    },
    "data_quality": {
        "missing_threshold": 0.05,
        "outlier_threshold": 0.02,
        "quality_score_min": 0.7
    },
    "benchmarking": {
        "latency_sla_ms": 100,
        "throughput_min": 1000,
        "memory_limit_mb": 512
    }
}

📚 API Reference

Core Classes

ModelValidator

class ModelValidator:
    def validate_model(self, model, data, target) -> ValidationResult
    def check_determinism(self, model, data) -> bool
    def detect_overfitting(self, model, train_data, val_data) -> OverfittingResult

DataQualityValidator

class DataQualityValidator:
    def validate_data(self, data, target=None) -> QualityResult
    def check_missing_values(self, data) -> MissingValueResult
    def detect_outliers(self, data) -> OutlierResult

BenchmarkRunner

class BenchmarkRunner:
    def run_benchmark(self, model, data) -> BenchmarkResult
    def test_latency(self, model, data) -> LatencyMetrics
    def test_throughput(self, model, data) -> ThroughputMetrics

Factory Functions

All components can be created using factory functions with crypto trading defaults:

# Model Testing
create_crypto_trading_validator()
create_crypto_performance_tester()
create_crypto_regression_tester()

# Data Testing
create_crypto_data_validator()
create_crypto_feature_validator()
create_crypto_drift_detector()

# Benchmarking
create_crypto_benchmark_runner()
create_crypto_metrics_calculator()

# Integration
create_crypto_trading_harness()
create_crypto_trading_data_generator()
create_crypto_trading_visualizer()

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Setup

# Clone the repository
git clone <repository-url>
cd ml-testing

# Install development dependencies
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

# Run tests
python -m pytest tests/ -v

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Enterprise Patterns for ML architecture guidance
scikit-learn community for ML testing best practices
Crypto trading community for domain-specific requirements
MLflow for experiment tracking integration

License

MIT

Support

For questions and support, please open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

License

KeepALifeUS/ml-testing

Folders and files

Latest commit

History

Repository files navigation

ML Testing Framework

🎯 Overview

🚀 Key Features

📁 Project Structure

🛠️ Installation

Prerequisites

Python Dependencies

Node.js Dependencies

Key Dependencies

🚀 Quick Start

1. Basic Model Validation

2. Data Quality Testing

3. Performance Benchmarking

4. Complete Testing Suite

🔬 Advanced Features

Data Drift Detection

Crypto Market Data Generation

Interactive Visualization

📊 Enterprise Patterns

1. Governance & Compliance

2. Observability & Monitoring

3. Reliability & Resilience

4. Security & Risk Management

5. Performance & Scalability

🧪 Testing

Run All Tests

Test Categories

📈 Performance Metrics

Classification Metrics

Regression Metrics

Performance Metrics

Business Metrics (Crypto Trading)

🔧 Configuration

Environment Variables

Configuration File

📚 API Reference

Core Classes

ModelValidator

DataQualityValidator

BenchmarkRunner

Factory Functions

🤝 Contributing

Development Setup

📝 License

🙏 Acknowledgments

License

Support

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages