Skip to content

Raghul-M/MLOps_Learning_Notes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MLOps Mastery - Learning Notes

A structured learning path toward MLOps mastery, covering tools, platforms, and core concepts across the full ML lifecycle.


Concepts

Foundations

  • ML lifecycle stages (data, training, evaluation, deployment, monitoring)
  • Technical debt in ML systems
  • Reproducibility and experiment tracking
  • ML system design patterns
  • Offline vs online evaluation

Data Engineering for ML

  • Data versioning and lineage
  • Feature engineering pipelines
  • Feature stores (online vs offline)
  • Data validation and schema enforcement
  • Handling data drift and skew
  • ETL/ELT for ML workloads

Model Training & Experimentation

  • Experiment tracking and metadata management
  • Hyperparameter optimization strategies
  • Distributed training patterns
  • GPU/TPU resource management
  • Model registry and artifact management
  • Transfer learning and fine-tuning workflows

Model Serving & Deployment

  • Batch vs real-time inference
  • Model serialization formats (ONNX, TorchScript, SavedModel)
  • Blue/green and canary deployments for models
  • Shadow deployments and A/B testing
  • Model compression and quantization
  • Edge deployment considerations

CI/CD for ML

  • Continuous training pipelines
  • ML-specific testing (data tests, model tests, integration tests)
  • Automated retraining triggers
  • Pipeline orchestration patterns
  • Infrastructure as Code for ML

Monitoring & Observability

  • Data drift detection
  • Model performance degradation tracking
  • Prediction logging and auditing
  • Alerting strategies for ML systems
  • Feedback loops and ground truth collection

Governance & Reliability

  • Model explainability and interpretability
  • Fairness and bias auditing
  • Regulatory compliance (GDPR, model cards)
  • Cost optimization for ML infrastructure
  • Disaster recovery for ML systems

Tools & Platforms

Experiment Tracking & Model Registry

  • MLflow
  • Weights & Biases (W&B)
  • Neptune.ai
  • CometML

Data & Feature Management

  • DVC (Data Version Control)
  • Feast (Feature Store)
  • Great Expectations (Data Validation)
  • Delta Lake / Lakehouse formats
  • Apache Spark for ML

Pipeline Orchestration

  • Apache Airflow
  • Kubeflow Pipelines
  • Prefect
  • Dagster
  • ZenML
  • Metaflow

Model Serving & Inference

  • TensorFlow Serving
  • TorchServe
  • Triton Inference Server
  • BentoML
  • Seldon Core
  • Ray Serve
  • vLLM (LLM serving)

Containerization & Infrastructure

  • Docker
  • Kubernetes
  • Helm charts for ML workloads
  • Terraform / Pulumi
  • KServe

CI/CD & Automation

  • GitHub Actions for ML
  • GitLab CI/CD for ML
  • CML (Continuous Machine Learning)
  • DVC Pipelines

Monitoring & Observability

  • Evidently AI
  • Prometheus + Grafana
  • Arize AI
  • WhyLabs

Cloud ML Platforms

  • AWS SageMaker
  • Google Vertex AI
  • Azure Machine Learning
  • Databricks MLflow

LLMOps (Large Language Models)

  • LangChain / LangSmith
  • LlamaIndex
  • Prompt engineering and management
  • RAG (Retrieval-Augmented Generation) pipelines
  • Fine-tuning LLMs (LoRA, QLoRA)
  • LLM evaluation frameworks (RAGAS, DeepEval)
  • Guardrails and content filtering

Progress

Area Status
Foundations Not Started
Data Engineering for ML Not Started
Model Training & Experimentation Not Started
Model Serving & Deployment Not Started
CI/CD for ML Not Started
Monitoring & Observability Not Started
Governance & Reliability Not Started
Tools & Platforms Not Started
LLMOps Not Started

About

MaLOps learning concepts, tools, and practice notes

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors