## MLOps
MLOps (Machine Learning Operations) is a set of practices that combines Machine Learning, DevOps, and Data Engineering to deploy, monitor, and manage ML models in production reliably and efficiently.

### 🔑 What is MLOps?  
MLOps stands for Machine Learning Operations. It aims to:

Automate and streamline the ML lifecycle.

Ensure reproducibility, scalability, and monitoring of ML models.

Bridge the gap between data science and IT operations teams.

### ML Lifecycle Stages
MLOps supports the full ML lifecycle:
| Stage                | Description                                |
| -------------------- | ------------------------------------------ |
| **Data Collection**  | Collecting and storing raw data            |
| **Data Processing**  | Cleaning, transforming, and preparing data |
| **Model Training**   | Selecting algorithms and tuning parameters |
| **Model Evaluation** | Validating model performance               |
| **Model Deployment** | Serving the model in production            |
| **Monitoring**       | Observing performance, drift, and failures |
| **Retraining**       | Updating the model with new data           |

### ⚙️ Core Practices in MLOps
1. Versioning

        Data Versioning (e.g., DVC)

        Model Versioning (e.g., MLflow, Weights & Biases)

        Code Versioning (e.g., Git)

2. Continuous Integration / Continuous Deployment (CI/CD)

        Test and validate ML code and pipelines automatically

        Tools: GitHub Actions, Jenkins, GitLab CI, Azure Pipelines

3. Experiment Tracking

        Log metrics, parameters, and model artifacts

        Tools: MLflow, Weights & Biases, Neptune.ai

4. Model Registry

        Store and manage models ready for production

        Tools: MLflow Model Registry, Sagemaker Model Registry

5. Model Deployment

        Batch, real-time (REST API), or edge deployment

        Tools: Docker, Kubernetes, Seldon, BentoML, SageMaker

6. Monitoring and Logging

        Detect data drift, concept drift, performance degradation

        Tools: Evidently, Prometheus, Grafana, Sentry

7. Data and Model Governance

        Ensuring compliance, auditability, and security

        Access controls, logging, lineage tracking


### Popular ML tools
| Category               | Tools                                    |
| ---------------------- | ---------------------------------------- |
| Workflow Orchestration | Kubeflow, Airflow, MLflow, Metaflow      |
| Experiment Tracking    | MLflow, Weights & Biases, Neptune.ai     |
| Deployment             | Docker, Kubernetes, Seldon, BentoML      |
| Monitoring             | Prometheus, Grafana, Evidently, Arize AI |
| CI/CD                  | GitHub Actions, Jenkins, GitLab CI       |
| Model Registry         | MLflow, SageMaker, Tecton                |

### MLOps architecture (Typical)
                        ┌───────────────────┐
                        │   Data Sources    │
                        └────────┬──────────┘
                                 ▼
                        ┌───────────────────┐
                        │ Data Engineering  │ ← Data pipelines (ETL, feature store)
                        └────────┬──────────┘
                                 ▼
                    ┌─────────────────────────┐
                    │ Training Environment     │ ← Model Dev, Experiments
                    └────────┬────────────────┘
                             ▼
                    ┌─────────────────────────┐
                    │ Model Registry          │ ← Versioning & Approvals
                    └────────┬────────────────┘
                             ▼
                    ┌─────────────────────────┐
                    │ Deployment (API, Batch) │ ← Serve models
                    └────────┬────────────────┘
                             ▼
                    ┌─────────────────────────┐
                    │ Monitoring & Logging    │ ← Alerts, Drift Detection
                    └─────────────────────────┘

![image.png](attachment:image.png)

### Skills Needed for MLOps(in general)  
ML/DS skills (modeling, evaluation)

DevOps skills (CI/CD, containers, cloud)

Data engineering (ETL, databases, feature engineering)

Software engineering (APIs, testing, version control)

Monitoring & automation

### 📈 Why MLOps Matters  
Without MLOps:  

Models are hard to reproduce.

Deployment is manual and error-prone.

Models in production become stale or unreliable.

Teams struggle to scale ML efforts.

With MLOps:  

Faster experimentation → deployment.

Better collaboration across teams.

Models are observable, maintainable, scalable.

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

![image-3.png](attachment:image-3.png)

![image-4.png](attachment:image-4.png)

![image-5.png](attachment:image-5.png)

![image-6.png](attachment:image-6.png)

![image-7.png](attachment:image-7.png)


![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

![image-3.png](attachment:image-3.png)

![image-4.png](attachment:image-4.png)

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)



### Developing and deploying environments may be different

![image.png](attachment:image.png)

### Deploying a model can be challenging due to different runtime environments. So to solve this problem, we can use the concept of container. Container is like a box that contain computer program along with all the required tools, settings, etc required for it to run.

![image-2.png](attachment:image-2.png)



### Machine learning deployment architecture
![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

![image-3.png](attachment:image-3.png)

![image-4.png](attachment:image-4.png)

### What is an API?
API stands for Application Programming Interface.

It is a set of rules and tools that allows one piece of software to communicate with another — like a contract between systems.

### 🔹 Real-Life Analogy:
Think of an API like a restaurant menu:

The menu (API) tells you what dishes (functions) are available.

You don’t need to know how the kitchen (backend) works.

You just place an order (make a request), and the food (response) is served.

### 🔹 Types of APIs:
Web APIs – used for communication over the internet (e.g., REST APIs like Twitter, Google Maps).

Library APIs – functions exposed by a library/module (e.g., NumPy, pandas).

Hardware APIs – interfaces for controlling devices (e.g., camera, printer).

### 🔹 Common API Terms:
Endpoint: A specific URL where a service is available (e.g., /users/123)

Request: The message sent to the API (usually HTTP)

Response: The result returned by the API (often in JSON format)

HTTP Methods:

GET – Retrieve data

POST – Send data

PUT – Update data

DELETE – Remove data

## CI/CD Pipeline
![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

### 1. Basic (Direct or "All-at-once") Deployment
#### 🔹 What it is:
The old version is completely replaced by the new one at once.

#### ✅ Pros:
Simple to implement.

Quick rollout.

#### ❌ Cons:
High risk — if something goes wrong, the whole system is affected.

Rollback might be slow or tricky.

#### 👉 Use when:
The system is small.

You can afford downtime or have strong confidence in the new version.

### 2. Shadow Deployment
#### 🔹 What it is:
The new version runs in parallel to the current one, receiving a copy of real traffic, but its output is not exposed to users.

Used only for testing, monitoring, or logging.

#### ✅ Pros:
Zero risk to users.

Allows real-world testing without affecting production.

#### ❌ Cons:
Extra compute cost (you’re running two systems).

Must handle non-intrusive logging and response comparison carefully.

#### 👉 Use when:
You want to validate model behavior (e.g., ML models).

You’re testing performance/scalability before production rollout.

### 3. Canary Deployment
#### 🔹 What it is:
Deploy the new version to a small subset of users first.

Gradually increase rollout as confidence grows.

#### ✅ Pros:
Lower risk — errors affect only a small segment.

Real user feedback.

Easy rollback in early stages.

#### ❌ Cons:
Slightly more complex setup.

Requires good monitoring and traffic routing.

#### 👉 Use when:
You want gradual rollout with safety nets.

Your user base is large or diverse.

![image-3.png](attachment:image-3.png)



### Automation and scaling in different phases of ML lifecycle
![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

![image-3.png](attachment:image-3.png)

### Monitoring a ML model
![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

![image-3.png](attachment:image-3.png)

### Retraning a ML model
Retraining uses new data to develop a fresh version of ML model

#### Data Drift
🔹 What is it?
Data drift occurs when the distribution of input data changes over time compared to the data used to train the model.

➡️ The model sees "different kinds of data" than it was trained on.

🔍 Example:
A fraud detection model trained on 2022 transaction data sees a new type of digital payment in 2025 that behaves differently.

#### Concept Drift
🔹 What is it?
Concept drift occurs when the relationship between input and output changes over time.

➡️ The meaning of data changes, so predictions become less accurate.

🔍 Example:
In spam detection, a model trained on certain spam patterns becomes outdated as spammers evolve their tactics.

| Term          | What Changes?               | Impact                               | Solution                  |
| ------------- | --------------------------- | ------------------------------------ | ------------------------- |
| Data Drift    | Input data distribution     | Features become unfamiliar           | Monitor & retrain         |
| Concept Drift | Relationship between X & Y  | Labels/predictions become unreliable | Retrain or redesign model |



![image.png](attachment:image.png)



### MLOps maturity
MLOps maturity refers to how advanced and reliable an organization’s machine learning operations are — from developing models to deploying, monitoring, and maintaining them at scale.

![image.png](attachment:image.png)

### There are different number of MLOps maturity levels, one of the common division is as shown below:

![image-2.png](attachment:image-2.png)

![image-3.png](attachment:image-3.png)

![image-4.png](attachment:image-4.png)

![image-5.png](attachment:image-5.png)



## Data and machine learning tools

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

![image-3.png](attachment:image-3.png)

![image-4.png](attachment:image-4.png)

### MLOps tools

![image-5.png](attachment:image-5.png)

![image-6.png](attachment:image-6.png)

![image-7.png](attachment:image-7.png)

![image-8.png](attachment:image-8.png)

![image-9.png](attachment:image-9.png)

![image-10.png](attachment:image-10.png)