![Image](https://mlflow.org/docs/latest/assets/images/tracking-metrics-ui-temp-ffc0da57b388076730e20207dbd7f9c4.png)

![Image](https://miro.medium.com/max/1400/1%2AWNn0RhcPt_UCDhwRUnNj4A.gif)

![Image](https://mlflow.org/docs/latest/assets/images/oss_registry_3_overview-daec63473b4d7bbf47c559600bf5c35d.png)


---

# MLflow for Experiment Tracking (Day 6 – MLOps)

## 1. Purpose of MLflow in MLOps

Machine learning development is inherently experimental. Teams iterate across:

* Data preprocessing strategies (IQR vs Z-score, scaling, imputation)
* Feature engineering choices
* Model families (Random Forest, SVM, Neural Networks)
* Hyperparameters

**MLflow** provides a centralized, auditable system to track these experiments by logging:

* Parameters
* Metrics
* Models
* Artifacts (plots, files, scripts)

This allows teams to objectively compare approaches and reproduce results.

---

## 2. MLflow vs DVC (Experiment Tracking)

DVC excels at **data versioning**, but MLflow is the more established standard for **experiment tracking**.

**Why MLflow is preferred for experiments**

* **Independence**: MLflow does not require tight coupling with Git workflows.
* **UI Quality**: Native UI enables fast, visual comparison of runs.
* **Collaboration**: Centralized tracking across ML and DL teams.

DVC and MLflow are complementary, not mutually exclusive.

---

## 3. Core Concepts

### Experiments

A high-level modeling strategy.
Examples:

* “Random Forest Approach”
* “Neural Network Approach”

### Runs

A single execution under an experiment with a specific configuration.
Examples:

* `n_estimators=10`
* `n_estimators=50`

Each run logs its own parameters, metrics, artifacts, and model snapshot.

---

## 4. Local Workflow and Setup

### Step 1: Installation and UI

```bash
pip install mlflow
mlflow ui
```

The UI starts at:

```
http://127.0.0.1:5000
```

---

### Step 2: Fix Tracking URI Artifact Bug

By default, MLflow may use a `file://` tracking URI, which causes failures when logging artifacts.

Force HTTP tracking:

```python
import mlflow
mlflow.set_tracking_uri("http://127.0.0.1:5000")
```

---

### Step 3: Manual Logging (Core Pattern)

Training code is wrapped in a run context.

```python
with mlflow.start_run():
    mlflow.log_param("n_estimators", 100)
    mlflow.log_metric("accuracy", 0.95)
    mlflow.sklearn.log_model(model, "random_forest_model")
    mlflow.log_artifact("confusion_matrix.png")
```

Logged items:

* Parameters → searchable and comparable
* Metrics → plotted across runs
* Models → versioned and reloadable
* Artifacts → stored with full provenance

---

## 5. Advanced Experiment Management

### Autologging

```python
mlflow.autolog()
```

Automatically captures:

* Parameters
* Metrics
* Models
* Training metadata

Supported across major ML libraries.

---

### Hyperparameter Tuning (Nested Runs)

When using `GridSearchCV` or similar tools:

* Use **nested runs**
* One **parent run**
* Multiple **child runs** (one per parameter combination)

This enables:

* Structured visualization
* Direct comparison of parameter effects in the UI

---

## 6. Model Registry

![Image](https://www.databricks.com/wp-content/uploads/2020/04/databricks-adds-access-control-to-mlflow-model-registry_01.jpg)

![Image](https://mlflow.org/docs/latest/assets/images/oss_registry_3_overview-daec63473b4d7bbf47c559600bf5c35d.png)

![Image](https://docs.databricks.com/aws/en/assets/images/stage_transition_prod-0029b892b3785e9cd8a28c6568191fe3.png)

The Model Registry manages the **model lifecycle**.

### Stages

1. **None / Development**
2. **Staging** (Testing)
3. **Production** (Live)
4. **Archived / Retired**

### Capabilities

* Multiple versions per model
* Controlled promotion between stages
* Rollback to previous versions
* Full lineage tracking

### Auditing Use Case

If a model decision must be explained months later (e.g., loan denial, recommendation ranking), teams can retrieve:

* The exact model version
* Parameters
* Training data references
* Metrics

This is critical for compliance and governance.

---

## 7. Local vs Remote Architecture

### Local Setup

* Metadata → `mlruns/`
* Artifacts → `mlartifacts/`
* Suitable for individual experimentation

### Remote Setup

* Centralized tracking server
* Shared access for teams
* Required for production workflows

Options:

* Self-hosted (EC2 + S3 + IAM)
* Managed platforms (e.g., DagsHub)

---

## 8. Remote Tracking with DagsHub

DagsHub provides a managed MLflow server without custom cloud setup.

### Steps

1. Connect GitHub repository to DagsHub
2. Install dependency:

```bash
pip install dagshub
```

3. Initialize in code:

```python
import dagshub
import mlflow

dagshub.init(repo_owner="username", repo_name="repo", mlflow=True)
mlflow.set_tracking_uri("https://dagshub.com/username/repo.mlflow")
```

All runs are now logged remotely and visible to the entire team.

---

## 9. Real-World Significance

MLflow is not just organizational tooling.

It enables:

* Reproducibility
* Accountability
* Regulatory auditing
* Long-term model governance

Archived models preserve historical decision logic, even years later.

---



![Image](https://mlflow.org/docs/latest/assets/images/tag-exp-run-relationship-fc898eccc4bb05fe59f41372ab5f6b50.svg)

![Image](https://dagshub.com/blog/content/images/2021/07/Experiment-Tracking-Comparison-1.png)

![Image](https://mlflow.org/docs/latest/assets/images/tracking-metrics-ui-temp-ffc0da57b388076730e20207dbd7f9c4.png)



---

# Hyperparameter Tuning with MLflow Nested Runs

## 1. Purpose

Hyperparameter tuning is often treated as a black-box optimization step that surfaces only a single “best” model. This approach loses critical information about *why* a model performed well.

MLflow transforms hyperparameter search into a **fully traceable experimentation process**, capturing:

* Every parameter combination evaluated
* The metric outcome of each trial
* The relationship between parameters and performance

This is essential for reproducibility, stakeholder reporting, and regulated environments.

---

## 2. Parent–Child Run Model (Nested Runs)

The professional pattern for hyperparameter tuning in MLflow is **Nested Runs**, which creates a hierarchical structure in the tracking UI.

### Run Hierarchy

* **Parent Run**

  * Represents the entire tuning job (e.g., full `GridSearchCV`)
  * Stores global context and the final selected model

* **Child Runs**

  * One run per parameter combination
  * Each child logs:

    * Exact hyperparameters
    * Resulting metrics
    * Optional artifacts

This structure mirrors how humans reason about experiments: one strategy, many controlled trials.

---

## 3. End-to-End Workflow

### Step 1: Define the Search Space

Create a parameter grid for the estimator (e.g., Random Forest).

### Step 2: Start the Parent Run

The full tuning process is wrapped in a single parent run.

```python
with mlflow.start_run(run_name="rf_grid_search") as parent_run:
    ...
```

---

### Step 3: Log Each Trial as a Child Run

After fitting `GridSearchCV`, iterate over `cv_results_` and log each trial using `nested=True`.

```python
for i in range(len(grid_search.cv_results_["params"])):
    with mlflow.start_run(nested=True):
        mlflow.log_params(grid_search.cv_results_["params"][i])
        mlflow.log_metric(
            "accuracy",
            grid_search.cv_results_["mean_test_score"][i]
        )
```

Key properties:

* `nested=True` links the run to the active parent
* Each run is independently comparable in the UI
* No metrics are overwritten or aggregated incorrectly

---

### Step 4: Log the Best Model to the Parent Run

Once the search completes, log the best-performing model once, at the parent level.

```python
mlflow.sklearn.log_model(
    grid_search.best_estimator_,
    artifact_path="best_model"
)
```

This ensures:

* One authoritative model artifact
* Clean separation between trials and final output
* Easy discovery for downstream workflows

---

## 4. Visualization and Analysis

### What the UI Enables

* Select all **child runs**
* Generate comparison plots:

  * `max_depth` vs accuracy
  * `n_estimators` vs precision
* Identify parameter interactions and diminishing returns

This replaces intuition-driven tuning with **evidence-backed reasoning**.

---

## 5. Practical Benefits

* Full experiment lineage
* Deterministic reproduction of any trial
* Clear justification for model selection
* Seamless transition into Model Registry workflows

This pattern scales cleanly from local experimentation to automated CI-driven hyperparameter searches.

---

## 6. Natural Continuation

Once the best model is logged at the parent level, it becomes a first-class candidate for:

* Registration
* Stage promotion (Staging → Production)
* Approval workflows
* Rollback safety



![Image](https://mlflow.org/docs/latest/assets/images/tracking-setup-overview-3d8cfd511355d9379328d69573763331.png)

![Image](https://miro.medium.com/v2/resize%3Afit%3A1400/1%2AQYRPcnLlW9MznPKTr7ZFfA.png)

![Image](https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2023/05/02/ML-12213-mlflow_architecture.png)



---

# S3-Backed MLflow Tracking Server with IAM

## 1. Architecture Overview

### Components

* **Tracking Server**: MLflow (stateless)
* **Metadata Store**: PostgreSQL (RDS)
* **Artifact Store**: Amazon S3
* **AuthN/AuthZ**: IAM Roles
* **Compute**: EC2 / ECS / EKS
* **Network**: VPC + Security Groups

### Trust Boundaries

* Public → Load Balancer (HTTPS)
* Private Subnet → MLflow Server
* IAM Role → S3 + RDS (least privilege)

---

## 2. High-Level Data Flow

1. Client authenticates via IAM credentials
2. MLflow logs metadata → PostgreSQL
3. Artifacts uploaded directly → S3
4. Model Registry metadata stored → PostgreSQL
5. Model binaries retrieved from S3 at inference time

---

## 3. AWS Resource Provisioning

### 3.1 S3 Bucket (Artifact Store)

```bash
aws s3api create-bucket \
  --bucket mlflow-artifacts-prod \
  --region us-east-1
```

Enable:

* Versioning
* Server-side encryption (SSE-S3 or SSE-KMS)
* Block public access (ALL)

---

### 3.2 IAM Policy (Least Privilege)

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::mlflow-artifacts-prod",
        "arn:aws:s3:::mlflow-artifacts-prod/*"
      ]
    }
  ]
}
```

Attach to:

* EC2 Instance Role
* ECS Task Role
* EKS IRSA Role

No static credentials allowed.

---

### 3.3 PostgreSQL (RDS)

Requirements:

* PostgreSQL ≥ 13
* Private subnet
* TLS enforced

Example database:

```
mlflow_db
```

---

## 4. MLflow Server Configuration

### 4.1 Python Environment

```bash
python==3.11
mlflow==2.11.3
psycopg2-binary==2.9.9
boto3==1.34.14
```

Pinned versions required for reproducibility.

---

### 4.2 Server Startup Command

```bash
mlflow server \
  --backend-store-uri postgresql://mlflow_user:password@rds-endpoint:5432/mlflow_db \
  --default-artifact-root s3://mlflow-artifacts-prod \
  --host 0.0.0.0 \
  --port 5000
```

Notes:

* MLflow automatically uses IAM credentials via AWS SDK
* No S3 keys stored or injected

---

## 5. Network Security

### Security Groups

* Allow inbound 5000 only from Load Balancer
* Allow outbound 443 (S3, STS)
* Allow outbound 5432 to RDS

### TLS

* Terminate TLS at ALB (TLS 1.3)
* Internal traffic remains private

---

## 6. Client Configuration

### Environment Variables (CI/CD or Dev)

```bash
export MLFLOW_TRACKING_URI=https://mlflow.company.com
export AWS_REGION=us-east-1
```

Authentication:

* IAM Role (preferred)
* OIDC (GitHub Actions)
* Temporary STS credentials only

---

## 7. Model Logging Example (S3-backed)

```python
import mlflow
import mlflow.sklearn

mlflow.set_tracking_uri("https://mlflow.company.com")
mlflow.set_experiment("credit-risk")

with mlflow.start_run():
    mlflow.log_param("n_estimators", 200)
    mlflow.log_metric("accuracy", 0.92)
    mlflow.sklearn.log_model(
        model,
        artifact_path="model",
        registered_model_name="credit_risk_model"
    )
```

Artifacts are written directly to S3.

---

## 8. CI/CD with IAM (GitHub Actions)

### OIDC-Based Auth (No Secrets)

```yaml
permissions:
  id-token: write
  contents: read
```

```yaml
- name: Configure AWS credentials
  uses: aws-actions/configure-aws-credentials@v4
  with:
    role-to-assume: arn:aws:iam::123456789012:role/mlflow-ci-role
    aws-region: us-east-1
```

This grants temporary credentials scoped to:

* S3 artifact access
* MLflow tracking

---

## 9. Observability

### Metrics

* ALB latency
* S3 request counts
* RDS connections

### Logs

* MLflow server logs → CloudWatch
* Access logs → ALB + S3

---

## 10. Backup and Recovery

* S3 versioning protects artifacts
* RDS automated backups (≥ 7 days)
* Cross-region S3 replication (optional)

---

## 11. Scalability Characteristics

* MLflow server is stateless → horizontal scaling
* S3 handles unlimited artifacts
* PostgreSQL scales vertically; read replicas optional
* Thousands of concurrent runs supported

---

## 12. Compliance and Audit Readiness

This setup provides:

* Full lineage tracking
* Immutable artifacts
* IAM-based access control
* Deterministic rollback via Model Registry

Suitable for:

* Financial services
* Healthcare
* Enterprise MLOps platforms

---






---

## 1. Absolute First Rule (Non-Optional)

```python
mlflow.set_tracking_uri("http://127.0.0.1:5000")
# or remote: https://<host>.mlflow
```

If this is wrong:

* Artifacts fail
* Registry fails
* CI/CD fails

---

## 2. Experiment Boundary (Always Set)

```python
mlflow.set_experiment("experiment_name")
```

Rule:

* One **idea / strategy** = one experiment
* Never mix approaches in the same experiment

---

## 3. Run Boundary (The Core Primitive)

```python
with mlflow.start_run():
    ...
```

What a run represents:

* One **decision attempt**
* One **parameter configuration**
* One **result**

No run = no accountability.

---

## 4. Parameters vs Metrics (Never Confuse These)

```python
mlflow.log_param("max_depth", 5)      # inputs (static)
mlflow.log_metric("accuracy", 0.91)   # outcomes (numeric, time-series)
```

Rules:

* Params = configuration
* Metrics = performance
* Never log metrics as params
* Never log params as metrics

---

## 5. Nested Runs (Hyperparameter Tuning)

### Parent Run (Intent)

```python
with mlflow.start_run(run_name="grid_search") as parent:
    ...
```

### Child Runs (Evidence)

```python
with mlflow.start_run(nested=True):
    mlflow.log_params(params)
    mlflow.log_metric("accuracy", score)
```

Rules:

* One grid search = one parent
* One parameter combo = one child
* `nested=True` is mandatory

---

## 6. Log the Best Model (Once, at Parent Level)

```python
mlflow.sklearn.log_model(
    best_model,
    artifact_path="model"
)
```

Rules:

* Never log *all* models to the registry
* Only the selected model is promoted
* Trials stay as metrics, not registry entries

---

## 7. Register Model (Lifecycle Starts Here)

```python
mlflow.sklearn.log_model(
    model,
    artifact_path="model",
    registered_model_name="model_name"
)
```

Rules:

* Registration ≠ Production
* Registration = governance entry point

---

## 8. Model Promotion (Controlled Action)

```python
client.transition_model_version_stage(
    name="model_name",
    version=version,
    stage="Staging",
    archive_existing_versions=True
)
```

Stages to remember **in order**:

1. None / Dev
2. Staging
3. Production
4. Archived

---

## 9. Load Model by Stage (Never by Path)

```python
mlflow.pyfunc.load_model(
    "models:/model_name/Production"
)
```

Rule:

* Code never changes for rollback
* Only metadata changes

---

## 10. Autologging (Use or Explicitly Avoid)

```python
mlflow.autolog()
```

Rule:

* Use for speed
* Disable when you need full control
* Never mix partial manual + autolog unintentionally

---

## 11. Artifact Logging (Evidence, Not Decoration)

```python
mlflow.log_artifact("confusion_matrix.png")
```

Artifacts should answer:

* How was performance measured?
* What data snapshot was used?
* What script produced this run?

---

## 12. S3 + IAM Mental Rule

You never do this:

```bash
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
```

You always do this:

* IAM Role
* OIDC
* Temporary credentials

MLflow must **inherit identity**, not store secrets.

---

## 13. CI/CD Invariant

CI must be able to:

* Train
* Evaluate
* Register
* Promote
* Roll back

**Without human judgment embedded in code.**

Thresholds are code.
Approval is metadata.

---

## Final Memory Compression (Write This on Paper)

```
Experiment = idea
Run = decision
Nested run = controlled trial
Registry = authority
Stage = trust level
S3 + IAM = immutability
CI/CD = consistency
```


