
# Summary of Logging Methods in MLflow for Databricks

MLflow is an open-source platform used to manage the machine learning lifecycle, including experimentation, reproducibility, and deployment. In Databricks, MLflow integrates seamlessly with the platform to manage models and track experiments. Below is a summary of the key logging methods in MLflow used within Databricks.

## 1. **Logging Parameters**
Parameters are used to track hyperparameters, inputs, and configurations during model training.

### Method:
```python
import mlflow

# Log a parameter
mlflow.log_param("param_name", "param_value")
```

### Example:
```python
mlflow.log_param("learning_rate", 0.01)
mlflow.log_param("batch_size", 32)
```

### 1.1 **Log multiple parameters**
```python
params = {"batch_size": 64, "epochs": 10, "optimizer": "adam"}
mlflow.log_params(params)

```

## 2. **Logging Metrics**
Metrics are numerical values that represent the model’s performance, such as accuracy, loss, and others.

### Method:
```python
import mlflow

# Log single metric
mlflow.log_metric("accuracy", 0.85)

# Log metric at a specific step
mlflow.log_metric("loss", 0.12, step=1)

# Log multiple metrics
metrics = {"precision": 0.9, "recall": 0.82, "f1": 0.86}
mlflow.log_metrics(metrics)
```

## 3. **Logging Artifacts**
Artifacts can be files, directories, or other outputs generated during training (e.g., model weights, plots, or logs). These are useful for storing and reviewing outputs.

### Method:
```python
import mlflow

# Log an artifact (file or directory)
mlflow.log_artifact(local_path, artifact_path=None)
```

### Example:
```python
mlflow.log_artifact("model_output.png", "outputs")
mlflow.log_artifact("logs/training_log.txt")
```

## 4. **Logging Models**
MLflow allows logging and tracking of models during training. It supports various model formats, including scikit-learn, TensorFlow, PyTorch, and custom models.

### Method:
```python
import mlflow
import mlflow.sklearn

# Log a scikit-learn model
mlflow.sklearn.log_model(model, "model_name")
```

### Example:
```python
mlflow.sklearn.log_model(model, "random_forest_model")
```

## 5. **Starting and Ending Runs**
MLflow allows you to start a run to track an experiment and log metrics, parameters, and artifacts within that run. The run can be ended manually or automatically.

### Methods:
```python
# Start a new run
with mlflow.start_run():
    # Log parameters, metrics, and artifacts
    mlflow.log_param("epochs", 10)
    mlflow.log_metric("accuracy", 0.95)
    
# End a run (automatic when 'with' block is used)
```

### Example:
```python
with mlflow.start_run():
    mlflow.log_param("epochs", 20)
    mlflow.log_metric("accuracy", 0.92)
```

## 6. **Tracking Experiments**
In Databricks, MLflow organizes runs within experiments. You can set the active experiment and track multiple runs under different experiments.

### Method:
```python
import mlflow

# Set the active experiment
mlflow.set_experiment("experiment_name")

# Log parameters and metrics within the experiment
mlflow.start_run()
mlflow.log_param("batch_size", 64)
mlflow.log_metric("accuracy", 0.97)
mlflow.end_run()
```

### Example:
```python
mlflow.set_experiment("/Users/your_username/experiment_1")
```

## 7. **Using MLflow with Databricks Notebooks**
In Databricks notebooks, you can use MLflow to automatically track the experiments associated with a particular notebook execution. Databricks integrates MLflow tracking UI directly into the workspace.

### Key Points:
- Use `mlflow.log_*` functions to log parameters, metrics, artifacts, and models.
- The **MLflow UI** in Databricks allows you to visualize and compare different runs in an experiment.
- Databricks users can leverage the `MLflow` tracking capabilities directly in the workspace without additional setup.


## 8. **Model Signature Logging**
```python
from mlflow.models.signature import infer_signature

# Infer model signature from data
signature = infer_signature(X_train, model.predict(X_train))

# Log model with signature
mlflow.sklearn.log_model(model, "model", signature=signature)
```


## 9. **Tags and Notes**
```python
# Add tags to a run
mlflow.set_tag("model_type", "classification")
mlflow.set_tags({"priority": "high", "team": "data_science"})

# Add a note to the run
mlflow.set_tag("mlflow.note.content", "This run uses the improved feature engineering pipeline")
```


## Conclusion
MLflow provides essential tools for logging parameters, metrics, artifacts, and models, and integrates seamlessly into Databricks for a streamlined machine learning workflow. The ability to track and compare different experiments allows data scientists and engineers to optimize and manage their models efficiently.

## Q. how to Set run descriptions?
> Add detailed descriptions to document experiment purpose

Using set_tag with the description key. <br>
The most common way to add a run description is by using the `mlflow.note.content tag`:

```python
import mlflow

with mlflow.start_run() as run:
    # Set a detailed description for the run
    mlflow.set_tag("mlflow.note.content", """
    This experiment tests the impact of feature engineering on model performance.
    We're comparing three different approaches:
    1. Raw features only
    2. Engineered features (polynomial features)
    3. PCA-reduced features
    
    Expected outcome: Approach #2 should perform best on test data
    while maintaining interpretability.
    """)
    
    # Rest of your experiment code
    mlflow.log_param("learning_rate", 0.01)
    # etc.
```
