# Complete Guide to Tracking Your Machine Learning Experiments With MLFlow and DagsHub
## Create reproducible and flexible ML projects
![](images/pexels.jpg)
<figcaption style="text-align: center;">
    <strong>
        Photo by 
        <a href='https://images.pexels.com/photos/2280571/pexels-photo-2280571.jpeg?auto=compress&cs=tinysrgb&dpr=2&h=650&w=940'>Chokniti Khongchum
        </a>
        on 
        <a href='https://images.pexels.com/photos/2280571/pexels-photo-2280571.jpeg?auto=compress&cs=tinysrgb&dpr=2&h=650&w=940'>Pexels.</a>
    </strong>
</figcaption>

### Introduction

Do you ever get sick of it when people keep saying data scientists spend 80% of their time cleaning data? While it is true, another part of the machine learning workflow deserves as much attention. That's machine learning experimentation. 

### What does it mean to track machine learning experiments?

In any machine learning project, it takes a long time and effort to build a model that meets your expectations. 

### Setup

Even though this article can be read on its own, it is part two of the project I've been doing using the Pet Pawpularity competition data from Kaggle. The aim of the project is to predict a pet's cuteness score given its image and metadata.

In [part one](https://towardsdatascience.com/open-source-ml-project-with-dagshub-improve-pet-adoption-with-machine-learning-1-e9403f8f7711?source=your_stories_page----------------------------------------), I explained my approach to solving the problem, the tools I've used and took a first look into the images and their metadata. I suggest you read the EDA section to familiarize yourself. 

You can also run the following commands to download the repo into your environment and set up the dependencies. 

```bash
git clone https://github.com/BexTuychiev/pet_pawpularity.git
cd pet_pawpularity
pip install -r requirements.txt

dvc pull
```

Note that the data files are more than 1GB, so the last `dvc pull` command takes a while to complete. Also, there are many dependencies to install so you may want to avoid that.

In that case, I recommend you to install only these libraries and download the metadata CSV of the images:

```bash
pip install mlflow dagshub pandas scikit-learn xgboost
wget https://dagshub.com/BexTuychiev/pet_pawpularity/raw/8c799cd3c985087f31da10bf3207cf701a0790fa/data/raw/train.csv
```

### Logging experiments with Git and DagsHub

In [2]:
import dagshub


def log_to_git(params, metrics, metrics_path="metrics.csv", hparams_path="params.yaml"):
    """
    A function to log experiments to git using dagshub logger.
    """
    # Create a logger
    logger = dagshub.dagshub_logger(
        metrics_path=metrics_path, hparams_path=hparams_path
    )

    # Log
    with logger as logger:
        logger.log_hyperparams(params)
        logger.log_metrics(metrics)

### Basics of logging experiments with MLFlow

In [3]:
import os

os.environ["MLFLOW_TRACKING_USERNAME"] = "MLFLOW_TRACKING_USERNAME"
os.environ["MLFLOW_TRACKING_PASSWORD"] = "MLFLOW_TRACKING_PASSWORD"

```bash
setx MLFLOW_TRACKING_USERNAME "your_dagshub_username"
setx MLFLOW_TRACKING_PASSWORD "your_dagshub_password"
```

In [4]:
import mlflow

mlflow.set_tracking_uri("https://dagshub.com/BexTuychiev/pet_pawpularity.mlflow")

```python
with mlflow.start_run():
    mlflow.log_params({...})
    mlflow.log_metrics({...})
```

### Deep dive into MLFlow workflow

### Analyzing experiment results with DagsHub

### Conclusion