# Organize ML runs

## Introduction

This guide will show you how to:

- Keep track of code, data, environment and parameters
- Log results like evaluation metrics and model files
- Find runs on the dashboard with tags
- Organize runs in a dashboard view and save it for later

## Setup

Install dependencies

In [1]:
! pip install neptune-client scikit-learn==0.24.2

Collecting neptune-client
  Downloading neptune-client-0.14.3.tar.gz (301 kB)
[K     |████████████████████████████████| 301 kB 19.6 MB/s 
[?25hCollecting scikit-learn==0.24.2
  Downloading scikit_learn-0.24.2-cp37-cp37m-manylinux2010_x86_64.whl (22.3 MB)
[K     |████████████████████████████████| 22.3 MB 3.3 MB/s 
Collecting bravado
  Downloading bravado-11.0.3-py2.py3-none-any.whl (38 kB)
Collecting future>=0.17.1
  Downloading future-0.18.2.tar.gz (829 kB)
[K     |████████████████████████████████| 829 kB 34.4 MB/s 
Collecting PyJWT
  Downloading PyJWT-2.3.0-py3-none-any.whl (16 kB)
Collecting websocket-client!=1.0.0,>=0.35.0
  Downloading websocket_client-1.2.3-py3-none-any.whl (53 kB)
[K     |████████████████████████████████| 53 kB 1.2 MB/s 
[?25hCollecting GitPython>=2.0.8
  Downloading GitPython-3.1.26-py3-none-any.whl (180 kB)
[K     |████████████████████████████████| 180 kB 27.3 MB/s 
[?25hCollecting boto3>=1.16.0
  Downloading boto3-1.20.53-py3-none-any.whl (132 kB)
[K 

## Create a basic training script

In [2]:
from sklearn.datasets import load_wine
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score

data = load_wine()
X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, test_size=0.4, random_state=1234
)

params = {
    "n_estimators": 9,
    "max_depth": 5,
    "min_samples_leaf": 3,
    "min_samples_split": 2,
    "max_features": 3,
}
clf = RandomForestClassifier(**params)

clf.fit(X_train, y_train)
y_train_pred = clf.predict_proba(X_train)
y_test_pred = clf.predict_proba(X_test)

train_f1 = f1_score(y_train, y_train_pred.argmax(axis=1), average="macro")
test_f1 = f1_score(y_test, y_test_pred.argmax(axis=1), average="macro")
print(f"Train f1:{train_f1} | Test f1:{test_f1}")

Train f1:0.9895424836601308 | Test f1:0.9867446393762184


## Initialize Neptune and create new run

Connect your script to Neptune application and create new run.

In [3]:
import neptune.new as neptune

run = neptune.init(
    project="gho0thubun/org",
    api_token="eyJhcGlfYWRkcmVzcyI6Imh0dHBzOi8vYXBwLm5lcHR1bmUuYWkiLCJhcGlfdXJsIjoiaHR0cHM6Ly9hcHAubmVwdHVuZS5haSIsImFwaV9rZXkiOiJiNDJlYTdmOS1lZWExLTQ5ZTgtYWZmMy03MjU1ODdlNTQ3NDMifQ==",
)  # your credentials

#run = neptune.init(project="common/quickstarts", api_token="ANONYMOUS")

https://app.neptune.ai/gho0thubun/org/e/ORG-3


Info (NVML): Driver Not Loaded. GPU usage metrics may not be reported. For more information, see https://docs.neptune.ai/you-should-know/what-can-you-log-and-display#hardware-consumption


Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.


Click on the link above to open this run in Neptune.

For now it is empty but keep the tab with run open to see what happens next. 

**Few explanations**

In the above code You tell Neptune: 

* **who you are**: your Neptune API token `api_token` 
* **where you want to send your data**: your Neptune `project`.

At this point you have new run in Neptune. For now on you will use `run` to log metadata to it.

---

**Note**


Instead of logging data to the public project 'common/colab-test-run' as an anonymous user 'neptuner' you can log it to your own project.

To do that:

1. Get your [Neptune API token](https://docs-beta.neptune.ai/administration/security-and-privacy/how-to-find-and-set-neptune-api-token)
2. Pass the token to ``api_token`` argument of ``neptune.init()`` method: ``api_token=YOUR_API_TOKEN``
3. Pass your project to the ``project`` argument of the ``neptune.init()``.

For example:

```python
neptune.init(project='my_workspace/my_project', 
             api_token='MY_API_TOKEN')
```

## Save parameters

In [4]:
run["parameters"] = params

## Add tags to organize things

Pass a list of strings to the ``.append_tag`` method of the run object.

In [5]:
run["sys/tags"].add(["swrulez"])

## Add logging of train and evaluation metrics

In [6]:
run["train/f1"] = train_f1
run["test/f1"] = test_f1

Runs can be viewed as dictionary-like structures - **namespaces** - that you can define in your code. You can apply hierarchical structure to your metadata that will be reflected in the UI as well. Thanks to this you can easily organize your metadata in a way you feel is most convenient.

There is one special namespace: **system namespace**, denoted `sys`. You can use it to add name and tags to the run.

## Execute a few runs with different parameters

Let's execute some runs with different model configuration.

Change parameters in the ``params`` dictionary of the **Step 1: Create a basic training script**

```python

    params = {'n_estimators': 10,
              'max_depth': 3,
              'min_samples_leaf': 1,
              'min_samples_split': 2,
              'max_features': 3,
              }
``` 

Run all the cells, log things to Neptune.

In [7]:
 params = {'n_estimators': 10,
              'max_depth': 4,
              'min_samples_leaf': 1,
              'min_samples_split': 2,
              'max_features': 3,
              }

## Stop logging

<font color=red>**Warning:**</font><br>
Once you are done logging, you should stop tracking the run using the `stop()` method.
This is needed only while logging from a notebook environment. While logging through a script, Neptune automatically stops tracking once the script has completed execution.

In [None]:
run.stop()

Shutting down background jobs, please wait a moment...
Done!


Waiting for the remaining 1 operations to synchronize with Neptune. Do not kill this process.


All 1 operations synced, thanks for waiting!


## Go to Neptune UI

Click on one of the links created when you run the script or go directly to the app.

If you are logging things to the public project ``common/quickstarts`` you can just [follow this link](https://app.neptune.ai/o/common/org/quickstarts/e/QUI-10/parameters).

## See that everything got logged

Go to one of the runs you executed and see that you logged things correctly:

- click on the run link or one of the rows in the runs table in the UI
- Go to ``Parameters`` section to see your parameters
- Go to ``Monitoring`` to see hardware utilization charts
- Go to ``All metadata`` to review all logged metadata

![image](https://neptune.ai/wp-content/uploads/docs-organize-runs-review.gif)

## Filter runs by tag

Go to the runs space and filter by the ``run-organization`` tag

Neptune should filter all those runs for you.

![img](https://neptune.ai/wp-content/uploads/docs-organize-ml-runs-tags.gif)

## Choose parameter and metric columns you want to see

Use the ``Add column`` button to choose the columns for the runs table:

- Click on ``Add column``,
- Type metadata name of interest, for example `test_f1`,
- Click on ``test_f1`` to add it.

![img](https://neptune.ai/wp-content/uploads/docs-organize-ml-runs-cols.gif)

## Save the view of runs table

You can save the current view of runs table for later:

- Click on the ``Save as new``

Both the columns and the filtering on rows will be saved as view.

![img](https://neptune.ai/wp-content/uploads/docs-organize-ml-runs-view.gif)

---
**Tip:**  
Create and save multiple views of the runs table for different use cases or runs groups.
---