# Getting Started with MLRUN
----------------------------

<a id='top'></a>
### **Understanding functions and running tasks locally**

**[intall mlrun](#install)**<br>
**[mlrun setup](#setup)**<br>
**[create and run a local function](#create-local)**<br>
**[create a new mlrun Task and run it](#create-new-task)**<br>
**[inspecting the run results and outputs](#inspecting)**<br>
**[using hyperparameter tasks](#using-hyperparamter-tasks)**<br>
**[running Task's through the cli](#tasks-cli)**<br>
**[inline code and running on multiple runtimes](#inline)**<br>
**[running locally in the notebook](#run-locally)**<br>
**[hyper parameters taken from a csv file](#run-csv)**

<a id="install" ></a>
______________________________________________

# **install**

In [1]:
# Uncomment this to install mlrun package, restart the kernel after

# !pip install mlrun

<a id="setup"></a>
______________________________________________

# **mlrun setup**

MLRun tracks jobs and artifacts, collecting metadata in local file directory or in a DB.

The DB/API path can be set using the environment variable ```MLRUN_DBPATH``` or the config object ```mlconf.dbpath```, we will try and get it from the environment.

**Note:** for _distributed jobs_ and and an _interactive UI_ you must use the `mlrun-api` service (and not the file DB).

For a local file DB, in the current folder:

In [1]:
from mlrun import run_local, new_task, mlconf
from os import path

mlconf.dbpath = mlconf.dbpath or "./"

For the ```mlrun-api``` service (in Kubernetes) use:

In [2]:
# uncomment for working with the DB
# mlconf.dbpath = mlconf.dbpath or 'http://mlrun-api:8080'

<a id="create-local"></a>
______________________________________________

# **create a new mlrun task and submit it to a local function**

An mlrun ```Task``` defines job inputs/outputs and metadata:
* input parameters
* hyper-parameters (or parameter files)
* input datasets
* default paths (for input/output)
* secrets (job credentials)
* ```Task``` metadata: name, project, labels, etc.

If a function supports multiple handlers (another term for method, function), we also need to define the specific handler (or set the `spec.default_handler`).

`Task` object have helper methods like `.with_params()`, `.with_secrets()`, `.with_input()`, `.set_label()` for conviniance.

This example shows how we can create a new task with various parameters and later use `.run()` to submit the task to our new function.<br>

artifacts from each run are stored in the `artifact_path` which can be set globally through environment var (`MLRUN_ARTIFACT_PATH`) or through the config, if its not already set we can create a directory and use it in our runs. Using `{{run.uid}}` in the path will allow us to create a unique directory per run, when we use pipelines we can use the `{{workflow.uid}}` template option.

> Note: artifact path can be a local path or a URL (starts with s3://, v3io://, etc.), if we want the artifacts to show in the UI the artifact path must be on a shared file or object media and should not be a relative path, on Iguazio platform the notebooks are always on the shared file system.

In [3]:
out = mlconf.artifact_path or path.abspath("./data")
# {{run.uid}} will be substituted with the run id, so output will be written to different directoried per run
artifact_path = path.join(out, "{{run.uid}}")

Then we create a new task and set its properties using helper methods:<br>
The [secrets `file`](secrets.txt) is a list of key=value properties 

In [4]:
task = (
    new_task(name="demo", params={"p1": 5}, artifact_path=artifact_path)
    .with_secrets("file", "secrets.txt")
    .set_label("type", "demo")
)

<a id="create-new-task"></a>
______________________________________________

### **run local code**
The following example creates a temp local function mapped to the **[training.py](training.py)** code file (located in the same folder as this notebook) and run it.

mlrun supports multiple _**runtimes**_ (handler, local, nuclio, job, spark, mpi, etc., see **[supported runtimes](https://github.com/mlrun/mlrun/tree/master/mlrun/runtimes)** for more details). _**local**_ runtime runs code in your local/notebook environment.

In [5]:
# run our task using our new function
run_object = run_local(task, command="training.py")

<b>Hover over the inputs/artifacts to see full link, or click to see the content !!!</b>
<a id="inspecting"></a>
______________________________________________

# **inspecting the run results and outputs**

Every ```run``` object (the result of a `.run()` method) has the following properties and methods:
* `.uid()`   - return the unique id
* `.state()` - return the last known state
* `.show()`  - show the latest task state and data in a visual widget (with hyper links and hints)
* `.outputs` - return a dict of the run results and artifact paths
* `.logs()`  - return the latest logs, use `Watch=False` to disable interactive mode in running tasks
* `.artifact(key)` - return full artifact details
* `.output(key)`   - return specific result or artifact (path)
* `.to_dict()`, `.to_yaml()`, `.to_json()` - convert the run object to dict/yaml/json

In [6]:
run_object.uid()

In [7]:
run_object.to_dict()

In [8]:
run_object.state()

In [9]:
run_object.show()

In [10]:
run_object.outputs

In [11]:
run_object.logs()

In [12]:
run_object.artifact("dataset")

<a id="using-hyperparamter-tasks"></a>
______________________________________________

# **using hyper-parameter tasks**
In many cases we want to run the same function with different input values and select the best result.<br>

You can specify parameters with a list of values and mlrun will run all the parameter combinations as a single hyper-param task.<br>

Each unique run combination is called an _**iteration**_, where '0' iteration is the parent task.

Use `.with_hyper_params()` and provide lists or values, we use the selector string to indicate which will iteration will be selected as the winning result (indicated using [```min``` or ```max```].[```output-value```])

In [13]:
run = run_local(
    task.with_hyper_params({"p2": [5, 2, 3]}, "min.loss"), command="training.py"
)

In [14]:
run.outputs

<a id="tasks-cli"></a>
______________________________________________

# **running tasks through the cli**

In [15]:
%env MLRUN_DBPATH={mlconf.dbpath}

In [16]:
!mlrun run --name train_hyper -x p1="[3,7,5]" --selector max.accuracy -p p2=5 --out-path {artifact_path} training.py
if _exit_code != 0:
    raise RuntimeError()

In [17]:
# see other CLI commands
!mlrun
if _exit_code != 0:
    raise RuntimeError()

<a id="inline"></a>
______________________________________________

# **using (inline) code and running on different runtimes**

In [20]:
from mlrun.artifacts import ChartArtifact, PlotArtifact
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# define a function with spec as parameter
import time


def handler(context, p1=1, p2="xx"):
    """this is a simple function

    :param p1:  first param
    :param p2:  another param
    """
    # access input metadata, values, and inputs
    print(f"Run: {context.name} (uid={context.uid})")
    print(f"Params: p1={p1}, p2={p2}")

    time.sleep(1)

    # log the run results (scalar values)
    context.log_result("accuracy", p1 * 2)
    context.log_result("loss", p1 * 3)

    # add a lable/tag to this run
    context.set_label("category", "tests")

    # create a matplot figure and store as artifact
    fig, ax = plt.subplots()
    np.random.seed(0)
    x, y = np.random.normal(size=(2, 200))
    color, size = np.random.random((2, 200))
    ax.scatter(x, y, c=color, s=500 * size, alpha=0.3)
    ax.grid(color="lightgray", alpha=0.7)

    context.log_artifact(PlotArtifact("myfig", body=fig, title="my plot"))

    # create a dataframe artifact
    df = pd.DataFrame([{"A": 10, "B": 100}, {"A": 11, "B": 110}, {"A": 12, "B": 120}])
    context.log_dataset("mydf", df=df)

    # Log an ML Model artifact
    context.log_model(
        "mymodel",
        body=b"abc is 123",
        model_file="model.txt",
        model_dir="data",
        metrics={"accuracy": 0.85},
        parameters={"xx": "abc"},
        labels={"framework": "xgboost"},
    )

    return "my resp"

<a id="run-locally"></a>
______________________________________________

# **run locally in the notebook**

In [21]:
task = new_task(name="demo2", handler=handler, artifact_path=artifact_path).with_params(
    p1=7
)
run = run_local(task)

<a id="run-csv"></a>
______________________________________________

# **run with hyper parameters taken from a csv file**

In [22]:
task = new_task(
    name="demo2", handler=handler, artifact_path=artifact_path
).with_param_file("params.csv", "max.accuracy")
run = run_local(task)