# ClearML agent intro

ClearML Agent is a virtual environment and execution manager for machine learning solutions.
Its main focus is around:

- Reproducing experiments, including their complete environments.
- Scaling workflows on multiple target machines.

![Alt text](data/clearml_agent_flow_diagram-36ad7ef4aad58ab97a70192ef65c66f2.jpg)

## Installation
To install, run 
```shell
pip install clearml-agent
```

## Configuration
After installation, you need to run 
```shell
clearml-agent init
```
which will run a wizard to get various credentials (Clearml, git, file server, ...).
Alternatively, you can directly edit the clearml.conf file if you have it already.

## Executing an Agent
To execute an agent, listening to a queue, run:
```shell
clearml-agent daemon --queue <queue_name>
```

## Executing in Background
To execute an agent in the background, run:

```shell
clearml-agent daemon --queue <execution_queue_to_pull_from> --detached
```

## Stopping Agents
To stop an agent running in the background, run:
```shell
clearml-agent daemon <arguments> --stop
```


## Allocating Resources
To specify GPUs associated with the agent, add the --gpus flag. To execute multiple agents on the same machine (usually assigning GPU for the different agents), run:
```shell
clearml-agent daemon --detached --queue default --gpus 0
clearml-agent daemon --detached --queue default --gpus 1
```

To allocate more than one GPU, provide a list of allocated GPUs:
```shell
clearml-agent daemon --gpus 0,1 --queue dual_gpu &
```

## Queue Prioritization

A single agent can listen to multiple queues. The priority is set by their order.
```shell
clearml-agent daemon --detached --queue high_q low_q --gpus 0
```

To make sure an agent pulls from all queues equally, add the --order-fairness flag.

```shell
clearml-agent daemon --detached --queue group_a group_b --order-fairness  --gpus 0
```

To make sure an agent pulls from all queues equally, add the --order-fairness flag.



## Docker mode

When executing the ClearML Agent in Docker mode, it will:

Run the provided Docker container
Install ClearML Agent in the container
Execute the Task in the container, and monitor the process.
ClearML Agent uses the provided default Docker container, which can be overridden from the UI.

All ClearML Agent flags (such as --gpus and --foreground) are applicable to Docker mode as well.

To execute ClearML Agent in Docker mode, run:

```shell
clearml-agent daemon --queue <execution_queue_to_pull_from> --docker [optional default docker image to use]
```

## Services mode

ClearML Agent supports a Services Mode where, as soon as a task is launched off of its queue, the agent moves on to the next task without waiting for the previous one to complete. This mode is intended for running resource-sparse tasks that are usually idling, such as periodic cleanup services, HPO task or a pipeline controller.

To run a clearml-agent in services mode, run:
```shell
clearml-agent daemon --services-mode --queue services --docker --cpu-only
```
To limit the number of simultaneous tasks run in services mode, pass the maximum number immediately after the --services-mode option (e.g. --services-mode 5)

# Use case

In [18]:
from clearml import Task, TaskTypes

## Clone, change and execute task on agents

### Get the base task

In [34]:
base_task = Task.get_task(project_name="ClearmlStudySessions/agents",
                 task_name="base task")

In [35]:
base_task.id

'1b17da9671624d8db0e35381babb2da1'

### Check the parameters

In [36]:
params = base_task.get_parameters_as_dict()

In [37]:
params

{'General': {'base_model': 'MobileNetV3',
  'batch_size': '256',
  'image_size': '224',
  'lr': '0.0001'}}

### Clone the task

In [38]:
cloned_task = Task.clone(source_task=base_task,
                         name="Clone of base task",)

In [39]:
cloned_params = cloned_task.get_parameters_as_dict()

In [40]:
cloned_params

{'General': {'base_model': 'MobileNetV3',
  'batch_size': '256',
  'image_size': '224',
  'lr': '0.0001'}}

### Change or add parameters

In [41]:
cloned_task.set_parameter(name="Extra/test", value="test")

In [42]:
# if you don't put the section first, it will assume the "General" section
cloned_task.set_parameter(name="base_model", value="test")

In [43]:
cloned_task.set_parameters({'base_model': 'MobileNetV2',
  'batch_size': 128,
  'image_size': 256,
  'lr': 1e-6,
  'pooling' : 'avg'})

### Enqueue the task to be run remotely

In [44]:
Task.enqueue(task=cloned_task,
             queue_name="test")

<tasks.EnqueueResponse: {
    "updated": 1,
    "fields": {
        "status": "queued",
        "status_reason": "",
        "status_message": "",
        "status_changed": "2023-05-31T01:41:39.292002+00:00",
        "last_update": "2023-05-31T01:41:39.292002+00:00",
        "last_change": "2023-05-31T01:41:39.292002+00:00",
        "last_changed_by": "d245343a43f4403489141b8d4663c4ff",
        "enqueue_status": "created",
        "execution.queue": "0d49c3b6f98c403aac20fa7920cac044"
    },
    "queued": 1
}>