# Collect and show metrics in Graphite

In this example we will learn how to collect metrics using Toloka-kit and
send them to remote metrics server (we will use [Graphite](https://graphiteapp.org) but switching to any other solution is very easy).

In [None]:
%%capture
!pip install toloka-kit==0.1.26
!pip install crowd-kit==1.0.0

import socket
import asyncio
import logging
import getpass

import toloka.metrics as metrics
import toloka.client as toloka
from toloka.metrics import MetricCollector

In [None]:
toloka_client = toloka.TolokaClient(getpass.getpass('Enter your OAuth token: '), 'PRODUCTION') # Or switch to 'SANDBOX'
print(toloka_client.get_requester())

For this example we will run pipeline from [Streaming pipeline example](https://github.com/Toloka/toloka-kit/tree/main/examples/6.streaming_pipelines/streaming_pipelines.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/6.streaming_pipelines/streaming_pipelines.ipynb).
If you are running this jupyter notebook in colab please download necessary script with the following line of code:

In [None]:
!wget --quiet --show-progress "https://raw.githubusercontent.com/Toloka/toloka-kit/main/examples/metrics/find_items_pipeline.py"

In [None]:
from find_items_pipeline import FindItemsPipeline
pipeline = FindItemsPipeline(client=toloka_client)

Create projects and pools needed for pipeline

In [None]:
pipeline.init_pipeline()

## Configuring metrics collection in Graphite

You need to [configure](https://graphite.readthedocs.io/en/stable/install.html) Graphite server before proceeding
to this section. An easy option might be using official docker container. Selection of user interface is up to you
(during creation of this example we used [Grafana](https://grafana.com)).

In [None]:
# specify your Graphite instance url and port
CARBON_ADDRESS = 'localhost'
CARBON_PORT = 2003

try:
    sock = socket.socket()
    sock.connect((CARBON_ADDRESS, CARBON_PORT))
    sock.close()
except ConnectionRefusedError:
    raise RuntimeError('Graphite server is unreachable!')
else:
    print('Congratulations, connected to Graphite server!')

First, let's define a callback for handling metrics values. We'll use it to store the data on a Graphite server.

In [None]:
class GraphiteLogger:
    def __init__(self, carbon_address, carbon_port, use_ipv6=False):
        self.carbon_address = carbon_address
        self.carbon_port = carbon_port
        self.use_ipv6 = use_ipv6
        self.logger = logging.getLogger('GraphiteLogger')

    def __call__(self, metric_dict):
        if self.use_ipv6:
            s = socket.socket(socket.AF_INET6)
            s.connect((self.carbon_address, self.carbon_port, 0, 0))
        else:
            s = socket.socket()
            s.connect((self.carbon_address, self.carbon_port))

        for metric in metric_dict:
            for timestamp, value in metric_dict[metric]:
                s.sendall(
                    f'{metric} {value} {timestamp.timestamp()}\n'.encode()
                )
                self.logger.log(
                    logging.INFO,
                    f'Logged {metric} {value} {timestamp.timestamp()}'
                )
        s.close()


graphite_logger = GraphiteLogger(
    CARBON_ADDRESS, CARBON_PORT,
    # specify use_ipv6=True if your Graphite server is available only via IPv6
    # (this may be the case if you are running Graphite inside docker hosted in MacOS)
    use_ipv6=False,
)

For sending metrics to Graphite we have to:
- Define which metrics we'll collect.
- Describe what we'll do with these metrics, as a callable functor.
- Define a TolokaClient for each metric.
- Asynchronously call `run` for the MetricCollector instance.

For this example we will collect a number of submitted assignments, accepted assignments and total expenses for each pool. All available metrics can be found in the [documentation](https://toloka.ai/en/docs/toloka-kit/reference/toloka.metrics.metrics.BaseMetric).

In [None]:
metric_collector = MetricCollector(
    [
        # Assignments in pools. We will track submitted assignments and
        # accepted assignments counts for every pool.
        metrics.AssignmentsInPool(
            pipeline.verification_pool.id,
            submitted_name='verification_pool.submitted_assignments',
            accepted_name='verification_pool.accepted_assignments',
        ),
        metrics.AssignmentsInPool(
            pipeline.find_items_pool.id,
            submitted_name='find_items_pool.submitted_assignments',
            accepted_name='find_items_pool.accepted_assignments',
        ),
        metrics.AssignmentsInPool(
            pipeline.sbs_pool.id,
            submitted_name='sbs_pool.submitted_assignments',
            accepted_name='sbs_pool.accepted_assignments',
        ),
        # Budget spent for every pool
        metrics.SpentBudgetOnPool(
            pipeline.verification_pool.id,
            'verification_pool.expenses'
        ),
        metrics.SpentBudgetOnPool(
            pipeline.find_items_pool.id,
            'find_items_pool.expenses'
        ),
        metrics.SpentBudgetOnPool(
            pipeline.sbs_pool.id,
            'sbs_pool.expenses'
        )
    ],
    callback=graphite_logger
)

# You can specify toloka_client argument in each metric instead of calling
# bind_client if you want to use different clients for different metrics
metrics.bind_client(metric_collector.metrics, toloka_client)

## Running pipeline

Let's try to launch our pipeline and see metrics updated. Metrics will be sent to configured Graphite server.

⚠️ **Be careful**:
real projects will be created and money will be spent in case of running in production environment! ⚠️


In [None]:
# Google Colab is using a global event pool,
# so in order to run our pipeline we have to apply nest_asyncio to create an inner pool
if 'google.colab' in str(get_ipython()):
    import nest_asyncio, asyncio
    nest_asyncio.apply()
    asyncio.get_event_loop().run_until_complete(asyncio.gather(metric_collector.run(), pipeline.run()))
else:
    await asyncio.gather(metric_collector.run(), pipeline.run())

Here is an example of metrics displayed in Grafana with Graphite as the Datasource after pipeline completion.

<table  align="center">
  <tr><td>
    <img src="./img/grafana_metrics.png" width="1000">
  </td></tr>
  <tr><td align="center">
    <b>Figure 2.</b> Grafana web view.
  </td></tr>
</table>

## Using Graphite in production
In normal usage it's better to gather metrics from Toloka once in ten minutes or less often. So you must prepare your graphite for that. Typically it already has `count` type of aggregation, that looks like that:
```
    [count]
    pattern = \.count$
    xFilesFactor = 0
    aggregationMethod = sum
```


It means, that all new metrics that end on ```.count``` will be processed like that: sum all of their values when graphite needs to aggregate this metric on some interval.


But for metrics that cannot be summed, for example, completion percentage, by default it's no useful type. So you need to add them to the ```storage-aggregation.conf```:
```
    [metric]
    pattern=_metric$
    xFileFactor = 0
    aggregationMethod = average
```

It means if you send a metric that ends on ```_metric``` to graphite, it will aggregate this metric like an average on any interval.

And you need to set up right retention for this metric in ```storage-schemas.conf```, for example:
```
    [metric]
    pattern = _metric$
    retentions = 10m:7d,1h:360d
```