This example illustrates how crowdsourcing using Toloka can be made easier and cheaper by integrating an ML model (which we refer to as an autohelper) into the usual pipeline. Furthermore, it shows how to run the whole project in the cloud using [Prefect](https://www.prefect.io/), which makes workflow orchestration much simpler.

The main steps are:
* setting up Prefect
* getting predictions using ML
* evaluating predictions' quality
* sending tasks with prediction below a certain quality threshold to Toloka users
* aggregating the results

Such a process leads to better quality and helps spend less by reducing the number of manual tasks

## Setting up Prefect

Prefect is a workflow management system, designed for modern infrastructure and powered by the open-source Prefect Core workflow engine.

Prefect offers many options for workflow management. We'll use its [cloud-based service](https://www.prefect.io/cloud/) for orchestration and run examples using local machine and local storage. What follows is a quick guide for setting it up (for more detailed information refer to [this material](https://docs.prefect.io/orchestration/getting-started/quick-start.html))

First, let's make sure prefect is installed

In [None]:
!pip install prefect
# !conda install -c conda-forge prefect
# !pipenv install --pre prefect

To use Prefect Cloud we'll need to login to (or set up an account for) Prefect Cloud at https://cloud.prefect.io. Once it's done, let's set the backend to use Prefect Cloud

In [None]:
!prefect backend cloud

Next, we'll need to authenticate with the backend - follow [these instructions](https://docs.prefect.io/orchestration/getting-started/set-up.html#authenticate-with-prefect-cloud) to do that, then enter your key in the next cell

In [None]:
YOUR_KEY = input()
!prefect auth login --key $YOUR_KEY

All that remains is to create a project and start an agent that will run Prefect flows on the local machine.
Prefect agent is responsible for starting and monitoring flow runs

In [None]:
PROJECT_NAME = input()
# PROJECT_NAME = 'Toloka test project 1'
!prefect create project $PROJECT_NAME
!prefect agent local start

Prefect uses an abstraction called [Executor](https://docs.prefect.io/api/latest/executors.html#executor) to run tasks, which is set to local by default, but also [natively supports](https://docs.prefect.io/orchestration/flow_config/executors.html#daskexecutor) dask. Other storage [types](https://docs.prefect.io/orchestration/flow_config/executors.html#daskexecutor) and agent [options](https://docs.prefect.io/orchestration/flow_config/executors.html#daskexecutor) are also supported, but we'll keep everything local for simplicity.

## Writing code

For the project, we have a set of customer reviews, and we need to classify them as “Positive” or “Negative”. We ask performers to read a review and decide which category it belongs to.
For more details refer to an official Toloka-kit [example](https://github.com/Toloka/toloka-kit/blob/main/examples/5.nlp/sentiment_analysis/sentiment_analysis.ipynb) which this project is based on

### Call to action
If you found some bugs or have a new feature idea, don't hesitate to [open a new issue on Github](https://github.com/Toloka/toloka-kit/issues/new/choose).
Like our library and examples? Star [our repo on Github](https://github.com/Toloka/toloka-kit)

Prepare environment and import all we'll need.

In [182]:
%%capture
!pip install toloka-kit==0.1.26
!pip install crowd-kit==1.0.0

import datetime
import requests
import os
import time
import getpass
from typing import List, Tuple

import numpy as np
import pandas as pd

import toloka.client as toloka
from toloka.client import Pool, Project, TolokaClient
from toloka.client.analytics_request import CompletionPercentagePoolAnalytics
from crowdkit.aggregation import DawidSkene

import prefect
from prefect import Flow, task
from prefect.engine.results import LocalResult

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

Set up the steps for getting json configs for the project and the pool

In [102]:
GITHUB_RAW = 'https://raw.githubusercontent.com'
GITHUB_BASE_PATH = 'Toloka/toloka-kit/main/examples/9.toloka_and_ml_on_prefect/configs'


def _load_json_from_github(filename: str):
    response = requests.get(os.path.join(GITHUB_RAW, GITHUB_BASE_PATH, filename))
    response.raise_for_status()
    return response.json()

Now we can start building the project.
Prefect refers to each step as a [*task*](https://docs.prefect.io/core/about_prefect/thinking-prefectly.html#tasks). In a simple sense, a task is just a Python function representing a logically distinct stage of a process.
This example is split into different tasks for project and pool creationg, data preparationa and finally task processing by autohelper and Toloka separately. Most tasks receive Toloka API token and env variable, which enables creating the Toloka client inside and solves possible difficulties involved in sharing such an object between different tasks.
Some tasks also specify that `print()` statements inside should be sent to Prefect Cloud logs, which makes debugging easier: `@task(log_stdout=True)`

Let's create all the necessary blocks for our flow.
First, create a project

In [103]:
@task
def create_project(token: str, env: str) -> str:
    client = TolokaClient(token, env)
    project = Project.structure(_load_json_from_github('project.json'))
    project = client.create_project(project)
    return project.id

Create pool with a skill-check.

In [184]:
@task
def create_skill(token: str, env: str, name='sentiment-analysis') -> str:
    client = TolokaClient(token, env)
    skill = next(client.get_skills(name=name), None) or client.create_skill(name=name)
    return skill.id

@task
def create_pool(token: str, env: str, project_id: str, skill_id: str, reward: float) -> str:
    client = TolokaClient(token, env)
    pool = Pool.structure(_load_json_from_github('pool.json'))
    pool.project_id = project_id
    skill_filter = (toloka.filter.Skill(skill_id) == None) | (toloka.filter.Skill(skill_id) >= 90)
    pool.set_filter(pool.filter & skill_filter)
    pool.quality_control.add_action(
        collector=toloka.collectors.GoldenSet(history_size=10),
        conditions=[toloka.conditions.TotalAnswersCount > 4],
        action=toloka.actions.SetSkillFromOutputField(skill_id=skill_id, from_field='correct_answers_rate')
    )
    pool.reward_per_assignment = reward
    pool.will_expire=datetime.datetime.utcnow() + datetime.timedelta(days=7)
    pool = client.create_pool(pool)
    return pool.id

We will use [Grammar and Online Product Reviews](https://data.world/datafiniti/grammar-and-online-product-reviews) dataset under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license


[![CC BY-NC-SA 4.0](https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc-sa/4.0/)

Download the necessary data and separate it into golden and non-golden tasks.
We'll use `cnt_tasks` of regular tasks and `cnt_golden` of golden tasks

In [104]:
@task(nout=2, log_stdout=True)
def prepare_dataset(
    dataset_url: str,
    cnt_tasks: int,
    cnt_golden: int,
) -> Tuple[pd.DataFrame, pd.DataFrame]:
    dataset = pd.read_csv(dataset_url)
    print(f'Initial dataset size: {len(dataset)}')

    dataset = dataset[['reviews.text', 'reviews.doRecommend']].dropna().reset_index(drop=True)
    dataset = dataset.replace({'reviews.doRecommend': {True: 'pos', False: 'neg'}})

    positive_tasks = dataset[dataset['reviews.doRecommend'] == 'pos']
    negative_tasks = dataset[dataset['reviews.doRecommend'] == 'neg']
    print(f'positive count: {len(positive_tasks)}. negative count: {len(negative_tasks)}')

    slice_tasks = cnt_tasks // 2
    slice_golden = slice_tasks + cnt_golden // 2
    pos_task_dataset, pos_golden_dataset, _ = np.split(positive_tasks, [slice_tasks, slice_golden])
    neg_task_dataset, neg_golden_dataset, _ = np.split(negative_tasks, [slice_tasks, slice_golden])

    task_dataset = pd.concat([pos_task_dataset, neg_task_dataset])
    golden_dataset = pd.concat([pos_golden_dataset, neg_golden_dataset])

    return task_dataset, golden_dataset

Create a function for getting the ML model and tokenizer, which will serve as an autohelper in our project. We'll use the readily-available models from [Hugging Face](https://huggingface.co/), namely [finetuned DistilBERT](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)

In [105]:
def _get_resources(model_name: str) -> Tuple[AutoModelForSequenceClassification, AutoTokenizer]:
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    return model, tokenizer

Set up a function to get autohelper predictions.
We'll use confidence scores later on to decide whether to trust the autohelper answer or to send the task to Toloka

In [106]:
# batch should be an array of reviews
def _make_predictions(batch: List[str], model: AutoModelForSequenceClassification, tokenizer: AutoTokenizer):
    batch = tokenizer(batch, padding=True, truncation=True, return_tensors='pt')
    print('start apply...')
    outputs = model(**batch)
    print('apply done')
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1).detach().numpy()
    # predictions are an array of pairs containing confidence scores for classes 0 and 1
    # (in this order)
    # therefore, (predictions[idx,1] > 0.5) is True if the model thinks
    #   that element idx is in the class 'pos'
    labels = np.vectorize(lambda flag: 'pos' if flag else 'neg')(predictions[:,1] > 0.5)
    confidence = predictions.max(axis=1)
    return labels, confidence

Create a function to decide, which tasks got adequate answers from autohelper (`accepted_tasks`) and which should be sent to Toloka (`manual_tasks`).
We assume that if a model performs below a certain confidence threshold, then the task should be given to Toloka users. For simplicity, the threshold is calculated as the 90th percentile of confidence scores on answers where the autohelper was wrong (this is a robust enough estimate for our case). To find it, we can use golden tasks for which we know the correct responses

In [120]:
@task(nout=2, log_stdout=True)
def apply(
    model_name: str,
    task_dataset: pd.DataFrame,
    golden_dataset: pd.DataFrame,
) -> Tuple[pd.DataFrame, pd.DataFrame]:
    model, tokenizer = _get_resources(model_name)
    print(f'Model loaded: {model_name}')

    # find the threshold using golden tasks:
    golden_items = list(golden_dataset.iloc[:,0].values)
    autohelper_golden_labels, golden_confidence = _make_predictions(golden_items, model, tokenizer)
    # extract true answers
    true_golden_labels = golden_dataset.iloc[:,1]
    # find wrong answers
    wrong_answers_mask = true_golden_labels != autohelper_golden_labels
    # set threshold to 90th percentile of confidence scores when the model was wrong
    # or if the model got all answers right, then set it to 0.95
    if wrong_answers_mask.any():
        threshold = np.percentile(golden_confidence[wrong_answers_mask], 90)
    else:
        threshold = 0.95

    # find elements where we think the model is likely to predict the right answer
    nongolden_items = list(task_dataset.iloc[:,0].values)
    nongolden_labels, nongolden_confidence = _make_predictions(nongolden_items, model, tokenizer)
    accepted_solutions_mask = nongolden_confidence > threshold

    # make a dataframe from the answers which we accepted
    accepted_tasks = pd.DataFrame({
        'review': task_dataset[accepted_solutions_mask]['reviews.text'],
        'sentiment': nongolden_labels[accepted_solutions_mask]
    })
    manual_tasks = task_dataset[~accepted_solutions_mask]

    print(f'accepted_tasks count: {len(accepted_tasks)}')
    print(f'manual_tasks count: {len(manual_tasks)}')

    return accepted_tasks, manual_tasks

Send golden and manual non-golden tasks to toloka

In [108]:
@task
def send_to_toloka(
    token: str,
    env: str,
    pool_id: str,
    golden_dataset: pd.DataFrame,
    manual_tasks: pd.DataFrame,
) -> None:
    client = TolokaClient(token, env)

    golden_tasks = [
        toloka.Task(
            pool_id=pool_id,
            input_values={'review': row['reviews.text']},
            known_solutions = [{'output_values': {'sentiment': row['reviews.doRecommend']}}],
            infinite_overlap=True,
        )
        for _, row in golden_dataset.iterrows()
    ]

    tasks = [
        toloka.Task(pool_id=pool_id, input_values={'review': review})
        for review in manual_tasks['reviews.text']
    ]

    client.create_tasks(golden_tasks + tasks, allow_defaults=True, open_pool=True)

Create a process to await pool's completion

In [109]:
@task(log_stdout=True)
def wait_pool_for_close(token: str, env: str, pool_id: str) -> None:
    client = TolokaClient(token, env)

    while True:
        pool = client.get_pool(pool_id)
        if pool.is_closed():
            print(f'Pool {pool_id} is closed.')
            return
        op = client.get_analytics([CompletionPercentagePoolAnalytics(subject_id=pool_id)])
        percentage = client.wait_operation(op).details['value'][0]['result']['value']
        print(f'   {datetime.datetime.now().strftime("%H:%M:%S")}\t'
              f'Pool {pool_id} - {percentage}%')
        time.sleep(60)

Create a task for processing Toloka responses and combining them with autohelper's answers

We'll run aggregation using the Dawid-Skene model.

We use this aggregation model because our questions are of the same difficulty, and we don't have many control tasks.

Read more about the Dawid-Skene model in the Requester’s Guide or get at an overview of different aggregation models in our Knowledge Base.


In order to save the data, we'll use Prefect's [output persistance option](https://docs.prefect.io/core/concepts/persistence.html#persisting-output), setting task's `checkpoint` flag to `True` and specifying the location where the pickled version of our information will be stored using Prefect's `LocalResult` (`dir` is the directory for the result, `location` is the file's name, so this file's relative path will be `./prefect_results/sentiments`)

In [183]:
@task(
    log_stdout=True,
    checkpoint=True,
    result=LocalResult(dir="./prefect_results", location='sentiments')
)
def collect_results(
    token: str,
    env: str,
    pool_id: str,
    autohelper_results: pd.DataFrame
) -> pd.DataFrame:
    client = TolokaClient(token, env)

    toloka_answers_df = client.get_assignments_df(pool_id)
    # Drop golden tasks
    toloka_answers_df = toloka_answers_df[toloka_answers_df['GOLDEN:sentiment'].isna()]
    # Prepare DataFrame for aggregation
    toloka_answers_df = toloka_answers_df.rename(columns={
        'INPUT:review': 'task',
        'OUTPUT:sentiment': 'label',
        'ASSIGNMENT:worker_id': 'worker'
    })

    print(f'Toloka answers count: {len(answers_df)}')

    toloka_predicted_answers = DawidSkene(n_iter=20).fit_predict(toloka_answers_df)
    toloka_results = pd.DataFrame({
        'review': toloka_predicted_answers.index,
        'sentiment': toloka_predicted_answers.values
    })

    answers = pd.concat([autohelper_results, toloka_results])

    return answers

Now we can finally set up our flow. We'll use Prefect's [Parameters](https://docs.prefect.io/core/concepts/parameters.html) to securely send Toloka API token to the flow and to choose the environment (`SANDBOX` or `PRODUCTION`)

In [None]:
with Flow('ML assisted pipeline example') as flow:
    # Toloka API token
    token = prefect.Parameter('token')
    # project environment
    env = prefect.Parameter('env')

    DATASET_URL = 'https://tlk.s3.yandex.net/ext_dataset/datafiniti_grammar_and_online_product_reviews.csv'
    model_name = 'distilbert-base-uncased-finetuned-sst-2-english'

    project_id = create_project(token, env)
    skill_id = create_skill(token, env)
    pool_id = create_pool(token, env, project_id, skill_id, reward=0.03)

    task_dataset, golden_dataset = prepare_dataset(DATASET_URL, cnt_tasks=200, cnt_golden=20)
    accepted_tasks, manual_tasks = apply(model_name, task_dataset, golden_dataset)

    sent = send_to_toloka(token, env, pool_id, golden_dataset, manual_tasks)
    pooling = wait_pool_for_close(token, env, pool_id).set_upstream(sent)

    collect_results(token, env, pool_id, accepted_tasks).set_upstream(pooling)

# register the flow with the project we've created in the beginning
# flow.register(project_name=PROJECT_NAME)
flow.register(project_name="test")

Go to the link in the last cell's output, leading to the Prefect Cloud UI
<img src="https://raw.githubusercontent.com/Toloka/toloka-kit/main/examples/9.toloka_and_ml_on_prefect/images/tabs.png" alt="Tabs" width="800">

Click on the *SETTINGS* tab and turn *Heartbeat* off. Tasks send *heartbeats* at regular intervals, if they're maling progress, and it's Prefect's way of protecting against zombie tasks (more info [here](https://docs.prefect.io/orchestration/concepts/services.html#zombie-killer)). But in our case, Toloka  users may be slow and not have enough time to submit an answer before Prefect starts thinking the *pooling* task is a zombie
<img src="https://raw.githubusercontent.com/Toloka/toloka-kit/main/examples/9.toloka_and_ml_on_prefect/images/heartbeat.png" alt="Heartbeat" width="800">

Next, navigate to the *RUN* tab, input the *env* and *token* and click on run in the bottom of the page
<img src="https://raw.githubusercontent.com/Toloka/toloka-kit/main/examples/9.toloka_and_ml_on_prefect/images/parameters.png" alt="Parameters" width="800">

You can also use the Cloud UI to inspect the flow's progress, see its structure (the *SCHEMATIC* tab), view the logs (choose the necessary *run* in the *RUNS* tab and select the *LOGS* tab) and many other things.

## Viewing the results

Let's unpickle the data we've saved and view it (by default, Prefect uses `cloudpickle` for data serialization)

In [188]:
import cloudpickle

FILEPATH = './prefect_results/sentiments'
with open(FILEPATH, 'rb') as file:
    results = cloudpickle.loads(file.read())

In [197]:
results.sample(10)

Unnamed: 0,review,sentiment
24,Great product! Exactly what it says works very...,pos
49,This cream did not do much for my face or thro...,neg
6,Got as a surprise for my husband there is noth...,neg
53,I've been using this product for years and it ...,neg
13,"I bought this to try to spice things up, but I...",neg
9,"Bought this to enhance our time a bit, did abs...",pos
22,"Exceptional product, this is smooth, not slimy...",pos
27,You will LOVE this lotion. I smile every time ...,pos
14,I bought this because it had better reviews th...,neg
52,I am so disappointed! I have used this product...,neg
