# AutoQuality

This example illustrates how to use `toloka.autoquality` module. AutoQuality is a tool to help set up quality control for Toloka project. AutoQuality uses random search to find the optimal set of quality control parameters. Every parameter has its own distribution. AutoQuality creates several pools with different parameter values and compares them. Distributions and optimum critera can be modified by user.

In [None]:
!pip install pandas
!pip install toloka-kit[autoquality]==0.1.26

In [1]:
import logging
import sys

logging.basicConfig(
    format='[%(levelname)s] %(name)s: %(message)s',
    level=logging.INFO,
    stream=sys.stdout,
)

In [None]:
import toloka.client as toloka
import toloka.client.project.template_builder as tb
from toloka.autoquality import AutoQuality

import datetime
import numpy as np
import os
import requests
import pandas as pd
from tqdm import tqdm

In this example our task will be Movie Reviews Sentiment Analysis. We will use a [Large Movie Review Dataset](https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews):

In [3]:
N_ROWS = 1000

def sample_stratified(df, label_column, n_rows):
    """Function to sample n_rows from a dataframe while presenving class distribution"""
    return df.groupby(label_column, group_keys=False) \
            .apply(lambda x: x.sample(int(np.rint(n_rows*len(x)/len(df))))) \
            .sample(frac=1)

base_url = 'https://tlk.s3.yandex.net/ext_dataset/aclImdb'
df = pd.read_csv(os.path.join(base_url, 'test.csv'))
df_control = sample_stratified(df, 'label', n_rows=1000)
df = df.drop(df_control.index)
df = sample_stratified(df, 'label', n_rows=N_ROWS)

df_control = df_control.reset_index(drop=True)
df = df.reset_index(drop=True)
df.head()

Unnamed: 0,path,text,label
0,test/neg/8744_2.txt,"This joins the endless line of corny, predicta...",neg
1,test/pos/6011_10.txt,Swift's writing really has more in common with...,pos
2,test/pos/9149_8.txt,This film is a good start for novices that hav...,pos
3,test/pos/6504_9.txt,"Wonderfully funny, awe-inspiring feature on th...",pos
4,test/neg/970_4.txt,Eddy Murphy and Robert De Niro should be a com...,neg


In [4]:
df.label.value_counts()

neg    500
pos    500
Name: label, dtype: int64

In [None]:
def load_texts(urls):
    texts = []
    for url in tqdm(urls):
        resp = requests.get(url)
        texts.append(resp.text)
    return texts

df['text'] = load_texts(base_url + '/' + df.path)
df_control['text'] =  load_texts(base_url + '/' + df_control.path)

## Project setup

Let's create an appropriate Toloka project. AutoQuality requires to set up a training pool and a base pool. The base bool should be set up like the regular pools you will be running. AutoQuality will clone this pool and change quality control settings to explore different configurations.


In [None]:
token = input("Enter your token:")
toloka_client = toloka.TolokaClient(token, 'PRODUCTION')

In [7]:
project = toloka.Project(
    public_name='Movie review classification',
    public_description='Classify sentiment of movie reviews',
    private_comment='Auto quality control optimization experiments',
)
input_specification = {'text': toloka.project.StringSpec()}
output_specification = {'result': toloka.project.StringSpec()}

In [8]:
text_viewer = tb.TextViewV1(tb.InputData('text'))

radio_group_field = tb.ButtonRadioGroupFieldV1(
    tb.OutputData('result'),
    [
        tb.GroupFieldOption('pos', '😃 Positive'),
        tb.GroupFieldOption('neg', '😡 Negative'),
    ],
    label='What is the review sentiment?',
    validation=tb.RequiredConditionV1(hint='You need to select one answer'),
)

task_width_plugin = tb.TolokaPluginV1(
    layout=tb.TolokaPluginV1.TolokaPluginLayout(
        kind='pager', 
        task_width=500,
    )
)

hot_keys_plugin = tb.HotkeysPluginV1(
    key_1=tb.SetActionV1(tb.OutputData('result'), 'pos'),
    key_2=tb.SetActionV1(tb.OutputData('result'), 'neg'),
)

project_interface = toloka.project.TemplateBuilderViewSpec(
    view=tb.ListViewV1([radio_group_field, text_viewer]),
    plugins=[task_width_plugin, hot_keys_plugin],
)

project.task_spec = toloka.project.task_spec.TaskSpec(
    input_spec=input_specification,
    output_spec=output_specification,
    view_spec=project_interface,
)

In [9]:
project.public_instructions = """
<h2>How to complete the task</h2>
<ul>
<li>1. Look at the movie review text.</li>
<li>2. If it seems 😃 positive, assign the positive label. Otherwise assign the 😡 negative label.</li>
<li>3. If you are unsure choose the label that seems most appropriate.</li>
</ul>

In case of problems send us a message. Good luck!
""".strip()

In [10]:
project = toloka_client.create_project(project)

[INFO] toloka.client: A new project with ID "100100" has been created. Link to open in web interface: https://toloka.dev/requester/project/100100


## Training and base pool setup

In [11]:
training_pool = toloka.training.Training(project_id=project.id,
    private_name='Training pool',  
    training_tasks_in_task_suite_count=5, 
    task_suites_required_to_pass=1,
    may_contain_adult_content=False,
    inherited_instructions=True,
    assignment_max_duration_seconds=60*5,
    retry_training_after_days=5,
    mix_tasks_in_creation_order=True,
    shuffle_tasks_in_task_suite=True,
)

In [12]:
training_pool = toloka_client.create_training(training_pool)

[INFO] toloka.client: A new training with ID "34223462" has been created. Link to open in web interface: https://toloka.dev/requester/project/100100/training/34223462


In [13]:
label_to_hint_map = {
    'pos': 'Positive', 
    'neg': 'Negative',
}


tasks = []
for l in ['pos', 'neg']: 
    examples = df[df.label == l].head(3)
    
    for ex_tuple in examples.itertuples():
        tasks.append(
            toloka.Task(
                input_values={'text': ex_tuple.text},
                known_solutions=[toloka.task.BaseTask.KnownSolution(output_values={'result': ex_tuple.label})],
                message_on_unknown_solution=f'Incorrect label! The actual label is: {label_to_hint_map[ex_tuple.label]}',
                infinite_overlap=True,
                pool_id=training_pool.id
            )
        )

result = toloka_client.create_tasks(tasks, allow_defaults=True)

In [14]:
base_pool = toloka.Pool(
        project_id=project.id,
        private_name='AutoQuality Base Pool',
        may_contain_adult_content=False,
        reward_per_assignment=0.01, 
        assignment_max_duration_seconds=60*7, 
        will_expire=datetime.datetime.utcnow() + datetime.timedelta(days=365), 
        filter=(
            (toloka.filter.Languages.in_('EN')) &
            (
                (toloka.filter.ClientType == 'TOLOKA_APP') | 
                (toloka.filter.ClientType == 'BROWSER')
            )
        ),
    )

In [15]:
base_pool.set_mixer_config(
    real_tasks_count=4,
    golden_tasks_count=1
)

In [16]:
base_pool = toloka_client.create_pool(base_pool)

[INFO] toloka.client: A new pool with ID "34223545" has been created. Link to open in web interface: https://toloka.dev/requester/project/100100/pool/34223545


## AutoQuality basic usage

To use AutoQuality class you need to set project_id, base_pool_id, training_pool_id. If your target label field is different from `label` when you also need to specify it. 

In [17]:
aq = AutoQuality(
  toloka_client=toloka_client,
  project_id=project.id,
  base_pool_id=base_pool.id,
  training_pool_id=training_pool.id,
  label_field='result'
  # you can also use exam pool
  # exam_pool_id = ...,
  # exam_skill_id = ...,
)

First, call `setup_pools` to create multiple pools with a different quality control settings (autoquality pools)

In [18]:
aq.setup_pools()

[INFO] toloka.autoquality.optimizer: Creating pools
[INFO] toloka.client: A new pool with ID "34223548" has been cloned. Link to open in web interface: https://toloka.dev/requester/project/100100/pool/34223548
[INFO] toloka.client: A new skill with ID "46804" has been created. Link to open in web interface: https://toloka.dev/requester/quality/skill/46804
[INFO] toloka.autoquality.optimizer: {'AssignmentSubmitTime': {'avg_page_seconds': 90, 'history_size': 5, 'too_fast_fraction': 0.1477143300195338}, 'ExamRequirement': {'exam_passing_skill_value': 53.285216686727665}, 'GoldenSet': {'history_size': 5, 'incorrect_answers_rate': 84.85131966116259}, 'MajorityVote': {'history_size': 5, 'incorrect_answers_rate': 78.20005144646932}, 'TrainingRequirement': {'training_passing_skill_value': 54.66533962883042}, 'overlap': 2}
[INFO] toloka.client: A new pool with ID "34223549" has been cloned. Link to open in web interface: https://toloka.dev/requester/project/100100/pool/34223549
[INFO] toloka.cl

Then use `create_tasks` to add tasks for every autoquality pool. AutoQuality usually requires 300-500 tasks to work properly(you also need enough control tasks if Golden Set quality control is used).

In [19]:
n_optim = 200
df_optim = df_control.iloc[:n_optim].copy()
df_optim_golden = df_control.iloc[n_optim:].copy()
df_optim.shape, df_optim_golden.shape

((200, 3), (800, 3))

In [20]:
aq_tasks = []

In [21]:
for row in df_optim.itertuples():
    aq_tasks.append(
        toloka.Task(
            input_values={'text': row.text}, 
        )
    )
for row in df_optim_golden.itertuples():
    aq_tasks.append(
        toloka.Task(
            input_values={'text': row.text},
            known_solutions=[toloka.task.BaseTask.KnownSolution(output_values={'result': row.label})]
        )
    )

In [22]:
aq.create_tasks(aq_tasks)

[INFO] toloka.autoquality.optimizer: Creating tasks in pools
[INFO] toloka.autoquality.optimizer: Populated pool 34223548 with 1000 tasks
[INFO] toloka.autoquality.optimizer: Populated pool 34223549 with 1000 tasks
[INFO] toloka.autoquality.optimizer: Populated pool 34223550 with 1000 tasks
[INFO] toloka.autoquality.optimizer: Populated pool 34223551 with 1000 tasks
[INFO] toloka.autoquality.optimizer: Populated pool 34223553 with 1000 tasks
[INFO] toloka.autoquality.optimizer: Populated pool 34223554 with 1000 tasks
[INFO] toloka.autoquality.optimizer: Populated pool 34223555 with 1000 tasks
[INFO] toloka.autoquality.optimizer: Populated pool 34223556 with 1000 tasks
[INFO] toloka.autoquality.optimizer: Populated pool 34223557 with 1000 tasks
[INFO] toloka.autoquality.optimizer: Populated pool 34223558 with 1000 tasks
[INFO] toloka.autoquality.optimizer: Setup complete, please verify


Finally, just `run` autoquality. 

In [None]:
aq.run()

After that your autoquality instance will have some useful attributes with the results of the work.

In [27]:
aq.best_pool_id

'34223549'

In [28]:
aq.best_pool_params

{'AssignmentSubmitTime': {'avg_page_seconds': 90,
  'history_size': 5,
  'too_fast_fraction': 0.23201628347219955},
 'ExamRequirement': {'exam_passing_skill_value': 69.62253027714169},
 'GoldenSet': {'history_size': 5,
  'incorrect_answers_rate': 56.978409332827354},
 'MajorityVote': {'history_size': 5,
  'incorrect_answers_rate': 49.64673390232204},
 'TrainingRequirement': {'training_passing_skill_value': 88.51934128668412},
 'overlap': 4}

You can also compare all autoqualoty pools by a variety of different metrics

In [29]:
aq.ranks

Unnamed: 0,pool_id,accuracy_golden,accuracy_mv,alpha_krippendorff,uncertanity,time_spent_seconds,unique_submitters_count,spent_budget,avg_submit_assignment_millis,num_bans,...,accuracy_golden_rank,accuracy_mv_rank,alpha_krippendorff_rank,spending_per_task_rank,tasks_per_second_rank,bans_ratio_rank,avg_quality_rank,avg_rank,optimal_quality_rank,main_rank
0,34223548,0.295918,0.774235,0.257751,0.424374,1017,98.0,1.0,92376.0,58,...,3,3,2,2,9,6,2.67,4.9175,3.402,3.402
1,34223549,0.452862,0.850938,0.581601,0.374584,1158,165.0,2.003,134946.0,79,...,8,8,10,1,8,10,8.67,6.9175,7.268667,7.268667
2,34223550,0.319298,0.788596,0.331257,0.395112,547,95.0,1.0,98809.0,57,...,5,4,4,2,10,5,4.33,5.3325,4.331333,4.331333
3,34223551,0.396078,0.813725,0.372302,0.422319,1598,85.0,1.0,102025.0,43,...,7,6,6,2,7,9,6.33,6.0825,5.864667,5.864667
4,34223553,0.255435,0.763587,0.20766,0.460486,1816,92.0,1.0,83584.0,61,...,2,2,1,2,6,2,1.67,2.9175,2.068667,2.068667
5,34223554,0.314234,0.799111,0.427109,0.42726,2249,185.0,2.003,112834.0,113,...,4,5,8,1,5,4,5.67,3.9175,4.468667,4.468667
6,34223555,0.3321,0.822762,0.476627,0.397302,3068,113.0,1.503,103747.0,64,...,6,7,9,1,3,8,7.33,4.8325,5.864667,5.864667
7,34223556,0.248355,0.80318,0.392882,0.398519,2903,76.0,1.003,116097.0,55,...,1,5,7,1,4,1,4.33,2.5825,3.198,3.198
8,34223557,0.251695,0.747279,0.294782,0.460207,3263,56.0,1.003,131404.0,37,...,1,1,3,1,2,3,1.67,1.9175,1.735333,1.735333
9,34223558,0.296667,0.765042,0.343792,0.5294,3495,106.0,2.003,118670.0,62,...,3,3,5,1,1,7,3.67,3.1675,3.402,3.402


And archive all pools created by autoquality.

In [30]:
aq.archive_autoquality_pools()

## Autoquality advanced usage

AutoQuality class provides many ways to customize your optimization algorithm. Let's create another instance with a different settings.

First of all, you can set `n_iter` parameter which determines how many autoquality pools will be created.

In [31]:
aq = AutoQuality(
  toloka_client=toloka_client,
  project_id=project.id,
  base_pool_id=base_pool.id,
  training_pool_id=training_pool.id,
  label_field='result',
  n_iter=5
)

Also you can change the distributions for quality control parameters optimized by autoquality. In this example we will change the distributions for the majority vote rule. AutoQuality will sample new values for every autoquality pool from this distributions.

In [32]:
from scipy import stats
aq.parameter_distributions['MajorityVote'] = dict(
    history_size=[3, 5, 7], 
    incorrect_answers_rate=stats.norm(loc=70, scale=10)
)

Finally, you can customize methods which calculate scores or ranks. Let's modify the ranking function to give preference to a cheaper pools. Do not forget to set your new rank to a `main_rank` column so that AutoQuality knows how to choose the best pool.

In [33]:
from toloka.autoquality.scoring import default_calc_ranks
def my_new_calc_ranks(scores_df: pd.DataFrame) -> pd.DataFrame:
    ranks = default_calc_ranks(scores_df)
    ranks['my_new_rank'] = (
        0.5 * scores_df['spending_per_task_rank']
        + 0.4 * scores_df['avg_quality_rank']
        + 0.05 * scores_df['bans_ratio_rank']
        + 0.05 * scores_df['tasks_per_second_rank']
    )
    ranks['main_rank'] = ranks['my_new_rank']
    return ranks
aq.ranking_func = my_new_calc_ranks

You can create completely new scoring and ranking functions to use AutoQuality the way you need. Just keep the same signature as in the [default methods](https://github.com/Toloka/toloka-kit/blob/main/src/autoquality/scoring.py).

Now let's run our modified AutoQuality instance again

In [34]:
aq.setup_pools()

[INFO] toloka.autoquality.optimizer: Creating pools
[INFO] toloka.client: A new pool with ID "34224967" has been cloned. Link to open in web interface: https://toloka.dev/requester/project/100100/pool/34224967
[INFO] toloka.autoquality.optimizer: {'AssignmentSubmitTime': {'avg_page_seconds': 90, 'history_size': 5, 'too_fast_fraction': 0.23797148002513394}, 'ExamRequirement': {'exam_passing_skill_value': 67.66191223513454}, 'GoldenSet': {'history_size': 5, 'incorrect_answers_rate': 77.59380523413068}, 'MajorityVote': {'history_size': 7, 'incorrect_answers_rate': 61.81437838740685}, 'TrainingRequirement': {'training_passing_skill_value': 42.91563427986361}, 'overlap': 3}
[INFO] toloka.client: A new pool with ID "34224968" has been cloned. Link to open in web interface: https://toloka.dev/requester/project/100100/pool/34224968
[INFO] toloka.autoquality.optimizer: {'AssignmentSubmitTime': {'avg_page_seconds': 90, 'history_size': 5, 'too_fast_fraction': 0.21702013951044616}, 'ExamRequiremen

In [35]:
aq.create_tasks(aq_tasks)

[INFO] toloka.autoquality.optimizer: Creating tasks in pools
[INFO] toloka.autoquality.optimizer: Populated pool 34224967 with 1000 tasks
[INFO] toloka.autoquality.optimizer: Populated pool 34224968 with 1000 tasks
[INFO] toloka.autoquality.optimizer: Populated pool 34224971 with 1000 tasks
[INFO] toloka.autoquality.optimizer: Populated pool 34224972 with 1000 tasks
[INFO] toloka.autoquality.optimizer: Populated pool 34224973 with 1000 tasks
[INFO] toloka.autoquality.optimizer: Setup complete, please verify


In [None]:
aq.run()

In [40]:
aq.best_pool_params

{'AssignmentSubmitTime': {'avg_page_seconds': 90,
  'history_size': 5,
  'too_fast_fraction': 0.2431566430532228},
 'ExamRequirement': {'exam_passing_skill_value': 41.63475243222442},
 'GoldenSet': {'history_size': 5,
  'incorrect_answers_rate': 38.278764737735415},
 'MajorityVote': {'history_size': 7,
  'incorrect_answers_rate': 78.1961475817833},
 'TrainingRequirement': {'training_passing_skill_value': 72.95585692343867},
 'overlap': 2}

In [38]:
aq.ranks

Unnamed: 0,pool_id,accuracy_golden,accuracy_mv,alpha_krippendorff,uncertanity,time_spent_seconds,unique_submitters_count,spent_budget,avg_submit_assignment_millis,num_bans,...,accuracy_mv_rank,alpha_krippendorff_rank,spending_per_task_rank,tasks_per_second_rank,bans_ratio_rank,avg_quality_rank,avg_rank,optimal_quality_rank,main_rank,my_new_rank
0,34224967,0.280797,0.797883,0.352331,0.477875,949,92.0,1.502,99863.0,48,...,2,2,1,5,5,2.0,3.25,2.4,1.8,1.8
1,34224968,0.259922,0.835943,0.476938,0.391633,1051,103.0,1.506,107552.0,64,...,4,5,1,4,3,3.33,2.8325,2.864667,2.182,2.182
2,34224971,0.330405,0.753688,0.16605,0.490057,1395,74.0,1.0,82150.0,45,...,1,1,2,3,4,2.0,2.75,2.333333,2.15,2.15
3,34224972,0.306494,0.815658,0.420929,0.403925,1550,77.0,1.0,117902.0,51,...,3,4,2,2,2,3.33,2.3325,2.798,2.532,2.532
4,34224973,0.310127,0.804282,0.381867,0.414965,1720,79.0,1.003,118541.0,56,...,2,3,1,1,1,2.67,1.4175,2.002,1.668,1.668


In [39]:
aq.archive_autoquality_pools()