# Search relevance

We have a set of search queries and products on a website. We need to determine the extent to which each query is relevant to the corresponding product on the website. We ask performers to look at the search query and the product image from the website and rate the relevance level.

To get acquainted with Toloka tools for free, you can use the promo code **TOLOKAKIT1** on $20 on your [profile page](https://toloka.yandex.com/requester/profile?utm_source=github&utm_medium=site&utm_campaign=tolokakit) after registration.

### Call to action
If you found some bugs or have a new feature idea, don't hesitate to [open a new issue on Github](https://github.com/Toloka/toloka-kit/issues/new/choose).
Like our library and examples? Star [our repo on Github](https://github.com/Toloka/toloka-kit)

Prepare environment and import all we'll need.

In [None]:
%%capture
!pip install toloka-kit==0.1.26
!pip install crowd-kit==1.0.0

import datetime
import sys
import time
import logging
import getpass
import urllib.request

import pandas
import numpy as np

import toloka.client as toloka
import toloka.client.project.template_builder as tb
from crowdkit.aggregation import DawidSkene

In [None]:
logging.basicConfig(
    format='[%(levelname)s] %(name)s: %(message)s',
    level=logging.INFO,
    stream=sys.stdout,
)

Сreate toloka-client instance. All api calls will go through it. More about OAuth token in our [Learn the basics example](https://github.com/Toloka/toloka-kit/tree/main/examples/0.getting_started/0.learn_the_basics) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/0.getting_started/0.learn_the_basics/learn_the_basics.ipynb)

In [None]:
toloka_client = toloka.TolokaClient(getpass.getpass('Enter your OAuth token: '), 'PRODUCTION') # Or switch to 'SANDBOX'
print(toloka_client.get_requester())

## Create a project
Enter a clear project name and description.
> Note: The project name and description will be visible to the performers.

In [None]:
project = toloka.Project(
    public_name='Classify search query relevance',
    public_description='Analyze a website with a product and decide to what extent it meets the search query',
)

Create task interface.
> Read about configuring the [task interface](https://toloka.ai/docs/guide/reference/interface-spec.html?utm_source=github&utm_medium=site&utm_campaign=tolokakit) in the Requester’s Guide.

> Check the [Interface section](https://toloka.ai/knowledgebase/interface?utm_source=github&utm_medium=site&utm_campaign=tolokakit) of our Knowledge Base for more tips on interface design.

This interface contains a query, a picture of a product, and its title, which needs to be assessed. There is a button for checking this query in Google, which is handy because the query might not be obvious and performers will often need to look it up. There is also a plugin that checks if a label was really chosen.

In [None]:
# left column
product_image = tb.ImageViewV1(tb.InputData('imagepath'))
product_description = tb.MarkdownViewV1(tb.InputData('title'), label='Product title:')

# right column
request = tb.AlertViewV1(tb.TextViewV1(tb.InputData('query')), label='Search query', theme='info')
google_link = tb.ActionButtonViewV1(tb.OpenLinkActionV1(tb.InputData('search_url')), label='Search query in Google')
divider = tb.DividerViewV1()
label = tb.RadioGroupFieldV1(
    tb.OutputData('result_class'),
    label='Choose relevance class',
    options=[
        tb.GroupFieldOption('relevant', 'Relevant'),
        tb.GroupFieldOption('relevant_minus', 'Slightly relevant'),
        tb.GroupFieldOption('irrelevant', 'Irrelevant'),
    ],
    validation=tb.RequiredConditionV1()
    )

# create interface with two columns
general_interface = tb.SidebarLayoutV1(
    tb.ListViewV1([product_image, product_description], direction='vertical'),
    tb.ListViewV1([request, google_link, divider, label], direction='vertical'),
    min_width=400,
)

task_width_plugin = tb.TolokaPluginV1(
    layout=tb.TolokaPluginV1.TolokaPluginLayout(
        kind='scroll',
        task_width=600,
    )
)

project_interface = toloka.project.TemplateBuilderViewSpec(
    view=tb.ListViewV1([general_interface]),
    plugins=[task_width_plugin],
)

For performers, our interface will look like this.

<table  align="center">
  <tr><td>
    <img src="./img/performer_interface.png"
         alt="Task page"  width="1000">
  </td></tr>
  <tr><td align="center">
    <b>Figure 1.</b> What the task can looks like.
  </td></tr>
</table>

Specifications are a description of input data that will be used in a project and the output data that will be collected from the performers.

We are using screenshots to make this demo more robust against possible webpage changes. Another way is to use an iframe and let the performers assess the whole webpage.

> Read more about [input and output data specifications](https://yandex.ru/support/toloka-tb/operations/create-specs.html?utm_source=github&utm_medium=site&utm_campaign=tolokakit) in the Requester’s Guide.

In [None]:
input_specification = {
    'imagepath': toloka.project.UrlSpec(),
    'title': toloka.project.StringSpec(),
    'query': toloka.project.StringSpec(),
    'search_url': toloka.project.UrlSpec(),
}
output_specification = {'result_class': toloka.project.StringSpec()}

project.task_spec = toloka.project.task_spec.TaskSpec(
    input_spec=input_specification,
    output_spec=output_specification,
    view_spec=project_interface,
)

Write comprehensive instructions.

Instructions are essential for complex tasks like relevance evaluation that are based on a set of rules and various criteria. Make sure to not only describe the general idea, but also go through examples and explain the evaluation logic in each case. We recommend trying to evaluate around two dozen cases yourself to get more insights for the instructions.

> Get more tips on [designing instructions](https://toloka.ai/knowledgebase/instruction?utm_source=github&utm_medium=site&utm_campaign=tolokakit) in our Knowledge Base.

In [None]:
project.public_instructions = """Your task is to determine whether a product is relevant to the search query and to what degree.<br>
<br>
Imagine that you're searching for this product and get such an answer for your query.<br>
<br>
<b>Basic steps:</b>
<ul><li>Look at the title and the image of the product</li>
<li>Compare it with the query</li>
<li>Choose the most appropriate level of relevance from Relevant, Slightly relevant or Irrelevant&nbsp;.</li></ul>
<br>
<i>If image is too small click on the expand button!</i>
<br>
Relevant:<br>
<ol><li>The product fully matches the query</li></ol>
<br>
Slightly relevant:<br>
<ol><li>The product is somewhat right but some properties are different</li></ol>
<br>
Irrelevant:<br>
<ol><li>There is a completely different product in the image</li>
<li>The title doesn't match the query at all</li></ol>
"""

In [None]:
example_images = [
    {
        'label': 'Relevant',
        'product': 'Bodum Bistro Electric Burr Coffee Grinder-(Brand New)',
        'query': 'coffee grinder',
        'img_url': 'https://tlklab.s3.yandex.net/screenshots/1026.jpg'
    },
    {
        'label': 'Slightly relevant',
        'product': 'The Hobbit: The Desolation of Smaug',
        'query': 'Bluray Hobbit extended',
        'img_url': 'https://tlklab.s3.yandex.net/screenshots/1037.jpg'
    },
    {
        'label': 'Irrelevant',
        'product': 'NEW IKEA RUSCH BATTERY OPERATED WHITE WALL CLOCK',
        'query': 'stop watches',
        'img_url': 'https://tlklab.s3.yandex.net/screenshots/1066.jpg'
    },
]

table_rows = ''.join([
    f'<tr><td>{row["label"]}</td>'
    f'<td>{row["product"]}</td>'
    f'<td>{row["query"]}</td>'
    f'<td><img alt="{row["label"]}" src="{row["img_url"]}" width="200" height="205"></td></tr>\n'
    for row in example_images
])

project.public_instructions = project.public_instructions + f"""
<br>
<b>Examples:</b><br>
<table border="1">
<tr><td>Class</td><td>Product</td><td>Query</td><td>Image</td></tr>
{table_rows}
</table>
"""

Create a project.

In [None]:
project = toloka_client.create_project(project)

## Preparing data
This example uses [eCommerce search relevance](https://data.world/crowdflower/ecommerce-search-relevance) that distributed under Public Domain License [![License: ODbL](https://img.shields.io/badge/License-PDDL-brightgreen.svg)](https://opendatacommons.org/licenses/pddl/)

Let's load this dataset and split it.

In [None]:
!curl https://tlk.s3.yandex.net/ext_dataset/ecommerce_search_relevance.csv --output dataset.csv

dataset = pandas.read_csv('dataset.csv')
dataset = dataset.sample(frac=1).reset_index(drop=True)

with pandas.option_context("max_colwidth", 100):
    display(dataset)

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 27.2M  100 27.2M    0     0  20.1M      0  0:00:01  0:00:01 --:--:-- 20.1M


Unnamed: 0,unit_id,relevance,relevance_variance,product_image,product_link,product_price,product_title,query,rank,source,url,product_description
0,711168135,3.67,0.471,http://edge.shop.com/ccimg.shop.com/op/455930000/455936000/455936012/image__175x175__.jpg,http://www.shop.com/Vivitar+ViviCam+T027+digital+camera+-455936012-o+.xhtml,$59.99,Vivitar ViviCam T027 - digital camera,digital camera,17,Shop.com,http://www.shop.com/search/digital camera?k=60&sort_popular=&t=0,"This flagship model has a high resolution and a large screen, combined with great choice of feat..."
1,713186181,,,http://thumbs2.ebaystatic.com/d/l225/m/m3NcN1Mb7p-g12jEAD12iIA.jpg,http://www.ebay.com/itm/Sergeants-Bansect-Flea-and-Tick-Control-Squeeze-On-Tubes-For-Dogs-Over-3...,$4.75,Sergeants Bansect Flea and Tick Control Squeeze-On Tubes For Dogs Over 33 Lbs,flea and tick control for dogs,36,eBay,http://www.ebay.com/sch/i.html?_from=R40&_trksid=p2050601.m570.l1313.TR11.TRC1.A0.H0.Xplant.TRS0...,<ul>\n\t\t<li>\n\t\t\tEnglish \n\t\t\t\t</li>\n \t<li>\n \t\t \n \t</li>\n \t<li>...
2,711178139,1.00,0.000,http://i.walmartimages.com/i/mp/MP/10/00/52/69/MP10005269453_P255045_180X180.jpg,http://www.walmart.com/ip/Oriental-Furniture-29-Classic-Woven-Top-Barstool-in-Black/20520408,$99.00,Oriental Furniture Classic 29'' Bar Stool,metal lathe,7,walmart,http://www.walmart.com/search/?query=metal%20lathe,
3,711171739,4.00,0.000,http://scene7.targetimg1.com/is/image/Target/16721129?wid=138&hei=138,http://www.target.com/p/graham-graham-men-s-dress-shirt-tie-set-blue-check/-/A-16721129#prodSlot...,$39.99,Graham & Graham Men's Dress Shirt & Tie Set - Blue Check,mens dress shirts,5,Target,http://www.target.com/s?searchTerm=mens%20dress%20shirts,This handsome Men's Dress Shirt & Tie Set will definitely makes things a bit easier on you. Let'...
4,713188658,,,http://ak1.ostkcdn.com/images/products/9800299/P16967916.jpg,http://www.overstock.com/Electronics/INSTEN-TPU-Rubber-Candy-Skin-Transparent-Bumper-Frame-Phone...,$4.49,INSTEN TPU Rubber Candy Skin Transparent Bumper Frame Phone Case Cover For Apple iPhone 4/ 4S,iphone 4 case,43,Overstock,http://www.overstock.com/search?keywords=iphone%204%20case,This is an INSTEN TPU case for Apple iPhone 4/ 4S. This case cover is specifically designed for ...
...,...,...,...,...,...,...,...,...,...,...,...,...
32666,711169109,3.67,0.471,http://edge.shop.com/ccimg.shop.com/250000/253400/253444/products/1273903111__175x175__.jpg,http://www.shop.com/nbts/p1241856375-xinternalsearch-link_off.xhtml,"$44.99,Sale $35.99",Feellib Women's Mesh-yoke Sleeveless Long Dress,long prom dress,21,Shop.com,http://www.shop.com/search/long prom dress?k=60&sort_popular=&t=0,
32667,713194516,,,http://edge.shop.com/ccimg.shop.com/240000/243400/243416/products/1048065033__175x175__.jpg,http://www.shop.com/nbts/p1015735591-xinternalsearch-link_off.xhtml,$4.89,"Insten Leather Case w/ Stand For Apple iPad 2 / 3 / 4, Black (Supports Auto Sleep/Wake)",ipad 2 heavy duty case,54,Shop.com,http://www.shop.com/search/ipad 2 heavy duty case?k=60&sort_popular=&t=0,
32668,711163127,4.00,0.000,http://ak1.ostkcdn.com/images/products/9248966/P16414656.jpg,http://www.overstock.com/Home-Garden/Guy-Fieri-Unleaded-Decaf-Single-Serve-Coffee-K-Cups/9248966...,$30.06 - $76.26,Guy Fieri Unleaded Decaf Single Serve Coffee K-Cups,k cups,15,Overstock,http://www.overstock.com/search?keywords=k%20cups,
32669,711175120,4.00,0.000,http://i5.walmartimages.com/dfw/63fd9f59-33e4/k2-_1255bd77-c218-4f2a-99ce-14731eeaa110.v1.gif,http://www.walmart.com/ip/George-Big-Men-s-Long-Sleeve-Poplin-Dress-Shirt/40038670,$12.00,George Big Men's Long Sleeve Poplin Dress Shirt,mens dress shirts,6,walmart,http://www.walmart.com/search/?query=mens%20dress%20shirts,Get attractive daily wear that has professional flair with the George Men's Long-Sleeve Poplin S...


Because it's old dataset, we need to check images. Let's take 80 rows with valid images:
- 10 - for training
- 10 - for exam
- 10 - for golden-set in the main pool
- 50 - main tasks

In [None]:
rows_cnt = 80
new_dataset = pandas.DataFrame(columns=['relevance', 'product_image', 'product_title', 'query'])
for row in dataset.itertuples():
    try:
        response = urllib.request.urlopen(row.product_image)
        data = response.read()
        if len(data) > 2000:
            new_dataset = new_dataset.append(
                {
                    'relevance': row.relevance,
                    'product_image': row.product_image,
                    'product_title': row.product_title,
                    'query': row.query,
                },
                ignore_index=True
            )
            print(len(new_dataset), row.product_image)
            if len(new_dataset) >= rows_cnt:
                break
    except:
        pass

1 http://scene7.targetimg1.com/is/image/Target/16721129?wid=138&hei=138
2 http://ak1.ostkcdn.com/images/products/9800299/P16967916.jpg
3 http://scene7.targetimg1.com/is/image/Target/11108929?wid=138&hei=138
4 http://ak1.ostkcdn.com/images/products/8595192/Organize-It-All-Cherry-Open-Drawer-Storage-Cube-P15865898.jpg
5 http://ak1.ostkcdn.com/images/products/8597293/P15867585.jpg
6 http://ak1.ostkcdn.com/images/products/8396033/Tubular-6-port-Sunflower-Seed-Feeder-P15697821.jpg
7 http://ak1.ostkcdn.com/images/products/8756258/Coffee-Grinder-Vinyl-Wall-Decal-P15999634.jpg
8 http://ak1.ostkcdn.com/images/products/P13759467.jpg
9 http://ak1.ostkcdn.com/images/products/7860036/7860036/White-Mark-Womens-Ibiza-Yellow-and-Turquoise-Printed-Sleeveless-Dress-P15245439.jpg
10 http://ak1.ostkcdn.com/images/products/9490830/P16671734.jpg
11 http://scene7.targetimg1.com/is/image/Target/16413566?wid=138&hei=138
12 http://ak1.ostkcdn.com/images/products/5900276/75/372/NCAA-Idaho-Vandals-Round-Patio-Set

Split dataset into 4 parts.

In [None]:
dataset_with_answers = new_dataset[~new_dataset['relevance'].isna()].head(30)
main_dataset = new_dataset.drop(dataset_with_answers.index)
training_dataset, exam_dataset, gold_dataset = np.split(dataset_with_answers, [10, 20], axis=0)

print(f'training_dataset - {len(training_dataset)}')
print(f'exam_dataset - {len(exam_dataset)}')
print(f'gold_dataset - {len(gold_dataset)}')
print(f'main_dataset - {len(main_dataset)}')

training_dataset - 10
exam_dataset - 10
gold_dataset - 10
main_dataset - 50


In the dataset relevance is a float, where 1.0 is "irrelevant" and 4.0 is absolutely "relevant". But in our project, we need three string labels. Let's prepare function to convert one to another.

In [None]:
def str_relevance(relevance: float) -> str:
    if relevance > 3:
        return 'relevant'
    if relevance > 2:
        return 'relevant_minus'
    return 'irrelevant'

print(str_relevance(1.0))
print(str_relevance(3.0))
print(str_relevance(3.66))

irrelevant
relevant_minus
relevant


## Create a training pool

Since relevance evaluation is based on rules, not just common sense or certain skills, we recommend investing some time on learning how to explain all the rules. Training needs to involve both common and extreme cases. The comments should explain the underlying logic rather than just state the correct answers.

> A well-grounded training exercise is also a great tool for scaling your task, because you can run it any time you need new performers.

Read more about [selecting performers](https://toloka.ai/knowledgebase/quality-control?utm_source=github&utm_medium=site&utm_campaign=tolokakit) in our Knowledge Base.

Read more about [training pools](https://toloka.ai/docs/guide/concepts/train.html?utm_source=github&utm_medium=site&utm_campaign=tolokakit) in our Requester’s Guide.

In [None]:
training = toloka.Training(
    project_id=project.id,
    private_name='Search relevance training',
    may_contain_adult_content=False,
    assignment_max_duration_seconds=60*10,
    mix_tasks_in_creation_order=True,
    shuffle_tasks_in_task_suite=True,
    training_tasks_in_task_suite_count=2,
    task_suites_required_to_pass=5,
    retry_training_after_days=2,
    inherited_instructions=True,
)
training = toloka_client.create_training(training)

Upload training tasks to the pool, without opening the training pool.

> We recommend opening the training pool along with the main pool. Otherwise Tolokers will spend their time on training but get no access to real tasks, which is frustrating. Also, do not forget to close the training pool when there are no main tasks available anymore.

In [None]:
hint_messages = {
    'irrelevant': 'The product does not fit the request.',
    'relevant_minus': 'The product is similar, but does not fully satisfy the request.',
    'relevant': 'Product is satisfied.',
}

training_tasks = [
    toloka.Task(
        pool_id=training.id,
        input_values={
            'imagepath': row.product_image,
            'title': row.product_title,
            'query': row.query,
            'search_url': f'https://www.google.ru/search?q={row.query}',
        },
        known_solutions = [toloka.task.BaseTask.KnownSolution(output_values={'result_class': str_relevance(row.relevance)})],
        message_on_unknown_solution=hint_messages[str_relevance(row.relevance)],
    )
    for row in training_dataset.itertuples()
]
result = toloka_client.create_tasks(training_tasks, allow_defaults=True)
print(len(result.items))

10


## Create an exam pool
We recommend adding an exam pool along with the training because relevance evaluation projects are usually more complicated than most crowdsourcing projects, and it takes a certain effort to master all the guidelines. The more guidelines there are, the greater will be the need to check if the performers have really learned them.

Set up exam quality calculation via skill.
Create new skill.

In [None]:
exam_skill = next(toloka_client.get_skills(name='Search relevance exam'), None)
if exam_skill:
    print('Skill already exists')
else:
    exam_skill = toloka_client.create_skill(
        name='Search relevance exam',
        hidden=True,
        public_requester_description={'EN': 'How performer deal with search relevance exam'},
    )



Set the price per task suite (for example, $0.03).
> You can use a zero price as well. However, if the exam is time-consuming, a zero price might be unfair, as the performers will spend  a lot of time completing it.

Read more about [pricing principles](https://toloka.ai/knowledgebase/pricing?utm_source=github&utm_medium=site&utm_campaign=tolokakit) in our Knowledge Base.

In [None]:
exam = toloka.Pool(
    project_id=project.id,
    # Give the pool any convenient name. You are the only one who will see it.
    private_name='Classify search query relevance - exam',
    may_contain_adult_content=False,
    # Set the price per task page.
    reward_per_assignment=0.03,
    will_expire=datetime.datetime.utcnow() + datetime.timedelta(days=365),
    # Time allowed for completing a task page
    assignment_max_duration_seconds=600,
    filter=(toloka.filter.Languages.in_('EN')),
)

exam.set_mixer_config(golden_tasks_count=10)

Attach the training pool.

> The quality of the training can be low to just filter out potential deception, because we expect performers to make mistakes and learn from them (yet again, relevance is a complicated task type).

In [None]:
exam.set_training_requirement(training_pool_id=training.id, training_passing_skill_value=10)

We will have 10 tasks in the exam pool, so the quality will be calculated after the whole exam has been passed.
> We will then use this parameter as an entry filter for the main pool.

In [None]:
exam.quality_control.add_action(
    collector=toloka.collectors.GoldenSet(history_size=10),
    conditions=[toloka.conditions.TotalAnswersCount >= 10,],
    action=toloka.actions.SetSkillFromOutputField(
        skill_id=exam_skill.id,
        from_field='correct_answers_rate',
    ),
)

In [None]:
exam = toloka_client.create_pool(exam)

Add tasks to exam.

In [None]:
exam_tasks = [
    toloka.Task(
        pool_id=exam.id,
        input_values={
            'imagepath': row.product_image,
            'title': row.product_title,
            'query': row.query,
            'search_url': f'https://www.google.ru/search?q={row.query}',
        },
        known_solutions = [toloka.task.BaseTask.KnownSolution(output_values={'result_class': str_relevance(row.relevance)})],
        infinite_overlap=True,
    )
    for row in exam_dataset.itertuples()
]
result = toloka_client.create_tasks(exam_tasks, allow_defaults=True)
print(len(result.items))

10


## Create the main pool
A pool is a set of paid tasks grouped into task pages. These tasks are sent out for completion at the same time.

>Note: All tasks within a pool have the same settings (price, quality control, etc.)

Set the price per task suite for 0.03$. Read more about [pricing principles](https://toloka.ai/knowledgebase/pricing?utm_source=github&utm_medium=site&utm_campaign=tolokakit) in our Knowledge Base.

Sets an overlap of 3. This is the number of users who will complete the same task. We will aggregate the results after the pool is completed. To understand [how this rule works](https://toloka.ai/docs/guide/concepts/mvote.html?utm_source=github&utm_medium=site&utm_campaign=tolokakit), go to the Requester’s Guide.

Let's add language filter so performers who speak English will be invited to complete this task.

In [None]:
pool = toloka.Pool(
    project_id=project.id,
    # Give the pool any convenient name. You are the only one who will see it.
    private_name='Classify search query relevance',
    may_contain_adult_content=False,
    # Set the price per task page.
    reward_per_assignment=0.03,
    will_expire=datetime.datetime.utcnow() + datetime.timedelta(days=365),
    # Overlap. This is the number of users who will complete the same task.
    defaults=toloka.Pool.Defaults(default_overlap_for_new_task_suites=3),
    # Time allowed for completing a task page
    assignment_max_duration_seconds=600,
    filter=(
        (toloka.filter.Languages.in_('EN')) &
        (toloka.filter.Skill(exam_skill.id) >= 90)
    )
)

Set up [Quality control](https://toloka.ai/docs/guide/concepts/control.html?utm_source=github&utm_medium=site&utm_campaign=tolokakit):
  - Set the number of responses and the percentage of correct responses. We will record a quality parameter in the same skill we used in the quality filter.
  - Set up the [Fast responses](https://toloka.ai/docs/guide/concepts/quick-answers.html?utm_source=github&utm_medium=site&utm_campaign=tolokakit) rule. This rule allows you to ban performers who submit tasks at a suspiciously high speed.

Read more about [quality control principles](https://toloka.ai/knowledgebase/quality-control?utm_source=github&utm_medium=site&utm_campaign=tolokakit) in our Knowledge Base or check out [control tasks settings](https://toloka.ai/docs/guide/concepts/goldenset.html?utm_source=github&utm_medium=site&utm_campaign=tolokakit) in the Requester’s Guide.

In [None]:
pool.quality_control.add_action(
    collector=toloka.collectors.GoldenSet(history_size=20),
    conditions=[toloka.conditions.TotalAnswersCount >= 1,],
    action=toloka.actions.SetSkillFromOutputField(
        skill_id=exam_skill.id,
        from_field='correct_answers_rate',
    ),
)

pool.quality_control.add_action(
    collector=toloka.collectors.AssignmentSubmitTime(history_size=5, fast_submit_threshold_seconds=10),
    conditions=[toloka.conditions.FastSubmittedCount >= 1],
    action=toloka.actions.RestrictionV2(
        scope=toloka.user_restriction.UserRestriction.PROJECT,
        duration=10,
        duration_unit='DAYS',
        private_comment='Fast responses',
    )
)

Specify	the number of tasks per page. We recommend putting as many tasks on one page as a performer can complete in 1 to 5 minutes. That way, performers are less likely to get tired, and they won’t lose a significant amount of data if a technical issue occurs.

To learn more about [grouping tasks](https://toloka.ai/docs/search/?utm_source=github&utm_medium=site&utm_campaign=tolokakit&query=smart+mixing) into suites, read the Requester’s Guide.

In [None]:
pool.set_mixer_config(
    real_tasks_count=4,
    golden_tasks_count=1,
)

Create pool

In [None]:
pool = toloka_client.create_pool(pool)

## Preparing and uploading tasks

We recommend putting as many tasks on one page as a performer can complete in 1 to 5 minutes. That way, performers are less likely to get tired, and they won’t lose a significant amount of data if a technical issue occurs.

To learn more about [grouping tasks](https://toloka.ai/docs/search/?utm_source=github&utm_medium=site&utm_campaign=tolokakit&query=smart+mixing) into suites, read the Requester’s Guide.

In [None]:
golden_tasks = [
    toloka.Task(
        pool_id=pool.id,
        input_values={
            'imagepath': row.product_image,
            'title': row.product_title,
            'query': row.query,
            'search_url': f'https://www.google.ru/search?q={row.query}',
        },
        known_solutions = [toloka.task.BaseTask.KnownSolution(output_values={'result_class': str_relevance(row.relevance)})],
        infinite_overlap=True,
    )
    for row in gold_dataset.itertuples()
]

tasks = [
    toloka.Task(
        pool_id=pool.id,
        input_values={
            'imagepath': row.product_image,
            'title': row.product_title,
            'query': row.query,
            'search_url': f'https://www.google.ru/search?q={row.query}',
        },
    )
    for row in main_dataset.itertuples()
]
created_tasks = toloka_client.create_tasks(golden_tasks + tasks, allow_defaults=True)
print(len(created_tasks.items))

44


You can visit web interface and preview task suites.
<table  align="center">
  <tr><td>
    <img src="./img/task_suite_interface.png"
         alt="How performers will see your tasks"  height="600">
  </td></tr>
  <tr><td align="center">
    <b>Figure 2.</b> How performers will see your tasks
  </td></tr>
</table>

Start the pools.

**Important.** Remember that real Toloka performers will complete the tasks.
Double check that everything is correct
with your project configuration before you start the pool

In [None]:
training = toloka_client.open_training(training.id)
print(f'training - {training.status}')

exam = toloka_client.open_pool(exam.id)
print(f'exam - {exam.status}')

pool = toloka_client.open_pool(pool.id)
print(f'main pool - {pool.status}')

training - Status.OPEN
exam - Status.OPEN
main pool - Status.OPEN


## Receiving responses

Wait until the pool is completed.

In [None]:
pool_id = pool.id

def wait_pool_for_close(pool_id, minutes_to_wait=1):
    sleep_time = 60 * minutes_to_wait
    pool = toloka_client.get_pool(pool_id)
    while not pool.is_closed():
        op = toloka_client.get_analytics([toloka.analytics_request.CompletionPercentagePoolAnalytics(subject_id=pool.id)])
        op = toloka_client.wait_operation(op)
        percentage = op.details['value'][0]['result']['value']
        print(
            f'   {datetime.datetime.now().strftime("%H:%M:%S")}\t'
            f'Pool {pool.id} - {percentage}%'
        )
        time.sleep(sleep_time)
        pool = toloka_client.get_pool(pool.id)
    print('Pool was closed.')

wait_pool_for_close(pool_id)

exam = toloka_client.close_pool(exam.id)
print(f'exam - {exam.status}')

training = toloka_client.close_training(training.id)
print(f'training - {training.status}')

Get responses

When all the tasks are completed, look at the responses from performers.

In [None]:
answers_df = toloka_client.get_assignments_df(pool.id, field=['ASSIGNMENT:task_id', 'ASSIGNMENT:worker_id'])

answers_df = answers_df[answers_df['GOLDEN:result_class'].isna()]

answers_df = answers_df.rename(columns={
    'ASSIGNMENT:task_id': 'task',
    'OUTPUT:result_class': 'label',
    'ASSIGNMENT:worker_id': 'worker',
    'INPUT:query': 'query',
    'INPUT:imagepath': 'imagepath',
    'INPUT:title': 'title',
})

answers_to_aggregate = answers_df[['task', 'label', 'worker']]

with pandas.option_context("max_colwidth", None):
    display(answers_df)



Unnamed: 0,query,title,imagepath,INPUT:search_url,label,GOLDEN:result_class,HINT:text,HINT:default_language,task,performer
0,white jeans,Hailey Jeans Co. Junior's Printed Fold-over Maxi Skirt,http://ak1.ostkcdn.com/images/products/9046426/Hailey-Jeans-Co.-Juniors-Printed-Fold-over-Maxi-Skirt-P16243513.jpg,https://www.google.ru/search?q=white jeans,irrelevant,,,,0001af5ed9--6154972ec2aa9f5eaeb3c7d2,e9fdffeffb5e12b0072362bea5003a61
1,lego star wars,The Empire Strikes Back (Paperback),http://ak1.ostkcdn.com/images/products/8462050/The-Empire-Strikes-Back-Paperback-P15753970.jpg,https://www.google.ru/search?q=lego star wars,relevant_minus,,,,0001af5ed9--61549730c2aa9f5eaeb3c81c,e9fdffeffb5e12b0072362bea5003a61
2,storage drawers,Seville Classics Heavy Duty Chrome File Cart With Storage Drawers,http://ak1.ostkcdn.com/images/products/9140420/Seville-Classics-Heavy-Duty-Chrome-File-Cart-With-Storage-Drawers-P16321869.jpg,https://www.google.ru/search?q=storage drawers,relevant_minus,,,,0001af5ed9--6154972fc2aa9f5eaeb3c7f3,e9fdffeffb5e12b0072362bea5003a61
4,dean guitar,Elton Dean - Just US,http://ak1.ostkcdn.com/images/products/262449//bmmg/ent/Elton-Dean-Just-US-P045775010328.JPG,https://www.google.ru/search?q=dean guitar,irrelevant,,,,0001af5ed9--61549730c2aa9f5eaeb3c825,e9fdffeffb5e12b0072362bea5003a61
5,gold dress,SensatioNail Gel Polish,http://scene7.targetimg1.com/is/image/Target/14766038?wid=138&hei=138,https://www.google.ru/search?q=gold dress,irrelevant,,,,0001af5ed9--6154972fc2aa9f5eaeb3c809,e9fdffeffb5e12b0072362bea5003a61
...,...,...,...,...,...,...,...,...,...,...
182,Cocoa Butter,Karess Krafters Natural Exfoliant Sugar Body Polish,http://ak1.ostkcdn.com/images/products/9776734/P16946758.jpg,https://www.google.ru/search?q=Cocoa Butter,irrelevant,,,,0001af5ed9--6154972fc2aa9f5eaeb3c7f5,6295ea32db0184a0987a0c6271f1157f
183,golf clubs,Nextt Golf T2 Platinum 3 Hybrid,http://ak1.ostkcdn.com/images/products/10041317/P17185933.jpg,https://www.google.ru/search?q=golf clubs,relevant,,,,0001af5ed9--61549730c2aa9f5eaeb3c82f,78ba124f9b1d24d127420df882141936
184,Cocoa Butter,Karess Krafters Natural Exfoliant Sugar Body Polish,http://ak1.ostkcdn.com/images/products/9776734/P16946758.jpg,https://www.google.ru/search?q=Cocoa Butter,irrelevant,,,,0001af5ed9--6154972fc2aa9f5eaeb3c7f5,78ba124f9b1d24d127420df882141936
186,golf clubs,Nextt Golf T2 Platinum 3 Hybrid,http://ak1.ostkcdn.com/images/products/10041317/P17185933.jpg,https://www.google.ru/search?q=golf clubs,relevant,,,,0001af5ed9--61549730c2aa9f5eaeb3c82f,8c623f044db5d2702e2df54826bca331


Aggregation results using the Dawid-Skene model. We use this aggregation model because our questions are of comparable difficulty, and we don't have many control tasks.

Read more about the [Dawid-Skene model](https://toloka.ai/docs/guide/concepts/result-aggregation.html?utm_source=github&utm_medium=site&utm_campaign=tolokakit#aggr__dawid-skene) in the Requester’s Guide or get at an overview of different [aggregation models](https://toloka.ai/knowledgebase/aggregation) our Knowledge Base.

More aggregation models in [Crowd-Kit](https://github.com/Toloka/crowd-kit#crowd-kit-computational-quality-control-for-crowdsourcing).

In [None]:
predicted_answers = DawidSkene(n_iter=20).fit_predict(answers_to_aggregate).reset_index(name='result')

predicted_answers = pandas.merge(predicted_answers, answers_df.drop_duplicates(subset='task'), on='task')

with pandas.option_context("max_colwidth", None):
    display(predicted_answers[['query', 'imagepath', 'title', 'result']])

Unnamed: 0,query,imagepath,title,result
0,white jeans,http://ak1.ostkcdn.com/images/products/9046426/Hailey-Jeans-Co.-Juniors-Printed-Fold-over-Maxi-Skirt-P16243513.jpg,Hailey Jeans Co. Junior's Printed Fold-over Maxi Skirt,irrelevant
1,lego star wars,http://ak1.ostkcdn.com/images/products/8462050/The-Empire-Strikes-Back-Paperback-P15753970.jpg,The Empire Strikes Back (Paperback),relevant_minus
2,storage drawers,http://ak1.ostkcdn.com/images/products/9140420/Seville-Classics-Heavy-Duty-Chrome-File-Cart-With-Storage-Drawers-P16321869.jpg,Seville Classics Heavy Duty Chrome File Cart With Storage Drawers,relevant_minus
3,dean guitar,http://ak1.ostkcdn.com/images/products/262449//bmmg/ent/Elton-Dean-Just-US-P045775010328.JPG,Elton Dean - Just US,irrelevant
4,gold dress,http://scene7.targetimg1.com/is/image/Target/14766038?wid=138&hei=138,SensatioNail Gel Polish,irrelevant
5,k cups,http://ak1.ostkcdn.com/images/products/6852833/P14378268.jpg,Green Mountain Coffee Caramel Vanilla Cream 48 K-Cups for Keurig Brewers,irrelevant
6,bike lock,http://ak1.ostkcdn.com/images/products/9539970/P16717998.jpg,Advantage SportsRack Dual Lock Cable and Hitch Lock Assembly,relevant
7,projector,http://ak1.ostkcdn.com/images/products/4709067/P12622936.jpg,Premium Power Products Lamp for NEC Front Projector,relevant
8,toy trucks,http://ak1.ostkcdn.com/images/products/9183302/P16357747.jpg,New Bright Remote Control Full Function Dodge Mopar Ram Truck,relevant
9,ipad 2 heavy duty case,http://scene7.targetimg1.com/is/image/Target/14057344?wid=138&hei=138,iSound Sesame Street Cookie Monster Plush Portfolio for iPad 2 - Blue (ISOUND-4611),relevant_minus
