# Get your medium level

Welcome to the medium level Toloka-Kit tutorial! In this notebook, we will explore some of the powerful features and capabilities that Toloka-Kit has to offer. This tutorial is designed for users who already have some experience with Toloka-Kit and are looking to expand their knowledge and implement more complex workflows in their crowdsourcing projects.

We will cover topics such as advanced quality control, custom interface design and using various collectors, conditions, and actions to create more sophisticated task pools.

## Libraries importing

First of all, let's install and import necessary libraries

In [None]:
import datetime
import getpass
import time

import pandas as pd
import toloka.client as toloka
import toloka.client.project.template_builder as tb

## Authorization in Toloka API

To interact with the Toloka API, you need to provide an API key for authentication.

You can obtain your API key from the Toloka website https://platform.toloka.ai/requester/profile/integration.

Make sure to keep your API key secret and not share it with anyone.

In [None]:
toloka_client = toloka.TolokaClient(getpass.getpass('Enter your OAuth token: '), 'PRODUCTION') # Or switch to 'SANDBOX'
# Lines below check that the OAuth token is correct and print your account's name
print(toloka_client.get_requester())

# Creating new project

Here we create an instance of the Toloka project with a specified public name and description.

The public name and description will be visible to the performers who work on your tasks.

Make sure to provide a clear and concise name and description that accurately represent the purpose and requirements of your crowdsourcing project.

In [None]:
project = toloka.Project(
    public_name='Your project name',
    public_description='Your project description',
)

## Input and output data

Define the input and output data types for our project.

The `input_specification` is a dictionary that describes the format of the input data (in this case, a URL pointing to an image).

The `output_specification` is a dictionary that describes the format of the output data (in this case, a string that will contain the result provided by the performer).

In [None]:
input_specification = {'image': toloka.project.UrlSpec()}
output_specification = {'result': toloka.project.StringSpec()}

## Task interface

We will configure the task interface that performers will interact with while working on the tasks.
A well-designed interface is essential for obtaining high-quality results. We will create different input and output fields
to capture various types of user responses, and use Toloka's Template Builder to customize the interface appearance and layout.

In this section, we create an instance of the Interface Builder. The Interface Builder allows you to easily create and customize various UI elements, such as text fields, image displays, and buttons, to ensure that your tasks are user-friendly and accessible to the performers.

In [None]:
project_interface = tb.InterfaceBuilder()

Here we create a text field UI element using the TextFieldV1 class from the Interface Builder.
The `output_name` parameter specifies the key that will be used to store the performer's input in the output data.
The `label` parameter sets the label that will be displayed next to the text field, providing instructions for the performer.
After creating the text field, we add it to the project interface using the `add_element` method.

In [None]:
text_field = tb.TextFieldV1(output_name='text_result', label='Enter your answer here:')
project_interface.view.add_element(text_field)

Next, we create a radio group UI element using the RadioGroupFieldV1 class from the Interface Builder.
The `data_output_path` parameter specifies the key that will be used to store the performer's selected option in the output data.
The `label` parameter sets the label that will be displayed.
The `options` parameter is a list of GroupFieldOption instances, each representing a selectable option.
After creating the radio group field, we add it to the project interface using the `add_element` method (not shown in this snippet).

In [None]:
radio_group = tb.RadioGroupFieldV1(
    data_output_path='radio_result',
    label='Select an option:',
    options=[
        tb.GroupFieldOption(label='Option 1', value='option_1'),
        tb.GroupFieldOption(label='Option 2', value='option_2'),
    ],
)

Creating a checkbox group UI element using the CheckboxGroupFieldV1 class from the Interface Builder.
The `data_output_path` parameter specifies the key that will be used to store the performer's selected options in the output data.
The `label` parameter sets the label that will be displayed.
The `options` parameter is a list of GroupFieldOption instances, each representing a selectable option in the checkbox group.
After creating the checkbox group field, we add it to the project interface using the `add_element` method.

In [None]:
checkbox_group = tb.CheckboxGroupFieldV1(
    data_output_path='checkbox_result',
    label='Select all that apply:',
    options=[
        tb.GroupFieldOption(label='Option 1', value='option_1'),
        tb.GroupFieldOption(label='Option 2', value='option_2'),
    ],
)
project_interface.view.add_element(checkbox_group)

Creating a dropdown UI element using the DropdownFieldV1 class from the Interface Builder.
The `data_output_path` parameter specifies the key that will be used to store the performer's selected option in the output data.
The `label` parameter sets the label that will be displayed.
The `options` parameter is a list of GroupFieldOption instances, each representing a selectable option in the dropdown menu.
After creating the dropdown field, we add it to the project interface using the `add_element` method.

In [None]:
dropdown = tb.DropdownFieldV1(
    data_output_path='dropdown_result',
    label='Choose an option:',
    options=[
        tb.GroupFieldOption(label='Option 1', value='option_1'),
        tb.GroupFieldOption(label='Option 2', value='option_2'),
    ],
)
project_interface.view.add_element(dropdown)

In this section, we create a TemplateBuilderTaskSpec object to define the task specification for the project.

The `interface` parameter is set to the project_interface, which we converted to a dictionary using the `to_dict` method.
The `task_defaults` parameter is used to set default values for tasks, such as the input and output data formats.
The `output_schemas` parameter is a list of OutputFieldSchema instances, each defining the output data format for a specific UI element.

The OutputFieldSchema has three parameters: `name` is the key used in the output data, `field` specifies the output data type, and `required` indicates whether the performer must provide a value for this field.

In this example, we have created output schemas for the text field, radio group, checkbox group, and dropdown field in the project interface.

In [None]:
project.task_spec = toloka.project.TemplateBuilderTaskSpec(
    interface=project_interface.to_dict(),
    task_defaults=toloka.project.ProjectTaskDefaults(),
    output_schemas=[
        toloka.project.OutputFieldSchema(
            name='text_result',
            field=tb.OutputFieldType.STRING,
            required=True,
        ),
        toloka.project.OutputFieldSchema(
            name='radio_result',
            field=tb.OutputFieldType.STRING,
            required=True,
        ),
        toloka.project.OutputFieldSchema(
            name='checkbox_result',
            field=tb.OutputFieldType.ARRAY,
            required=True,
        ),
        toloka.project.OutputFieldSchema(
            name='dropdown_result',
            field=tb.OutputFieldType.STRING,
            required=True,
        ),
    ],
)

Creating a TolokaPluginV1 object with the scroll plugin to set the task width in the project interface.
The `task_width` parameter is set to 400 pixels, but you can adjust it to fit the content of your tasks.

In [None]:
task_width_plugin = tb.TolokaPluginV1(
    'scroll',
    task_width=400,
)

Here, we configure how the task will be presented to the performers.
We create a `TemplateBuilderViewSpec` object that combines the dropdown and radio group fields in a ListViewV1.
We also include the `task_width_plugin` we created earlier to control the task width.
This configuration ensures a visually appealing and organized task layout for the performers in the Toloka interface.

In [None]:
project_interface = toloka.project.TemplateBuilderViewSpec(
    view=tb.ListViewV1([dropdown, radio_group]),
    plugins=[task_width_plugin],
)

Next, we assign the task interface and input/output data specifications to the project.

In [None]:
project.task_spec = toloka.project.task_spec.TaskSpec(
    input_spec=input_specification,
    output_spec=output_specification,
    view_spec=project_interface,
)

Provide a text with instructions that will be visible to the performers before they start working on the tasks.
Clear and comprehensive instructions are crucial for obtaining high-quality results from the performers.

In [None]:
project.public_instructions = 'Your text with instruction'

Finally, we submit the project configuration to the Toloka platform by calling the `create_project` method.
It creates a new project on the platform with the specified settings and returns the created project object.

In [None]:
project = toloka_client.create_project(project)

# Traning

We should create a training pool for the project. Training pools help educate performers on how to complete the tasks correctly.
By setting up a training pool, you ensure that performers understand the instructions and requirements before they start working on the main tasks.

In [None]:
training = toloka.Training(
    project_id=project.id,
    private_name='Training for your project',
    may_contain_adult_content=False,
    inherited_instructions=True,
    assignment_max_duration_seconds=60 * 20,
)

Creating an interface builder specifically for the training tasks.
The training interface can be similar or identical to the main task interface, depending on the project's requirements.

In [None]:
training_interface = tb.InterfaceBuilder()

Next, we configure the components of the training interface.
Here, we use a text field similar to the main task interface.
Then, we define the task specification for the training project with the appropriate input and output data schemas.

In [None]:
training_text_field = tb.TextFieldV1(output_name='result', label='Enter your answer here:')
training_interface.view.add_element(training_text_field)

training.task_spec = toloka.project.TemplateBuilderTaskSpec(
    interface=training_interface.to_dict(),
    task_defaults=toloka.project.ProjectTaskDefaults(),
    output_schemas=[
        toloka.project.OutputFieldSchema(
            name='result',
            field=tb.OutputFieldType.STRING,
            required=True,
        ),
    ],
)

Finally, we create the training project using the Toloka API client and the configured training object.

In [None]:
training = toloka_client.create_training(training)

# Pool

Here, we will create a pool for our project. A pool is a group of tasks with similar characteristics and settings, assigned to performers with specific skills or attributes. We will configure the pool settings, such as the reward per assignment, auto-acceptance of solutions, assignment duration, and dynamic pricing, to ensure efficient task distribution and management.

In [None]:
pool = toloka.Pool(
    # The project ID to which the pool is related
    project_id=project.id,
    # A private name for the pool, visible only to you
    private_name='Your pool name',
    # Whether the pool may contain adult content
    may_contain_adult_content=False,
    # The pool's expiration date
    will_expire=datetime.datetime.utcnow() + datetime.timedelta(days=365),
    # The reward per assignment for performers
    reward_per_assignment=0.01,
    # Whether to automatically accept solutions after the specified period
    auto_accept_solutions=True,
    # The number of days before auto-accepting solutions
    auto_accept_period_day=7,
    # Maximum duration for a performer to complete an assignment
    assignment_max_duration_seconds=60 * 20,
    dynamic_pricing_config=toloka.pool.DynamicPricingConfig(
        # Whether dynamic pricing is enabled
        enabled=True,
        # The multiplier for dynamic pricing
        multiplier=1.5,
    ),
)


Setting the default overlap for tasks in the pool. The overlap value determines the number of different performers who will complete the same task. In this case, each task will be completed by 3 different performers, allowing for more reliable results through aggregation.

In [None]:
pool.defaults = toloka.client.pool.Pool.Defaults (
    overlap=3,
)

In this section, we configure the Task Suite mixer for the pool. The mixer determines the composition of tasks within a Task Suite for performers. In this example, we set the following configuration:

- real_tasks_count: The number of actual tasks in a Task Suite, which is set to 5.
- golden_tasks_count: The number of golden tasks (tasks with known correct answers) in a Task Suite, which is set to 1. Golden tasks help monitor and evaluate the quality of performer's work.
- training_tasks_count: The number of training tasks in a Task Suite, which is set to 0. Training tasks are used to teach performers how to complete tasks correctly before they start working on actual tasks.

In [None]:
pool.set_mixer_config(real_tasks_count=5, golden_tasks_count=1, training_tasks_count=0)

## Filters

Here, we will apply various filters to the pool to ensure that only performers with certain characteristics can work on the tasks. We will combine different filters such as language, custom skill, number of completed tasks, average task completion rate, region based on the performer's IP address, and the type of client used by the performer. By using these filters, we can better control the quality of the work and attract performers who meet our requirements.

In [None]:
pool.filter = (
    # Language filter
    (toloka.filter.Languages.in_('EN')) &
    # Filter by custom skill
    (toloka.filter.Skill('your_custom_skill_id') > 0.8) &
     # Filter by the number of completed tasks
    (toloka.filter.AssignmentsAcceptedCount.gt(100)) &
    # Filter by average task completion rate
    (toloka.filter.AssignmentSubmitTimeRate.lt(60 * 2)) &
    # Filter by performer's region using his location
    (toloka.filter.RegionByIp.in_('US', 'CA')) &
    # Combination of filters using the OR condition due to the gadget type
    (
        toloka.filter.ClientType.eq(toloka.filter.ClientType.ClientType.BROWSER) |
        toloka.filter.ClientType.eq(toloka.filter.ClientType.ClientType.TOLOKA_APP)
    )
)

Creating a custom skill which can be used to track and rate the performers based on their performance in our tasks. This skill will be hidden from the performers, and it won't be accessible through public requests. The custom skill can be used as a filter in the pool to select performers with specific skill levels.

In [None]:
custom_skill = toloka_client.create_skill(
    toloka.Skill(
        name='custom_skill',
        hidden=True,
        public_request=False,
    )
)

Here are some more functionalities for skills' manipulating

To get a list of all skills in your account, you can use the `get_skills` method

In [None]:
all_skills = toloka_client.TolokaClient.get_skills()
for skill in all_skills:
    print(f'Skill ID: {skill.id}, Name: {skill.name}')

If you need to delete a skill, you can use the `delete_skill` method

In [None]:
toloka_client.TolokaClient.delete_skill(custom_skill.id)
print(f'Skill with ID {custom_skill.id} has been deleted.')

You can set a skill value for a specific performer using the `set_performer_skill` method

In [None]:
performer_id = 'your_performer_id'
updated_skill_value = 0.9

toloka_client.set_performer_skill(
    user_id=performer_id,
    skill_id=custom_skill.id,
    value=updated_skill_value
)

You can update the properties of a skill using the `update_skill` method

In [None]:
updated_skill_name = 'Updated Custom Skill'
updated_skill_description = 'An updated description for the custom skill'

updated_skill = toloka.Skill(
    id=custom_skill.id,
    name=updated_skill_name,
    description=updated_skill_description
)

updated_skill = toloka.client.TolokaClient.update_skill(updated_skill)

## Tasks in pool

Uploading your data, it could be any format (csv, json, tsv, sql) but upload it using pd

In [None]:
dataset = pd.read_csv('Your dataset')

Next, we are preparing and uploading tasks to the pool:

- tasks: Creates a list of tasks from the dataset. The toloka.Task object contains various parameters that can be set to customize the tasks as required.
- toloka_client.create_tasks: Adds the tasks to the pool using the Toloka API client. The allow_defaults parameter is set to True, which allows default values for task parameters.

In [None]:
# Forming tasks from data
tasks = [
    toloka.Task(
    self, *,
    input_values: Optional[Dict[str, Any]] = None,
    known_solutions: Optional[List[BaseTask.KnownSolution]] = None,
    message_on_unknown_solution: Optional[str] = None,
    id: Optional[str] = None,
    infinite_overlap=None,
    overlap=None,
    pool_id: Optional[str] = None,
    remaining_overlap: Optional[int] = None,
    reserved_for: Optional[List[str]] = None,
    unavailable_for: Optional[List[str]] = None,
    origin_task_id: Optional[str] = None,
    created: Optional[datetime] = None,
    baseline_solutions: Optional[List[BaselineSolution]] = None
)
]

# Add tasks to a pool
toloka_client.create_tasks(tasks, allow_defaults=True)

For example, we are downloading a dataset of cats and dogs images, loading it into a pandas dataframe, shuffling the dataset, and then creating tasks from the image URLs in the dataset. Finally, we are adding these tasks to the pool.

In [None]:
!curl https://tlk.s3.yandex.net/dataset/cats_vs_dogs/toy_dataset.tsv --output dataset.tsv

dataset = pd.read_csv('dataset.tsv', sep='\t')
print(f'Dataset contains {len(dataset)} rows\n')
dataset = dataset.sample(frac=1).reset_index(drop=True)

tasks = [
    toloka.Task(input_values={'image': url}, pool_id=pool.id)
    for url in dataset['url']
]
# Add tasks to a pool
toloka_client.create_tasks(tasks, allow_defaults=True)
print(f'Populated pool with {len(tasks)} tasks')
print(f'To view this pool, go to https://toloka.dev/requester/project/{project.id}/pool/{pool.id}')

`TaskSuites` are a grouping of several tasks that are meant to be completed together by a single performer. This can be useful for a variety of reasons, such as maintaining context or consistency across tasks, or simply optimizing the workflow for performers.

In contrast, individual Tasks represent a single unit of work that can be completed independently by a performer. These tasks are typically used for projects where context and consistency are not crucial or when tasks are unrelated to each other.

In [None]:
# Create a list of Task Suites
task_suites = toloka.client.task_suite.TaskSuite(
    self,
    *,
    infinite_overlap=None,
    overlap=None,
    pool_id: Optional[str] = None,
    tasks: Optional[List[BaseTask]] = ...,
    reserved_for: Optional[List[str]] = None,
    unavailable_for: Optional[List[str]] = None,
    issuing_order_override: Optional[float] = None,
    mixed: Optional[bool] = None,
    traits_all_of: Optional[List[str]] = None,
    traits_any_of: Optional[List[str]] = None,
    traits_none_of_any: Optional[List[str]] = None,
    longitude: Optional[float] = None,
    latitude: Optional[float] = None,
    id: Optional[str] = None,
    remaining_overlap: Optional[int] = None,
    automerged: Optional[bool] = None,
    created: Optional[datetime] = None
)

In [None]:
# Add the Task Suites to the pool
toloka_client.create_task_suites(task_suites)

## Quality control

Quality control in a Toloka task pool consists of three main components: collectors, conditions, and actions. They work together to monitor the performance of workers and take measures when quality issues are detected.
- Collectors: These collect information about the worker's performance while they complete tasks. For example, collectors may gather information about the task completion time, correctness scores, etc.

- Conditions: Conditions determine under which circumstances actions will be applied to a worker. Conditions can check information gathered by collectors, or other properties of a worker's performance, such as the number of tasks completed or their current rating.

- Actions: These are measures that are applied to a worker if the conditions are met. For example, an action can involve restricting a worker's access to the project or task pool for a certain period, adjusting the worker's rating, or notifying the administrator.

In this section, we're setting a quality control requirement based on the performer's training. 

- *pool.quality_control.training_requirement*: This attribute of the pool object sets the training requirement for the quality control.

- *toloka.pool.QualityControl.TrainingRequirement*: This class defines the training requirement configuration.

1) *training_pool_id*: Replace <your_training_pool_id> with the ID of the training pool you want the performers to pass before they can work on the main pool tasks.

2) *training_passing_score*: This is the minimum score (in percentage) that performers must achieve in the training pool to be allowed to work on the main pool tasks. In this example, the passing score is set to 80.

By setting this training requirement, you ensure that only performers who have successfully passed the training with a score of at least 80% can access and work on the tasks in the main pool.

In [None]:
pool.quality_control.training_requirement = toloka.pool.QualityControl.TrainingRequirement(
    training_pool_id='<your_training_pool_id>',
    training_passing_score=80,)

In this section, we're adding a quality control action to the pool based on the assignment submit time.

- pool.quality_control.add_action: This method adds a quality control action to the pool.

- collector=toloka.collectors.AssignmentSubmitTime(history_size=5): This collector gathers information on the average submit time for the last 5 assignments completed by a performer.

- conditions=[toloka.conditions.AssignmentSubmitTime(minutes_ago=15)]: This condition checks if the average submit time for the last 5 assignments was less than 15 minutes ago. If the condition is met, the specified action will be triggered.

- action=toloka.actions.RestrictionV2(...): This action applies a restriction to the performer who meets the condition. The restriction parameters are as follows:

  1) scope='PROJECT': The restriction applies to the entire project.

  2) duration=3: The restriction will last for 3 units of time.

  3) duration_unit='DAYS': The unit of time for the restriction duration is days. So, the restriction will last for 3 days.

  4) reason_code='SUBMIT_TIME_TOO_FAST': This is a custom code to indicate the reason for the restriction.

By adding this quality control action, you restrict access to the project for performers who complete tasks too quickly (average submit time for the last 5 assignments is less than 15 minutes). This can help to prevent low-quality submissions from performers who rush through tasks.

In [None]:
pool.quality_control.add_action(
    collector=toloka.collectors.AssignmentSubmitTime(history_size=5),
    conditions=[
                toloka.conditions.AssignmentSubmitTime(minutes_ago=15),
    ],
    action=toloka.actions.RestrictionV2(
        scope='PROJECT',
        duration=3,
        duration_unit='DAYS',
        reason_code='SUBMIT_TIME_TOO_FAST',
    ),
)

Here, we're adding another quality control action to the pool based on the performer's skill.

By adding this quality control action, you restrict access to the project for performers who have a custom skill score lower than 0.8. This helps ensure that only high-skilled performers can participate in the project. Note that the collector should be updated or removed to reflect the condition based on the custom skill, as the current collector is not relevant.

In [None]:
pool.quality_control.add_action(
    collector=toloka.collectors.AssignmentSubmitTime(history_size=5),
    conditions=[
        toloka.conditions.Skill(custom_skill.id, '>=', 0.8),
    ],
    action=toloka.actions.RestrictionV2(
        scope='PROJECT',
        duration=3,
        duration_unit='DAYS',
        reason_code='LOW_SKILL',
    ),
)

## Final settings and pool creation

Next, we are setting a limit on the number of assignments a single performer can complete in the pool

In [None]:
pool.set_limit(toloka.limits.AssignmentsPerUserCountLimit(max_count=50))

Adding a webhook for event notifications in the pool

In [None]:
webhook = toloka.Webhook(
    url='https://your-webhook-url.com',
    events=[
        toloka.Webhook.Event.ASSIGNMENT_SUBMITTED,
        toloka.Webhook.Event.ASSIGNMENT_APPROVED,
    ],
)
webhook = toloka_client.create_webhook(webhook)

pool.webhooks.add(webhook.id)

Creating the pool and open it for performers to start working on the tasks. 

In [None]:
pool = toloka_client.create_pool(pool)
toloka_client.open_pool(pool.id)

## Semi-Manual Acceptance and Rejection of Assignments

In some cases, you might want to review the assignments submitted by performers manually and decide whether to accept or reject them.

The usuall pipeline is the next:
- Get an assignment by ID;
- Review the assignment;
- Accept the assignment or reject the assignment:;


In [None]:
# To retrieve an assignment using its ID, you can use the get_assignment function.
assignment_id = 'your_assignment_id'
assignment = toloka_client.get_assignment(assignment_id)

In [None]:
# If you're satisfied with the performer's work, you can accept the assignment using the accept_assignment function.
toloka_client.accept_assignment(assignment_id, 'Well done!')

In [None]:
# If the assignment doesn't meet your expectations or requirements, you can reject it using the reject_assignment function.
toloka_client.reject_assignment(assignment_id, 'Please follow the instructions carefully.')

# Get responses

In this section, we define a function to monitor the progress of the pool and wait for its completion

In [None]:
pool_id = pool.id

def wait_pool_for_close(pool_id, minutes_to_wait=1):
    sleep_time = 60 * minutes_to_wait
    pool = toloka_client.get_pool(pool_id)
    while not pool.is_closed():
        op = toloka_client.get_analytics([toloka.analytics_request.CompletionPercentagePoolAnalytics(subject_id=pool.id)])
        op = toloka_client.wait_operation(op)
        percentage = op.details['value'][0]['result']['value']
        print(
            f'   {datetime.datetime.now().strftime("%H:%M:%S")}\t'
            f'Pool {pool.id} - {percentage}%'
        )
        time.sleep(sleep_time)
        pool = toloka_client.get_pool(pool.id)
    print('Pool was closed.')

wait_pool_for_close(pool_id)

Next, we use the Toloka API to retrieve the completed assignments from the specified pool and store the results in a pandas DataFrame called `answers_df`. This DataFrame will contain all the information about the completed tasks, such as the input values, the performer's answers, and other metadata. You can then use this DataFrame to analyze the results or perform additional quality control steps.

In [None]:
answers_df = toloka_client.get_assignments_df(pool_id)

## Aggregate the results

For this step, please, see our prepared notebooks with examples here 
https://github.com/Toloka/crowd-kit/tree/main/examples

## Summary

In the advanced Toloka notebook, we cover the following main aspects:

- Authorization in Toloka API: Involves authorizing with Toloka using OAuth 2.0 token.

- Creating a new project: Describes the process of creating a new project in Toloka, adding a description, and public instructions.

- Task interface description: Includes creating a task interface using various components like text fields, radio buttons, checkboxes, and dropdowns.

- Creating and setting up a training pool: Describes the process of creating and setting up a training pool for training performers before they work on actual tasks.

- Creating and setting up the main pool: Involves creating and configuring the main pool for tasks execution, setting up performer filters, dynamic pricing, limits, and quality control mechanisms.

- Preparing and uploading data: Preparing data for creating tasks and uploading them to the pool.

- Launching the pool and tracking progress: Describes the process of launching the pool, tracking its progress, and closing it upon completion.

- Retrieving results: Explains how to retrieve the results of completed tasks and save them in a DataFrame.

This advanced notebook provides a detailed guide on creating, configuring, and managing Toloka projects using Python and the Toloka API, as well as exploring various quality control mechanisms and workflow optimizations.