# How to collect images for a dataset

The goal for this project is to collect images of dogs and cats for a dataset. This dataset will consist of images of dogs, cats, and empty images.

Performers will be asked to take a photo of their pet and specify the type of animal.

The real project like that should be subdivided into subprojects of validation and markup to make sure each photo is correct and contains the object it says it does. This example is simplified and doesn't contain subdivision.

In [None]:
# Prepare the environment and import everything you'll need
!pip install toloka-kit==0.1.5

import datetime
import time

import toloka.client as toloka
import toloka.client.project.template_builder as tb

Click [here](https://github.com/Toloka/toloka-kit/blob/main/README.md) to learn about Toloka and how to get an OAuth token.

[Image segmentation example](https://github.com/Toloka/toloka-kit/blob/main/examples/image_segmentation/image_segmentation.ipynb).

In [None]:
# Сreate a toloka-client instance
# All API calls will go through it
try:
    token = input("Enter your token:")
    toloka_client = toloka.TolokaClient(token, 'PRODUCTION')  # Or switch to 'SANDBOX'
    # Lines below check the availability of money in your account and that the OAuth token is correct
    requester = toloka_client.get_requester()
    print('It\'s enough money on your account - ', requester.balance > 3.0)
except:
    print('You probably entered an invalid token. Please, run this cell again.')

---
---
## Starting a project

Note: Go to the next section to get results for the **already launched project**.

### Create a new project

Prepare the task interface.

The task interface should:
- Contain the description of the task.
- Permit uploading images.
- Allow to select the type of object depicted in the image.

Structure of output data:

In [None]:
output_specification = {
    'image': toloka.project.field_spec.FileSpec(),
    'label': toloka.project.field_spec.StringSpec(allowed_values=['cat', 'dog', 'none'])
}

Configure the task interface.

Click [here](https://yandex.com/support/toloka-tb/index.html) to learn more about Template Builder, an environment for task interface configuration.

In [None]:
# Radio buttons to choose the label type
radio_group_field = tb.fields.RadioGroupFieldV1(
    data=tb.data.OutputData(path='label'),
    label='What\'s in your photograph?',
    validation=tb.conditions.RequiredConditionV1(),
    options=[
        tb.fields.GroupFieldOption(label='Cat', value='cat'),
        tb.fields.GroupFieldOption(label='Dog', value='dog'),
        tb.fields.GroupFieldOption(label='Neither a cat nor a dog', value='none'),
    ]
)

# Buttons for loading an image or taking a photo
image_loader = tb.fields.MediaFileFieldV1(
    label='Upload a photo of your cat or your dog. Read the instructions carefully.',
    data=tb.data.OutputData(path='image'),
    validation=tb.conditions.RequiredConditionV1(),
    accept=tb.fields.MediaFileFieldV1.Accept(photo=True, gallery=True),
    multiple=False,
)

# How performers will see the task
project_interface = toloka.project.view_spec.TemplateBuilderViewSpec(
    config=tb.TemplateBuilder(
        view=tb.view.ListViewV1(items=[image_loader, radio_group_field])
    ),
    settings={
        'showSubmit': True,
        'showFinish': True,
        'showTimer': True,
        'showReward': True,
        'showTitle': True,
        'showRoute': True,
        'showComplain': True,
        'showMessage': True,
        'showSubmitExit': True,
        'showFullscreen': True,
        'showInstructions': True,
    },
)

public_instruction = """Take a picture of your pet if it is a cat or a dog and select the appropriate label type.<br><br>
If you don't have a cat or a dog, take a photo of anything and select a "Not a cat nor a dog" label. There should be exactly one animal in the photo, clearly visible, not cropped. The animal can be photographed from any side and in any position. You can take a picture of a pet in your arms.<br><br>
It should be clearly visible what animal is depicted (e.g. do not photograph your pet's back in the dark).
"""

# Create a project
new_project = toloka.project.Project(
    assignments_issuing_type=toloka.project.Project.AssignmentsIssuingType.AUTOMATED,
    public_name='Take a photo of your pet',
    public_description='If you have a cat or a dog, take a picture of it. If you don\'t have any such animals, take a random photo.',
    public_instructions=public_instruction,
    # Set up the task interface and output parameters
    task_spec=toloka.project.task_spec.TaskSpec(
        input_spec={'label': toloka.project.field_spec.StringSpec(required=False, hidden=True)},
        output_spec=output_specification,
        view_spec=project_interface,
    ),
)

# An API request to create a new project
new_project = toloka_client.create_project(new_project)
print(f'Created project with id {new_project.id}')
print(f'To view the project, go to https://toloka.yandex.com/requester/project/{new_project.id}')
# print(f'To view this pool, go to https://sandbox.toloka.yandex.com/requester/project/{new_project.id}/pool/{new_pool.id}') # Print a sandbox version link

### Create a pool

Create a task pool and set its quality control rules.

This project will only allow one answer. It can be implemented through a skill:

1. A performer gets the skill after sending a response.
2. The performers with the skill are not allowed to perform the task.

In [None]:
# Create a skill
skill_name = 'Pet photo'
pet_skill = next(toloka_client.get_skills(name=skill_name), None)
if pet_skill:
    print('Skill already exists')
else:
    print('Creating new skill')
    pet_skill = toloka_client.create_skill(
        name=skill_name,
        hidden=True,
        public_requester_description={'EN': 'The performer took a photo of their pet.'},
    )

Access to tasks is granted for:

1. Toloka Mobile users.

   _Why: A phone is a convenient tool to make photos. A phone also makes it harder to cheat by uploading a random file._

2. English-speaking performers.

   _Why: The task instruction is written in English._

In [None]:
# Create a pool
new_pool = toloka.pool.Pool(
    project_id=new_project.id,
    private_name='Pool 1',
    may_contain_adult_content=False,
    will_expire=datetime.datetime.utcnow() + datetime.timedelta(days=365),
    reward_per_assignment=0.05,
    auto_accept_solutions=True,
    assignment_max_duration_seconds=60*10,
    defaults=toloka.pool.Pool.Defaults(default_overlap_for_new_task_suites=1),
    filter=(
        (toloka.filter.Languages.in_('EN')) &
        (toloka.filter.Skill(pet_skill.id) == None) &
        (toloka.filter.ClientType == toloka.filter.ClientType.ClientType.TOLOKA_APP)
    ),
)

# Automatically updating skills
new_pool.quality_control.add_action(
    collector=toloka.collectors.AnswerCount(),
    # If the performer completed at least one task,
    conditions=[toloka.conditions.AssignmentsAcceptedCount > 0],
    # It doesn't add to the skill, it sets the new skill to 1
    action=toloka.actions.SetSkill(skill_id=pet_skill.id, skill_value=1),
)

new_pool = toloka_client.create_pool(new_pool)
print(f'Created pool with id {new_pool.id}')
print(f'To view the pool, go to https://toloka.yandex.com/requester/project/{new_project.id}/pool/{new_pool.id}')
# print(f'To view this pool, go to https://sandbox.toloka.yandex.com/requester/project/{new_project.id}/pool/{new_pool.id}') # Print a sandbox version link

Open the project for preview.

Mobile devices will display the task like that:

<table  align="center">
  <tr><td>
    <img src="./img/performer_interface.png"
         alt="How performers will see your task on mobile"  height="600">
  </td></tr>
  <tr><td align="center">
    <b>Figure 1.</b> How performers will see your task on mobile
  </td></tr>
</table>

Note: In preview mode you won't be able to upload an image and look at the result. This restriction is related to the preview features and doesn't affect performers.

### Add a task and run the project
Add one task.

Adjust the amount of images you want to get by changing the overlap.

In [None]:
image_count = 10  # How many images you will receive.
new_tasks_suite = toloka.task_suite.TaskSuite(
    pool_id=new_pool.id,
    tasks=[toloka.task.Task(input_values={'label': 'Cats vs Dogs'})],
    overlap=image_count,
)

# Add task suites to the pool
toloka_client.create_task_suite(new_tasks_suite)
print(f'Created pool with id {new_pool.id}')
print(f'To view this pool, go to https://toloka.yandex.com/requester/project/{new_project.id}/pool/{new_pool.id}')
# print(f'To view this pool, go to https://sandbox.toloka.yandex.com/requester/project/{new_project.id}/pool/{new_pool.id}') # Print a sandbox version link

# Open the pool
new_pool = toloka_client.open_pool(new_pool.id)
pool_id = new_pool.id

## Getting responses

Wait for performers to complete the tasks, then download the results.

### If your work with the notepad was interrupted

In case you were launching tasks, but then reloaded the notepad:

1. Enter the pool id below and uncomment the cell.
2. Run all the code cells.

If you are executing the notepad right now, **skip the next cell**.

In [None]:
# pool_id = 22791482

### Wait for the responses

Wait for all the tasks in the pool to be completed.

In [None]:
def wait_pool_for_close(pool_id, minutes_to_wait=1):
    sleep_time = 60 * minutes_to_wait
    pool = toloka_client.get_pool(pool_id)
    while not pool.is_closed():
        op = toloka_client.get_analytics([toloka.analytics_request.CompletionPercentagePoolAnalytics(subject_id=pool.id)])
        op = toloka_client.wait_operation(op)
        percentage = op.details['value'][0]['result']['value']
        print(
            f'   {datetime.datetime.now().strftime("%H:%M:%S")}\t'
            f'Pool {pool.id} - {percentage}%'
        )
        time.sleep(sleep_time)
        pool = toloka_client.get_pool(pool.id)
    print('Pool was closed.')

wait_pool_for_close(pool_id)

Download the results. 

Note: You should download files' ids, not the files themselves. The files will only be needed right before reviewing.

In [None]:
results_list = []

for assignment in toloka_client.get_assignments(pool_id=pool_id, status=toloka.assignment.Assignment.ACCEPTED):
    for solution in assignment.solutions:
        results_list.append(solution.output_values)
print(len(results_list))

---
---
## Showing results

Configure data display.

In [None]:
!pip install ipyplot
from PIL import Image, ImageDraw
import ipyplot

results_iter = iter(results_list)

Run the cell below multiple times to see different responses.

In [None]:
res = next(results_iter, None)
if res is not None:
    with open('tmp_image_file', 'w+b') as out_f:
        toloka_client.download_attachment(res['image'], out_f)
        image = Image.open(out_f).convert("RGBA")
        print(f"label: '{res['label']}'")
        ipyplot.plot_images(
            [image],
            max_images=1,
            img_width=600,
        )
else:
    print('No more results')

## Summary

This project consists of the minimum number of settings that will allow you to collect marked up images for your dataset.

In real projects you should configure:
- Non-automatic acceptance to have the time to review the images.
- Linked project for validation and object's type markup.