# Survey manual

### Call to action
If you found some bugs or have a new feature idea, don't hesitate to [open a new issue on Github](https://github.com/Toloka/toloka-kit/issues/new/choose).
Like our library and examples? Star [our repo on Github](https://github.com/Toloka/toloka-kit)

Prepare environment and import all we'll need.

In [None]:
%%capture
!pip install toloka-kit==0.1.26
!pip install pandas
!pip install plotly

import datetime
import sys
import time
import logging

import plotly.express as px
import pandas

import toloka.client as toloka
import toloka.client.project.template_builder as tb

In [None]:
logging.basicConfig(
    format='[%(levelname)s] %(name)s: %(message)s',
    level=logging.INFO,
    stream=sys.stdout,
)

Сreate toloka-client instance. All api calls will go through it. More about OAuth token in our [Learn the basics example](https://github.com/Toloka/toloka-kit/tree/main/examples/0.getting_started/0.learn_the_basics) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/0.getting_started/0.learn_the_basics/learn_the_basics.ipynb)

In [None]:
toloka_client = toloka.TolokaClient(getpass.getpass('Enter your OAuth token: '), 'PRODUCTION') # Or switch to 'SANDBOX'
logging.info(toloka_client.get_requester())

## Create a project
Enter a clear project name and description.
> The project name and description will be visible to the performers.

In [None]:
project = toloka.Project(
    public_name='Survey on stress management',
    public_description='This survey will take about 1-2 minutes',
)

Create task interface.
Each question described by one field:

  - RadioGroupFieldV1 - used for question with only one possible answer.
  - CheckboxGroupFieldV1 - used for question with several possible answers.


You can replace or add new questions:

  - create new field,
  - add it to the ListView in project_interface,
  - add output field to output_specification below.

In [None]:
work_mode_field = tb.RadioGroupFieldV1(
    tb.OutputData('workmode'),
    [
        tb.GroupFieldOption('office', 'Office'),
        tb.GroupFieldOption('home', 'Home office'),
    ],
    label='Where do you work?',
    validation=tb.RequiredConditionV1(hint='Select an option'),
)

stress_field = tb.RadioGroupFieldV1(
    tb.OutputData('stress'),
    [
        tb.GroupFieldOption('alot', 'A lot'),
        tb.GroupFieldOption('notmuch', 'Not much'),
    ],
    label='Is there a lot of stress in your everyday life?',
    validation=tb.RequiredConditionV1(hint='Select an option'),
)

cope_field = tb.CheckboxGroupFieldV1(
    tb.OutputData('coping'),
    [
        tb.GroupFieldOption('family', 'Spending time with family'),
        tb.GroupFieldOption('sleeping', 'Sleeping'),
        tb.GroupFieldOption('goingout', 'Going out to restaurants, cinemas etc'),
        tb.GroupFieldOption('sport', 'Sport'),
        tb.GroupFieldOption('meditation', 'Meditation'),
        tb.GroupFieldOption('therapy', 'Therapy'),
        tb.GroupFieldOption('alcohol', 'Alcohol'),
        tb.GroupFieldOption('other', 'Other'),
        tb.GroupFieldOption('none', 'None of the above'),
    ],
    label='How do you cope with stress? You can select several options',
    validation=tb.RequiredConditionV1(hint='Choose one or more options'),
)

meditation_field = tb.RadioGroupFieldV1(
    tb.OutputData('meditation'),
    [
        tb.GroupFieldOption('practice', 'I practice meditation'),
        tb.GroupFieldOption('usedtopractice', 'I used to practice meditation'),
        tb.GroupFieldOption('wanttotry', 'I have never practiced but I\'d like to try'),
        tb.GroupFieldOption('dontwant', 'I have never practiced and I don\'t want to try'),
    ],
    label='How do you feel about meditation?',
    validation=tb.RequiredConditionV1(hint='Select an option'),
)

# Add an attention check question (or several). Since there are no correct
# answers to a survey and we can’t just check if they are right or wrong,
# we need to use some workaround techniques to ensure quality.
honeypot_field = tb.RadioGroupFieldV1(
    tb.OutputData('honeypot'),
    [
        tb.GroupFieldOption('yes', 'Yes'),
        tb.GroupFieldOption('no', 'No'),
    ],
    label='Are you now completing a survey on Toloka?',
    validation=tb.RequiredConditionV1(hint='Select an option'),
)

mobile_apps_field = tb.RadioGroupFieldV1(
    tb.OutputData('apps'),
    [
        tb.GroupFieldOption('yes', 'Yes'),
        tb.GroupFieldOption('dontneed', 'No, I don\'t need them'),
        tb.GroupFieldOption('dontpay', 'No, I\'m not ready to pay'),
    ],
    label='Do you buy mobile apps?',
    validation=tb.RequiredConditionV1(hint='Select an option'),
)

project_interface = toloka.project.TemplateBuilderViewSpec(
    view=tb.ListViewV1(
        [
            work_mode_field,
            stress_field,
            cope_field,
            meditation_field,
            honeypot_field,
            mobile_apps_field
        ]
    ),
    plugins=[tb.TolokaPluginV1(kind='scroll', task_width=500)],
)

Make sure the specifications include all output data paths that you have created.
> Specifications are a description of input data that will be used in a project and the output data that will be collected from the performers.

Read more about [input and output data specifications](https://yandex.ru/support/toloka-tb/operations/create-specs.html?utm_source=github&utm_medium=site&utm_campaign=tolokakit) in the Requester’s Guide.

In [None]:
input_specification = {'theme': toloka.project.StringSpec()}
output_specification = {
    'workmode': toloka.project.StringSpec(),
    'stress': toloka.project.StringSpec(),
    'coping': toloka.project.JsonSpec(),
    'meditation': toloka.project.StringSpec(),
    'honeypot': toloka.project.StringSpec(),
    'apps': toloka.project.StringSpec(),
}

project.task_spec = toloka.project.task_spec.TaskSpec(
    input_spec=input_specification,
    output_spec=output_specification,
    view_spec=project_interface,
)

If there is anything important about the survey that the performers should know, put it in the instructions. In that case, the attention check question can be based on this information.

In [None]:
project.public_instructions = """We are conducting research on how people cope with stress in their everyday life<br>
Answer the questions by selecting one or more possible answers."""

Create a project.

In [None]:
project = toloka_client.create_project(project)

## Create a pool
A pool is a set of paid tasks grouped into task pages. These tasks are sent out for completion at the same time.
> All tasks within a pool have the same settings (price, quality control, etc.)

We will use non-automatic acceptance. The reason for accepting the task will be a correct answer to the attention check question.

In [None]:
pool = toloka.Pool(
    project_id=project.id,
    # Give the pool any name you find suitable. You are the only one who will see it.
    private_name='Survey on stress management',
    may_contain_adult_content=False,
    # Set the price per task page.
    reward_per_assignment=0.01,
    # We will check the completed tasks manually before paying for them.
    auto_accept_solutions=False,
    # Number of days to determine if we pay.
    auto_accept_period_day=1,
    will_expire=datetime.datetime.utcnow() + datetime.timedelta(days=365),
    # Overlap. This is the number of users who will complete the same task.
    defaults=toloka.Pool.Defaults(default_overlap_for_new_task_suites=50),
    # Time allowed for completing a task page
    assignment_max_duration_seconds=60*10,
)

Select English-speaking performers.


Add access from both the Toloka web version and Toloka for mobile. Most surveys are suitable for completion on a mobile device, and it will speed up pool completion.


We would like to run our survey on people living in the UK and the USA who are over 30.
> Please note that personal information like dates of birth is provided by Tolokers themselves. The platform does not control the accuracy of this info. The region can be double-checked using the Region by IP parameter.

In [None]:
pool.filter = (
    (toloka.filter.Languages.in_('EN')) &
    ((toloka.filter.ClientType == 'BROWSER') | (toloka.filter.ClientType == 'TOLOKA_APP')) &
    ((toloka.filter.Country == 'US') | (toloka.filter.Country == 'GB')) &
    (toloka.filter.RegionByIp.in_(102) | toloka.filter.RegionByIp.in_(84)) &
    (toloka.filter.DateOfBirth < int(datetime.datetime.strptime('01.01.1991', '%d.%M.%Y').timestamp()))
)

Set up [Quality control](https://toloka.ai/en/docs/guide/concepts/control?utm_source=github&utm_medium=site&utm_campaign=tolokakit). Read more about [configuring this rule](https://toloka.ai/en/docs/guide/concepts/goldenset?utm_source=github&utm_medium=site&utm_campaign=tolokakit) in our Requester’s Guide.


If the number of responses is at least 1 and the correctness of the responses = 100%, then the answer will be auto-accepted.

In [None]:
pool.quality_control.add_action(
    collector=toloka.collectors.GoldenSet(history_size=1),
    conditions=[toloka.conditions.GoldenSetCorrectAnswersRate == 100],
    action=toloka.actions.ApproveAllAssignments()
)

 Create the “Stress management” skill that will reflect response quality. It can later be used if you re-run the survey and need to exclude those who have already taken part in it.

In [None]:
survey_skill = next(toloka_client.get_skills(name='stress-management'), None)
if survey_skill:
    print('Detection skill already exists')
else:
    survey_skill = toloka_client.create_skill(
        name='stress-management',
        hidden=True,
    )

pool.quality_control.add_action(
    collector=toloka.collectors.GoldenSet(history_size=1),
    conditions=[toloka.conditions.TotalAnswersCount > 0],
    action=toloka.actions.SetSkillFromOutputField(
        skill_id=survey_skill.id,
        from_field='correct_answers_rate',
    ),
)

Add the Processing rejected and accepted assignments rule. If an assignment has been rejected, the task will be sent to another performer.

In [None]:
pool.quality_control.add_action(
    collector=toloka.collectors.AssignmentsAssessment(),
    conditions=[toloka.conditions.AssessmentEvent == 'REJECT'],
    action=toloka.actions.ChangeOverlap(delta=1, open_pool=True),
)

Specify	the number of tasks per page.

In [None]:
pool.set_mixer_config(golden_tasks_count=1)

Create pool

In [None]:
pool = toloka_client.create_pool(pool)

## Preparing and uploading tasks
Create pool task

In [None]:
tasks = [
    toloka.Task(
        pool_id=pool.id,
        input_values={'theme': 'Stress management'},
        known_solutions = [
            toloka.task.BaseTask.KnownSolution(
                output_values={'honeypot': 'yes'}
            )
        ],
    )
]

Upload tasks

In [None]:
created_tasks = toloka_client.create_tasks(tasks, allow_defaults=True)
logging.info(len(created_tasks.items))

You can open pool in web-interface and preview preformers interface.

<table  align="center">
  <tr><td>
    <img src="./img/task_interface.png"
         alt="Task page preview"  width="1000">
  </td></tr>
  <tr><td align="center">
    <b>Figure 1.</b> What the task page preview can looks like.
  </td></tr>
</table>

Start the pool.

**Important.** Remember that real Toloka performers will complete the tasks.
Double check that everything is correct
with your project configuration before you start the pool

In [None]:
pool = toloka_client.open_pool(pool.id)
logging.info(pool.status)

## Receiving responses

Wait until the pool is completed.

In [None]:
pool_id = pool.id

def wait_pool_for_close(pool_id, minutes_to_wait=1):
    sleep_time = 60 * minutes_to_wait
    pool = toloka_client.get_pool(pool_id)
    while not pool.is_closed():
        op = toloka_client.get_analytics([toloka.analytics_request.CompletionPercentagePoolAnalytics(subject_id=pool.id)])
        op = toloka_client.wait_operation(op)
        percentage = op.details['value'][0]['result']['value']
        logging.info(
            f'   {datetime.datetime.now().strftime("%H:%M:%S")}\t'
            f'Pool {pool.id} - {percentage}%'
        )
        time.sleep(sleep_time)
        pool = toloka_client.get_pool(pool.id)
    logging.info('Pool was closed.')

wait_pool_for_close(pool_id)

Get responses. There are accepted assignments, and assignments that need to be reviewed. They were not accepted because a user failed the attention check. They need to be rejected.

In [None]:
for assignment in toloka_client.get_assignments(status='SUBMITTED', pool_id=pool_id):
    toloka_client.reject_assignment(assignment.id, 'There was an attention check question that was failed.')

Let's just show the distribution of answers in all our questions.

In [None]:
answers = []
answers_df = toloka_client.get_assignments_df(pool_id)
answers_df = answers_df.rename(columns={
    'OUTPUT:apps': 'pay for apps',
    'OUTPUT:coping': 'coping with stress',
    'OUTPUT:stress': 'stress level',
    'OUTPUT:workmode': 'work mode',
    'OUTPUT:meditation': 'using meditation',
})

One choice questions.

In [87]:
fig = px.histogram(answers_df, x='work mode', histnorm='percent')
fig.show()

In [90]:
fig = px.histogram(answers_df, x='stress level', histnorm='percent', color='work mode')
fig.show()

In [88]:
fig = px.histogram(answers_df, x='using meditation', histnorm='percent')
fig.show()

In [89]:
fig = px.histogram(answers_df, x='pay for apps', histnorm='percent')
fig.show()

In [94]:
import json
coping_df = pandas.json_normalize(answers_df['coping with stress'].apply(lambda x : json.loads(x)))
fig = px.histogram(coping_df, barmode='group')
fig.show()