## Pipeline for collecting text summarization dataset

The main goal of the pipeline is to get a dataset from text-summarization pairs for further use in NLP models.
We work with texts of length 4000-8000 letters.

## Decomposition

For this task we need to decompose work for some parts, for example as shown in the figure below

<table  align="center">
  <tr><td>
    <img src="./pictures/pipeline.png"
         alt="Pipeline"  width="1200">
  </td></tr>
</table>

### Input data
Any texts can be used. But we should expect that the results are better for smaller texts. In this example, parsed texts from various pages in Russian will be used.  

It is also not recommended to use texts from Wikipedia as input data, since they are already quite small and informative and it is difficult to squeeze more out of them.

In [None]:
import toloka.client as toloka
import toloka.client.project.template_builder as tb
import datetime
import os

In [None]:
LANGUAGE="ru"

In [None]:
token = input("Enter your token:")
toloka_client = toloka.TolokaClient(token, 'SANDBOX')  # or switch to PRODUCTION
print(toloka_client.get_requester())

### Prepare text data

In this example as input data we have file with json-strings. In each row json of the following type: `{"text": "###", "url": "###" }`  
Your data could be in the other format. But in this case you need to write iterator by texts of your dataset.

In [None]:
filename = 'data/input'
min_len = 4000
max_len = 8000
texts = []

with open(filename) as file:
    while True:
        line = file.readline()
        if not line:
            break
        json_string = json.loads(line)
        if (min_len <= len(json_string["text"]) <= max_len):
            text = json_string["text"]
            text = text.replace('\n', '<br>').replace('\t', '')
            text_id = hash(text)
            texts.append((text, text_id))
        
print(f'Texts count: {len(texts)}')
df_for_preval.head()

### 0. Prevalidation project

In this step we want to exclude bad texts. For example such texts as: product composition, instructions for the medicine, song translates are not good for summarization task.

In [None]:
with open(f"projects/prevalidation/{LANGUAGE}/page.css", "r", encoding="utf-8") as f:
    prevalidation_page_styles = f.read()

with open(f"projects/prevalidation/{LANGUAGE}/page.html", "r", encoding="utf-8") as f:
    prevalidation_page_text = f.read()
    
with open(f"projects/prevalidation/{LANGUAGE}/page.css", "r", encoding="utf-8") as f:
    prevalidation_page_script = f.read()

prevalidation_input_spec = {
    "text": toloka.project.field_spec.StringSpec(required=True),
    "text_id": toloka.project.field_spec.IntSpec(required=True),
}

prevalidation_output_spec = {
    'summarizable': toloka.project.field_spec.BoolSpec(required=True),
}

prevalidation_view_spec = toloka.project.view_spec.ClassicViewSpec(
    script=prevalidation_page_script,
    markup=prevalidation_page_text,
    styles=prevalidation_page_styles,
    assets=toloka.project.view_spec.ClassicViewSpec.Assets(
            style_urls=["https://storage.mds.yandex.net/get-ang2-data/40144/static/material-icons.css?content_type=text/css"],
            script_urls=["$TOLOKA_ASSETS/js/toloka-handlebars-templates.js"]
        )
)

prevalidation_project = toloka.project.Project(
    assignments_issuing_type=toloka.project.Project.AssignmentsIssuingType.AUTOMATED,
    public_name=open(f"projects/prevalidation/{LANGUAGE}/name.txt", encoding="utf-8").read().strip(),
    public_description=open(f"projects/prevalidation/{LANGUAGE}/comment.txt", encoding="utf-8").read().strip(),
    public_instructions=open(f"projects/prevalidation/{LANGUAGE}/instruction.html", encoding="utf-8").read().strip(),
    
    task_spec=toloka.project.task_spec.TaskSpec(
        input_spec=training_input_spec,
        output_spec=training_output_spec,
        view_spec=training_view_spec,
    ),
)

In [None]:
prevalidation_project = toloka_client.create_project(prevalidation_project)
print(f'Created prevalidation project with id {prevalidation_project.id}')

### Prevalidation pool

In [None]:
# Setting up pool
prevalidation_pool = toloka.pool.Pool(
    project_id=training_project.id,
    private_name='Training for text summarization',
    may_contain_adult_content=True,
    will_expire=datetime.datetime.utcnow() + datetime.timedelta(days=365),
    reward_per_assignment=0.01,
    auto_accept_solutions=True,
    assignment_max_duration_seconds=60*15,
    defaults=toloka.pool.Pool.Defaults(
        default_overlap_for_new_task_suites=3
    ),
    filter=toloka.filter.FilterAnd([
        toloka.filter.Languages.in_(LANGUAGE.upper()),
        toloka.filter.ClientType.eq(toloka.filter.ClientType.ClientType.BROWSER),
    ])
)

# Setting task mixing configuration
prevalidation_pool.set_mixer_config(
    real_tasks_count=5,
    golden_tasks_count=0,
    training_tasks_count=0
)

prevalidation_pool.quality_control.add_action(
    collector=toloka.collectors.MajorityVote(answer_threshold=2),
    conditions=[
        toloka.conditions.TotalAnswersCount > 9,
        toloka.conditions.CorrectAnswersRate < 60,
    ],
    action=toloka.actions.RejectAllAssignments(public_comment='Too low quality')
)

In [None]:
prevalidation_pool = toloka_client.create_pool(prevalidation_pool)
print(f'Created prevalidation poll with id {prevalidation_pool.id}')

### 1. Training project

There we want to learn performers how summarization of text should look. And also filter performers, who have bad results at such task.  

<table  align="center">
  <tr><td>
    <img src="./pictures/training.png"
         alt="Pipeline"  width="1200">
  </td></tr>
</table>

In [None]:
with open(f"projects/training/{LANGUAGE}/page.css", "r", encoding="utf-8") as f:
    training_page_styles = f.read()

with open(f"projects/training/{LANGUAGE}/page.html", "r", encoding="utf-8") as f:
    training_page_text = f.read()
    
with open(f"projects/training/{LANGUAGE}/page.css", "r", encoding="utf-8") as f:
    training_page_script = f.read()

training_input_spec = {
    "paragraph": toloka.project.field_spec.StringSpec(required=True),
    "summary1": toloka.project.field_spec.StringSpec(required=True),
    "summary2": toloka.project.field_spec.StringSpec(required=True),
}

training_output_spec = {
    'result': toloka.project.field_spec.StringSpec(required=True),
}

training_view_spec = toloka.project.view_spec.ClassicViewSpec(
    script=training_page_script,
    markup=training_page_text,
    styles=training_page_styles,
    assets=toloka.project.view_spec.ClassicViewSpec.Assets(
            style_urls=["https://storage.mds.yandex.net/get-ang2-data/40144/static/material-icons.css?content_type=text/css"],
            script_urls=["$TOLOKA_ASSETS/js/toloka-handlebars-templates.js"]
        )
)

training_project = toloka.project.Project(
    assignments_issuing_type=toloka.project.Project.AssignmentsIssuingType.AUTOMATED,
    public_name=open(f"projects/training/{LANGUAGE}/name.txt", encoding="utf-8").read().strip(),
    public_description=open(f"projects/training/{LANGUAGE}/comment.txt", encoding="utf-8").read().strip(),
    public_instructions=open(f"projects/training/{LANGUAGE}/instruction.html", encoding="utf-8").read().strip(),
    
    task_spec=toloka.project.task_spec.TaskSpec(
        input_spec=training_input_spec,
        output_spec=training_output_spec,
        view_spec=training_view_spec,
    ),
)

In [None]:
training_project = toloka_client.create_project(training_project)
print(f'Created training project with id {training_project.id}')

In [None]:
# create new skill to filter performers in main project
skill_name = 'Text summarization'
summ_skill = next(toloka_client.get_skills(name=skill_name), None)

if summ_skill:
    print('Skill already exists')
else:
    print('Creating new skill')
    summ_skill = toloka_client.create_skill(
        name=skill_name,
        hidden=True,
    )

In [None]:
# Setting up pool
training_pool = toloka.pool.Pool(
    project_id=training_project.id,
    private_name='Training for text summarization',
    may_contain_adult_content=True,
    will_expire=datetime.datetime.utcnow() + datetime.timedelta(days=365),
    reward_per_assignment=0.00,
    auto_accept_solutions=True,
    assignment_max_duration_seconds=60*15,
    defaults=toloka.pool.Pool.Defaults(
        default_overlap_for_new_task_suites=3
    ),
    filter=toloka.filter.FilterAnd([
        toloka.filter.Languages.in_(LANGUAGE.upper()),
        toloka.filter.ClientType.eq(toloka.filter.ClientType.ClientType.BROWSER),
    ])
)

# Setting task mixing configuration
training_pool.set_mixer_config(
    real_tasks_count=5,
    golden_tasks_count=5,
    training_tasks_count=5
)

training_pool.quality_control.add_action(
    collector=toloka.collectors.AcceptanceRate(),
    conditions=[
        toloka.conditions.TotalAssignmentsCount > 2,
        toloka.conditions.AcceptedAssignmentsRate > 50,
    ],
    action=toloka.actions.SetSkill(skill_id=summ_skill.id, skill_value=100),
)

In [None]:
training_pool = toloka_client.create_pool(training_pool)

### 2.1. Summarization by small paragraphs

<table  align="center">
  <tr><td>
    <img src="./pictures/summarization.png"
         alt="Pipeline"  width="1200">
  </td></tr>
</table>

In [None]:
with open(f"projects/summarization_small/{LANGUAGE}/page.css", "r", encoding="utf-8") as f:
    summarization_page_styles = f.read()

with open(f"projects/summarization_small/{LANGUAGE}/page.html", "r", encoding="utf-8") as f:
    summarization_page_text = f.read()
    
with open(f"projects/summarization_small/{LANGUAGE}/page.css", "r", encoding="utf-8") as f:
    summarization_page_script = f.read()

summarization_input_spec = {
    "text_id": toloka.project.field_spec.IntegerSpec(required=True),
    "paragraph_id": toloka.project.field_spec.IntegerSpec(required=True),
    "paragraph": toloka.project.field_spec.StringSpec(required=True),
    "text_full": toloka.project.field_spec.StringSpec(required=True),
}

summarization_output_spec = {
    'summary': toloka.project.field_spec.StringSpec(required=True),
}

summarization_view_spec = toloka.project.view_spec.ClassicViewSpec(
    script=summarization_page_script,
    markup=summarization_page_text,
    styles=summarization_page_styles,
    assets=toloka.project.view_spec.ClassicViewSpec.Assets(
            script_urls=["$TOLOKA_ASSETS/js/toloka-handlebars-templates.js"]
        )
)

summarization_project = toloka.project.Project(
    assignments_issuing_type=toloka.project.Project.AssignmentsIssuingType.AUTOMATED,
    public_name=open(f"projects/summarization_small/{LANGUAGE}/name.txt", encoding="utf-8").read().strip(),
    public_description=open(f"projects/summarization_small/{LANGUAGE}/comment.txt", encoding="utf-8").read().strip(),
    public_instructions=open(f"projects/summarization_small/{LANGUAGE}/instruction.html", encoding="utf-8").read().strip(),
    
    task_spec=toloka.project.task_spec.TaskSpec(
        input_spec=summarization_input_spec,
        output_spec=summarization_output_spec,
        view_spec=summarization_view_spec,
    ),
)

In [None]:
summarization_project = toloka_client.create_project(summarization_project)
print(f'Created summarization project with id {summarization_project.id}')

### Summarization pool

In [None]:
# Setting up pool
summarization_pool = toloka.pool.Pool(
    project_id=summarization_project.id,
    private_name='Summarization paragraphs',
    may_contain_adult_content=True,
    will_expire=datetime.datetime.utcnow() + datetime.timedelta(days=365),
    reward_per_assignment=0.02,
    auto_accept_solutions=True,
    assignment_max_duration_seconds=60*15,
    defaults=toloka.pool.Pool.Defaults(
        default_overlap_for_new_task_suites=2
    ),
    filter=toloka.filter.FilterAnd([
        toloka.filter.Languages.in_(LANGUAGE.upper()),
        toloka.filter.ClientType.eq(toloka.filter.ClientType.ClientType.BROWSER),
        toloka.filter.Skill(summ_skill.id).eq(100)
    ])
)

# Setting task mixing configuration
summarization_pool.set_mixer_config(
    real_tasks_count=1,
    golden_tasks_count=0,
    training_tasks_count=0
)


# Setting up pool quality control

# Banning performer who answers too fast
summarization_pool.quality_control.add_action(
    collector=toloka.collectors.AssignmentSubmitTime(
        history_size=5, 
        fast_submit_threshold_seconds=20
    ),
    conditions=[toloka.conditions.FastSubmittedCount > 1],
    action=toloka.actions.RestrictionV2(
        scope=toloka.user_restriction.UserRestriction.PROJECT,
        duration_unit=toloka.user_restriction.DurationUnit.PERMANENT,
        private_comment='Fast answers'
    )
)

# Increasing overlap for the task if the assignment was rejected
summarization_pool.quality_control.add_action(
    collector=toloka.collectors.AssignmentsAssessment(),
    conditions=[toloka.conditions.AssessmentEvent == toloka.conditions.AssessmentEvent.REJECT],
    action=toloka.actions.ChangeOverlap(delta=1, open_pool=True)
)

summarization_pool = toloka_client.create_pool(summarization_pool)

### 3.1. Validation for small paragraphs

<table  align="center">
  <tr><td>
    <img src="./pictures/validation.png"
         alt="Pipeline"  width="1200">
  </td></tr>
</table>

In [None]:
with open(f"projects/validation_small/{LANGUAGE}/page.css", "r", encoding="utf-8") as f:
    validation_page_styles = f.read()

with open(f"projects/validation_small/{LANGUAGE}/page.html", "r", encoding="utf-8") as f:
    validation_page_text = f.read()
    
with open(f"projects/validation_small/{LANGUAGE}/page.css", "r", encoding="utf-8") as f:
    validation_page_script = f.read()

validation_input_spec = {
    "summary": toloka.project.field_spec.StringSpec(required=True),
    "text_id": toloka.project.field_spec.IntegerSpec(required=True),
    "paragraph": toloka.project.field_spec.StringSpec(required=True),
    "paragraph_id": toloka.project.field_spec.IntegerSpec(required=True),
    "text_full": toloka.project.field_spec.StringSpec(required=True),
    "assignment_id": toloka.project.field_spec.StringSpec(required=True),
}

validation_output_spec = {
    'theme': toloka.project.field_spec.BoolSpec(required=True),
    'correct': toloka.project.field_spec.BoolSpec(required=True),
    'quality': toloka.project.field_spec.BoolSpec(required=True),
    'original': toloka.project.field_spec.BoolSpec(required=True),
}

validation_view_spec = toloka.project.view_spec.ClassicViewSpec(
    script=validation_page_script,
    markup=validation_page_text,
    styles=validation_page_styles,
    assets=toloka.project.view_spec.ClassicViewSpec.Assets(
            script_urls=["$TOLOKA_ASSETS/js/toloka-handlebars-templates.js"]
        )
)

validation_project = toloka.project.Project(
    assignments_issuing_type=toloka.project.Project.AssignmentsIssuingType.AUTOMATED,
    public_name=open(f"projects/validation_small/{LANGUAGE}/name.txt", encoding="utf-8").read().strip(),
    public_description=open(f"projects/validation_small/{LANGUAGE}/comment.txt", encoding="utf-8").read().strip(),
    public_instructions=open(f"projects/validation_small/{LANGUAGE}/instruction.html", encoding="utf-8").read().strip(),
    
    task_spec=toloka.project.task_spec.TaskSpec(
        input_spec=validation_input_spec,
        output_spec=validation_output_spec,
        view_spec=validation_view_spec,
    ),
)

In [None]:
validation_project = toloka_client.create_project(validation_project)
print(f'Created validation project with id {validation_project.id}')

### Validation pool

In [None]:
# Setting up pool
validation_pool = toloka.pool.Pool(
    project_id=validation_project.id,
    private_name='Validation after summarizing',
    may_contain_adult_content=True,
    will_expire=datetime.datetime.utcnow() + datetime.timedelta(days=365),
    reward_per_assignment=0.01,
    auto_accept_solutions=True,
    assignment_max_duration_seconds=60*15,
    defaults=toloka.pool.Pool.Defaults(
        default_overlap_for_new_task_suites=3
    ),
    filter=toloka.filter.FilterAnd([
        toloka.filter.Languages.in_(LANGUAGE.upper()),
        toloka.filter.ClientType.eq(toloka.filter.ClientType.ClientType.BROWSER),
        toloka.filter.Skill(summ_skill.id).eq(100)
    ])
)

# Setting task mixing configuration
validation_pool.set_mixer_config(
    real_tasks_count=5,
    golden_tasks_count=0,
    training_tasks_count=0
)

# Setting up pool quality control

# Increasing overlap for the task if the assignment was rejected
validation_pool.quality_control.add_action(
    collector=toloka.collectors.MajorityVote(answer_threshold=2),
    conditions=[
        toloka.conditions.TotalAnswersCount > 7,
        toloka.conditions.CorrectAnswersRate < 60,
    ],
    action=toloka.actions.RejectAllAssignments(public_comment='Too low quality')
)


In [None]:
validation_pool = toloka_client.create_pool(validation_pool)

### 4. Choose best of 2

In [None]:
with open(f"projects/choose_best/{LANGUAGE}/page.css", "r", encoding="utf-8") as f:
    choose_best_page_styles = f.read()

with open(f"projects/choose_best/{LANGUAGE}/page.html", "r", encoding="utf-8") as f:
    choose_best_page_text = f.read()
    
with open(f"projects/choose_best/{LANGUAGE}/page.css", "r", encoding="utf-8") as f:
    choose_best_page_script = f.read()

choose_best_input_spec = {
    "text_id": toloka.project.field_spec.IntegerSpec(required=True),
    "summary1": toloka.project.field_spec.StringSpec(required=True),
    "summary2": toloka.project.field_spec.StringSpec(required=False),
    "summary3": toloka.project.field_spec.StringSpec(required=False),
    "summary4": toloka.project.field_spec.StringSpec(required=False),
    "paragraph": toloka.project.field_spec.StringSpec(required=True),
    "paragraph_id": toloka.project.field_spec.IntegerSpec(required=True),
}

choose_best_output_spec = {
    'result': toloka.project.field_spec.IntSpec(required=True),
}

choose_best_view_spec = toloka.project.view_spec.ClassicViewSpec(
    script=choose_best_page_script,
    markup=choose_best_page_text,
    styles=choose_best_page_styles,
    assets=toloka.project.view_spec.ClassicViewSpec.Assets(
            script_urls=["$TOLOKA_ASSETS/js/toloka-handlebars-templates.js"]
        )
)

choose_best_project = toloka.project.Project(
    assignments_issuing_type=toloka.project.Project.AssignmentsIssuingType.AUTOMATED,
    public_name=open(f"projects/choose_best/{LANGUAGE}/name.txt", encoding="utf-8").read().strip(),
    public_description=open(f"projects/choose_best/{LANGUAGE}/comment.txt", encoding="utf-8").read().strip(),
    public_instructions=open(f"projects/choose_best/{LANGUAGE}/instruction.html", encoding="utf-8").read().strip(),
    
    task_spec=toloka.project.task_spec.TaskSpec(
        input_spec=choose_best_spec,
        output_spec=choose_best_spec,
        view_spec=choose_best_spec,
    ),
)

In [None]:
choose_best_project = toloka_client.create_project(choose_best_project)
print(f'Created choose best project with id {choose_best_project.id}')

### Run pools

In [None]:
def split_to_paragraphs(text):
    max_paragraph_size = 1500
    prepared = []
    paragraphs = text.split("<br>")

    cur_paragraph = ""
    for paragraph in paragraphs:
        if not cur_paragraph or len(cur_paragraph) + len(paragraph) < max_paragraph_size:
            cur_paragraph += "<br>" + paragraph
        else:
            prepared.append(cur_paragraph)
            cur_paragraph = paragraph
    prepared.append(cur_paragraph)

    return prepared

def text_to_task(text, text_id):
    tasks_by_text = []
    paragraphs = split_to_paragraphs(text)
    for i, paragraph in enumerate(paragraphs):                
        text_full = '<br>'.join(paragraphs[:i]) + \
                    f'<br><b> {paragraph} </b><br>' + \
                    '<br>'.join(paragraphs[i+1:])
        task = toloka.task.Task(
                    input_values={
                        'text_id': paragraph['context'],
                        'paragraph': paragraph,
                        'text_full': text_full,
                        'paragraph_id': i
                    },
                    pool_id = summarization_pool.id,
                )
        tasks_by_text.append(task)
    return tasks_by_text

In [None]:
tasks = []
for text, text_id in texts:
    tasks_by_text = text_to_task(text, text_id)
    for task in tasks_by_text:
        tasks.append(text_to_task(text, text_id))

In [None]:
tasks_op = toloka_client.create_tasks_async(tasks, allow_defaults=True)
toloka_client.wait_operation(tasks_op)

In [None]:
def wait_pool_for_close(pool):
    sleep_time = 60
    pool = toloka_client.get_pool(pool.id)
    while not pool.is_closed():
        print(
            f'\t{datetime.datetime.now().strftime("%H:%M:%S")}\t'
            f'Pool {pool.id} has status {pool.status}.'
        )
        time.sleep(sleep_time)
        pool = toloka_client.get_pool(pool.id)



def prepare_validation_tasks():
    validation_tasks = []
    request = toloka.search_requests.AssignmentSearchRequest(
        status=toloka.assignment.Assignment.SUBMITTED,  # Only take completed tasks that haven't been accepted or rejected
        pool_id=summarization_pool.id,
    )
    # Create and store new tasks
    for assignment in toloka_client.get_assignments(request):
        for task, solution in zip(assignment.tasks, assignment.solutions):
            validation_tasks.append(
                toloka.task.Task(
                    input_values={
                        'summary': solution.output_values['summary'],
                        'text_id': task.input_values['text_id'],
                        'paragraph': task.input_values['paragraph'],
                        'text_full': task.input_values['text_full'],
                        'paragraph_id': task.input_values['paragraph_id'],
                        'assignment_id': assignment.id,
                    },
                    pool_id=val_pool.id,
                )
            )
    print(f'Generate {len(validation_tasks)} new validation tasks')
    return validation_tasks



def run_validation_pool(validation_tasks):
    validation_tasks_op = toloka_client.create_tasks_async(
        validation_tasks,
        toloka.task.CreateTasksParameters(allow_defaults=True)
    )
    toloka_client.wait_operation(validation_tasks_op)
    validation_tasks_result = [task for task in toloka_client.get_tasks(pool_id=validation_pool.id) if not task.known_solutions]

    task_to_assignment = {}
    for task in validation_tasks_result:
        task_to_assignment[task.id] = task.input_values['assignment_id']

    # Open the validation pool
    run_pool2_operation = toloka_client.open_pool(validation_pool.id)
    run_pool2_operation = toloka_client.wait_operation(run_pool2_operation)
    print(f'Validation pool status - {run_pool2_operation.status}')
    return task_to_assignment


def get_aggregation_results():
    print('Start aggregation in the validation pool')
    aggregation_operation = toloka_client.aggregate_solutions_by_pool(
        type='DAWID_SKENE',
        pool_id=validation_pool.id,
        fields=[toloka.aggregation.PoolAggregatedSolutionRequest.Field(name='is_correct')]
    )
    aggregation_operation = toloka_client.wait_operation(aggregation_operation)
    print('Results aggregated')

    aggregation_result = toloka_client.find_aggregated_solutions(aggregation_operation.id)
    validation_results = aggregation_result.items
    while aggregation_result.has_more:
        aggregation_result = toloka_client.find_aggregated_solutions(
            aggregation_operation.id,
            task_id_gt=aggregation_result.items[len(aggregation_result.items) - 1].task_id,
        )
        validation_results = validation_results + aggregation_result.items
    return validation_results


def set_answers_status(validation_results):
    print('Started adding results to marking tasks')
    assignment_results = dict()
    for r in validation_results:
        if r.task_id not in task_to_assignment:
            continue

        assignment_id = task_to_assignment[r.task_id]
        assignment_result = assignment_results.get(assignment_id, 0)

        # Increase the number of correct tasks in assignment
        if r.output_values['is_correct'] == 'yes':
            assignment_result += 1

        assignment_results[assignment_id] = assignment_result

    for assignment_id, correct_num in assignment_results.items():
        assignment = toloka_client.get_assignment(assignment_id)
        if assignment.status.value == 'SUBMITTED':
            # If 4 or 5 tasks in the assignment was marked as correct then we will accept the assignment
            if correct_num >= 4:
                toloka_client.accept_assignment(assignment_id, 'Well done!')
            else:
                toloka_client.reject_assignment(assignment_id, 'Incorrect answers')
    print('Finished adding results to marking tasks')

In [None]:
toloka_client.open_pool(training_pool.id)
toloka_client.open_pool(summarization_pool.id)
toloka_client.open_pool(validation_pool.id)

In [None]:
# Run the pipeline
while True:
    print('\nWaiting for summarization pool to close')
    wait_pool_for_close(summarization_pool)
    print(f'Marking pool {summarization_pool.id} is finally closed!')

    # Preparing tasks
    validation_tasks = prepare_validation_tasks()

    # Make sure all the tasks are done
    if not validation_tasks:
        print('All the tasks in our project are done')
        break

    # Add it to the pool and run the pool
    task_to_assignment = run_validation_pool(validation_tasks)

    print('\nWaiting for validation pool to close')
    wait_pool_for_close(validation_pool)
    print(f'Validation pool {validation_pool.id} is finally closed!')

    # Aggregation operation
    validation_results = get_aggregation_results()
    # Reject or accept tasks in the segmentation pool
    set_answers_status(validation_results)


print(f'Results received at {datetime.datetime.now()}')