# Tweets Sentiment Analysis

In this tutorial we are going to get tweets from any Twitter profile using BrightData and classify them as “Positive”, “Neutral” or “Negative” with a help from Toloka. We ask workers to read several tweets and decide which category it belongs to.

To get acquainted with Toloka tools for free, you can use the promo code **TOLOKAKIT1** on $20 on your [profile page](https://toloka.yandex.com/requester/profile?utm_source=github&utm_medium=site&utm_campaign=tolokakit) after registration.

Prepare environment and import all we'll need.

In [1]:
!pip install toloka-kit==0.1.25
!pip install crowd-kit==1.0.0
!pip install ipyplot

import datetime
import getpass
import json
import requests
import sys
import time
import logging

import numpy as np
import pandas as pd
pd.set_option('display.max_colwidth', None)

import toloka.client as toloka
import toloka.client.project.template_builder as tb
from crowdkit.aggregation import DawidSkene

logging.basicConfig(
    format='[%(levelname)s] %(name)s: %(message)s',
    level=logging.INFO,
    stream=sys.stdout,
)



You should consider upgrading via the '/Users/mr-fedulow/.pyenv/versions/3.9.0/bin/python3.9 -m pip install --upgrade pip' command.[0m


You should consider upgrading via the '/Users/mr-fedulow/.pyenv/versions/3.9.0/bin/python3.9 -m pip install --upgrade pip' command.[0m
You should consider upgrading via the '/Users/mr-fedulow/.pyenv/versions/3.9.0/bin/python3.9 -m pip install --upgrade pip' command.[0m


Сreate toloka-client instance. All api calls will go through it. More about OAuth token in our [Learn the basics example](https://github.com/Toloka/toloka-kit/tree/main/examples/0.getting_started/0.learn_the_basics) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/0.getting_started/0.learn_the_basics/learn_the_basics.ipynb)

In [2]:
toloka_client = toloka.TolokaClient(getpass.getpass('Enter your OAuth token: '), 'PRODUCTION') # Or switch to 'SANDBOX'
print(toloka_client.get_requester())

Enter your OAuth token: ········
Requester(_unexpected={}, id='b39ea2ce2474c437ed0ee0d4aeec630b', balance=Decimal('671.4218'), public_name={'EN': 'Ya.Apollyon', 'FR': 'Ya.Apollyon', 'ID': 'Ya.Apollyon', 'RU': 'Я.Аполлион', 'TR': 'Ya.Apollyon'}, company=Requester.Company(_unexpected={}, id='1', superintendent_id='56caaaeeea84b3b3765420ef45a08262'))


## Project creation

<b>Note</b>: The project name and description will be visible to the workers.

In [3]:
project = toloka.Project(
    public_name='Classify tweets as positive, neutral or negative',
    public_description='Decide whether a tweet is positive, neutral or negative'
)

Create task interface. Read more about the Template Builder in the [Requester’s Guide](https://yandex.ru/support/toloka-tb/index.html?utm_source=github&utm_medium=site&utm_campaign=tolokakit).

In [4]:
review_view = tb.GroupViewV1(tb.TextViewV1(tb.InputData('tweet')))

radio_group_field = tb.ButtonRadioGroupFieldV1(
    tb.OutputData('sentiment'),
    [
        tb.GroupFieldOption('pos', 'Positive'),
        tb.GroupFieldOption('neu', 'Neutral'),
        tb.GroupFieldOption('neg', 'Negative'),
    ],
    label='Is this tweet positive, neutral or negative?',
    validation=tb.RequiredConditionV1(),
)

task_width_plugin = tb.TolokaPluginV1(
    layout=tb.TolokaPluginV1.TolokaPluginLayout(
        kind='scroll',
        task_width=650,
    )
)

hot_keys_plugin = tb.HotkeysPluginV1(
    key_1=tb.SetActionV1(tb.OutputData('sentiment'), 'pos'),
    key_2=tb.SetActionV1(tb.OutputData('sentiment'), 'neu'),
    key_3=tb.SetActionV1(tb.OutputData('sentiment'), 'neg'),
)

project_interface = toloka.project.view_spec.TemplateBuilderViewSpec(
    view=tb.ListViewV1([review_view, radio_group_field]),
    plugins=[task_width_plugin, hot_keys_plugin],
)

<b>Note</b>: Specifications are a description of input data that will be used in a project and the output data that will be collected from the workers.

Read more about input and output data specifications in the [Requester’s Guide](https://yandex.ru/support/toloka-tb/operations/create-specs.html?utm_source=github&utm_medium=site&utm_campaign=tolokakit).

In [5]:
input_specification = {'tweet': toloka.project.field_spec.StringSpec()}
output_specification = {'sentiment': toloka.project.field_spec.StringSpec()}

Set task interface and data specifications to project.

In [6]:
project.task_spec = toloka.project.task_spec.TaskSpec(
    input_spec=input_specification,
    output_spec=output_specification,
    view_spec=project_interface,
)

Write comprehensive instructions. Be sure to add examples for unobvious cases.

In [7]:
project.public_instructions = """<p>In the task you will have to read tweets and define whether they are positive, neutral or negative</p>
<ul><li><b>Positive.</b> Choose this option if the tweet reflects a user's <b>good</b> attitude towards the topic. For your convenience, you can also use the short-cut by pressing "1"</li>
<li><b>Neutral.</b> Choose this option if the tweet reflects a user's <b>neutral</b> attitude towards the topic or <b>no emotions</b> about the topic at all. For your convenience, you can also use the short-cut by pressing "2"</li>
<li><b>Negative.</b> Choose this option if the tweet reflects a user's <b>bad</b> attitude towards the topic. For your convenience, you can also use the short-cut by pressing "3"</li>
</ul>"""

Create a project.

In [8]:
project = toloka_client.create_project(project)

[INFO] toloka.client: A new project with ID "96206" has been created. Link to open in web interface: https://toloka.yandex.com/requester/project/96206


## Create training and exam pool

As we want to be sure that the quality of labelled data is good and workers haven't just picked labels randomly, we are going to follow one of the most popular crowd labelling pattern: Train-Exam-Main Pool. Before working with non-spoiled data in main pool, workers will train their skills in **training set**, which contains labelled data with hints shown to worker after sending their guesses. Then worker passes an **exam** and after successful copleteion get a **skill**.

A pool is a set of paid tasks grouped into task pages. These tasks are sent out for completion at the same time.

<b>Note</b>: All tasks within a pool have the same settings (price, quality control, etc.)

We start with creating a Training pool and upload training tasks.

In [9]:
training_dataset = pd.read_csv('training_tasks.csv').sample(frac=1)
exam_dataset = pd.read_csv('exam_tasks.csv').sample(frac=1)

In [10]:
training = toloka.Training(
    project_id=project.id,
    private_name='Tweets sentiment training',
    may_contain_adult_content=False,
    assignment_max_duration_seconds=60*10,
    mix_tasks_in_creation_order=True,
    shuffle_tasks_in_task_suite=True,
    training_tasks_in_task_suite_count=10,
    task_suites_required_to_pass=3,
    retry_training_after_days=2,
    inherited_instructions=True,
)
training = toloka_client.create_training(training)

[INFO] toloka.client: A new training with ID "33804144" has been created. Link to open in web interface: https://toloka.yandex.com/requester/project/96206/training/33804144


In [11]:
hint_messages = {
    'pos': 'This tweet shows mostly positive attitude towards the topic',
    'neu': 'This tweet shows neutral attitude towards the topic or just give some information',
    'neg': 'This tweet shows mostly negative attitude towards the topic',
}

training_tasks = [
    toloka.Task(
        pool_id=training.id,
        input_values={
            'tweet': row.tweet
        },
        known_solutions = [toloka.task.BaseTask.KnownSolution(output_values={'sentiment': row.sentiment})],
        message_on_unknown_solution=hint_messages[row.sentiment],
    )
    for row in training_dataset.itertuples()
]
training_tasks = toloka_client.create_tasks(training_tasks, allow_defaults=True)

Then we prepare an **exam pool** with a skill to be given to worker.

In [12]:
exam_skill = next(toloka_client.get_skills(name='Tweets sentiment exam'), None)
if exam_skill:
    print('Tweets sentiment exam exists')
else:
    exam_skill = toloka_client.create_skill(
        name='Tweets sentiment exam',
        hidden=True,
        public_requester_description={'EN': 'How performer deal with tweets sentiment exam'},
    )

Tweets sentiment exam exists


In [13]:
exam = toloka.Pool(
    project_id=project.id,
    # Give the pool any convenient name. You are the only one who will see it.
    private_name='Classify tweets as positive, neutral or negative - exam',
    may_contain_adult_content=False,
    type='EXAM',
    # Set the price per task page.
    reward_per_assignment=0,
    will_expire=datetime.datetime.utcnow() + datetime.timedelta(days=365),
    # Time allowed for completing a task page
    assignment_max_duration_seconds=600,
    filter=(toloka.filter.Languages.in_('EN')),
)

exam.set_mixer_config(golden_tasks_count=10)

In [14]:
exam.set_training_requirement(training_pool_id=training.id, training_passing_skill_value=10)

In [15]:
exam.quality_control.add_action(
    collector=toloka.collectors.GoldenSet(history_size=10),
    conditions=[toloka.conditions.TotalAnswersCount >= 10,],
    action=toloka.actions.SetSkillFromOutputField(
        skill_id=exam_skill.id,
        from_field='correct_answers_rate',
    ),
)

In [16]:
exam = toloka_client.create_pool(exam)

[INFO] toloka.client: A new pool with ID "33804145" has been created. Link to open in web interface: https://toloka.yandex.com/requester/project/96206/pool/33804145


In [17]:
exam_tasks = [
    toloka.Task(
        pool_id=exam.id,
        input_values={
            'tweet': row.tweet
        },
        known_solutions = [toloka.task.BaseTask.KnownSolution(output_values={'sentiment': row.sentiment})],
        infinite_overlap=True,
    )
    for row in exam_dataset.itertuples()
]
exam_tasks = toloka_client.create_tasks(exam_tasks, allow_defaults=True)
print(len(exam_tasks.items))

36


## Create the main pool

Specify the [pool parameters.](https://toloka.ai/docs/guide/concepts/pool_poolparams.html?utm_source=github&utm_medium=site&utm_campaign=tolokakit)

Binary classification tasks are normally paid as basic tasks because these tasks do not take much time. Read more about [pricing principles](https://toloka.ai/knowledgebase/pricing?utm_source=github&utm_medium=site&utm_campaign=tolokakit) in our Knowledge Base.

Choose `Languages.in_('EN')` as your first filter. This way, performers who speak English will be invited to complete this task.
These filters will make it possible for performers to complete your task on their computers or mobile devices.

In [18]:
pool = toloka.Pool(
    project_id=project.id,
    # Give the pool any name you find suitable. You are the only one who will see it.
    private_name='Classify Elon Musk\'s tweets as positive, neutral or negative',
    may_contain_adult_content=False,
    # Set the price per task suite.
    reward_per_assignment=0.01,
    will_expire=datetime.datetime.utcnow() + datetime.timedelta(days=365),
    # Overlap. This is the number of users who will complete the same task.
    # Set an overlap of 3 to get a more confident final label.
    defaults=toloka.Pool.Defaults(default_overlap_for_new_task_suites=3),
    # Specify the time given to complete a task suite (for example, 1200 seconds). To understand how much time it should take to
    # complete a task suite, try doing it yourself.
    assignment_max_duration_seconds=600,
    # Filter performers who can access the task.
    filter=(
        (toloka.filter.Languages.in_('EN')) &
        (toloka.filter.Skill(exam_skill.id) >= 90)
    )
)


## Set up Quality control.

Ban performers who give incorrect responses to control tasks.

Since tasks such as these have an answer that can be used as a ground truth, we can use standard quality control rules such as golden sets.

Read more about [quality control principles](https://toloka.ai/knowledgebase/quality-control?utm_source=github&utm_medium=site&utm_campaign=tolokakit) in our Knowledge Base or [check out control tasks settings](https://toloka.ai/docs/guide/concepts/goldenset.html?utm_source=github&utm_medium=site&utm_campaign=tolokakit) in the Requester’s Guide.

Set up a rule for Captcha. It will be shown at low frequency and those performers who are not accurate will be suspended in the pool for a day.

Captcha is a good tool to check performers’ attention. Read more about different Quality Control rules in [Toloka Knowledge Base.](https://toloka.ai/knowledgebase/quality-control?utm_source=github&utm_medium=site&utm_campaign=tolokakit)


In [19]:
# Turns on captchas
pool.set_captcha_frequency('MEDIUM')

pool.quality_control.add_action(
    collector=toloka.collectors.Captcha(history_size=10),
    conditions=[
        toloka.conditions.StoredResultsCount >= 4,
        toloka.conditions.SuccessRate < 75,
    ],
    action=toloka.actions.RestrictionV2(
        scope='PROJECT',
        duration=1,
        duration_unit='DAYS',
        private_comment='captcha'
    )
)

Set up the [Fast responses rule](https://toloka.ai/docs/guide/concepts/quick-answers.html?utm_source=github&utm_medium=site&utm_campaign=tolokakit). It allows to ban performers who submit tasks at a suspicious speed.

In [20]:
pool.quality_control.add_action(
    collector=toloka.collectors.AssignmentSubmitTime(fast_submit_threshold_seconds=20),
    conditions=[
        toloka.conditions.TotalSubmittedCount > 4,
        toloka.conditions.FastSubmittedCount > 2,
    ],
    action=toloka.actions.RestrictionV2(
        scope='PROJECT',
        duration=1,
        duration_unit='DAYS',
        private_comment='fast responses'
    )
)

Set up the [Submitted responses](https://toloka.ai/docs/guide/concepts/submitted-answers.html?utm_source=github&utm_medium=site&utm_campaign=tolokakit) rule. This is used to get more variety in answers so that the answers won’t be biased toward only a few productive performers.

In [21]:
pool.quality_control.add_action(
    collector=toloka.collectors.AnswerCount(),
    conditions=[
        toloka.conditions.AssignmentsAcceptedCount >= 30,
    ],
    action=toloka.actions.RestrictionV2(
        scope='PROJECT',
        duration=1,
        duration_unit='DAYS',
        private_comment='too many responses'
    )
)

Set Smart mixing option in pool settings and specify the number of tasks of each type per page. We recommend to put as much tasks in one page as a performer can complete in 1 to 5 minutes. This volume does not let performers get tired and protects them from significant data losses in case of a technical issue. To learn more how to group tasks in suites, read the Requester’s Guide.

In [22]:
pool.set_mixer_config(real_tasks_count=7, golden_tasks_count=3, training_tasks_count=0)

Create pool.

In [23]:
pool = toloka_client.create_pool(pool)

[INFO] toloka.client: A new pool with ID "33804147" has been created. Link to open in web interface: https://toloka.yandex.com/requester/project/96206/pool/33804147


## Collect tweets from bright data collector

Now we need to have a data which we want to be labelled. We use Bright Data Collector to get the most up-to-date tweets from Elon Musk's profile.

What you need to proceed with the demo is Bright Data API token and Data Collector ID.

In [24]:
brightdata_api = getpass.getpass('Enter Bright Data API token: ')

Enter Bright Data API token: ········


In [25]:
brightdata_collector = getpass.getpass('Enter Bright Data Collector ID: ')

Enter Bright Data Collector ID: ········


In [26]:
data = {'url': 'https://twitter.com/elonmusk'}
headers = {'Content-Type': 'application/json','Authorization': f'Bearer {brightdata_api}'}
r = requests.post(f'https://api.brightdata.com/dca/trigger_immediate?collector={brightdata_collector}', data=json.dumps(data), headers=headers)
print(r.content)

b'{"response_id":"z2798t1654551105893r9rkp9njelf8"}'


In [27]:
response_id = json.loads(r.content)['response_id']
response_id

'z2798t1654551105893r9rkp9njelf8'

`response_id` now stores Data Collection Task ID. We need to wait for collector to scrape data

In [28]:
collection_result = json.loads(r.content)
while 'response_id' in collection_result or 'pending' in collection_result:
    time.sleep(60)
    r = requests.get(f'https://api.brightdata.com/dca/get_result?response_id={response_id}', headers=headers)
    collection_result = json.loads(r.content)
    print(collection_result)
    
print(collection_result)

{'input': {'url': 'https://twitter.com/elonmusk'}, 'lines': [{'posts': 18309, 'profile_background_image_url': 'https://pbs.twimg.com/profile_banners/44196397/1576183471', 'profile_image_url': 'https://pbs.twimg.com/profile_images/1529956155937759233/Nyn1HZWF_normal.jpg', 'profile_name': 'Elon Musk', 'profile_url': 'https://twitter.com/elonmusk', 'isVerified': True, 'bio': '', 'following': 114, 'followers': 96846282, 'scrape_time': '2022-06-06T21:32:27.088Z', 'posts_info': [{'text': 'What resolution is life in, 8k? – SJM', 'time': 'Mon Jun 06 17:05:47 +0000 2022', 'id': '1533857698575310848', 'replies': 8626, 'retweets': 3490, 'likes': 57603}, {'text': 'If chess was released as a video game https://t.co/8SuK8Mg7yT', 'time': 'Mon Jun 06 01:17:17 +0000 2022', 'id': '1533619000986554368', 'media_url': 'https://pbs.twimg.com/media/FUiCU3CX0AAw1BX.jpg', 'replies': 8882, 'retweets': 16170, 'likes': 226709}, {'text': 'The acid test for any two competing socioeconomic systems is which side need

After recieving data, we prepare it for uploading to Toloka pool.

In [29]:
task_dataset = pd.DataFrame.from_dict(collection_result ['lines'][-1]['posts_info'])
task_dataset['text'] = task_dataset["text"].str.replace(r'\s*https?://\S+(\s+|$)', '').str.replace('\n','').str.strip()
task_dataset['text'].replace('', np.nan, inplace=True)
task_dataset.dropna(subset=['text'], inplace=True)
task_dataset.head()

Unnamed: 0,text,time,id,replies,retweets,likes,media_url,views,quoted_id,is_quoted
0,"What resolution is life in, 8k? – SJM",Mon Jun 06 17:05:47 +0000 2022,1533857698575310848,8626,3490,57603,,,,
1,If chess was released as a video game,Mon Jun 06 01:17:17 +0000 2022,1533619000986554368,8882,16170,226709,https://pbs.twimg.com/media/FUiCU3CX0AAw1BX.jpg,,,
2,The acid test for any two competing socioeconomic systems is which side needs to build a wall to keep people from escaping? That’s the bad one!,Mon Jun 06 01:06:53 +0000 2022,1533616384747442176,7886,16496,144093,,,,
3,"Realized what I have in common with environmentalists, but also why they’re so annoyingly wrong: They are conservationists of what is, whereas they should be conservationists of our potential over time, our cosmic endowment.(From a friend)",Mon Jun 06 00:25:51 +0000 2022,1533606056756137985,7168,5289,65182,,,,
4,"From Shakespeare’s The Tempest, but I much prefer it literally vs ironically",Mon Jun 06 00:15:25 +0000 2022,1533603431373578241,1968,2083,37949,,,,


## Prepare and upload tasks

We start preparing tasks with Contol tasks, which also called Golden tasks. Control tasks are tasks that already contain the correct response. They are used for checking the quality of responses from performers. The performer's response is compared to the response you provided. If they match, it means the performer answered correctly.

<b>Tip.</b> Make sure to include different variations of correct responses in equal amounts.

We will use [Twitter Tweets Sentiment Dataset](https://www.kaggle.com/datasets/yasserh/twitter-tweets-sentiment-dataset) datasets from Kaggle as Golden tasks under Creative Commons Public Domain 1.0 International license


[![CC BY-NC-SA 4.0](https://img.shields.io/badge/License-CC0%20%201.0-lightgrey.svg)](https://creativecommons.org/publicdomain/zero/1.0/).

In [30]:
golden_dataset = pd.read_csv('tweets_golden_set.csv').sample(frac=1).reset_index(drop=True)
golden_dataset.head()

Unnamed: 0,tweet,sentiment
0,resistance was futile! Needed pretties to knit with,neg
1,"FCKeditor is giving me problems! Posts just fine, but only edits in plain text! Help!",neg
2,please check out www.mysweetebony.com and lmk what you think ... my first paysite ... post up your site too!,neu
3,so ur name is also Naina,neu
4,kenny u alive!!!...I`m here getting da hair done..to bad I`m not chillin w/ u todat kinda sad,neg


In [31]:
golden_amount = int (len(task_dataset)* 3 / 7) + 3

positive_golden_dataset = golden_dataset[golden_dataset.sentiment == 'pos'].sample(golden_amount // 3)
neutral_golden_dataset = golden_dataset[golden_dataset.sentiment == 'neu'].sample(golden_amount // 3)
negative_golden_dataset = golden_dataset[golden_dataset.sentiment == 'neg'].sample(golden_amount // 3)

new_golden_dataset = pd.concat([positive_golden_dataset, neutral_golden_dataset, negative_golden_dataset]).sample(frac=1)

In [32]:

golden_tasks = [
    toloka.Task(
        pool_id=pool.id,
        input_values={'tweet': row['tweet']},
        known_solutions = [
            toloka.task.BaseTask.KnownSolution(
                output_values={'sentiment': row['sentiment']}
            )
        ],
        infinite_overlap=True,
    )
    for _, row in new_golden_dataset.iterrows()
]

In [33]:
tasks = [
    toloka.Task(
        pool_id=pool.id,
        input_values={'tweet': tweet},
    )
    for tweet in task_dataset['text']
]

Upload tasks

In [34]:
created_tasks = toloka_client.create_tasks(golden_tasks + tasks, allow_defaults=True)
print(len(created_tasks.items))

53


You can go to the pool preview page and in web-interface you can see something like this:
<table  align="center">
  <tr><td>
      <img src="./img/pool_preview_for_workers.png"
         alt="Pool interface"  width="1000">
  </td></tr>
  <tr><td align="center">
    <b>Figure 1.</b> What the pool interface might look like.
  </td></tr>
</table>

Start the pool.

<b>Note</b>: Remember that the tasks will be completed by actual Tolokers. Double check that everything is correct with your project configuration.

In [35]:
training = toloka_client.open_training(training.id)
print(f'training - {training.status}')

exam = toloka_client.open_pool(exam.id)
print(f'exam - {exam.status}')

pool = toloka_client.open_pool(pool.id)
print(f'main pool - {pool.status}')

training - Status.OPEN
exam - Status.OPEN
main pool - Status.OPEN


## Receiving responses

Wait until the pool is completed.

In [36]:
pool_id = pool.id

def wait_pool_for_close(pool_id, minutes_to_wait=1):
    sleep_time = 60 * minutes_to_wait
    pool = toloka_client.get_pool(pool_id)
    while not pool.is_closed():
        op = toloka_client.get_analytics([toloka.analytics_request.CompletionPercentagePoolAnalytics(subject_id=pool.id)])
        op = toloka_client.wait_operation(op)
        percentage = op.details['value'][0]['result']['value']
        print(
            f'   {datetime.datetime.now().strftime("%H:%M:%S")}\t'
            f'Pool {pool.id} - {percentage}%'
        )
        time.sleep(sleep_time)
        pool = toloka_client.get_pool(pool.id)
    print('Pool was closed.')

wait_pool_for_close(pool_id)

exam = toloka_client.close_pool(exam.id)
print(f'exam - {exam.status}')

training = toloka_client.close_training(training.id)
print(f'training - {training.status}')

   00:32:59	Pool 33804147 - 0%
   00:34:02	Pool 33804147 - 13%
   00:35:03	Pool 33804147 - 60%
   00:36:05	Pool 33804147 - 80%
   00:37:07	Pool 33804147 - 86%
   00:38:09	Pool 33804147 - 93%
   00:39:10	Pool 33804147 - 100%
Pool was closed.
exam - Status.CLOSED
training - Status.CLOSED


Get responses.

In [37]:
answers_df = toloka_client.get_assignments_df(pool_id)

# Drop golden tasks
answers_df = answers_df[answers_df['GOLDEN:sentiment'].isna()]

# Prepare DataFrame for aggregation
answers_df = answers_df.rename(columns={
    'INPUT:tweet': 'task',
    'OUTPUT:sentiment': 'label',
    'ASSIGNMENT:worker_id': 'worker'
})

print(f'answers count: {len(answers_df)}')

answers count: 105


Run aggregation using the [Dawid-Skene](https://toloka.ai/docs/guide/concepts/result-aggregation.html?utm_source=github&utm_medium=site&utm_campaign=tolokakit#aggr__dawid-skene) model.

We use this aggregation model because our questions are of the same difficulty, and we don't have many control tasks.

Read more about the Dawid-Skene model in the Requester’s Guide or get at an overview of different aggregation models in our Knowledge Base.


In [38]:
# Run aggregation
predicted_answers = DawidSkene(n_iter=20).fit_predict(answers_df)

Look at the results.

In [39]:
pd.DataFrame({'tweets': predicted_answers.index, 'sentiment': predicted_answers.values}).sample(10)

Unnamed: 0,tweets,sentiment
15,"It is rare for me to endorse political candidates. My political leanings are moderate, so neither fully Republican nor Democrat, which I am confident is the case for most Americans.Executive competence is super underrated in politics – we should care about that a lot more!",neu
17,"When thinking about deep time, what is more astounding is to think about how much time is ahead!",pos
19,"Realized what I have in common with environmentalists, but also why they’re so annoyingly wrong: They are conservationists of what is, whereas they should be conservationists of our potential over time, our cosmic endowment.(From a friend)",neg
34,10 years since SpaceX’s first mission to @Space_Station,neu
27,Tesla Plaid S cruising around Austin with volume at 11 is sublime,neu
3,Literally …,neu
1,"RT @SpaceX: Two years ago yesterday, SpaceX launched its first human spaceflight to the @space_station",neu
6,"What resolution is life in, 8k? – SJM",neu
11,Tomorrow will be the first sunrise of the rest of ur life – make it what u want,pos
28,"RT @Tesla: Tesla navigation will now take predicted crosswind, headwind, humidity &amp; temperature into account for calculating battery % on a…",neu


And finally let's count tweets by sentiment

In [40]:
predicted_answers.value_counts()

neu    22
pos     9
neg     4
Name: agg_label, dtype: int64