In this guideline, we will research how to use some aggregation methods for text and embedding specifically. We are going to use some machine learning models and for faster calculating it's better to use GPU. First of all, let's install all the necessary libraries:

In [None]:
!pip install pandas
!pip install ipyplot
!pip install crowd-kit==1.0.0
!pip install sentence_transformers
!pip install jiwer


The first task we are going to solve - is the embedding aggregation problem. Let's import the necessary libraries.

In [2]:
from crowdkit.datasets import get_datasets_list
from crowdkit.datasets import load_dataset
import pandas as pd
import numpy as np

The function get_datasets_list() returns the list of all available datasets from crowd-kit

In [3]:
get_datasets_list()

[('relevance-2',
  'This dataset, designed for evaluating answer aggregation methods in crowdsourcing, contains around 0.5 million anonymized crowdsourced labels collected in the Relevance 2 Gradations project in 2016 at Yandex. In this project, query-document pairs are provided with binary labels: relevant or non-relevant.'),
 ('relevance-5',
  'This dataset was designed for evaluating answer aggregation methods in crowdsourcing. It contains around 1 million anonymized crowdsourced labels collected in the Relevance 5 Gradations project in 2016 at Yandex. In this project, query-document pairs are labeled on a scale of 1 to 5. from least relevant to most relevant.'),
 ('mscoco',
  'A sample of 2,000 images segmentations from MSCOCO dataset (https://cocodataset.org, licensed under Creative Commons Attribution 4.0 International Public License.) annotated on Toloka by 911 peformers. For each image, 9 workers submitted segmentations.'),
 ('mscoco_small',
  'A sample of 100 images segmentati

The load_dataset function returns a pair of elements. The first element is the pandas data frame with the crowdsourced data. The second element is the ground truth dataset, whenever possible. The data frame, or df, has three columns: worker, task, and label. The label is set to 0 if the document is rated as non-relevant by the given annotator in the given task, otherwise, the label will be 1. The ground truth dataset df_gt is a pandas series that contains the correct responses to the tasks put to the index of this series.

In [4]:
df, gt = load_dataset('crowdspeech-test-clean')

df['text'] = df['text'].apply(lambda s: s.lower())

Downloading crowdspeech-test-clean from remote
Unpacking crowdspeech-test-clean.zip


We need to rename the columns here to avoid the error because, in the crowdspeech dataset, the names of columns are different but the function we are going to use needs the specific columns name.

In [5]:
df = df.rename(columns={'performer': 'worker'})
df = df.rename(columns={'text': 'output'})

Let's check our dataset

In [6]:
df

Unnamed: 0,task,worker,output
0,https://tlk.s3.yandex.net/annotation_tasks/lib...,964,then again there was no known way to lubricate...
1,https://tlk.s3.yandex.net/annotation_tasks/lib...,445,then again there was no known way to lubricate...
2,https://tlk.s3.yandex.net/annotation_tasks/lib...,1889,then again there was no known way to lubricate...
3,https://tlk.s3.yandex.net/annotation_tasks/lib...,445,almost instantly was forced to the top
4,https://tlk.s3.yandex.net/annotation_tasks/lib...,964,almost instantly you with wash to the ?
...,...,...,...
18335,https://tlk.s3.yandex.net/annotation_tasks/lib...,309,"here comes their, it glides. now it is up the ..."
18336,https://tlk.s3.yandex.net/annotation_tasks/lib...,901,kenneth and beth refrained
18337,https://tlk.s3.yandex.net/annotation_tasks/lib...,1548,i hope the two elder father and i responsible ...
18338,https://tlk.s3.yandex.net/annotation_tasks/lib...,3066,underscore these words for they are full of co...


In [7]:
gt

task
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/0.mp3       young fitzooth had been commanded to his mothe...
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/1.mp3       there befell an anxious interview mistress fit...
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/2.mp3       most of all robin thought of his father what w...
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/3.mp3       if for a whim you beggar yourself i cannot sta...
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/4.mp3       but take it whilst i live and wear montfichet'...
                                                                                                    ...                        
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/2615.mp3    it is evident therefore that the present trend...
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/2616.mp3    it is also noticeable

As you can see data frame 'df' don't have the embedding column - we need to create it to calculate embedding metrics. For this we are going to use the SentenceTransformer and the model:

In [None]:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
df['embedding'] = df['output'].apply(lambda s: model.encode(s))

As you can see we added the new column into data frame

In [9]:
df

Unnamed: 0,task,worker,output,embedding
0,https://tlk.s3.yandex.net/annotation_tasks/lib...,964,then again there was no known way to lubricate...,"[-0.04020776, 0.0067358925, 0.05979881, 0.1144..."
1,https://tlk.s3.yandex.net/annotation_tasks/lib...,445,then again there was no known way to lubricate...,"[-0.095029786, 0.0154505, 0.056588538, 0.13282..."
2,https://tlk.s3.yandex.net/annotation_tasks/lib...,1889,then again there was no known way to lubricate...,"[-0.06920603, -0.0011689191, 0.03827777, 0.115..."
3,https://tlk.s3.yandex.net/annotation_tasks/lib...,445,almost instantly was forced to the top,"[0.040035784, 0.052871052, 0.003033586, 0.0266..."
4,https://tlk.s3.yandex.net/annotation_tasks/lib...,964,almost instantly you with wash to the ?,"[-0.004239588, 0.043570317, 0.019411284, 0.090..."
...,...,...,...,...
18335,https://tlk.s3.yandex.net/annotation_tasks/lib...,309,"here comes their, it glides. now it is up the ...","[0.011544286, 0.035661004, 0.056194123, 0.0631..."
18336,https://tlk.s3.yandex.net/annotation_tasks/lib...,901,kenneth and beth refrained,"[-0.00089914503, 0.025945157, 0.046376966, -0...."
18337,https://tlk.s3.yandex.net/annotation_tasks/lib...,1548,i hope the two elder father and i responsible ...,"[0.018889101, -0.03745493, 0.04225569, -0.0968..."
18338,https://tlk.s3.yandex.net/annotation_tasks/lib...,3066,underscore these words for they are full of co...,"[0.0035546476, 0.012647274, 0.039280064, 0.074..."


Now we can use the RASA - Reliability Aware Sequence Aggregation:

In [10]:
from crowdkit.aggregation import RASA

#fit_preditct() - Fit the model and return aggregated outputs.
result_RASA = RASA().fit_predict(df)

#We need to make a lowercase text in order to compare the final result with golden set
result_RASA['output'] = result_RASA['output'].apply(lambda s: s.lower())

In [11]:
result_RASA

Unnamed: 0_level_0,output,embedding
task,Unnamed: 1_level_1,Unnamed: 2_level_1
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/0.mp3,young fitzu had been commanded to his mother's...,"[0.028490134, -0.004826027, 0.00010397238, -0...."
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/1.mp3,there be felling anxious interview mister smit...,"[-0.05984814, 0.084188975, -0.026369223, 0.002..."
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/10.mp3,dismiss your squire robin and bid me goody in,"[-0.04156442, 0.06937688, 0.025341447, -0.0066..."
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/100.mp3,it will be no disappointment to me,"[-0.08131021, 0.021344986, 0.03386112, 0.01117..."
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/1000.mp3,i was thinking it's very like the ace of heart...,"[-0.08748426, 0.07625206, -0.006550895, 0.0262..."
...,...,...
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/995.mp3,but her greeting the captain leak was more tha...,"[-0.033350002, 0.069584064, 0.0719789, 0.05270..."
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/996.mp3,at dinner lake was easy and amusing,"[0.023130937, 0.058338255, 0.021346621, -0.001..."
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/997.mp3,i'm glad you liked it says wilder chuckling be...,"[-0.0019007322, 0.045743566, -0.022639899, 0.0..."
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/998.mp3,i believe i have a little taste that way those...,"[-0.08742515, -0.062198542, -0.0054394, -0.007..."


Let's try another method - HRRASA - Hybrid Reliability and Representation Aware Sequence Aggregation.

In [12]:
from crowdkit.aggregation import HRRASA

result_HRRASA = HRRASA().fit_predict(df)

result_HRRASA['output'] = result_HRRASA['output'].apply(lambda s: s.lower())

In [13]:
result_HRRASA

Unnamed: 0_level_0,output,embedding
task,Unnamed: 1_level_1,Unnamed: 2_level_1
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/0.mp3,young fitzu had been commanded to his mother's...,"[0.028490134, -0.004826027, 0.00010397238, -0...."
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/1.mp3,there be felling anxious interview mister smit...,"[-0.05984814, 0.084188975, -0.026369223, 0.002..."
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/10.mp3,dismiss your squire robin and bid me goody in,"[-0.04156442, 0.06937688, 0.025341447, -0.0066..."
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/100.mp3,it will be no disappointment to me,"[-0.08131021, 0.021344986, 0.03386112, 0.01117..."
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/1000.mp3,i was thinking it's very like the ace of heart...,"[-0.08748426, 0.07625206, -0.006550895, 0.0262..."
...,...,...
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/995.mp3,but her greeting to captain leek was more than...,"[-0.051279552, 0.09634754, 0.062635474, 0.0479..."
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/996.mp3,at dinner lake was easy and amusing,"[0.023130937, 0.058338255, 0.021346621, -0.001..."
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/997.mp3,"i'm glad you like it says wilder, chuckling be...","[0.052483276, 0.025777997, -0.007196051, 0.030..."
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/998.mp3,i believe i have a little taste that way those...,"[-0.08742515, -0.062198542, -0.0054394, -0.007..."


The last method of embedding aggregation - ClosestToAverage

In [14]:
from crowdkit.aggregation import ClosestToAverage

#Here we additionally need to state the 'distance' argument - it's the measure for estimating the proximity of vectors. We will use MSE
result_ClosestToAverage = ClosestToAverage(distance = lambda x, y: np.sqrt(np.sum((x - y) ** 2))).fit_predict(df)

result_ClosestToAverage['output'] = result_ClosestToAverage['output'].apply(lambda s: s.lower())

In [15]:
result_ClosestToAverage

Unnamed: 0_level_0,output,embedding
task,Unnamed: 1_level_1,Unnamed: 2_level_1
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/0.mp3,young fitsuit had been commanded to his mother...,"[0.026935445, 0.07048155, -0.021111224, 0.0313..."
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/1.mp3,there be felling anxious interview mister smit...,"[-0.05984814, 0.084188975, -0.026369223, 0.002..."
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/10.mp3,dismiss your squire robin and bid me goody in,"[-0.04156442, 0.06937688, 0.025341447, -0.0066..."
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/100.mp3,it will be no disappointment to me,"[-0.08131021, 0.021344986, 0.03386112, 0.01117..."
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/1000.mp3,i was thinking it's very like the ace of heart...,"[-0.08872618, 0.06491811, -0.010857003, 0.0224..."
...,...,...
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/995.mp3,but her greeting the captain leak was more tha...,"[-0.033350002, 0.069584064, 0.0719789, 0.05270..."
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/996.mp3,at dinner lake was easy and amusing,"[0.023130937, 0.058338255, 0.021346621, -0.001..."
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/997.mp3,i'm glad you liked it says wilder chuckling be...,"[-0.0019007322, 0.045743566, -0.022639899, 0.0..."
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/998.mp3,i believe i have a little taste that way those...,"[-0.08742515, -0.062198542, -0.0054394, -0.007..."


Let's compare the results of embedding methods - we will use word error rate (WER) from jiwer. The less the WER metric the better-predicted word sequence we did. Let's see the difference!

In [28]:
from jiwer import wer


input_RASA = list()
target_RASA = list()

#Making target and input lists from RASA method and golden set
for i in gt.index:
  input_RASA.append(gt.loc[i])
  target_RASA.append(result_RASA['output'].loc[i])

print('The WER of RASA -', wer(target_RASA, input_RASA))

input_HRRASA = list()
target_HRRASA = list()

#Making target and input lists from HRRASA method and golden set
for i in gt.index:
  input_HRRASA.append(gt.loc[i])
  target_HRRASA.append(result_HRRASA['output'].loc[i])

print('The WER of HRRASA -', wer(target_HRRASA, input_HRRASA))


input_ClosestToAverage = list()
target_ClosestToAverage = list()

#Making target and input lists from ClosestToAverage method and golden set
for i in gt.index:
  input_ClosestToAverage.append(gt.loc[i])
  target_ClosestToAverage.append(result_ClosestToAverage['output'].loc[i])

print('The WER of ClosestToAverage -', wer(target_ClosestToAverage, input_ClosestToAverage))

The WER of RASA - 0.09224330677062433
The WER of HRRASA - 0.09590533720752409
The WER of ClosestToAverage - 0.09446771078347006


Now let's see how we can solve the aggregation problem with text responses. We will use three methods - ROVER, TextRASA, TextHRRASA. Let's import them

In [17]:
from crowdkit.aggregation import ROVER
from crowdkit.aggregation import TextHRRASA
from crowdkit.aggregation import TextRASA

At first, we will use Recognizer Output Voting Error Reduction - ROVER. It's a dynamic programming method to align sequences. As for ClosestToAverage, we need to determine the tokenizer and detokenizer functions. For the tokenizer, we will split a sentence by spaces and for the detokenizer we will glue the words into a string

In [20]:
tokenizer = lambda s: s.split(' ')
detokenizer = lambda tokens: ' '.join(tokens)

#We need to rename the column for ROVER method
df_ROVER = df
df_ROVER = df_ROVER.rename(columns={'output': 'text'})

result_ROVER = ROVER(tokenizer, detokenizer).fit_predict(df_ROVER)

In [21]:
result_ROVER

task
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/0.mp3       young suit had been commanded to his mother's ...
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/1.mp3       there be an anxious interview mistress mr........
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/10.mp3           dismiss your squire robin and bit me evening
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/100.mp3                    it will be no disappointment to me
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/1000.mp3    i was thinking it's very like the ace of heart...
                                                                                                    ...                        
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/995.mp3     but her greeting to captain leak was more than...
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/996.mp3                   at dinn

Next, we will try TextHRRASA. We need to determine the encoder - a callable that takes a text and returns a NumPy array containing the corresponding embedding. For that, we will use the model from sentence_transformers. The model.encode returns the embedding

In [None]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-mpnet-base-v2')

result_TextHRRASA = TextHRRASA(encoder = model.encode).fit_predict(df)


In [23]:
result_TextHRRASA

Unnamed: 0_level_0,output
task,Unnamed: 1_level_1
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/0.mp3,young fitzu had been commanded to his mother's...
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/1.mp3,there be found an anxious interview me filthri...
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/10.mp3,dismiss your squire robin and bid me goody in
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/100.mp3,it will be no disappointment to me
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/1000.mp3,i was thinking it's very like the ace of heart...
...,...
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/995.mp3,but her greeting to captain leek was more than...
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/996.mp3,at dinner lake was easy and amusing
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/997.mp3,"i'm glad you like it says wilder, chuckling be..."
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/998.mp3,i believe i have a little taste that way. thos...


The last method is RASA. Do the same as before

In [24]:
from crowdkit.aggregation import TextRASA

result_TextRASA = TextRASA(encoder=model.encode).fit_predict(df)


In [25]:
result_TextRASA

Unnamed: 0_level_0,output
task,Unnamed: 1_level_1
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/0.mp3,young fitzu had been commanded to his mother's...
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/1.mp3,there be found an anxious interview me filthri...
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/10.mp3,dismiss your squire robin and bid me goody in
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/100.mp3,it will be no disappointment to me
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/1000.mp3,i was thinking it's very like the ace of heart...
...,...
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/995.mp3,but her greeting to captain leek was more than...
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/996.mp3,at dinner lake was easy and amusing
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/997.mp3,"i'm glad you like it says wilder, chuckling be..."
https://tlk.s3.yandex.net/annotation_tasks/librispeech/test-clean/998.mp3,i believe i have a little taste that way those...


Now let's compare these 3 methods of textual responses by the same WER metric

In [29]:
input_TextRASA = list()
target_TextRASA = list()

#Making target and input lists from TextRASA method and golden set
for i in gt.index:
  input_TextRASA.append(gt.loc[i])
  target_TextRASA.append(result_TextRASA['output'].loc[i])

print('The WER of TextRASA -', wer(target_TextRASA, input_TextRASA))

input_TextHRRASA = list()
target_TextHRRASA = list()

#Making target and input lists from TextHRRASA method and golden set
for i in gt.index:
  input_TextHRRASA.append(gt.loc[i])
  target_TextHRRASA.append(result_TextHRRASA['output'].loc[i])

print('The WER of TextHRRASA -', wer(target_TextHRRASA, input_TextHRRASA))


input_ROVER = list()
target_ROVER = list()

#Making target and input lists from ClosestToAverage method and golden set
for i in gt.index:
  input_ROVER.append(gt.loc[i])
  target_ROVER.append(result_ROVER.loc[i])

print('The WER of ROVER -', wer(target_ROVER, input_ROVER))

The WER of TextRASA - 0.0907894234804838
The WER of TextHRRASA - 0.09415696009165553
The WER of ROVER - 0.07017374889239897
