Skip to content
Switch branches/tags
Go to file

Latest commit


Git stats


Failed to load latest commit information.

Tweet Sentiment Quantification: An Experimental Re-Evaluation

ECIR2021: Reproducibility track

This repo contains the code to reproduce all experiments discussed in the paper entitled Tweet Sentiment Quantification: An Experimental Re-Evaluation which is submitted for consideration to the ECIR2021's track on Reproducibility


  • skicit-learn, numpy, scipy
  • svmperf patched for quantification (see below)
  • absl-py
  • tqdm

A simple way to get started is to create a conda environment from the configuration file environment_q.yml. At this point it is useful to run the scripts that prepare the datasets and the svmperf package (explained below):

conda create ecir -f environment_cc.yml
conda activate ecir
git clone
cd TweetSentQuant
chmod +x *.sh

Test that everything works by running:

cd src
python3 --dataset hcr --method cc --learner lr

SVM-perf with quantification-oriented losses

In order to run experiments involving SVM(Q), SVM(KLD), SVM(NKLD), SVM(AE), or SVM(RAE), you have to first download the svmperf package, apply the patch svm-perf-quantification-ext.patch, and compile the sources. The script does all the job. Simply run:


The resulting directory svm_perf_quantification contains the patched version of svmperf with quantification-oriented losses. Make sure that the variable SVM_PERF_HOME from ./src/ points to the right path if you decide to move it somewhere else.

The svm-perf-quantification-ext.patch is an extension of the patch made available by Esuli et al. 2015 that allows SVMperf to optimize for the Q measure as proposed by Barranquero et al. 2015 and for the KLD and NKLD as proposed by Esuli et al. 2015 for quantification. This patch extends the former by also allowing SVMperf to optimize for AE and RAE.


The 11 datasets used in this work can be downloaded from here. The datasets are in vector form, and in sparse format.

The file semeval15.test.feature.txt is corrupted in the zip file (all documents have the 0 label). The script replaces the wrong labels with the correct ones in semeval15.test.labels.npy.

In order to prepare the datasets (download and patch the file), simply run the script:


Reproduce Experiments

All experiments and tables reported in the paper can be reproduced by running the script in ./src folder:


Each of the experiments runs the file with different arguments. Run the command:

python --help

to display the arguments and options:

       USAGE: [flags]
  --dataset: the name of the dataset (e.g, sanders)
  --error: error to optimize for in model selection (none acce f1e mae mrae)
    (default: 'mae')
  --learner: a classification learner method (lr svmperf)
  --method: a quantificaton method (cc, acc, pcc, pacc, emq, svmq, svmkld,
    svmnkld, svmmae, svmmrae)
  --results: where to pickle the results as a pickle containing the true
    prevalences and the estimated prevalences according to the artificial
    sampling protocol
    (default: '../results')
  --results_point: where to pickle the results as a pickle containing the true
    prevalences and the estimated prevalences according to the natural
    (default: '../results_point')
  --sample_size: sampling size
    (default: '100')
    (an integer)
  --seed: a numeric seed for aligning random processes and a suffix to be used
    in the the result file path, e.g., "run0"
    (default: '0')
    (an integer)

For example, the following command will train and test the Adjusted Classify & Count variant with a LR as the learner device for classification, and will perform a grid-search optimization of hyperparameters in terms of MAE for the dataset Sanders.

python --dataset sanders --method acc --learner lr --error mae

The program will produce a pickle file in ../results/sanders-acc-lr-100-mae-run0.pkl that contains the true prevalences of the sampled used during test (a np.array of 5775 prevalences, 21x22/2 prevalences x 25 repetitions, according to the artificial sampling protocol with three classes) and the estimated prevalences (a np.array with the 5775 estimations delivered by the ACC method for each of the test samples).

The resulting pickles are used for evaluating and comparing the different runs. The evaluation of the current run is shown before exiting. In this example:

optimization finished: refitting for {'C': 1000.0, 'class_weight': 'balanced'} (score=0.06271) on the whole development set

0.000, 0.000, 1.000->[0.023+-0.0162, 0.139+-0.0553, 0.838+-0.0514]
0.000, 0.050, 0.950->[0.016+-0.0191, 0.189+-0.0687, 0.795+-0.0647]
0.000, 0.100, 0.900->[0.016+-0.0210, 0.230+-0.0695, 0.753+-0.0642]
0.000, 0.150, 0.850->[0.025+-0.0283, 0.244+-0.0676, 0.731+-0.0589]
0.000, 0.200, 0.800->[0.016+-0.0193, 0.308+-0.0638, 0.675+-0.0637]
0.000, 0.250, 0.750->[0.019+-0.0256, 0.330+-0.0797, 0.652+-0.0752]
0.900, 0.000, 0.100->[0.896+-0.0506, 0.030+-0.0386, 0.074+-0.0352]
0.900, 0.050, 0.050->[0.894+-0.0601, 0.066+-0.0692, 0.040+-0.0259]
0.900, 0.100, 0.000->[0.871+-0.0757, 0.117+-0.0799, 0.012+-0.0222]
0.950, 0.000, 0.050->[0.936+-0.0562, 0.043+-0.0613, 0.021+-0.0233]
0.950, 0.050, 0.000->[0.928+-0.0506, 0.064+-0.0509, 0.008+-0.0139]
1.000, 0.000, 0.000->[0.978+-0.0240, 0.013+-0.0213, 0.010+-0.0143]

Evaluation Metrics:

I1030 19:53:29.857306 4762131904] saving results in ../results/sanders-acc-lr-100-mae-run0.pkl

Point-Test evaluation:
true-prev=0.164, 0.688, 0.148, estim-prev=0.163, 0.708, 0.129

Note that the first evaluation corresponds to the artificial sampling protocol, in which a grid of prevalences is explored. The second evaluation is a single evaluation, carried out in the test set with natural prevalence, i.e., without performing sampling (as was done in past literature).


Tweet Sentiment Quantification: An Experimental Re-Evaluation



No releases published


No packages published