We present Pixie, a manually annotated dataset for Comparative Preference Classification (CPC) from app reviews. While previous works have focused on direct explicit comparisons (A is better than B), Pixie includes implicit (B is slow) or indirect comparisons (A is here immediately while B takes forever) common in user generated text such as reviews. Pixie contains comparative sentences that reveal user preferences without the use of linguistic comparative structures.
/src contains code for our experiments. We experiment with traditional machine learning approaches with BERT sentence embeddings and transformer-based methods on Pixie to identify preferred entities in comparative sentences from app reviews. We also compare our results with the existing state-of-the-art model in CPC trained on Pixie.
We find that transformer-based pretrained models fine-tuned on Pixie can achieve a weighted average F1 score of 83.34% and notably outperform the previous state-of-the-art method (73.99%).