Overview

This repository contains datasets used in the paper: "Balanced and Token-Efficient Summarization of User Reviews via Stratified Sampling and Large Language Models".

Overview

The paper introduces a novel methodology that leverages Large Language Models (LLMs) such as BERT and GPT to create comprehensive and balanced summaries of user-generated reviews. Unlike conventional summarization methods, which often highlight positive aspects to encourage purchases, this approach aims to present an unbiased perspective by covering both positive and negative aspects of the reviews.

Datasets

The repository includes the following datasets:

Amazon Product Reviews:

Content: Reviews of 50 selected products, primarily in the electronics category. Each product has hundreds of reviews, totaling approximately 10,000 entries.
Features: Review text, rating, title, review reactions, user verification status, location, and date of the review.
Distribution: Balanced across all possible ratings (1 to 5 stars).

Tripadvisor Hotel Reviews:

Content: Reviews of 150 hotels in New York.
Features: Review title, user rating, language, travel date, type of trip.
Trip Types: Couples, solo, family, business, friends, and not specified.

How to Cite

If you use the datasets or the methodology described in this paper, please cite it as follows:

@InProceedings{ecml-pkdd-marozzo-2025,
  author    = {Fabrizio Marozzo and Loris Belcastro and Cristian Cosentino and Pietro Lio},
  title     = {Balanced and Token-Efficient Summarization of User Reviews via Stratified Sampling and Large Language Models},
  booktitle = {Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD)},
  year      = {2025}
}

Contact

For questions or feedback, please reach out to lbelcastro@dimes.unical.it, ccosentino@dimes.unical.it, or fmarozzo@dimes.unical.it.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
amazon		amazon
code/summarization		code/summarization
tripadvisor		tripadvisor
.gitignore		.gitignore
README.md		README.md
tripadvisor_file_entity_mapping.json		tripadvisor_file_entity_mapping.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Datasets

Amazon Product Reviews:

Tripadvisor Hotel Reviews:

How to Cite

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview

Datasets

Amazon Product Reviews:

Tripadvisor Hotel Reviews:

How to Cite

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages