Skip to content

bookingcom/ml-dataset-reviews

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Booking.com Accommodation Review Dataset

This repository contains the training set of the user-generated review dataset of Booking.com reviews. The training set contains about 1.6M reviews from 40k accommodations around the world. All reviews were written by guests who stayed at the accommodation.

The dataset consists of English reviews published in 2023. All reviews have passed a moderation process ensuring they are genuine and do not violate the platform guidelines. To avoid displaying reviews that do not reflect of the accommodation's overall sentiment, reviews with two points or more below the average accommodation score were excluded. In order to preserve user privacy, no personally identifiable information was included in the data. Similarly, to protect business-sensitive statistics, the dataset is limited to only tens of thousands accommodations. Finally, we selected only informative reviews that include at least 3 topics.

The following table describes the fields in the dataset:

Column Description
review_title The title of the review
review_positive Positive ("liked") section in review.
review_negative Negative ("disliked") section in review.
guest_score Review score for the stay
review_helpful_votes How many users marked the review as helpful
guest_type There are 4 types of traveller types: Solo traveller (1 adult) /
Couple (2 adults) / Group (>2 adults) / Family with
children (adults & children)
guest_country Anonymized country from which the reservation was made
room_nights The length of the reservation, i.e. number of nights booked
month The month of the check-in date of the reservation
accommodation_id An anonymized accommodation ID
accommodation_type The type of the accommodation, e.g. hotel, apartment, hostel
accommodation_score The overall average guest review score for the accommodation
accommodation_country Country of the accommodation
accommodation_star_rating Accommodation star rating is provided by the property, and is
usually determined by an official accommodation rating
organisation or another third party
location_is_beach Is the accommodation located in a beach location
location_is_ski Is the accommodation located in a ski location
location_is_city_center Is the accommodation located in the city center

License

The dataset is published under the following non-commercial license

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published