Sentiment Summarization of Amazon Reviews
Team Members: Jane Zhang, Matthew Bayer, Peter Husisian, Arnav Ghosh, Wes Gurnee
Evaluating potential purchases online is often made easier by reviews provided by other customers. However, sifting through hundreds of reviews to create a holistic view of the sentiment the product evokes can be time-consuming. Consequently, we propose a model that can provide a comprehensive summary of all the reviews on a product. Our task is a specific instantiation of the problem of summarizing the sentiments and opinions of a large group of people into a few key topics, and to summarize them in one generated review.
We take two approaches to the problem: an extractive technique, where the review is generated from important sentences sampled from the corpus, and an abstractive technique, where we generate new text using an autoencoder that summarizes the original reviews.
Specifically, our project is divided up into 3 components:
Content Representation/Encoding: We encode reviews into content vectors. For the abstractive approach, we use an autoencoder trained on specific product categories (electronics, sports, etc.) to vectorize each sentence in each review. In the extractive case, we call the indico featurization API to use a more heavy duty mass language model for embeddings.
Generate Candidate Sentences for Artificial Review: We cluster similar sentences together and sample salient sentences from the clusters using a specialized approach with DBSCAN: We choose a very selective density hyparameter and very small minimum samples hyperparameter for DBSCAN. In the extractive case, we then sample sentences that are close to the mean of each cluster, but are not necessarily in the cluster generated by DBSCAN. In the abstractive approach, we use our autoencoder to decode the mean of each group of candidate sentences.
Generate Artificial Review: We take subsets of our group of candidate sentences and concatenate them to form multiple candidate reviews. We finetune a BERT model to predict review helpfulness, and use a genetic algorithm to optimize these candidate reviews to be the most helpful. TODO: Explain Siamese/Triplet Loss
Evaluation: We use ROUGE and cosine similarity against a gold-standard most helpful review from the corpus to evaluate summarization quality. We recognize that summarization is inherently subjective and multimodal, and thus evaluation is difficult, so frequent human evalulation is also included.
Related fields: natural language processing, sentiment analysis, abstractive summarization, deep learning, natural language generation
pip install -r requirements.txt
python -m spacy download en_core_web_lg