Skip to content

ChanWenLe/Natural-Language-Processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Screenshot 2024-08-13 191328

Project Summary

Customer reviews and their associated ratings are feedback that sellers and Amazon should monitor to better comprehend customer experiences and address concerns effectively, thereby helping them remain competitive and stay relevant in the fast-paced retail sector.

The purpose of this project was to understand:

  1. What messages and concerns would the customer like to convey in their comments?
  2. What are the products that are on the list for positive and negative sentiments?

Dataset is available Here

Scope & Project Steps

1.)Scope

This project focuses on the Amazon Fashion segment, specifically analysing customer reviews of footwear to gain insights into their sentiments and preferences. The primary aim is to harness these insights to enhance customer satisfaction and loyalty, which are crucial for standing out from crowd and maintaining a competitive edge in the online retail market.

2.)Project Steps

  1. Data Preprocessing and Wrangling
  2. Sentiment Analysis
    -NLTK
    -TextBlob
    -SpaCy
  3. Evaluation of Results
  4. Conclusion

Data Preprocessing and Wrangling

With the raw dataset, a few data cleaning steps were conducted with Python. Code available in the "Amazon Reviews.ipynb" file.

1. Handling Null Values:
The dataset underwent a null value check, revealing a few empty 'style' columns.

2. Data Type Correction:
To ensure consistency in textual analysis and facilitate analysis operations, 'style' column was converted from their original format to string.

3. Data Preprocessing for Sentiment Analysis:
The text was cleaned by removing punctuation and converting all letters to lowercase. Stop words were removed to focus on more meaningful words, subsequently, lemmatization was applied to bring words to their base or root form, aiding in the simplification and standardization of data for analysis.

4. Tokenisation of the Corpus:
To identify the list of words that emerge most frequently from customer comments, the cleaned corpus was broken down into individual words or phrases. This process is essential for analysing the frequency and distribution of distinct words.

Sentiment Analysis

Different libraries, such as NLTK and TextBlob, were used to conduct sentiment analysis. The results were then compared with customer ratings recorded under the 'overall' column, ranging from 1 to 5, with the assumption that these ratings genuinely reflect customer satisfaction levels.

Ratings of 1 and 2 are classified as negative sentiment, 3 as neutral, and 4 and 5 as positive sentiment. Similarly, sentiment scores from the models are categorized as negative if they are below -0.05 and recognized as positive if they are above 0.05. After standardizing the results from both columns, they were compared for accuracy using the sklearn library.

It was observed that NLTK's VADER model, with an accuracy of 81.23%, slightly outperformed TextBlob, which had an accuracy of 79.22%. Thus, the NLTK sentiment analysis model is preferred due to its higher accuracy rate.

1.NLTK:

Distribution of Sentiment Scores for NLTK Distribution of Sentiment Scores for NLTK2

2. TextBlob:

Distribution of Sentiment Scores for Textblob Distribution of Sentiment Scores for Textblob 2

3. SpaCy

After using NLTK for initial sentiment analysis, spaCy was deployed to enhance accuracy and efficiency. While NLTK and TextBlob are great for basic text processing, spaCy's advanced capabilities in Named Entity Recognition (NER) and dependency parsing, along with its high performance and easy integration, make it ideal for refining and expanding the analysis. Screenshot 2024-08-13 191328 Picture1

A predominance of 'CARDINAL' entities has been observed. It typically consists of numeric data that could relate to product sizes, quantities, or frequency of use mentioned in customer reviews. These insights could inform stock management decisions and help anticipate customer needs. 'ORG' is another entity which deserve the attention, it highlights the brand mentions which are invaluable for measuring brand visibility and perception in the market. Brands like 'Nike' are prominently mentioned, indicating significant customer engagement that could drive strategic marketing campaigns.

'CARDINAL' terms such as 'half' or specific numbers may inform businesses about prevalent customer preferences or sizing issues. Meanwhile, the frequent citation of 'Nike' and similar 'supreme tr' entities could signal which brands are top-of-mind for consumers, guiding competitive strategy and brand positioning.

The presence of 'DATE' entities might reflect on the usage patterns, durability of products or customer experiences. For instance, frequent references to "several years" or "daily" suggest the usage patterns or durability of product, informing potential promotional narratives or product improvement strategies.

Evaluation of Results

1. What messages and concerns would the customer like to convey in their comments?

image image

Using the NLTK model, it showed that 82.7% of reviews were positive, 10.6% were neutral, and 6.6% were negative. Two different subsets were filtered out: the positive sentiment subset and the negative sentiment subset. The positive sentiment subset consisted of reviews with a sentiment score of 0.5 or higher and a rating of 4 or 5, whereas the negative sentiment subset included data with a sentiment score of -0.05 or below, or ratings of 1 and 2. This distribution into two groups aims to closely examine and understand the feedback categorized under the respective sentiments. It is necessary since negative sentiment constitutes only 6.6% of the total data.

From the observation of the word cloud above, the most frequent words identified in positive sentiment feedback—'shoe,' 'comfortable,' 'love,' 'fit,' and 'great'—dominate these visuals, highlighting customer priorities and satisfaction. The recurrence of terms like 'comfortable,' 'love,' and 'fit' suggests a positive reception of the products' quality and alignment with advertisement claims, while 'great' signals overall approval. These findings, especially the frequent mention of 'love', may indicate strong performance by sellers, reflecting well on customer satisfaction and product fit.

On the other hand, the most common key terms associated with negative sentiment feedback include "return," "size," and "hurt," indicating common issues customers have encountered. The significant presence of "return" suggests a substantial volume of returns due to dissatisfaction. Terms like "small" and "big" likely refer to problems with sizing or fit, while "discomfort" and "hurt" could point to issues with the physical experience of wearing the products.

2. What are the products that are on the list for positive and negative sentiments?

image image

The analysis highlights the specific styles that have received positive feedback from customers, particularly in the shoe sizes of 8 B(M) US, 9 B(M) US, and 8.5 B(M) US, and in the colour combination Black/White/Anthracite/Stealth. This information is valuable for the platform and vendors to understand consumer preferences.

Conclusion

Leveraging insights from NER results and top 20 favoured style combinations, Amazon can make informed inventory decisions by ensuring that popular sizes and colours are readily available especially during high demanding period. Such strategy can help sellers in preventing stockouts and unnecessary markdowns of pricing, leading to better sales turnover and profitability.

Furthermore, Amazon can utilise positive sentiment from the reviews to tailor its marketing campaigns to highlight the attributes which customers love the most, such as comfort and fit. Special promotions or loyalty programs centred around the most favoured styles can further capitalise on known customer preferences, which results in a higher conversion rate.

On the other hand, the significant prominence of the word "return" in the negative sentiments suggests customers are dissatisfied with the product and prefer returning the order to retailer. This trend should be addressed with a deeper analysis into the causes of dissatisfaction to reduce return rates. It would be advisable to investigate the specific reasons of returning, whether they relate to sizing issues, quality concerns, or mismatches between product description and actual items. Understanding these reasons can help in improving in process flows, for instance, implementing a tighter quality assurance standard and ensure the descriptions mentioned truly reflect the actual product.

Additionally, customer representative should proactively connect with the customers intending to return orders to ensure their concerns have been attended effectively and potentially reduce the need for returns.

About

SENTIMENT ANALYSIS ON AMAZON REVIEWS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors