(EDA + Review Analysis)
This project performs Exploratory Data Analysis (EDA) on Amazon Sales data to uncover insights about product categories, pricing, customer behavior, and ratings.
The analysis involves data cleaning, preprocessing, visualization, and statistical exploration to better understand sales performance and business drivers.
View the full interactive notebook with visualizations on Kaggle:
- Clean and preprocess raw sales data
- Handle missing values, duplicates, and inconsistent entries
- Perform univariate, bivariate, and multivariate analysis
- Create meaningful visualizations for better storytelling
product_id
: Product IDproduct_name
: Name of the Productcategory
: Category of the Productdiscounted_price
: Discounted Priceactual_price
: Actual Pricediscount_percentage
: Percentage of Discountrating
: Product Ratingrating_count
: Number of reviewersabout_product
: Product descriptionuser_id
: Reviewer IDuser_name
: Reviewer namereview_id
: Review IDreview_title
: Short reviewreview_content
: Long reviewimg_link
: Product image URLproduct_link
: Official product link
-
Data Understanding
- Load dataset, explore features and statistics
-
Data Cleaning & Preparation
- Handle missing values, duplicates
- Data type conversions
- Outlier detection
-
Univariate Analysis
- Distribution of ratings and discount percentage
- Most frequent product categories
- WordCloud for keywords in product name
-
Bivariate Analysis
- Discount % vs sales (
rating_count
) - Actual price vs discounted price correlation
- Ratings vs popularity
- Discount % vs sales (
-
Multivariate Analysis
- Feature correlations
- Categories balancing ratings, discounts, popularity
-
Review Analysis
- Explore customer reviews for common words and sentiments
-
Key Insights & Observations
-
Conclusion & Outcomes
- Most products priced between 0–25,000 (affordability matters)
- Price has weak correlation with ratings
- Ratings mostly between 4–4.5
- Black products ordered most, then white
- Smart Watches & Charging Cables most popular
- Top product: Fire-Boltt Ninja Call Pro Plus Smart Watch
- Most products have 50–60% discount
- Discount % and rating/sales have very weak negative correlation
- High ratings ≠ high popularity
- Frequent review terms: good, product quality, price, delivery, usability
- Cleaned dataset prepared for analysis
- Visual insights created for business strategy
- Actionable insights on pricing, discounts, ratings, and categories
- Ready for predictive modeling (sales forecasting, recommendation systems)
- Language: Python
- Libraries:
pandas
– Data manipulationnumpy
– Numerical operationsmatplotlib
&seaborn
– Visualizationplotly
– Interactive graphsscipy
– Statistical analysiswordcloud
– Text visualization