### 📊 Amazon Best Sellers 2024 Analysis

#### Introduction

This analysis explores **Amazon’s best-selling products of 2024** to uncover key trends and insights within the e-commerce market. By examining product categories, prices, ratings, brands, and sales patterns over time, the goal is to understand what drives success on the platform.

The analysis focuses on five main questions:

1. **Which product categories dominate Amazon’s best sellers in 2024?**  
2. **What is the relationship between product price and sales performance?**  
3. **Do higher customer ratings correlate with higher sales?**  
4. **Which brands appear most frequently among the top-selling products?**  
5. **How do monthly sales trends change throughout 2024, especially around major events or seasons?**

Through this analysis, we aim to identify patterns that can help sellers, marketers, and analysts better understand **consumer behavior** and **market dynamics** on Amazon.

#### 📦 Next Steps   
- Set up the project structure (data, notebooks, src, reports).  
- Install the required Python libraries.  
- Begin exploratory data analysis (EDA).
-------------
#### ⚙️ Required Libraries:

In [1]:
%pip install -q -U pandas numpy matplotlib seaborn 

Note: you may need to restart the kernel to use updated packages.


In [2]:
%pip install -q -U watermark

Note: you may need to restart the kernel to use updated packages.


### 📦 Project Libraries - import

This section lists all the Python libraries used in this project. Keeping them organized here helps with reproducibility and makes it easier to install dependencies.

In [3]:
# Importing the library for data manipulation in tables
import pandas as pd 

# Importing the NumPy library for mathematical operations and arrays
import numpy as np  

# Importing the Matplotlib library for generating plots
import matplotlib.pyplot as plt  

# Importing the Seaborn library for statistical data visualization
import seaborn as sns  

# Jupyter Notebook magic command to display plots directly in the notebook
%matplotlib inline

In [4]:
# Load the watermark extension
%reload_ext watermark

# Display metadata for your notebook
%watermark -a "Maykon Analysis" -d -u -v -p numpy,pandas,matplotlib,seaborn

Author: Maykon Analysis

Last updated: 2025-10-21

Python implementation: CPython
Python version       : 3.14.0
IPython version      : 9.6.0

numpy     : 2.3.4
pandas    : 2.3.3
matplotlib: 3.10.7
seaborn   : 0.13.2



#### Loading a dataset into your working environment (in this case, Python using the pandas library).

In [5]:
data = r'D:\Desktop\Projects\Data-Analysis-Amazon-Best-Sellers-in-2024\data\best_sellers24.csv'

df = pd.read_csv(data)  

In [None]:
df.shape #the dimensions of your DataFrame (rows, columns).

(218, 11)

In [7]:
df.head() #shows the first 5 rows.

Unnamed: 0,title,brand,description,starsBreakdown/3star,starsBreakdown/4star,starsBreakdown/5star,reviewsCount,price,price/currency,price/value,categoryPageData/productPosition
0,"Ferrero Rocher, 24 Count, Premium Milk Chocola...",Ferrero Rocher,Ferrero Rocher's milk chocolate gift box offer...,0.02,0.07,0.89,20021.0,,$,11.39,7
1,"HERSHEY'S NUGGETS Assorted Chocolate, Valentin...",HERSHEY'S,This HERSHEY'S NUGGETS candy assortment is fil...,0.03,0.1,0.84,18891.0,,$,10.69,16
2,LEGO Icons Flower Bouquet Building Decoration ...,LEGO,Giving and receiving beautiful flowers is such...,0.01,0.05,0.92,19395.0,,$,47.99,2
3,BodyRefresh Shower Steamers Aromatherapy - 8 P...,BodyRefresh,,0.07,0.15,0.67,593.0,,$,9.99,10
4,JoJowell Shower Steamers Aromatherapy - 21Pcs ...,JoJowell,,0.1,0.15,0.63,816.0,,$,21.99,11


In [8]:
df.tail()  #shows the last 5 rows.

Unnamed: 0,title,brand,description,starsBreakdown/3star,starsBreakdown/4star,starsBreakdown/5star,reviewsCount,price,price/currency,price/value,categoryPageData/productPosition
213,JOYIN 28 Packs Valentine's Day Gift Cards with...,JOYIN,,0.14,0.17,0.55,30.0,,$,22.99,305
214,THEMEROL Natural Gemstone Bracelet Gifts for D...,THEMEROL,,0.03,0.07,0.87,140.0,,,,300
215,Juegoal 28 Pack Valentines Day Gift Cards for ...,Juegoal,,0.05,0.16,0.68,113.0,,$,21.99,296
216,Double Couple Gift for Mom Women-Rose Cute Bea...,Double Couple,,0.06,0.15,0.72,347.0,,$,19.59,298
217,Valentine's Day Gifts For Her - Rose in Glass ...,Norcalway,,0.04,0.1,0.81,423.0,,$,23.97,310


In [None]:
df.info()  #Gives you a summary overview of your entire DataFrame.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 218 entries, 0 to 217
Data columns (total 11 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   title                             218 non-null    object 
 1   brand                             217 non-null    object 
 2   description                       46 non-null     object 
 3   starsBreakdown/3star              218 non-null    float64
 4   starsBreakdown/4star              218 non-null    float64
 5   starsBreakdown/5star              218 non-null    float64
 6   reviewsCount                      216 non-null    float64
 7   price                             0 non-null      float64
 8   price/currency                    169 non-null    object 
 9   price/value                       169 non-null    float64
 10  categoryPageData/productPosition  218 non-null    int64  
dtypes: float64(6), int64(1), object(4)
memory usage: 18.9+ KB


In [None]:
df.describe(include='all')  #generate a summary of statistics for all columns in your DataFrame — both numerical and categorical.

Unnamed: 0,title,brand,description,starsBreakdown/3star,starsBreakdown/4star,starsBreakdown/5star,reviewsCount,price,price/currency,price/value,categoryPageData/productPosition
count,218,217,46,218.0,218.0,218.0,216.0,0.0,169,169.0,218.0
unique,218,168,46,,,,,,1,,
top,"Ferrero Rocher, 24 Count, Premium Milk Chocola...",Ferrero Rocher,Ferrero Rocher's milk chocolate gift box offer...,,,,,,$,,
freq,1,6,1,,,,,,169,,
mean,,,,0.04922,0.105275,0.767936,2863.759259,,,18.046805,127.123853
std,,,,0.041966,0.066756,0.151997,7104.953282,,,11.69647,91.812865
min,,,,0.0,0.0,0.0,1.0,,,1.99,1.0
25%,,,,0.03,0.0725,0.72,54.75,,,9.99,52.25
50%,,,,0.04,0.1,0.79,387.5,,,15.95,103.5
75%,,,,0.07,0.13,0.85,1581.5,,,21.99,193.75


In [None]:
df.dtypes  #It shows the data type (dtype) of each column in the DataFrame.

title                                object
brand                                object
description                          object
starsBreakdown/3star                float64
starsBreakdown/4star                float64
starsBreakdown/5star                float64
reviewsCount                        float64
price                               float64
price/currency                       object
price/value                         float64
categoryPageData/productPosition      int64
dtype: object

#### Identify any missing values and decide how to handle them, Data Preparation (Cleaning & Transformation).

In [13]:
df.isnull().sum() #Show me how many missing values each column has.

title                                 0
brand                                 1
description                         172
starsBreakdown/3star                  0
starsBreakdown/4star                  0
starsBreakdown/5star                  0
reviewsCount                          2
price                               218
price/currency                       49
price/value                          49
categoryPageData/productPosition      0
dtype: int64

In [26]:
df['reviewsCount'] = df['reviewsCount'].astype('Int64')  #Convert the 'reviewsCount' column to integer type, allowing for missing values.

In [None]:
df[df['price'].isnull()].head() #Display rows where the price column has missing values.

Unnamed: 0,title,brand,description,starsBreakdown/3star,starsBreakdown/4star,starsBreakdown/5star,reviewsCount,price,price/currency,price/value,categoryPageData/productPosition
0,"Ferrero Rocher, 24 Count, Premium Milk Chocola...",Ferrero Rocher,Ferrero Rocher's milk chocolate gift box offer...,0.02,0.07,0.89,20021.0,,$,11.39,7
1,"HERSHEY'S NUGGETS Assorted Chocolate, Valentin...",HERSHEY'S,This HERSHEY'S NUGGETS candy assortment is fil...,0.03,0.1,0.84,18891.0,,$,10.69,16
2,LEGO Icons Flower Bouquet Building Decoration ...,LEGO,Giving and receiving beautiful flowers is such...,0.01,0.05,0.92,19395.0,,$,47.99,2
3,BodyRefresh Shower Steamers Aromatherapy - 8 P...,BodyRefresh,,0.07,0.15,0.67,593.0,,$,9.99,10
4,JoJowell Shower Steamers Aromatherapy - 21Pcs ...,JoJowell,,0.1,0.15,0.63,816.0,,$,21.99,11


In [None]:
df = df.dropna(subset=['price']) #Remove rows where the price column has missing values.

In [None]:
df['brand'] = df['brand'].fillna('Unknown')
df['reviewsCount'] = df['reviewsCount'].fillna(0) #Fill missing review counts with 0

In [None]:
df['description'] = df['description'].fillna('No description available') #Fill missing descriptions with a placeholder text

In [None]:
df = df.dropna(subset=['price/value', 'price/currency']) #Remove rows with missing price/value or price/currency

In [19]:
df['price/currency'] = df['price/currency'].fillna('USD') #Fill missing currency with 'USD'

In [23]:
df.duplicated().sum() #Check for duplicate rows in the DataFrame.
#df = df.drop_duplicates() - if you find duplicates, you can uncomment this line to remove them.

np.int64(0)

In [24]:
df.isnull().sum() #Show me how many missing values each column has after cleaning.

title                               0
brand                               0
description                         0
starsBreakdown/3star                0
starsBreakdown/4star                0
starsBreakdown/5star                0
reviewsCount                        0
price                               0
price/currency                      0
price/value                         0
categoryPageData/productPosition    0
dtype: int64