---
# **AMAZON   Sales   Analysis**
---

---
## Download Required Libraries
---

In [None]:

pip install numpy pandas matplotlib seaborn plotly mysql-connector-python


---
## Importing Libraries
---

In [None]:

import mysql.connector
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
import matplotlib.dates as mdates
%matplotlib inline
import seaborn as sns
import plotly.express as px


---
## Reading the SQL File
---

In [None]:

connection = mysql.connector.connect(
    user='root',
    password='Ashu',
    host='localhost',
    database='amazon_sales_analysis'
)


In [None]:

cursor = connection.cursor()


In [None]:

cursor.execute('select * from `sales_data`')


In [None]:

df = pd.DataFrame(cursor.fetchall(),columns=[desc[0] for desc in cursor.description])


## Overview of a Data


In [None]:

print("\n---------------------------------------------------------------------------------")
print("First 5 Rows:")
print("---------------------------------------------------------------------------------\n")
df.head()


In [None]:

print("\n---------------------------------------------------------------------------------")
print("Last 5 Rows:")
print("---------------------------------------------------------------------------------\n")
df.tail(5)


In [None]:

print("\nShape:", df.shape)


In [None]:

print("\n---------------------------------------------------------------------------------")
print("Descriptive Statistics:")
print("---------------------------------------------------------------------------------\n")
df.describe(include='all')    


In [None]:

print("\n---------------------------------------------------------------------------------")
print("Column Names:")
print("---------------------------------------------------------------------------------\n")
df.columns


In [None]:

print("\n---------------------------------------------------------------------------------")
print("Info:")
print("---------------------------------------------------------------------------------\n")
df.info()


---

# Cleaning the Data

---

In [None]:

# Checking NULL Values 
pd.isnull(df) 


In [None]:

print("Total Rows % Columns:", df.shape)


In [None]:

# Sum will give total values of null values
pd.isnull(df).sum()


In [None]:

# We will remove NULL Columns
df.drop(['new', 'pending_status'], axis=1, inplace=True)


---

**Note**: We removed New & Pendings Column beacuse it only contained Null values

---

In [None]:

print("\n---------------------------------------------------------------------------------")
print("Column:")
print("---------------------------------------------------------------------------------\n")
df.columns


---

We can observe that we have successfully removed the NULL Columns (New & Pendings)

---

In [None]:

# Drop NULL Values
df.dropna(inplace=True)


In [None]:

# Sum will give total values of null values
pd.isnull(df).sum()


---

We can observe that we have successfully removed the NULL Values

---

In [None]:

print("\n---------------------------------------------------------------------------------")
print("Total Rows & Columns: ")
print("---------------------------------------------------------------------------------")
df.shape


In [None]:

# Changing Data Type
df['ship_postal_code']=df['ship_postal_code'].astype('float')


In [None]:

# Checking whether the data type change or not
spc = df['ship_postal_code'].dtype
print('\nShip Postal Code:' ,spc)


---

Here we can observe that Data Type of Ship-Postal-Code 

---


In [None]:

# Rename Columns
df.rename(columns={'qty':'Quantity'},inplace= True)


In [None]:

print("\n---------------------------------------------------------------------------------")
print("Descriptive Statistics:")
print("---------------------------------------------------------------------------------\n")
df.describe(include='all')


In [None]:

# Describe() method return description of the data in the DataFrame (i.e count,mean,std,min..etc)
df.describe()


In [None]:

df.describe(include='object')


In [None]:

# Use describe() for specific columns
df[['Quantity','amount']].describe()


In [None]:

print("\n---------------------------------------------------------------------------------")
print("Duplicate rows:")
print("---------------------------------------------------------------------------------\n")
df[df.duplicated()]


In [None]:

print("\n---------------------------------------------------------------------------------")
duplicate_count = df.duplicated().sum()
print("Number of duplicate rows:", duplicate_count)
print("---------------------------------------------------------------------------------")


In [None]:

df.drop_duplicates(inplace=True)


In [None]:

print(df['fulfilment'].value_counts())


In [None]:

print("\n---------------------------------------------------------------------------------")
duplicate_count = df.duplicated().sum()
print("Number of duplicate rows:", duplicate_count)
print("---------------------------------------------------------------------------------")


---

We have Successfully Removed the Duplicate Data

---

---

# Exploratory Data Analysis

---

---

### **Question 1:** What are the total sales (Amount) for each product category ?

---

In [None]:

total_sales_by_category = df.groupby('category')['amount'].sum().sort_values()

colors = plt.cm.tab10(np.linspace(0, 1, len(total_sales_by_category)))
plt.figure(figsize=(15, 10))
ax = total_sales_by_category.plot(kind='bar', color=colors, edgecolor='black')
plt.title('\nTotal Sales by Product Category\n', fontsize=20,fontweight='bold')
plt.xlabel('\nProduct Category\n', fontsize=18,fontweight='bold')
plt.ylabel('\nTotal Sales (Amount in Million)\n', fontsize=18,fontweight='bold')

for p in ax.patches:
    ax.annotate(f'{int(p.get_height()):,}',
    (p.get_x() + p.get_width() / 2., p.get_height()),
    ha = 'center' , va = 'bottom' , fontsize = 12,fontweight='bold')


plt.xticks(rotation=0, ha='right', fontsize=10)
plt.yticks([0, 2e6, 4e6 , 6e6, 8e6, 10e6, 12e6], ['0','2M','4M','6M','8M','10M','12M'])
plt.grid(axis='y', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.show()

---

### **Conclusion:** T-shirts generated the most sales, indicating they are the most popular category and contribute significantly to overall revenue. This suggests focusing on T-shirts for promotions and stock optimization.
### **Recommendation:** Focus on improving sales performance in underperforming regions and product categories to increase overall profitability.

---

---

### **Question 2:** What is the percentage distribution of sales channels ?

---

In [None]:

category_distribution = df['category'].value_counts()

plt.figure(figsize=(20, 10))
ax = category_distribution.plot(kind='pie', startangle=90, colors=['skyblue', 'orange', 'lightgreen', 'lightcoral', 'yellow', 'purple', 'pink'], autopct=None, labels=None)
plt.legend(category_distribution.index, title='Category', loc='lower left', fontsize=9)
percentage_table = category_distribution / category_distribution.sum() * 100
table_data = list(zip(category_distribution.index, category_distribution.values, percentage_table.round(2)))

plt.table(cellText=table_data, colLabels=['\n---- Category ----\n', '\n---- Sales ----\n', '\n---- Percentage (%) ----\n'], loc='bottom', cellLoc='center', colLoc='center', bbox=[0, -0.3, 1, 0.3])
plt.title('\nSales Distribution by Product Category',fontsize = 20,fontweight='bold')
plt.ylabel('')  
plt.tight_layout()
plt.show()


---

### **Conclusion:** The pie chart reveals that T-shirt category generates the highest sales, significantly outpacing other product categories. 
### **Recommendation:** Consider expanding the T-shirt product line and marketing efforts to capitalize on its strong sales performance.

---

---

### **Question 3:** What is the trend of total orders (Quantity) over time ?


---

In [None]:

df['Date'] = pd.to_datetime(df['date'], errors='coerce', format='%m-%d-%y')
orders_trend = df.groupby('Date')['Quantity'].sum()

plt.figure(figsize=(27, 8))
plt.plot(orders_trend.index, orders_trend.values, marker='o', color='red')
plt.title('\nTrend of Total Orders Over Time\n', fontsize=20, fontweight='bold')
plt.xlabel('\nDate\n', fontsize=18, fontweight='bold')
plt.ylabel('\nTotal Orders (Quantity)\n', fontsize=18, fontweight='bold')
plt.grid(alpha=0.3)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%d-%m-%Y'))
plt.gca().xaxis.set_major_locator(mdates.DayLocator(interval=5))
plt.tight_layout()
plt.show()


---

### **Conclusion:** The trend of total orders shows periodic spikes and declines, with varying levels of activity over time.
### **Recommendation:** Implement targeted promotions and improve inventory management to stabilize order trends and reduce fluctuations.

---

---

### **Question 4:** What is the distribution of sizes across different product categories?

---

In [None]:

x_data = df['category']  
y_data = df['size'] 

unique_sizes = y_data.unique()
colors = {size: plt.cm.tab10(i % 10) for i, size in enumerate(unique_sizes)}
dot_colors = y_data.map(colors)

plt.figure(figsize=(8, 8))
plt.scatter(x_data, y_data, c=dot_colors)
plt.xlabel('\nCategory\n', fontsize=20,fontweight='bold')  
plt.ylabel('\nSize\n', fontsize=18, fontweight='bold')  
plt.title('\nScatter Plot\n', fontsize=18, fontweight='bold') 
plt.show()


In [None]:

quantity_table = df.pivot_table(index='category', columns='size', values='Quantity', aggfunc='sum', fill_value=0)

print("\n---------------------------------------------------------------------------------")
print("Table of Quantity by Category and Size:")
print("---------------------------------------------------------------------------------\n")
quantity_table


---

### **Conclusion:**    Sizes are distributed across categories with some categories offering a broader range (e.g., T-shirts), while others like "Wallet" and "Perfume" primarily have a single size ("Free").
### **Recommendation:** Introduce more size variations in categories with limited options to enhance product appeal.

---

---

### **Question 5:** How is the sales amount distributed across all orders?
---

In [None]:

plt.figure(figsize=(20,10))
n, bins, patches = plt.hist(df['amount'], bins=30, color='purple', edgecolor='black', alpha=1)

for i in range(len(patches)):
    height = patches[i].get_height()
    plt.text(patches[i].get_x() + patches[i].get_width()/ 2, height, str(int(height)), ha ='center', va ='bottom', fontsize=12, fontweight='bold')

plt.title('\nDistribution of Sales Amount\n', fontsize=20,fontweight='bold')
plt.xlabel('\nSales Amount\n', fontsize=18, fontweight='bold')
plt.ylabel('\nFrequency\n', fontsize=18, fontweight='bold')
plt.grid(axis='y', alpha=0.5)
plt.tight_layout()
plt.show()



---

### **Conclusion:** The majority of sales amounts fall within the range of 0 to 1500, with a sharp decline in frequency beyond this range.
### **Recommendation:** Target efforts to increase sales in higher value ranges to reduce the decline in sales beyond 1500.

---

---

### **Question 6:** What is the distribution of total sales across different order statuses and product categories?

---

In [None]:

heatmap_data = df.pivot_table(index='status', columns='category', values='amount', aggfunc='sum', fill_value=0)
plt.figure(figsize=(20, 10))
sns.heatmap(heatmap_data, annot=True, fmt=".0f", cmap='coolwarm', cbar_kws={'label': '\nTotal Sales Amount\n'})
plt.title('\nHeatmap of Order Status vs Product Category\n', fontsize=20, fontweight='bold')
plt.xlabel('\nProduct Category\n', fontsize=18, fontweight='bold')
plt.ylabel('\nOrder Status\n', fontsize=18, fontweight='bold')

cbar = plt.gca().collections[0].colorbar
cbar_ticks = cbar.get_ticks() 
cbar_ticks = [tick for tick in cbar_ticks if tick <= 8e6]
cbar.set_ticks(cbar_ticks)
cbar.set_ticklabels([f'{int(x//1e6)}M' for x in cbar_ticks])
plt.tight_layout()
plt.show()


---

### **Conclusion:** The heatmap reveals that the majority of total sales are concentrated in "Shipped - Delivered to Buyer" status, with T-shirts dominating across product categories.
### **Recommendation:** Enhance shipping efficiency and explore opportunities to expand sales in all product categories.

---

---

### **Question 7:** What is the cumulative sales trend over time?

---

In [None]:

cumulative_sales = df.groupby('Date')['amount'].sum().cumsum()
plt.figure(figsize=(20, 10))
plt.fill_between(cumulative_sales.index,cumulative_sales.values, color='lightgreen', alpha=1)
plt.title('\nCumulative Sales Trend Over Time\n',fontsize=20, fontweight='bold')
plt.xlabel('\nDate\n', fontsize=18, fontweight='bold')
plt.ylabel('\nCumulative Sales (Amount)\n', fontsize=18, fontweight='bold')
plt.grid(alpha=1)
plt.xticks(rotation=0)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%d-%m-%Y'))
plt.xticks(rotation=0)

y_ticks = range(0, int(cumulative_sales.max()) + 2000000, 2000000)
y_labels = [f'{int(y/1e6)}m' for y in y_ticks]
plt.yticks(ticks=y_ticks, labels=y_labels)
plt.tight_layout()
plt.show()


---

### **Conclusion** The cumulative sales trend demonstrates a steady upward trajectory, with sharp increases in sales during specific periods, highlighting potential seasonal or promotional impacts.
### **Recommendation:** Identify and capitalize on seasonal and promotional peaks to sustain growth in sales throughout the year.

---

---

### **Question 8:** What are the top 3 states with the highest total revenue, broken down by category?

---

In [None]:

state_total_sales = df.groupby('ship_state')['amount'].sum()
top_3_states = state_total_sales.nlargest(3).index
top_3_state_data = df[df['ship_state'].isin(top_3_states)]
state_category_sales = top_3_state_data.groupby(['ship_state', 'category'])['amount'].sum().unstack()
ax = state_category_sales.plot(kind='bar', figsize=(30, 10), width=0.8, edgecolor='black')

plt.title('\nTop 3 States by Revenue and Category\n', fontsize=20, fontweight='bold')
plt.xlabel('\nShipping State\n', fontsize=18, fontweight='bold')
plt.ylabel('\nTotal Sales (Amount)\n', fontsize=18, fontweight='bold')
plt.xticks(rotation=0)
plt.grid(axis='y', alpha=0)

ax.get_yaxis().set_major_formatter(plt.FuncFormatter(lambda x, loc: f'{x/1000:.1f}K'))
for p in ax.patches:
    ax.annotate(f'{p.get_height():,.0f}',
                (p.get_x() + p.get_width() / 2., p.get_height()), 
                xytext=(0, 5),
                textcoords='offset points', 
                ha='center', va='bottom', fontsize=14)
plt.legend(fontsize=18)
plt.tight_layout()
plt.show()


---

### **Conclusion** T-shirts generate the highest revenue across Karnataka, Maharashtra, and Uttar Pradesh, followed by Shirts, which also contribute significantly to the sales in these states.
### **Recommendation:** Focus on promoting T-shirts and Shirts in Karnataka, Maharashtra, and Uttar Pradesh to further enhance revenue in these high-performing states.

---

---

### **Question 9:** How does the total sales (Amount) compare over time between two product categories (e.g., "Shirt" and "T-shirt")?

---

In [None]:

filtered_data = df[df['category'].isin(['Shirt', 'T-shirt'])]
category_sales_over_time = filtered_data.groupby(['Date', 'category'])['amount'].sum().unstack()
plt.figure(figsize=(27, 8))

plt.plot(category_sales_over_time.index,category_sales_over_time['Shirt'], marker='o', color='green', label='Shirt')
plt.plot(category_sales_over_time.index,category_sales_over_time['T-shirt'], marker='o', color='blue', label='T-shirt')

plt.title('\nTotal Sales Comparison Over Time: Shirt vs T-shirt\n',fontsize=20, fontweight='bold')
plt.xlabel('\nDate\n', fontsize=18, fontweight='bold')
plt.ylabel('\nTotal Sales (Amount)\n', fontsize=18, fontweight='bold')
plt.grid(alpha=0.3)
plt.xticks(rotation=0)
plt.tight_layout()
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%d-%m-%Y'))
plt.legend(fontsize=20)
plt.show()


---

### **Conclusion** T-shirt sales consistently outperform shirt sales, indicating a higher demand and greater market potential for T-shirts.
### **Recommendation:** Focus on enhancing T-shirt offerings and promotions to further capitalize on its strong sales performance.

---

---

### **Question 10:** What is the distribution of total sales across different order statuses and product categories?

---

In [None]:

state_sales = df.groupby('ship_state')['amount'].sum().reset_index()
state_sales.columns = ['State', 'Total Sales']

state_sales['State'] = state_sales['State'].str.upper()

state_coordinates = {
    'ANDHRA PRADESH': [15.9129, 79.7400], 'ARUNACHAL PRADESH': [28.2180, 94.7278], 'ASSAM': [26.2006, 92.9376],
    'BIHAR': [25.0961, 85.3131], 'CHHATTISGARH': [21.2787, 81.8661], 'DELHI': [28.7041, 77.1025],
    'GOA': [15.2993, 74.1240], 'GUJARAT': [22.2587, 71.1924], 'HARYANA': [29.0588, 76.0856],
    'HIMACHAL PRADESH': [31.1048, 77.1734], 'JHARKHAND': [23.6102, 85.2799], 'KARNATAKA': [15.3173, 75.7139],
    'KERALA': [10.8505, 76.2711], 'MADHYA PRADESH': [22.9734, 78.6569], 'MAHARASHTRA': [19.7515, 75.7139],
    'MANIPUR': [24.6637, 93.9063], 'MEGHALAYA': [25.4670, 91.3662], 'MIZORAM': [23.1645, 92.9376],
    'NAGALAND': [26.1584, 94.5624], 'ODISHA': [20.9517, 85.0985], 'PUNJAB': [31.1471, 75.3412],
    'RAJASTHAN': [27.0238, 74.2179], 'SIKKIM': [27.5330, 88.5122], 'TAMIL NADU': [11.1271, 78.6569],
    'TELANGANA': [18.1124, 79.0193], 'TRIPURA': [23.9408, 91.9882], 'UTTAR PRADESH': [26.8467, 80.9462],
    'UTTARAKHAND': [30.0668, 79.0193], 'WEST BENGAL': [22.9868, 87.8550]
}

state_sales['Latitude'] = state_sales['State'].map(lambda x: state_coordinates[x][0])
state_sales['Longitude'] = state_sales['State'].map(lambda x: state_coordinates[x][1])

fig = px.scatter_geo(
    state_sales,
    lat='Latitude',
    lon='Longitude',
    text='State',
    size='Total Sales',
    size_max=50,
    color='Total Sales',
    color_continuous_scale='RdYlGn',
    scope='asia',
    title='\nTotal Sales by State (India)'
)

fig.update_layout(
    title=dict(x=0.5, font=dict(size=32, weight='bold')),
    geo=dict(
        scope='asia',
        showcountries=True,
        countrycolor='black',  
        showcoastlines=True,   
        coastlinecolor='black', 
        showland=True,
        landcolor='#A0C185',  
        showocean=True,
        oceancolor='#B0E0E6', 
        projection_type='mercator', 
        fitbounds="locations", 
        resolution=50, 
    ),
    width=1500, 
    height=1000,
    font=dict(
        family="Arial", 
        size=12, 
        color="black",
        weight="bold"
    ),
)


In [None]:

state_sales = df.groupby('ship_state')['amount'].sum()
sorted_sales = state_sales.sort_values()
colors = ['#FF5733', '#33FF57', '#3357FF', '#F1C40F', '#9B59B6', '#1ABC9C', '#E74C3C', '#3498DB']
colors = colors * (len(sorted_sales) // len(colors) + 1) 

plt.figure(figsize=(30, 20))
bars = plt.barh(sorted_sales.index, sorted_sales.values, color=colors[:len(sorted_sales)], edgecolor='black')
plt.title('\nTotal Sales by Shipping State\n', fontsize=20, fontweight='bold')
plt.xlabel('\nTotal Sales (Amount)\n', fontsize=18, fontweight='bold')
plt.ylabel('\nShipping State\n', fontsize=18, fontweight='bold')
plt.grid(axis='x', alpha=0)

for index, value in enumerate(sorted_sales.values):
    plt.text(value, index, f'{value:,.0f}', va='center', ha='left', fontsize=15, color='black')
def format_sales_labels(value):
    if value >= 1_000_000:
        return f'{value/1_000_000:.0f} M'
    elif value >= 1_000:
        return f'{value/1_000:.0f} K'
    else:
        return f'{value}'

plt.xticks([x for x in plt.gca().get_xticks()], [format_sales_labels(x) for x in plt.gca().get_xticks()])
plt.tight_layout()
plt.show()


---

#### **Top 11 States**

---

In [None]:

state_sales = df.groupby('ship_state')['amount'].sum().reset_index()
state_sales.columns = ['State', 'Total Sales']

state_sales['State'] = state_sales['State'].str.upper()

top_states = state_sales.nlargest(11, 'Total Sales')

state_coordinates = {
    'ANDHRA PRADESH': [15.9129, 79.7400], 'ARUNACHAL PRADESH': [28.2180, 94.7278], 'ASSAM': [26.2006, 92.9376],
    'BIHAR': [25.0961, 85.3131], 'CHHATTISGARH': [21.2787, 81.8661], 'DELHI': [28.7041, 77.1025],
    'GOA': [15.2993, 74.1240], 'GUJARAT': [22.2587, 71.1924], 'HARYANA': [29.0588, 76.0856],
    'HIMACHAL PRADESH': [31.1048, 77.1734], 'JHARKHAND': [23.6102, 85.2799], 'KARNATAKA': [15.3173, 75.7139],
    'KERALA': [10.8505, 76.2711], 'MADHYA PRADESH': [22.9734, 78.6569], 'MAHARASHTRA': [19.7515, 75.7139],
    'MANIPUR': [24.6637, 93.9063], 'MEGHALAYA': [25.4670, 91.3662], 'MIZORAM': [23.1645, 92.9376],
    'NAGALAND': [26.1584, 94.5624], 'ODISHA': [20.9517, 85.0985], 'PUNJAB': [31.1471, 75.3412],
    'RAJASTHAN': [27.0238, 74.2179], 'SIKKIM': [27.5330, 88.5122], 'TAMIL NADU': [11.1271, 78.6569],
    'TELANGANA': [18.1124, 79.0193], 'TRIPURA': [23.9408, 91.9882], 'UTTAR PRADESH': [26.8467, 80.9462],
    'UTTARAKHAND': [30.0668, 79.0193], 'WEST BENGAL': [22.9868, 87.8550]
}
top_states['Latitude'] = top_states['State'].map(lambda x: state_coordinates[x][0])
top_states['Longitude'] = top_states['State'].map(lambda x: state_coordinates[x][1])

fig = px.scatter_geo(
    top_states,
    lat='Latitude',
    lon='Longitude',
    text='State',
    size='Total Sales',
    size_max=50,
    color='Total Sales',
    color_continuous_scale='RdYlGn',
    scope='asia',
    title='Top 11 States by Total Sales (India)'
)

fig.update_layout(
    title=dict(x=0.5, font=dict(size=32, weight='bold')),
    geo=dict(
        scope='asia',
        showcountries=True,
        countrycolor='black', 
        showcoastlines=True, 
        coastlinecolor='black',
        showland=True,
        landcolor='#A0C185',  
        showocean=True,
        oceancolor='#B0E0E6', 
        projection_type='mercator', 
        fitbounds="locations", 
        resolution=50, 
    ),
    width=1500,  
    height=1000,
    font=dict(
        family="Arial", 
        size=12, 
        color="black", 
        weight="bold" 
    ),
)

fig.show()


In [None]:

state_sales = df.groupby('ship_state')['amount'].sum()
top_11_sales = state_sales.sort_values(ascending=False).head(11)
colors = ['#FF5733', '#33FF57', '#3357FF', '#F1C40F', '#9B59B6', '#1ABC9C', '#E74C3C', '#3498DB', '#E67E22', '#2ECC71']
colors = colors * (len(top_11_sales) // len(colors) + 1) 

plt.figure(figsize=(20, 8))
bars = plt.bar(top_11_sales.index, top_11_sales.values, color=colors[:len(top_11_sales)], edgecolor='black')
plt.title('\nTop 11 Total Sales by Shipping State\n',fontsize=20, fontweight='bold')
plt.xlabel('\nShipping State\n',fontsize=18, fontweight='bold')
plt.ylabel('\nTotal Sales (Amount)\n', fontsize=18, fontweight='bold')
plt.xticks(rotation=0)
plt.grid(axis='y', alpha=0)

for index, value in enumerate(top_11_sales.values):
    plt.text(index, value, f'{value:,.0f}', ha='center', va='bottom', fontsize=18, color='black')

def format_sales_labels(value):
    if value >= 1_000_000:
        return f'{value/1_000_000:.1f}M'
    elif value >= 1_000:
        return f'{value/1_000:.0f}K'

plt.ylim(0, top_11_sales.max() * 1.1)
plt.yticks([y for y in plt.gca().get_yticks()], [format_sales_labels(y) for y in plt.gca().get_yticks()])
plt.tight_layout()
plt.show()


---

### **Conclusion:** The chart highlights that Maharashtra leads in total sales among all states, followed by Karnataka and Uttar Pradesh, while Haryana records the lowest sales among the top 11 states.
### **Recommendation:** Focus marketing and sales strategies in Haryana to improve its performance and explore opportunities to sustain growth in leading states like Maharashtra.

---

---

# **Conclusion**
### - The analysis shows key trends such as top-selling products, high-revenue categories, and peak sales periods.  
### - Insights into customer preferences and operational performance were identified.  
### - The report highlights Amazon's strengths and areas for improvement.  

---

---

# **Future Improvements**
### - Collect More Data: Add customer feedback and competitor pricing to enhance the analysis.
### - Use Predictive Tools: Apply forecasting techniques to predict sales trends and demands.
### - Improve Visuals: Create clearer and more interactive charts for better understanding.
### - Include Real-Time Data: Use live data for quicker and more accurate decisions.
### - Optimize Processes: Focus on improving inventory management and personalized marketing.
#### These steps can help make the analysis more detailed and actionable.

---