# ðŸ›’ E-Commerce Sales Analysis

This notebook contains a complete exploratory data analysis (EDA) of an e-commerce dataset (20,000+ rows).
It is structured in a **portfolio-ready** format with:
- Data loading & cleaning  
- Exploratory data analysis  
- Visualizations (auto-saved as PNG)  
- Business insights  

You can run this notebook in **Google Colab** or **Jupyter Notebook**.


## 1. Import Libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import os

# Show all columns when printing dataframes
pd.set_option("display.max_columns", None)

# Create folder for plots
os.makedirs("plots", exist_ok=True)

def save_plot(name):
    """Save current Matplotlib figure into plots/ folder."""
    plt.savefig(f"plots/{name}.png", dpi=300, bbox_inches="tight")
    plt.show()


## 2. Load Dataset

> Replace the file path with your own, if needed.


In [None]:
# If running in Colab, you can upload the CSV manually:
# from google.colab import files
# uploaded = files.upload()

# Load dataset
df = pd.read_csv("ecommerce_dataset.csv")

print("Shape of dataset:", df.shape)
df.head()

## 3. Data Cleaning

In [None]:
# Convert order_date to datetime
df["order_date"] = pd.to_datetime(df["order_date"], errors="coerce")

# Drop rows with invalid dates
df = df.dropna(subset=["order_date"])

# Drop duplicates
df = df.drop_duplicates()

# Optional: drop rows with missing key columns
df = df.dropna(subset=["total_sales", "customer_state", "payment_method", "product"])

print("Shape after cleaning:", df.shape)
df.head()

## 4. Basic KPIs (Total Revenue, Orders, Customers)

In [None]:
total_revenue = df["total_sales"].sum()
total_orders = len(df)
unique_customers = df["customer_id"].nunique() if "customer_id" in df.columns else None

print(f"Total Revenue       : {total_revenue:,.2f}")
print(f"Total Orders        : {total_orders}")
if unique_customers is not None:
    print(f"Unique Customers    : {unique_customers}")

## 5. State-wise Revenue

In [None]:
state_rev = df.groupby("customer_state")["total_sales"].sum().sort_values(ascending=False)

plt.figure(figsize=(12, 6))
plt.bar(state_rev.index, state_rev.values)
plt.title("State-wise Revenue")
plt.xlabel("State")
plt.ylabel("Revenue")
plt.xticks(rotation=45)

save_plot("state_revenue")

## 6. Monthly Revenue Trend

In [None]:
df["month"] = df["order_date"].dt.to_period("M").astype(str)
monthly = df.groupby("month")["total_sales"].sum().reset_index()

plt.figure(figsize=(10, 5))
plt.plot(monthly["month"], monthly["total_sales"], marker="o")
plt.xticks(rotation=45)
plt.title("Monthly Revenue Trend")
plt.xlabel("Month")
plt.ylabel("Revenue")

save_plot("monthly_revenue")

## 7. Weekday-wise Order Count

In [None]:
df["weekday"] = df["order_date"].dt.day_name()
weekday = df["weekday"].value_counts().reindex(
    ["Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"]
)

plt.figure(figsize=(8, 5))
weekday.plot(kind="bar")
plt.title("Orders per Weekday")
plt.xlabel("Weekday")
plt.ylabel("Order Count")

save_plot("orders_weekday")

## 8. Payment Method Share

In [None]:
pay_count = df["payment_method"].value_counts()

plt.figure(figsize=(6, 6))
pay_count.plot(kind="pie", autopct="%1.1f%%", ylabel="")
plt.title("Payment Method Share")

save_plot("payment_share")

## 9. Top 10 Best-Selling Products (by Revenue)

In [None]:
top_prod = df.groupby("product")["total_sales"].sum().sort_values(ascending=False).head(10)

plt.figure(figsize=(10, 6))
top_prod.plot(kind="barh")
plt.title("Top 10 Products by Revenue")
plt.xlabel("Revenue")

save_plot("top_products")

## 10. Business Insights

Fill this section with your own words based on the charts above. Example:

- **High Revenue States**: Top states like `X`, `Y`, `Z` contribute majority of revenue â†’ focus ads & inventory.
- **Monthly Trend**: Revenue peaks around certain months â†’ plan campaigns & stock accordingly.
- **Weekday Pattern**: Higher orders on Thuâ€“Fri â†’ run targeted offers on these days.
- **Payment Behavior**: If COD is high â†’ work on building trust for online payments.
- **Top Products**: Push these in recommendations, bundles, and ads.

You can adjust insights based on your actual charts/output.
