# Sales Data Analysis

   * Take sales CSV (date, product, price, quantity, region).
   * Find top-selling products.
   * Monthly sales trends using `resample()`.
   * Contribution of each region in revenue.

In [6]:
import pandas as pd

# Load dataset
df = pd.read_csv("sales_data_million.csv")

# Ensure Date is in datetime format
df["Date"] = pd.to_datetime(df["Date"], errors="coerce")

# Add Revenue column
df["Revenue"] = df["Price"] * df["Quantity"]

# 1. Top-selling products (by revenue)
top_products = (
    df.groupby("Product")["Revenue"]
    .sum()
    .reset_index()
    .sort_values(by="Revenue", ascending=False)
)
print("Top Selling Products:\n", top_products, "\n")

# 2. Monthly sales trends using resample
monthly_sales = (
    df.resample("ME", on="Date")["Revenue"]
    .sum()
    .reset_index()
)
print("Monthly Sales Trends:\n", monthly_sales.head(12), "\n")  # first 12 months

# 3. Contribution of each region in revenue
region_contribution = (
    df.groupby("Region")["Revenue"]
    .sum()
    .reset_index()
    .sort_values(by="Revenue", ascending=False)
)
print("Region Contribution:\n", region_contribution)


Top Selling Products:
       Product     Revenue
0      Camera  1053229822
4      Tablet  1052896466
2      Laptop  1048468273
3      Mobile  1047053655
1  Headphones  1045365945 

Monthly Sales Trends:
          Date    Revenue
0  2020-01-31  109948976
1  2020-02-29  103636986
2  2020-03-31  112142970
3  2020-04-30  107321575
4  2020-05-31  111057897
5  2020-06-30  108015482
6  2020-07-31  111464557
7  2020-08-31  111742610
8  2020-09-30  107997650
9  2020-10-31  111550564
10 2020-11-30  109078647
11 2020-12-31  112404868 

Region Contribution:
   Region     Revenue
2  South  1316354178
1  North  1312608223
0   East  1312293468
3   West  1305758292
