# 💰 Finance Data Analysis (Dummy Dataset)
This notebook demonstrates financial data analysis using a **synthetic dataset** created for learning purposes.

The dataset simulates **monthly transactions** across customers, products, and regions.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style='whitegrid')
pd.set_option('display.max_columns', 50)


In [None]:
# Generate dummy finance dataset
np.random.seed(42)

n = 1000
dates = pd.date_range(start="2020-01-01", periods=36, freq="M")
data = {
    "Date": np.random.choice(dates, n),
    "CustomerID": np.random.randint(1000, 2000, n),
    "Region": np.random.choice(["North", "South", "East", "West"], n),
    "ProductCategory": np.random.choice(["Equity", "Bonds", "Insurance", "Mutual Funds", "Loans"], n),
    "Revenue": np.random.randint(5000, 50000, n),
    "Expenses": np.random.randint(2000, 30000, n)
}
df = pd.DataFrame(data)
df["Profit"] = df["Revenue"] - df["Expenses"]
df.head()

In [None]:
# Basic info
print(df.shape)
print(df.info())
df.describe()

In [None]:
# Revenue over time
monthly = df.groupby(df["Date"].dt.to_period("M")).agg({"Revenue":"sum","Profit":"sum"}).reset_index()
monthly["Date"] = monthly["Date"].dt.to_timestamp()

plt.figure(figsize=(10,5))
sns.lineplot(data=monthly, x="Date", y="Revenue", marker="o", label="Revenue")
sns.lineplot(data=monthly, x="Date", y="Profit", marker="o", label="Profit")
plt.title("Monthly Revenue & Profit Trend")
plt.xlabel("Date")
plt.ylabel("Amount")
plt.xticks(rotation=45)
plt.legend()
plt.show()

In [None]:
# Revenue by Region
plt.figure(figsize=(6,4))
sns.barplot(data=df, x="Region", y="Revenue", estimator=sum, ci=None, palette="Set2")
plt.title("Total Revenue by Region")
plt.show()

In [None]:
# Product category contribution
plt.figure(figsize=(6,6))
df.groupby("ProductCategory")["Revenue"].sum().plot(kind="pie", autopct="%1.1f%%")
plt.title("Revenue Share by Product Category")
plt.ylabel("")
plt.show()

In [None]:
# Correlation heatmap
plt.figure(figsize=(5,4))
sns.heatmap(df[["Revenue","Expenses","Profit"]].corr(), annot=True, cmap="coolwarm", vmin=-1, vmax=1)
plt.title("Correlation between Revenue, Expenses, and Profit")
plt.show()

## 🔍 Observations
- **Revenue and Profit** show steady growth over the months.  
- The **North and South regions** contribute the most to total revenue.  
- **Mutual Funds and Loans** dominate the revenue share among products.  
- **Profit is highly correlated with Revenue** (as expected), while Expenses negatively impact Profit.  


## ✅ Outcome
This project demonstrates how to create and analyze a dummy **finance dataset** using Python.  
We practiced:  
- Data generation with NumPy & Pandas  
- Descriptive statistics and cleaning  
- Time series trend analysis  
- Comparative analysis across regions and products  
- Correlation analysis  

This notebook can serve as a template for **financial data analysis projects** in portfolios.  
