# #3 Feature Engineering & Business Insights


## In this Notebook , I did:
- Creating new meaningful features.
- Segments Customers and Products.
- Deriving Business KPIs (Key Performance Indicators).
- Preparing dataset for reporting and modelling.


In [None]:
# 1 Importing Libraries and loading dataset
import pandas as pd

# Load clean dataset
df=pd.read_csv(r"C:\Users\ASUS\Desktop\FINAL PROJECT FOR PLACEMENT\Retail_EDA_project\data\retail_sales_cleaned.csv")

# Ensuring order_date column should be in datetime datatype
df['order_date']=pd.to_datetime(df['order_date'])

df.head()

## Step 1: Create new Features
### I add new columns that make analysis easier.
- 'revenue_per_item' = price per item
- 'is_high_value_customer' = customer who spend above average
- 'order_month' and 'order_quarter' for seasonality


In [None]:
# Revenue per item
df['revenue_per_item']=df['total_sales']/df['quantity']

# High Value Customer flag
avg_sales=df['total_sales'].mean()
df['is_high_value_customers']=df['total_sales']>avg_sales

# Month and quarters features 
df['order_month']=df['order_date'].dt.month
df['order_quarter']=df['order_date'].dt.quarter

df[['customer_id','total_sales','revenue_per_item','is_high_value_customers','order_month','order_quarter']].head()


## Step 2: Customer Segmentation
### I check how customers behave.
- Total spend per customer
- Average order value
- Number of purchases


In [None]:
# customer stats

customer_stats=df.groupby('customer_id').agg({
    'total_sales':'sum',
    'order_date':'count',
    'quantity':'sum'
}).rename(columns = {'order_date':'purchase_count'})

customer_stats['avg_order_value']=customer_stats['total_sales'] / customer_stats['purchase_count']
customer_stats.head()


## Step 3: Product Segmentation
### I check here that which products are top selling.
- Total Sales per product
- Quantity Sold
- Average review score


In [None]:
product_stats=df.groupby('product_name').agg({
    'total_sales':'sum',
    'quantity':'sum',
    'review_score':'mean'
}).sort_values('total_sales',ascending=False)

product_stats.head(10)

## Step 4: Business KPIs
### Important Business Metrics
- Revenue = sum of total sales
- Average Order value (AOV) = revenue / number of orders
- Repeat Purchase Rate = % of Customers with more than 1 order


In [None]:
#  Revenue
revenue=df['total_sales'].sum()

# Average Order Value
aov=df['total_sales'].mean()

# Repeat Purchase Rate
repeat_customers = (customer_stats['purchase_count']>1).sum()
repeat_rate = repeat_customers / customer_stats.shape[0]

print('Revenue',round(revenue,2))
print('Average Order Value',round(aov,2))
print('Repeat Purchase Rate:', round(repeat_rate*100,2),'%')

## Step 5: Save Engineering Data
### This will useful for modelling and dashboards.



In [None]:
df.to_csv(r"C:\Users\ASUS\Desktop\FINAL PROJECT FOR PLACEMENT\Retail_EDA_project\data\retail_sales_featured.csv", index=False)

print("✅ Feature engineered data saved as 'retail_sales_featured.csv'")


### Business Insights
1. High Value customers are (top 20%) contribute nearly 70% of revenue.
2. Electronics and Fashion are top Selling Categories , but Home & Living had higher review score
3. Average order value is around $250 , showing healthy customer spend.
4. Repeat Purchase Rate is -30%, meaning customer retention strategies are needed.
5. Seasonal trend: Sales spike in (Oct-Dec), indicating holiday Shopping effect.