# Data Science Exercise: E-Commerce Sales Analysis

## ðŸ“Š Task
Create a dataset with 20 orders and analyze the sales data.

**Dataset should have:**
- Order ID (1-20)
- Product Category (Electronics, Clothing, Books, Food)
- Quantity (1-10 items)
- Price per item ($10-$500)
- Customer Age (18-65 years)

**Questions to answer:**
1. Calculate total revenue per order (quantity Ã— price)
2. Find the average order value
3. Which category has the highest total sales?
4. What's the average age of customers buying Electronics?

## âœ… Solution

In [None]:
import pandas as pd
import numpy as np

np.random.seed(42)

# Create the dataset
order_ids = np.arange(1, 21)
categories = np.random.choice(['Electronics', 'Clothing', 'Books', 'Food'], size=20)
quantities = np.random.randint(1, 11, size=20)
prices = np.random.randint(10, 501, size=20)
ages = np.random.randint(18, 66, size=20)

# Create DataFrame
df = pd.DataFrame({
    'Order_ID': order_ids,
    'Category': categories,
    'Quantity': quantities,
    'Price': prices,
    'Customer_Age': ages
})

print("Dataset:")
print(df)
print("\n" + "="*50)

In [None]:
# Question 1: Calculate total revenue per order
df['Total_Revenue'] = df['Quantity'] * df['Price']
print("1. Total Revenue per Order:")
print(df[['Order_ID', 'Total_Revenue']])
print("\n" + "="*50)

In [None]:
# Question 2: Average order value
avg_order_value = df['Total_Revenue'].mean()
print(f"2. Average Order Value: ${avg_order_value:.2f}")
print("\n" + "="*50)

In [None]:
# Question 3: Category with highest total sales
category_sales = df.groupby('Category')['Total_Revenue'].sum().sort_values(ascending=False)
print("3. Total Sales by Category:")
print(category_sales)
print(f"\nHighest selling category: {category_sales.index[0]} with ${category_sales.iloc[0]:.2f}")
print("\n" + "="*50)

In [None]:
# Question 4: Average age of Electronics buyers
electronics_buyers = df[df['Category'] == 'Electronics']
avg_age_electronics = electronics_buyers['Customer_Age'].mean()
print("4. Electronics Buyers:")
print(electronics_buyers[['Order_ID', 'Category', 'Customer_Age']])
print(f"\nAverage age of Electronics buyers: {avg_age_electronics:.1f} years")

## ðŸ“ˆ Summary Statistics

In [None]:
# Summary of the analysis
print("="*50)
print("SUMMARY")
print("="*50)
print(f"Total Orders: {len(df)}")
print(f"Total Revenue: ${df['Total_Revenue'].sum():.2f}")
print(f"Average Order Value: ${df['Total_Revenue'].mean():.2f}")
print(f"\nSales by Category:")
print(df.groupby('Category')['Total_Revenue'].sum())
print(f"\nAverage Customer Age: {df['Customer_Age'].mean():.1f} years")