-
Notifications
You must be signed in to change notification settings - Fork 47
Open
Labels
Question & AssignmentFurther information is requestedFurther information is requested
Description
This case study helps you build strong skills in data cleaning, data manipulation, aggregation, feature engineering, and exploratory data analysis (EDA) using Pandas. You will analyze an online retail sales dataset to understand customer purchasing behavior, sales performance, and revenue trends.
Problem Statement
You are working as a data analyst for an e-commerce company. Management wants insights into sales trends, customer behavior, product performance, and potential revenue issues to improve business decisions.
You have been provided with a dataset named online_retail_sales.csv, which contains detailed records of customer orders.
Dataset Overview (online_retail_sales.csv)
Columns Description
| Column Name | Data Type | Description |
|---|---|---|
| Order_ID | int | Unique order identifier |
| Customer_ID | int | Unique customer identifier |
| Order_Date | str | Date of order (YYYY-MM-DD) |
| Product | str | Name of the product |
| Category | str | Product category (Electronics, Clothing, Home, etc.) |
| Quantity | int | Number of units purchased |
| Unit_Price | float | Price per unit (USD) |
| Total_Amount | float | Total order amount |
| Payment_Method | str | Credit Card, UPI, PayPal, Cash, etc. |
| Order_Status | str | Completed, Cancelled, Returned |
| City | str | Customer city |
Assignment Tasks
1. Data Exploration and Cleaning
- Load the dataset into a Pandas DataFrame and display the first 5 rows.
- Check the dataset shape, column names, data types, and summary statistics.
- Identify missing values in each column.
- Handle missing values appropriately:
- Numerical columns: fill with mean or median.
- Categorical columns: fill with mode or drop.
- Convert
Order_Dateinto datetime format. - Extract Year, Month, and Day from
Order_Dateinto new columns.
2. Data Selection and Filtering
- Retrieve all orders placed in March 2024.
- Find all orders where:
Total_Amount > 1000Categoryis Electronics.
- Select only Completed orders.
- Retrieve orders placed from Mumbai or Delhi.
3. Data Manipulation and Feature Engineering
- Create a new column
Discounted_Amount:- Apply a 10 percent discount on orders where
Total_Amount > 800.
- Apply a 10 percent discount on orders where
- Create a column
Price_Levelbased onTotal_Amount:- Low: Below 200
- Medium: 200 to 800
- High: Above 800
- Create a new column
Revenue_Per_Item:Revenue_Per_Item = Total_Amount / Quantity
- Drop the
Productcolumn if more than 40 percent of values are missing.
4. Aggregation and Business Insights
- Find the total revenue per Category.
- Calculate the average order value per Payment_Method.
- Identify the top 5 cities by total sales.
- Find the number of cancelled and returned orders per Category.
- Determine the monthly sales trend.
5. Customer Behavior Analysis
- Identify customers who placed more than 5 orders in a single month.
- Find customers whose average order value exceeds 1500.
- Identify customers who placed orders from multiple cities.
- Find repeat customers with more than one order.
6. Anomaly and Risk Indicators
- Identify orders where:
- Quantity is greater than 20
- Total_Amount is less than 100
- Find returned orders with
Total_Amount > 2000. - Detect customers who placed multiple orders on the same day.
7. Data Merging and Joining
You are given another dataset named customer_details.csv with the following columns:
| Column Name | Description |
|---|---|
| Customer_ID | Unique customer identifier |
| Age | Customer age |
| Gender | Male or Female |
| Membership_Level | Silver, Gold, or Platinum |
Tasks
- Merge
online_retail_sales.csvwithcustomer_details.csvusingCustomer_ID. - Find the average order value by Membership_Level.
- Identify the membership group generating the highest revenue.
- Analyze purchasing behavior by age group.
Bonus Tasks (Optional)
- Visualize monthly revenue using a line chart.
- Create a bar chart for revenue by category.
- Plot the top 10 customers by total spending.
Metadata
Metadata
Assignees
Labels
Question & AssignmentFurther information is requestedFurther information is requested