This project analyzes retail sales data to understand revenue trends, customer behavior, and product performance across different cities and regions.
- Cleaned and processed raw transaction data
- Converted variables into appropriate formats (date, categorical variables)
- Created new metrics such as revenue, cost, and profit
- Performed exploratory data analysis (EDA) to identify patterns and trends
- A small number of products generate a large portion of total revenue (Pareto effect)
- Major cities like New York, Los Angeles, and Chicago contribute the most revenue
- Returning customers generate more revenue and profit than new customers
- Promotions increase sales volume but can reduce profit margins if discounts are too high
- Revenue and profit analysis by product, city, and category
- Trend analysis over time (monthly and weekly patterns)
- Correlation analysis between variables (price, quantity, discount)
- Distribution analysis using histograms and box plots
- Pareto analysis to identify top-performing products
- R
- dplyr
- ggplot2
- gt (for tables)
- ggcorrplot
- How to clean and structure real-world datasets
- How to analyze business performance using data
- How to identify key drivers of revenue and profit
- How to communicate insights using visualizations
Md Boshirul Azad
Cybersecurity Student