Skip to content

LogapriyaS4/Python

Repository files navigation

Sales Data Analysis and Revenue Prediction Project


Step 1: Objective Goal: • Analyze sales data to understand revenue trends, top-selling products, regional performance, returns, and campaign ROI. • Build a predictive model to forecast revenue using key features. Why: • Businesses need insights to make data-driven decisions and improve revenue and marketing efficiency.


Step 2: Required Libraries Libraries used: • pandas → For data loading, cleaning, and manipulation • matplotlib & seaborn → For visualization of KPIs and trends • sklearn → For preprocessing categorical data, splitting datasets, building regression model, and evaluating performance Example: import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from sklearn.preprocessing import LabelEncoder from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error, r2_score Why: • These libraries provide all the tools needed for data analysis, visualization, and modeling.


Step 3: Load the Dataset • Used pandas.read_csv() to load sales data from CSV file. • Checked initial rows using data.head() to understand the data structure. • data.info(): Shows the dataset’s structure, column names, data types, and non-null counts. • data.describe(): Provides statistical summary (mean, median, min, max, quartiles) for numeric columns. Why: • To explore and understand the dataset before preprocessing.


Step 4: Data Cleaning & Preprocessing

  1. Check for missing values: o Used data.isnull().sum() to identify columns with missing data.
  2. Fill missing values: o Numeric columns (Quantity, Price, Revenue, Campaign Cost) → filled with median or mean o Categorical column (Region) → filled with mode
  3. Encode categorical variables: o LabelEncoder used to convert Category and Region into numerical values for regression. Why: • Missing values can affect calculations and models. • Categorical data needs encoding for regression models.

Step 5: Calculate Key Performance Indicators (KPIs) • Total Revenue → sum of Revenue column • Average Revenue per Order → mean of Revenue • Total Quantity Sold → sum of Quantity • Top Product → product with highest total quantity sold • Total Returns → count of orders with Return Flag = 'Yes' • Campaign ROI → (Total Revenue - Campaign Cost) / Campaign Cost * 100 Why: • KPIs summarize the business performance and provide actionable insights.


Step 6: Data Visualization Visualizations created:

  1. Revenue by Region → bar chart
  2. Top 5 Products by Quantity Sold → bar chart
  3. Revenue Over Time → line chart Why: • Visuals make patterns easier to understand. • Helps identify high-performing regions, products, and seasonal trends.

Step 7: Revenue Prediction Using Linear Regression

  1. Select features: Category, Quantity, Price, Region, Campaign Cost
  2. Target variable: Revenue
  3. Split dataset: 80% training, 20% testing
  4. Train model: LinearRegression()
  5. Predict revenue: on test set
  6. Evaluate model: MSE and R² score Why: • Predictive modeling helps forecast revenue and supports business planning. • Regression identifies which factors influence revenue the most.

Step 8: Insights & Observations • Campaign ROI was positive → marketing effective. • Tablet were top-selling → focus on inventory and promotion. • South and west regions generated most revenue → prioritize these regions. • Seasonal revenue trends → plan promotions and stock accordingly. • Linear Regression model R² ~0.71 → model explains 71% of revenue variance.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published