This is a complete Sales Analytics and Prediction System built as a web application using Python and Streamlit. Users can upload any CSV/Excel sales file and the system automatically:
- Cleans and processes the data
- Generates interactive visualizations
- Trains a Machine Learning model and predicts future sales
- Provides smart product recommendations
- Shows intelligent alerts and warnings
- Supports 3 languages: English, Gujarati (ΰͺΰ«ΰͺΰͺ°ΰͺΎΰͺ€ΰ«), Hindi (ΰ€Ήΰ€Ώΰ€¨ΰ₯ΰ€¦ΰ₯)
sales_analytics/
β
βββ app.py β Main Streamlit application (UI + logic)
β
βββ utils/
β βββ __init__.py
β βββ translations.py β All text in 3 languages (English/Gujarati/Hindi)
β βββ data_processor.py β Data loading, cleaning, aggregation, alerts
β βββ visualizations.py β All Plotly charts (line, bar, pie, etc.)
β
βββ models/
β βββ __init__.py
β βββ predictor.py β ML model (Polynomial Regression + Scikit-learn)
β
βββ data/
β βββ sample_sales_data.csv β Sample dataset for testing
β
βββ requirements.txt β All Python packages needed
βββ README.md β This file
Make sure Python 3.9 or higher is installed.
python --version
Open terminal / command prompt in the project folder and run:
pip install -r requirements.txtstreamlit run app.pyThe app will automatically open in your browser at: http://localhost:8501
Your CSV file should have these columns (column names are flexible β the system auto-detects them):
| Column | Description | Example |
|---|---|---|
| Date | Sale date | 2023-01-15 |
| Product | Product name | Laptop Pro |
| Category | Product category | Electronics |
| Units_Sold | Number of units sold | 25 |
| Unit_Price | Price per unit (βΉ) | 45000 |
| Total_Sales | Total revenue (βΉ) | 1125000 |
| Cost | Cost price | 787500 |
| Profit | Net profit | 337500 |
| Region | Sales region | North |
Note: The system is smart β it can work even if some columns are missing. It will auto-estimate missing values.
- File:
utils/translations.py - Contains all text in English, Gujarati, and Hindi
get_text(lang, key)function returns the correct translation- The language selector in the sidebar changes the entire app's text
- File:
utils/data_processor.py load_data()β reads CSV or Excel filespreprocess_data()β cleans data, detects columns automatically, creates Month/Year/Season columnsget_monthly_data()β groups sales by monthget_top_products()β finds best-selling productsgenerate_alerts()β compares recent vs. past performance
- File:
utils/visualizations.py - All charts use Plotly (interactive, hover-enabled)
- Line chart β monthly sales trend
- Bar chart β top products
- Pie/Donut chart β category distribution
- Grouped bar β profit vs sales
- Regional bar chart
- Seasonal bar chart
- Prediction chart with confidence band
- File:
models/predictor.py - Algorithm: Polynomial Regression (degree 2) using Scikit-learn
- Features used: Time index, sin/cos of month (seasonality), quarter
- Train/Test split: 80% training, 20% validation
- Metrics: RΒ² Score (accuracy) and MAE (mean absolute error)
- Output: Next 3 months predicted sales with confidence interval (Β±15%)
- Top 5 products ranked by composite score (sales + profit margin + volume)
- Seasonal performance analysis (Spring/Summer/Autumn/Winter)
- Best and worst performing months
- Compares last 3 months vs. previous 3 months
- Detects: declining sales, loss-making products, high demand trends, best seasons
What algorithm is used?
β Polynomial Regression (a smarter version of Linear Regression)
How does it work?
β It learns patterns from historical monthly sales data. It understands:
- Is sales going up or down over time?
- Which months tend to be high/low? (seasonality)
- What quarter are we in?
What does it predict?
β Total sales amount (βΉ) for each of the next 3 months
How accurate is it?
β Measured by RΒ² Score (0 to 1). An RΒ² of 0.85 means the model explains 85% of sales variation β which is good for this type of data.
Q1: What is the main purpose of your project?
A: Our project automates sales analysis. A business owner simply uploads their sales CSV file, and the system automatically analyzes the data, creates charts, predicts future sales using ML, and shows alerts β all in a multilingual web interface.
Q2: Which ML algorithm did you use and why?
A: We used Polynomial Regression from Scikit-learn. We chose it because sales data has seasonal patterns (non-linear) that simple linear regression misses. Polynomial regression can capture these curves. We also encode month using sin/cos to capture seasonality mathematically.
Q3: How does your multilingual support work?
A: We created a translations.py file with a Python dictionary containing every text in English, Gujarati, and Hindi. A get_text(language, key) function returns the correct translation. The user selects their language from the sidebar, and all text updates instantly.
Q4: How do you handle different CSV file formats?
A: Our data_processor.py uses keyword-based column detection. It searches for columns with names like "date", "product", "sales", "profit", etc. If a column is missing, it estimates it (e.g., profit = 25% of sales). This makes the system flexible with any CSV format.
Q5: What is the prediction accuracy?
A: The model is evaluated using RΒ² Score and Mean Absolute Error. With 12+ months of data, the RΒ² score is typically 0.80β0.95. We also show a confidence band (Β±15%) around predictions to communicate uncertainty.
Q6: How does the alert system work?
A: The generate_alerts() function compares the last 3 months average sales with the previous 3 months. If sales dropped by >10%, a danger alert is shown. It also checks for products with negative profit, identifies demand trends, and highlights the best season.
Q7: What technologies did you use?
A:
- Python β core programming language
- Streamlit β web framework for the UI
- Pandas & NumPy β data manipulation
- Scikit-learn β machine learning
- Plotly β interactive charts
Q8: Can this system work with real business data?
A: Yes! The system auto-detects column names, handles missing values, and works with any CSV or Excel file. We tested it with various formats. The flexible preprocessing makes it production-ready.
- Upload Screen β Clean upload area with sample CSV download button
- KPI Dashboard β 6 metric cards (total sales, profit, best month, etc.)
- Analytics Tab β 6 interactive charts in a dashboard layout
- Prediction Tab β ML model results + 3-month forecast chart + table
- Recommendations Tab β Top 5 product cards + seasonal analysis
- Alerts Tab β Color-coded business alerts (red=danger, yellow=warning, blue=info)
- Show the running app in browser
- Upload the sample CSV and let the system process it
- Explain each tab β Analytics β Prediction β Recommendations β Alerts
- Change the language to Gujarati or Hindi to demonstrate multilingual feature
- Show the code structure β explain which file does what
- Highlight the ML part β show RΒ² score and prediction chart
| Feature | Technology Used |
|---|---|
| Web UI | Streamlit |
| Data Processing | Pandas, NumPy |
| Charts | Plotly |
| Machine Learning | Scikit-learn (Poly Regression) |
| Multi-language | Custom translation system |
| File Support | CSV, XLSX (via openpyxl) |
Built with β€οΈ as a Final Year B.Tech/BCA/MCA Project