## **Maximizing Revenue with Data: 2019 Sales Trends and Product Performance**

### **Business Understanding**

#### **Problem Satement**

A client assigned by getINNOtized, has collected transactional data for the year 2019 but hasn't been able to effectively use this data to improve sales or operational efficiency. They need insights into sales performance, seasonal trends, product popularity, and city-level sales to help drive more sales and streamline operations.

#### **Goal and Objectives**

The main goal is to deliver a comprehensive business intelligence (BI) solution that helps the client:

- Identify trends and seasonality in sales.
- Analyze product performance to discover best- and worst-selling items.
- Compare sales across different time periods (monthly, weekly) for actionable insights.
- Analyze geographical sales performance to identify cities with higher demand.
- Segment products based on price and analyze their contribution to total sales.


#### **Stakeholders**
- Primary Stakeholders: Management team looking for sales and operational insights.
- Secondary Stakeholders: Sales and marketing teams who can use the insights for future campaigns.
- Analysts: Those responsible for deriving and communicating actionable insights.
- Operations Team: Can leverage insights for improving efficiency in delivering products to high-demand areas.

#### **Key Metrics and Success**
- Total Sales Revenue: Monthly and yearly revenue.
- Seasonality Metrics: Monthly/quarterly sales trends.
- Product Performance: Revenue and quantity sold by product.
- Geographic Metrics: Sales distribution by city.
- Product Category Performance: Revenue and quantity sold by product category (high-level vs. basic).
- Operational Efficiency: Timeliness of reporting, ease of extracting actionable insights, and ability to identify growth   opportunities

#### **Hypotheses**
Null Hypothesis (H0): There is no significant seasonality in sales across the year.

Alternate Hypothesis (H1): There is significant seasonality in sales, with certain months showing higher or lower sales trends

#### **Analytical Questions**
1. How much money did we make this year? 

2. Can we identify any seasonality in the sales? 

3. What are our best and worst-selling products? 

4. How do sales compare to previous months or weeks? 

5. Which cities are our products delivered to most? 

6. How do product categories compare in revenue generated and quantities ordered? 

#### **Scope and Constraints**
Scope:
- Analysis will focus on the sales data for 2019, broken down by months, weeks, and product categories.
- Sales data for January to June will be extracted from CSV files, and data for July to December will be pulled from the database.
- The analysis will also involve comparison between high-level and basic products based on unit price thresholds.

Constraints:
- Data integration: Combining two different data sources (CSV and database).
- Time constraints for accessing, cleaning, and preparing the data.
- Potential inconsistencies in data formatting between the first and second halves of the year.


#### **Extra Information**
- The products with a unit price greater than $99.99 will be categorized as high-level products, and those below or equal to $99.99 will be considered basic products. This categorization will be critical for revenue comparisons across product types.
- To answer the questions efficiently, a blend of SQL for database querying and Python/Excel for data analysis and visualization will be required.
- This project must be completed in two weeks 






### **Data Understanding**

In [3]:
# **Importation of libraries**
 #Data manipulation and analysis
import pandas as pd
import numpy as np

# Managing environment variables
#from dotenv import dotenv_values


import warnings

warnings.filterwarnings('ignore')
 
# Database connectivity
import pyodbc
 
# Database ORM
from sqlalchemy import create_engine
 
# Data visualization
import matplotlib.pyplot as plt
import seaborn as sns
 

# Machine learning 
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.preprocessing import LabelEncoder
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import OrdinalEncoder 
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import RobustScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import MaxAbsScaler
from sklearn.preprocessing import PowerTransformer
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV


### **Data Preparation**

### **Modelling and Evaluation**

### **Deployment**