# Liquor Profits Prediction

![](https://github.com/LuciaZou/capstone/blob/main/Screenshot%20Capstone.png)

## Project Overview

### Problem Area
Our project is centered around assisting liquor stores in managing their extensive sales data. The primary challenge is to convert this raw data into valuable information. The central problem revolves around the transformation of this raw data into actionable insights, which are essential for boosting sales and facilitating well-informed business decisions.

### Impact of the Solution
Revenue Boost: Targeted marketing and personalized customer experiences will significantly increase sales, leaeding revenue growth.

Operational Efficiency: Optimized inventory management will reduce wastage and enhance efficiency, driving to cost savings.

Strategic Growth: Informed decisions about store locations will ensure successful expansions, maximizing profitability.

Enhanced Customer Satisfaction: Personalized offerings and a well-stocked inventory tailored to customer preferences will increase satisfaction levels.

### Description of the Dataset
From: https://www.kaggle.com/datasets/gabrielramos87/iowa-sales-liquor-jan-2021jan-2022 

The dataset contains:
Temporal Data: Date of orders.

Product Insights: invoice_and_item_number, item_number, item_description, volume_sold_liters, volume_sold_gallons category, category_name and sale_dollars.

Geospatial Data: address, city, zip_code, store_location, county_number and county.

Sales Metrics: pack, bottle_volume_ml, state_bottle_cost, state_bottle_retail, bottles_sold.         

The original data has 2805307 rows and 24 columns.


### Features
| Column Name                | Description                                                                                                                       | DataTypes |
|----------------------------|-----------------------------------------------------------------------------------------------------------------------------------|------------|
| invoice_and_item_number    | Concatenated invoice and line number associated with the liquor sold. This provides a unique identifier for the individual liquor products included in the store's sales. | object     |
| date                       | Date of sale.                                                                                                                     | object     |
| store_number               | Unique number assigned to the store that sold the liquor.                                                                         | int64      |
| store_name                 | Name of the store that sold the liquor.                                                                                            | object     |
| address                    | Address of the store that sold the liquor.                                                                                        | object     |
| city                       | City where the store that sold the liquor is located.                                                                             | object     |
| zip_code                   | ZIP code where the store that sold the liquor is located.                                                                         | float64    |
| store_location             | Location of the store that sold the liquor. The address, city, state, and ZIP code are geocoded to provide geographic coordinates. | object     |
| county_number              | Iowa county number for the county where the store that sold the liquor is located.                                             | float64    |
| county                     | County where the store that sold the liquor is located.                                                                           | object     |
| category                   | Category code associated with the liquor sold.                                                                                    | float64    |
| category_name              | Category of the liquor sold.                                                                                                      | object     |
| vendor_number              | The vendor number of the company for the brand of liquor sold.                                                                    | float64    |
| vendor_name                | The vendor name of the company for the brand of liquor sold.                                                                      | object     |
| item_number                | Item number for the individual liquor product sold.                                                                               | int64      |
| item_description           | Description of the individual liquor product sold.                                                                               | object     |
| pack                       | The number of bottles in a case for the liquor sold.                                                                              | int64      |
| bottle_volume_ml           | Volume of each liquor bottle sold in milliliters.                                                                                 | int64      |
| state_bottle_cost          | Cost for stores that paid for each bottle of liquor sold.                                                                         | float64    |
| state_bottle_retail        | Price for stores that sold each bottle of liquor sold.                                                                            | float64    |
| bottles_sold               | The number of bottles of liquor sold by the store.                                                                               | int64      |
| sale_dollars               | Total amount of liquor sold (number of bottles multiplied by the state bottle retail).                                           | float64    |
| volume_sold_liters         | Total volume of liquor sold in liters.                                                                                            | float64    |
| volume_sold_gallons        | Total volume of liquor sold in gallons.                                                                                          | float64    |


##   Project Workflow

### [Data Cleaning_PT1](https://github.com/LuciaZou/capstone/blob/main/DataCleaningPT1.ipynb)
We started clean the dataset by filling all null values.

### [Data Cleaning_PT2](https://github.com/LuciaZou/capstone/blob/main/DataCleaningPT2.ipynb)
Further data cleaning to ensure that the unique number of categorical columns matches with the corresponding numeric columns

### [Data Cleaning_PT3](https://github.com/LuciaZou/capstone/blob/main/DataCleaningPT3.ipynb)
Last part of data cleaning.

###  [EDA](https://github.com/LuciaZou/capstone/blob/main/EDA.ipynb)
Exploratory Data Analysis (EDA) has proven invaluable in comprehending feature distributions and their relationships with the target variable. This insight has not only influenced our choice of models but has also provided valuable guidance for feature engineering.

###  [Feature Engineering](https://github.com/LuciaZou/capstone/blob/main/Feature%20Engineering.ipynb)
We performed data transformation by employing one-hot encoding, thereby converting categorical variable into dummy variables that suitable for machine learning.

### [Models PT1](https://github.com/LuciaZou/capstone/blob/main/Models%20PT1.ipynb)
- Linear Regression
- Decision Tree
- Random Forest

### [Models PT2](https://github.com/LuciaZou/capstone/blob/main/ModelsPT2.ipynb)
- Lasso and Ridge
- XGboost
- Comparison

### Conclusions
The best model is XGBoost, with R-squared for train: 0.892 and R-squared for test: 0.862.

Potential improvements include:
Try Different Models like LightBoost, CNN, KNN, CatBoost...
Try validation dataset instead of test and train sets.

This project shows the practical application of data science techniques in predicting profits for liquor stores and provides precious insights for marketing decisions. Future work will aim to improve the model's performance and interpretability by using different models.