# SuperKart Project

## **Problem Statement**

### **Business Context:**

A sales forecast predicts future sales revenue based on historical data, industry trends, and the status of the current sales pipeline. Businesses use the sales forecast to estimate weekly, monthly, quarterly, and annual sales totals. A company needs to make an accurate sales forecast as it adds value across an organization and helps the different verticals to chalk out their future course of action. Forecasting helps an organization plan its sales operations by region and provides valuable insights to the supply chain team regarding the procurement of goods and materials.
An accurate sales forecast process has many benefits, which include improved decision-making about the future and the reduction of sales pipeline and forecast risks. Moreover, it helps to reduce the time spent in planning territory coverage and establishes benchmarks that can be used to assess trends in the future.


### **Objective:**

They hired you as an MLOps Engineer, and your task is to build an automated MLOps pipeline with CI/CD to deliver accurate and reliable sales forecasts. The objective is to leverage historical sales data, industry trends, and the current pipeline status to predict weekly, monthly, quarterly, and annual revenues. By automating data ingestion, preprocessing, model training, evaluation, and deployment, the pipeline will ensure scalability, consistency, and minimal manual intervention. With CI/CD integration, forecasts will be continuously updated and seamlessly deployed, enabling different business verticals to plan sales operations by region, optimize supply chain procurement, reduce risks in sales pipelines, and establish benchmarks for future trend analysis. Ultimately, this solution will enhance decision-making, streamline planning efforts, and drive operational efficiency and business growth.

### **Data Dictionary:**

The data contains the different attributes of the various products and stores.

- **Product_Id**: Unique identifier of each product; starts with two letters followed by a number.
- **Product_Weight**: Weight of each product.
- **Product_Sugar_Content**: Sugar content of the product (low sugar, regular, no sugar).
- **Product_Allocated_Area**: Ratio of the display area allocated to the product relative to the total display area of all products in a store.
- **Product_Type**: Broad product category such as meat, snack foods, hard drinks, dairy, canned, soft drinks, health and hygiene, baking goods, bread, breakfast, frozen foods, fruits and vegetables, household, seafood, starchy foods, and others.
- **Product_MRP**: Maximum retail price of the product.
- **Store_Id**: Unique identifier of each store.
- **Store_Establishment_Year**: Year in which the store was established.
- **Store_Size**: Size of the store based on square footage (high, medium, low).
- **Store_Location_City_Type**: Type of city where the store is located (Tier 1, Tier 2, Tier 3); Tier 1 cities have a higher standard of living compared to Tier 2 and Tier 3.
- **Store_Type**: Type of store based on products sold (Departmental Store, Supermarket Type 1, Supermarket Type 2, Food Mart).
- **Product_Store_Sales_Total**: Total revenue generated from the sale of a particular product in a specific store.


# Model Building

In [None]:
# Create a master folder to keep all files created when executing the below code cells
import os
os.makedirs("tourism_project", exist_ok=True)

In [None]:
# Create a folder for storing the model building files
os.makedirs("tourism_project/model_building", exist_ok=True)

## Data Registration

In [None]:
os.makedirs("tourism_project/data", exist_ok=True)

Once the **data** folder created after executing the above cell, please upload the **tourism.csv** in to the folder

## Data Preparation

## Model Training and Registration with Experimentation Tracking

# Deployment

## Dockerfile

In [None]:
os.makedirs("tourism_project/deployment", exist_ok=True)

In [None]:
%%writefile tourism_project/deployment/Dockerfile
# Use a minimal base image with Python 3.9 installed
FROM python:3.9

# Set the working directory inside the container to /app
WORKDIR /app

# Copy all files from the current directory on the host to the container's /app directory
COPY . .

# Install Python dependencies listed in requirements.txt
RUN pip3 install -r requirements.txt

RUN useradd -m -u 1000 user
USER user
ENV HOME=/home/user \
	PATH=/home/user/.local/bin:$PATH

WORKDIR $HOME/app

COPY --chown=user . $HOME/app

# Define the command to run the Streamlit app on port "8501" and make it accessible externally
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0", "--server.enableXsrfProtection=false"]

Writing tourism_project/deployment/Dockerfile


## Streamlit App

Please ensure that the web app script is named `app.py`.

## Dependency Handling

Please ensure that the dependency handling file is named `requirements.txt`.

# Hosting

# MLOps Pipeline with Github Actions Workflow

**Note:**

1. Before running the file below, make sure to add the HF_TOKEN to your GitHub secrets to enable authentication between GitHub and Hugging Face.
2. The below code is for a sample YAML file that can be updated as required to meet the requirements of this project.

```
name: Tourism Project Pipeline

on:
  push:
    branches:
      - main  # Automatically triggers on push to the main branch

jobs:

  register-dataset:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Install Dependencies
        run: <add_code_here>
      - name: Upload Dataset to Hugging Face Hub
        env:
          HF_TOKEN: ${{ secrets.HF_TOKEN }}
        run: <add_code_here>

  data-prep:
    needs: register-dataset
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Install Dependencies
        run: <add_code_here>
      - name: Run Data Preparation
        env:
          HF_TOKEN: ${{ secrets.HF_TOKEN }}
        run: <add_code_here>


  model-traning:
    needs: data-prep
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Install Dependencies
        run: <add_code_here>
      - name: Start MLflow Server
        run: |
          nohup mlflow ui --host 0.0.0.0 --port 5000 &  # Run MLflow UI in the background
          sleep 5  # Wait for a moment to let the server starts
      - name: Model Building
        env:
          HF_TOKEN: ${{ secrets.HF_TOKEN }}
        run: <add_code_here>


  deploy-hosting:
    runs-on: ubuntu-latest
    needs: [model-traning,data-prep,register-dataset]
    steps:
      - uses: actions/checkout@v3
      - name: Install Dependencies
        run: <add_code_here>
      - name: Push files to Frontend Hugging Face Space
        env:
          HF_TOKEN: ${{ secrets.HF_TOKEN }}
        run: <add_code_here>

```

**Note:** To use this YAML file for our use case, we need to

1. Go to the GitHub repository for the project
2. Create a folder named ***.github/workflows/***
3. In the above folder, create a file named ***pipeline.yml***
4. Copy and paste the above content for the YAML file into the ***pipeline.yml*** file

## Requirements file for the Github Actions Workflow

## Github Authentication and Push Files

* Before moving forward, we need to generate a secret token to push files directly from Colab to the GitHub repository.
* Please follow the below instructions to create the GitHub token:
    - Open your GitHub profile.
    - Click on ***Settings***.
    - Go to ***Developer Settings***.
    - Expand the ***Personal access tokens*** section and select ***Tokens (classic)***.
    - Click ***Generate new token***, then choose ***Generate new token (classic)***.
    - Add a note and select all required scopes.
    - Click ***Generate token***.
    - Copy the generated token and store it safely in a notepad.

In [None]:
# Install Git
!apt-get install git

# Set your Git identity (replace with your details)
!git config --global user.email "<-------GitHub Email Address------->"
!git config --global user.name "<--------GitHub UserName--------->"

# Clone your GitHub repository
!git clone https://github.com/<--------GitHub UserName--------->/<--------GitHub Reponame--------->.git

# Move your folder to the repository directory
!mv /content/tourism_project/ /content/<--------GitHub Reponame--------->

In [None]:
# Change directory to the cloned repository
%cd <--------GitHub Reponame--------->/

# Add the new folder to Git
!git add .

# Commit the changes
!git commit -m "first commit"

# Push to GitHub (you'll need your GitHub credentials; use a personal access token if 2FA enabled)
!git push https://<--------GitHub UserName--------->:<--------GitHub Token--------->@github.com/<--------GitHub UserName--------->/<--------GitHub Reponame--------->.git

# Output Evaluation

- GitHub (link to repository, screenshot of folder structure and executed workflow)

- Streamlit on Hugging Face (link to HF space, screenshot of Streamlit app)

<font size=6 color="navyblue">Power Ahead!</font>
___