# ML Zoomcamp 2024 Competition

## Problem Definition
__Business Problem__ : <br/>
__Goal__ : Forecast customer demand based on historical sales and related data from a retailer.

Data Collection

Code: Import and preview data.
Data Cleaning

Code: Handle missing values, outliers, etc.
Outputs: Summary statistics.
EDA

Code: Generate visualizations (e.g., pair plots, histograms).
Outputs: Insights with markdown commentary.
Feature Engineering

Code: Create and transform features.
Outputs: Updated dataset snapshot.

Model Training and Evaluation

Code: Train models, compare metrics.
Outputs: Confusion matrix, accuracy, precision, etc.
Model Deployment Simulation

Markdown: Describe how the model would be deployed.
Code: Simple simulation (e.g., predict new data).
Monitoring and Maintenance

Markdown: Discuss strategies for ongoing updates.

## Data Collection

### Import Data

In [6]:

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

from tabulate import tabulate
import qgrid as qg

# Set display options
pd.set_option('display.max_rows', 10)      # Max number of rows to display
pd.set_option('display.max_columns', 5)   # Max number of columns to display
pd.set_option('display.width', 100)       # Adjust the total display width
pd.set_option('display.colheader_justify', 'center')  # Align column headers

In [5]:
root_dir = '../data/raw/'
df_actual_matrix = pd.read_csv(root_dir + 'actual_matrix.csv')
df_catalog = pd.read_csv(root_dir + 'catalog.csv')
df_discounts_history = pd.read_csv(root_dir + 'discounts_history.csv')
df_markdowns = pd.read_csv(root_dir + 'markdowns.csv')
df_online = pd.read_csv(root_dir + 'online.csv')
df_price_history = pd.read_csv(root_dir + 'price_history.csv')
df_sales = pd.read_csv(root_dir + 'sales.csv')
df_stores = pd.read_csv(root_dir + 'stores.csv')

#(Kaggle Only) testing and sumbmission
df_sample_submission = pd.read_csv(root_dir + 'sample_submission.csv')
df_test = pd.read_csv(root_dir + 'test.csv')

### Exploratory Data Analysis

What I would like to explore while solving the problem? 
# Forecast customer demand based on historical data.

How do holidays effect data?
Store areas? 
Online vs Storefront?
Are some items markdown more often than others?
High/low selling items, when and what stores?
Do all stores carry the same products? 
Area/Location of stores? How do seasons affect sales?


In [7]:
#Stores

print(f'Store Data Shape: {df_stores.shape}\n')
print(f'Store Data Description: \n{tabulate(df_stores.describe(), headers='keys', tablefmt='pretty')}')
print(f'Store Data Head(): \n{tabulate(df_stores.head(), headers='keys', tablefmt='pretty')}')

#qgrid_store_widget = qg.show_grid(df_stores, show_toolbar=True)
#qgrid_store_widget

Store Data Shape: (4, 6)

Store Data Description: 
+-------+--------------------+--------------------+-------------------+
|       |     Unnamed: 0     |      store_id      |       area        |
+-------+--------------------+--------------------+-------------------+
| count |        4.0         |        4.0         |        4.0        |
| mean  |        1.5         |        2.5         |       926.5       |
|  std  | 1.2909944487358056 | 1.2909944487358056 | 900.5814788235432 |
|  min  |        0.0         |        1.0         |       109.0       |
|  25%  |        0.75        |        1.75        |      184.75       |
|  50%  |        1.5         |        2.5         |       855.0       |
|  75%  |        2.25        |        3.25        |      1596.75      |
|  max  |        3.0         |        4.0         |      1887.0       |
+-------+--------------------+--------------------+-------------------+
Store Data Head(): 
+---+------------+----------+----------+------------------+------

### Visualize Data

### Feature Engineering

## Model Training

### Model Evaluation

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mi>R</mi>
  <mi>M</mi>
  <mi>S</mi>
  <mi>E</mi>
  <mo>=</mo>
  <msqrt>
    <mfrac>
      <mn>1</mn>
      <mi>n</mi>
    </mfrac>
    <munderover>
      <mo>&#x2211;<!-- ∑ --></mo>
      <mrow class="MJX-TeXAtom-ORD">
        <mi>i</mi>
        <mo>=</mo>
        <mn>1</mn>
      </mrow>
      <mrow class="MJX-TeXAtom-ORD">
        <mi>n</mi>
      </mrow>
    </munderover>
    <mo stretchy="false">(</mo>
    <msub>
      <mi>y</mi>
      <mi>i</mi>
    </msub>
    <mo>&#x2212;<!-- − --></mo>
    <msub>
      <mrow class="MJX-TeXAtom-ORD">
        <mover>
          <mi>y</mi>
          <mo stretchy="false">&#x005E;<!-- ^ --></mo>
        </mover>
      </mrow>
      <mi>i</mi>
    </msub>
    <msup>
      <mo stretchy="false">)</mo>
      <mn>2</mn>
    </msup>
  </msqrt>
</math>


## Model Deployment