# 

## **1. Preliminary Data Exploration** 📊🔍

Getting a feel for the data is an important first step. This is where we can start to understand the data and identify any potential issues. We can also start to think about how we might want to structure our data for modeling. For the assignment, this step will help in choosing a modeling approach 🤔💡.

In [None]:
# Import Libraries

import pandas as pd

In [None]:
# Load Data
original_df = pd.read_csv('../data/orders_autumn_2020.csv')

# Create a copy of the original dataframe
df = original_df.copy()

In [None]:
# Check columns and data types
df.info()

**🔍 Observation:** There are 18706 data points in the dataset with 13 features. Also, there seems to be some missing values in the dataset 🚫📉.

In [None]:
# Check missing values
df.isna().sum()

**🔍 Observation**: Some values for the weather 🌤️ features are missing. We will have to handle these missing values (impute, drop) if we decide to use these features depending on our model (some models handle missing data natively).


In [None]:
# Check for duplicates
df.duplicated().sum()

**🔍 Observation**: There are no duplicate data points in the dataset.

In [None]:
# Get statistical summary of the dataset
df.describe().T

**🔍 Observations**: 
- ⏱️ The data indicates a generally efficient delivery system, with actual delivery times often being less than estimated.
- 🌦️ The weather conditions show notable variation, which could impact delivery times.
- 🗺️ The proximity of users to venues (latitude, longitude) suggests a densely populated or urban area, possibly leading to quicker deliveries.


In [None]:
#  Check how the dataset looks like
df.head(10)

## **2. Modeling Approach** 🤖📈

After getting a feel of the data, and taking into account what I have learnt about Wolt's business model, the following modeling approach is chosen:

### **🔮 "What would be the order demand for the next day?"** 

### **Motivations for the modeling approach:**
Accurate forecasting of item demand helps Wolt's business model in the following ways:

- 💼 **Resource Management:** Ensures optimal allocation of delivery personnel and reduces operational costs. 

- 👩‍🍳 **Partner Restaurants Efficiency:** Helps restaurants prepare for demand, improving food quality and reducing waste. 

- 💰 **Dynamic Pricing:** Enables effective dynamic pricing and promotional strategies to manage demand. 

- 📈 **Financial Planning:** Essential for revenue forecasting and strategic decisions, like market expansion. 

## **3. Exploratory Data Analysis**

- Trend and Seasonality Analysis

## **4. Model Selection**

Given the nature of the data, timeseries forecasting models are a good choice. The data is a time series of item demand, and we want to predict the demand for the next day.

> ### **Following models are considered for the approach:**

- **📉 SARIMAX (Baseline):** It is a good baseline model for our data as it has seasionality and external factors. It is also relatively easy to interpret and explain the results than blackbox models.

- 🚀 **XGBoost:** XGBoost is a powerful ensemble machine learning model that can handle both regression and timeseries forecasting tasks. It can capture complex patterns and dependencies in the data. 

- **🧠 LSTM:** LSTM is a good choice for timeseries forecasting, and can be used to capture the non-linearities in the data. 


## **5. Feature Engineering**

- hourly, daily and weekly decomposition
- calculate distance between venue and delivery location

## **6. Modeling**

### **6.1 SARIMAX (Baseline)**

### **6.2 XGBoost**

### **6.3 LSTM**

## **7. Evaluation**

## **8. Further Development**