# Dynamic pricing

* A pricing strategy that adjusts the price of a product or service in real-time
* Based on factors:
  * Market demand
  * Competition
  * Customer behaviour
* Goal is to maximize profits 
  * By finding the optimal price point that customers are willing to pay
  * Costs associated with producing and selling the product


## Algorithms

### Rule-based
Uses pre-defined rules to set prices i.e. increase price if demand is high or lower price if demand is low.

### Time-series forecasting
Uses historical data to forecast future demand and adjust prices accordingly. Takes into account seasonal patterns and trends to optimize pricing.

### Machine Learning algorithm
Use historical data to learn patterns and adjust prices based on predictive models. Can include regression analysis, decision trees, and neural networks.

### Reinforcement Learning algorithm
Uses trial and error to optimize pricing. Algorithm makes decisions based on feedback from customer behaviour and ajusts prices to maximize profit.

### Multi-armed bandit algorithm
Used in situations where there are multiple products or pricing options. Uses trial and error to test different pricing strategies and learns which ones are most effective in maximizing profit.


In [1]:
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

### Join datasets

* Brazilian E-commerce data in multiple CSVs

In [6]:
# Read multiple CSV files
FILE_PATH = "/workspaces/dynamic_pricing/data/"

order_items    = pd.read_csv(FILE_PATH + "olist_order_items_dataset.csv")
orders         = pd.read_csv(FILE_PATH + "olist_orders_dataset.csv")
order_payments = pd.read_csv(FILE_PATH + "olist_order_payments_dataset.csv")
products       = pd.read_csv(FILE_PATH + "olist_products_dataset.csv")
customers      = pd.read_csv(FILE_PATH + "olist_customers_dataset.csv")
sellers        = pd.read_csv(FILE_PATH + "olist_sellers_dataset.csv")
product_category_translation = pd.read_csv(FILE_PATH + "product_category_name_translation.csv")

# Merge datasets
merged = order_items.merge(orders, on='order_id') \
                    .merge(order_payments, on=['order_id']) \
                    .merge(products, on='product_id') \
                    .merge(customers, on='customer_id') \
                    .merge(sellers, on='seller_id') \
                    .merge(product_category_translation, on='product_category_name')

# Save the consolidated dataset to a CSV file
merged.to_csv(FILE_PATH + 'brazilian_ecommerce_dataset.csv', index=False)

### Load merged Brazilian E-commerce dataset

* Preprocess data
* Drop columns
  * seller_id
  * freight_value
* customer_state restricted to SP, RJ, MG
* Limit to columns
  * product_category_name
  * product_photos_qty
  * product_weight_g
  * product_length_cm
  * product_height_cm
  * product_width_cm
  * customer_state
  * price
* Remove price outliers
* Convert categorical variables to numerical values

In [10]:
# Load the dataset
df = pd.read_csv(FILE_PATH + "brazilian_ecommerce_dataset.csv")

# Preprocess the data
# Drop irrelevant columns and handle missing data
df = df.drop(['seller_id', 'freight_value'], axis=1)
df = df[df['customer_state'].isin(["SP","RJ","MG"])]
df = df[['product_category_name','product_photos_qty','product_weight_g', 'product_length_cm','product_height_cm','product_width_cm','customer_state','price']]
df = df.dropna()

# Remove outliers
df = df[(df['price'] >= df['price'].quantile(0.05)) & (df['price'] <= df['price'].quantile(0.95))]

# Convert categorical variables to numerical values
df = pd.get_dummies(df, columns=['product_category_name', 'customer_state'])

## Train-test split

In [None]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df.drop('price', axis=1), df['price'], test_size=0.2, random_state=42)

## Build and train models to predict pricing

### 1. Random Forest Regressor

In [11]:
# Train a machine learning model
rf = RandomForestRegressor(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

### Evaluate model

* Evaluate using Mean Absolute Error (MAE)

In [13]:
# Evaluate the model
y_pred = rf.predict(X_test)
mae = mean_absolute_error(y_test, y_pred)
print(f"Mean absolute error: {mae:.3f}")

Mean absolute error: 13.381


### Predict on new data

* 'product_category_name_beleza_saude' (health & beauty product) is True
* 'product_photos_qty' = 2.0
* 'product_weight_g' = 500.0
* 'product_length_cm' = 20.0
* 'product_height_cm' = 10.0
* 'product_width_cm' = 30.0
* 'customer_state_SP' = True