## Media Mix Modelling 
Here, MMM is implemented using `lightweight_mmm`. 

**Overview:** 
Media mix models are statistical models that are used to understand the Return On Investment(ROI) of various media channels and to optimiza the budget. The dataset will be `media spends` (independent variables) and `sales/revenue` (dependent variable).

This exercise uses a [dataset available in kaggle](https://www.kaggle.com/datasets/mediaearth/traditional-and-digital-media-impact-on-sales/data).

### Importing Required Libraries

In [2]:
import pandas as pd

# Access datasets from Kaggle
import kagglehub  # Interface to download datasets from Kaggle
from kagglehub import KaggleDatasetAdapter  # Adapter to interact with Kaggle datasets

# Import jax.numpy and any other library we might need.
import jax.numpy as jnp
import numpyro

# Import the relevant modules of the library
from lightweight_mmm import lightweight_mmm
from lightweight_mmm import optimize_media
from lightweight_mmm import plot
from lightweight_mmm import preprocessing
from lightweight_mmm import utils

### Importing Data

In [3]:
# Load the dataset from Kaggle using the KaggleHub adapter in pandas format
data = kagglehub.dataset_load(
    KaggleDatasetAdapter.PANDAS,
    "mediaearth/traditional-and-digital-media-impact-on-sales",  # Dataset slug
    "mediamix_sales.csv"  # Specific file to load
)

# Convert the 'Time' column to datetime format and store it in a new column 'Date'
data['Date'] = pd.to_datetime(data['Time'], format='%d/%m/%y')

# Drop the original 'Time' column as it's now redundant
data = data.drop(columns=['Time'], axis=1)

# Remove any duplicate rows from the dataset
data = data.drop_duplicates()
data = data.sort_values(by=['Date'],ascending=True)

# Display the first few rows of the cleaned dataset
data.head()

Unnamed: 0,tv_sponsorships,tv_cricket,tv_RON,radio,NPP,Magazines,OOH,Social,Programmatic,Display_Rest,Search,Native,sales,Date
0,119.652,66.729,43.719,37.8,55.36,13.84,35,41.8782,5,33.50256,26.802048,5,22100,2001-01-01
1,23.14,12.905,8.455,39.3,36.08,9.02,35,8.099,5,6.4792,5.18336,6,10400,2001-02-01
2,8.944,4.988,3.268,45.9,55.44,13.86,35,3.1304,5,2.50432,2.003456,7,9300,2001-03-01
3,78.78,43.935,28.785,41.3,46.8,11.7,35,27.573,5,22.0584,17.64672,5,18500,2001-04-01
4,94.016,52.432,34.352,10.8,46.72,11.68,35,32.9056,5,26.32448,21.059584,7,12900,2001-05-01


In [9]:
# media variables - spends 
media_cols=['tv_sponsorships', 'tv_cricket', 'tv_RON', 'radio', 'NPP', 'Magazines',
       'OOH', 'Social', 'Programmatic', 'Display_Rest', 'Search', 'Native']

# dependent variable
dv =['sales']

In [10]:
SEED = 42
data_size = len(data)

n_media_channels = len(media_cols)
n_extra_features = 0
media_data = data[media_cols].to_numpy()
target = data['sales'].to_numpy()
costs = data[media_cols].sum().to_numpy()

# Split and scale data.
test_data_period_size = 24
split_point = data_size - test_data_period_size
# Media data
media_data_train = media_data[:split_point, ...]
media_data_test = media_data[split_point:, ...]

# Target
target_train = target[:split_point]


media_scaler = preprocessing.CustomScaler(divide_operation=jnp.mean)
target_scaler = preprocessing.CustomScaler(divide_operation=jnp.mean)
cost_scaler = preprocessing.CustomScaler(divide_operation=jnp.mean, multiply_by=0.15)

media_data_train = media_scaler.fit_transform(media_data_train)
target_train = target_scaler.fit_transform(target_train)
costs = cost_scaler.fit_transform(costs)





mmm = lightweight_mmm.LightweightMMM(model_name="hill_adstock")
mmm.fit( media=media_data_train, media_prior=costs, target=target_train, media_names = media_cols, seed=SEED)


TypeError: asarray() got an unexpected keyword argument 'copy'