<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google/meridian/blob/main/demo/Meridian_Getting_Started.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/google/meridian/blob/main/demo/Meridian_Getting_Started.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

# **Tirerack Meridian Implementation**

Welcome to the Meridian for TireRack. This simplified demo showcases the fundamental functionalities and basic usage of the library, including working examples of the major modeling steps:


<ol start="0">
  <li><a href="#install">Install</a></li>
  <li><a href="#load-data">Load the data</a></li>
  <li><a href="#configure-model">Configure the model</a></li>
  <li><a href="#model-diagnostics">Run model diagnostics</a></li>
  <li><a href="#generate-summary">Generate model results & two-page output</a></li>
  <li><a href="#generate-optimize">Run budget optimization & two-page output</a></li>
  <li><a href="#save-model">Save the model object</a></li>
</ol>


This notebook utilizes Synthetic Data Generated for TireRack. As a result, the numbers and results obtained might not accurately reflect what you encounter when working with a real dataset.

<a name="install"></a>
## Step 0: Install

In [None]:
github_token = 'Your-Github-Token: ghp_...'  #@param {type: "string"}

In [None]:
# Authenticate with Google Cloud - This is not supported in Colab Enterprise, WARNING is ok
from google.colab import auth
import google.auth
from google.auth import impersonated_credentials
import google.auth.transport.requests

auth.authenticate_user()
source_credentials, _ = google.auth.default()
request = google.auth.transport.requests.Request()
source_credentials.refresh(request)
source_credentials.apply(headers = {'user-agent': 'cloud-solutions/meridian-mmm-deploy-v1.0'})
source_credentials.refresh(request)
if source_credentials.valid:
  print('Authenticated')
else:
  print('Authentication failed')

# Set your Google Cloud project ID
project_id = 'Your project ID'  # Replace with your project ID #@param {type: "string"}
!gcloud config set project {project_id}

# GCS bucket to store Meridian Processing Files
bucket_name = 'Your Bucket name ex. meridian_data'  #@param {type: "string"}


3\. Install the latest version of Meridian, and verify that GPU is available.

In [None]:
# Install meridian
!pip install --upgrade git+https://{github_token}@github.com/google/meridian.git

# (optional)If you want to use simulated data, you may clone the meridian repo
!git clone https://{github_token}@github.com/google/meridian.git

In [None]:
import numpy as np
import pandas as pd
import pandas_gbq
import tensorflow as tf
import tensorflow_probability as tfp
import arviz as az

import IPython

from meridian import constants
from meridian.data import load
from meridian.data import test_utils
from meridian.model import model
from meridian.model import spec
from meridian.model import prior_distribution
from meridian.analysis import optimizer
from meridian.analysis import analyzer
from meridian.analysis import visualizer
from meridian.analysis import summarizer
from meridian.analysis import formatter

from google.colab import files
from google.cloud import storage

# check if GPU is available
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
print("Num CPUs Available: ", len(tf.config.experimental.list_physical_devices('CPU')))

Your runtime has 89.6 gigabytes of available RAM

Num GPUs Available:  1
Num CPUs Available:  1


In [None]:
# Common functions to execute, check and validate

def check_bucket_exists(bucket_name):
  storage_client = storage.Client(project=project_id, client_info=ClientInfo(user_agent='cloud-solutions/meridian-mmm-deploy-v1.0'))
  return storage_client.bucket(bucket_name).exists()

In [None]:
#upload a CSV file from your local machine to colab for processing and load this raw file into GCS
uploaded = files.upload()
file_to_upload ='/content/' +  list(uploaded.keys())[0]
uploaded_file_name = list(uploaded.keys())[0]

if not check_bucket_exists(bucket_name):
    !gsutil mb -l us-central1 gs://{bucket_name}

!gsutil cp {file_to_upload} gs://{bucket_name}/{uploaded_file_name}

print('Uploaded files in GCS')
!gsutil ls gs://{bucket_name}


Saving BD-MMM-Modified-6.csv to BD-MMM-Modified-6 (1).csv
/bin/bash: -c: line 1: syntax error near unexpected token `('
/bin/bash: -c: line 1: `gsutil cp /content/BD-MMM-Modified-6 (1).csv gs://meridian_data/BD-MMM-Modified-6 (1).csv'
Uploaded files in GCS
gs://meridian_data/BD-MMM-Modified-3.csv
gs://meridian_data/BD-MMM-Modified-4.csv
gs://meridian_data/BD-MMM-Modified-5.csv
gs://meridian_data/BD-MMM-Modified.csv.orig


<a name="load-data"></a>
## Step 1: Load the data

Load the TireRack or the simulated dataset in CSV format as follows.

1\. Map the column names to their corresponding variable types. For example, the column names 'GQV', 'Discount', and 'Competitor_Sales' are mapped to `controls`. The required variable types are `time`, `controls`, `population`, `kpi`, `revenue_per_kpi`, `media` and `spend`. For the definition of each variable, see
[Collect and organize your data](https://developers.google.com/meridian/docs/user-guide/collect-data).

In [None]:
# A mapping between the desired and actual column names in the input data.

""" Notes:
Column Definitions:
   `geo`, `time`, `kpi`, `revenue_per_kpi`, `population` (single column)
   `controls` (multiple columns)
   (1) `media`, `media_spend` (multiple columns)
   (2) `reach`, `frequency`, `rf_spend` (multiple columns)

   - Time is sequential - ascending order of dates
   - Using demo values for the control variables. Measures effects of these controls on KPIs. with 0 as value the model does not converge.
   - revenue_per_kpi - tirerack file has multiple revenue-per_kpi columns. added all of then to represent this column.
   - Column names in CSV should match coord_to_columns, see below.
   - TireRack data here does not contain reach and frequency, recommend to have these for better analysis.
   - using only 6 channels for tirerack data. The model does not converge using 9 channels as in the csv file
   - Need details on the data definition and shapes to automate
"""
coord_to_columns = load.CoordToColumns(
    time='time',
    geo='geo',
    controls=['GQV', 'Discount', 'Competitor_Sales'],
    population='population',
    kpi='conversions',
    revenue_per_kpi='revenue_per_conversion',
    media=[
        'facebook_impression',
        'faceboo_whatapp_spend',
        'tiktok_impression',
        'googledv360_impression',
        'tvchannelCNBC_impression',
        'CinemaMovieTeater_impression',
    ],
    media_spend=[
        'facebook_spend',
        'youtube_spend',
        'tiktok_spend',
        'googledv360_spend',
        'tvchannelCNBC_spend',
        'CinemaMovieTeater_spend',
    ],
)

In [None]:
# Map the media variables and the media spends to the designated channel names intended for display in the two-page HTML output.
correct_media_to_channel = {
    'Channel0_impression': 'facebook',
    'Channel1_impression': 'whatsapp',
    'Channel2_impression': 'Channel_2',
    'Channel3_impression': 'Channel_3',
    'Channel4_impression': 'Channel_4',
    'Channel5_impression': 'Channel_5',
}
correct_media_spend_to_channel = {
    'Channel0_spend': 'Channel_0',
    'Channel1_spend': 'Channel_1',
    'Channel2_spend': 'Channel_2',
    'Channel3_spend': 'Channel_3',
    'Channel4_spend': 'Channel_4',
    'Channel5_spend': 'facebook',
}

In [None]:
# Load the CSV data using CsvDataLoader. Note that csv_path is the path to the data file location.
loader = load.CsvDataLoader(
    csv_path="/content/BD-MMM-Modified-6.csv",
    kpi_type='non_revenue',
    coord_to_columns=coord_to_columns,
    media_to_channel=correct_media_to_channel,
    media_spend_to_channel=correct_media_spend_to_channel,
)
data = loader.load()

# Load the CSV data using CsvDataLoader. Note that csv_path is the path to the data file location.
loader = load.CsvDataLoader(
    csv_path="gs://meridian2year101424/Meridian_Data_12.11.24csv.csv",
    kpi_type='non_revenue',
    coord_to_columns=coord_to_columns,
    media_to_channel=correct_media_to_channel,
    media_spend_to_channel=correct_media_spend_to_channel,
)
data = loader.load()

In [None]:
print(data)

InputData(kpi=<xarray.DataArray 'kpi' (geo: 1, time: 156)> Size: 1kB
array([[379693380, 357318530, 414199070, 407065540, 343760220, 425020130,
        457814240, 400064740, 338602500, 357265700, 411695780, 387659650,
        437840220, 439090600, 398970620, 404387070, 392051260, 433768450,
        395985300, 428536480, 379184350, 393646200, 357142080, 418816670,
        415547550, 415660060, 409616540, 364261570, 369517120, 448275460,
        362465660, 370611420, 439771330, 409001630, 380875520, 413250080,
        407499520, 404942700, 358461300, 379114000, 373319400, 376140060,
        417934980, 432121440, 434410270, 358719360, 407049380, 438124060,
        425296900, 386151940, 449264930, 410576160, 368908400, 431871870,
        389534400, 408580350, 406623800, 472384260, 449794270, 432928830,
        438586000, 467027360, 428283040, 398012030, 365261760, 430634140,
        376971870, 364065730, 432132300, 437757570, 385603800, 397800200,
        396950980, 397607070, 425400900, 39

<a name="configure-model"></a>
## Step 2: Configure the model

Meridian uses Bayesian framework and Markov Chain Monte Carlo (MCMC) algorithms to sample from the posterior distribution.

1\. Inititalize the `Meridian` class by passing the loaded data and the customized model specification. One advantage of Meridian lies in its capacity to calibrate the model directly through ROI priors, as described in [Media Mix Model Calibration With Bayesian Priors](https://research.google/pubs/media-mix-model-calibration-with-bayesian-priors/). In this particular example, the ROI priors for all media channels are identical, with each being represented as Lognormal(0.2, 0.9).

In [None]:
roi_mu = 0.2     # Mu for ROI prior for each media channel.
roi_sigma = 0.9  # Sigma for ROI prior for each media channel.
prior = prior_distribution.PriorDistribution(
    roi_m=tfp.distributions.LogNormal(roi_mu, roi_sigma, name=constants.ROI_M)
)
model_spec = spec.ModelSpec(prior=prior)

mmm = model.Meridian(input_data=data, model_spec=model_spec)



In [None]:
%%time
mmm.sample_prior(500)
mmm.sample_posterior(n_chains=7, n_adapt=500, n_burnin=500, n_keep=1000)




CPU times: user 5min 37s, sys: 3.99 s, total: 5min 41s
Wall time: 5min 43s




For more information about configuring the parameters and using a customized model specification, such as setting different ROI priors for each media channel, see [Configure the model](https://developers.google.com/meridian/docs/user-guide/configure-model).

<a name="model-diagnostics"></a>
## Step 3: Run model diagnostics

After the model is built, you must assess convergence, debug the model if needed, and then assess the model fit.

1\. Assess convergence. Run the following code to generate r-hat statistics. R-hat close to 1.0 indicate convergence. R-hat < 1.2 indicates approximate convergence and is a reasonable threshold for many problems.

In [None]:
model_diagnostics = visualizer.ModelDiagnostics(mmm)
model_diagnostics.plot_rhat_boxplot()

2\. Assess the model's fit by comparing the expected sales against the actual sales.

In [None]:
model_fit = visualizer.ModelFit(mmm)
model_fit.plot_model_fit()



For more information and additional model diagnostics checks, see [Modeling diagnostics](https://developers.google.com/meridian/docs/user-guide/model-diagnostics).

<a name="generate-summary"></a>
## Step 4: Generate model results & two-page output

To export the two-page HTML summary output, initialize the `Summarizer` class with the model object. Then pass in the filename, filepath, start date, and end date to `output_model_results_summary` to run the summary for that time duration and save it to the specified file.

In [None]:
mmm_summarizer = summarizer.Summarizer(mmm)

In [None]:
filepath = '/content/track-output'
start_date = '2021-01-25'
end_date = '2024-01-15'
mmm_summarizer.output_model_results_summary('summary_output.html', filepath, start_date, end_date)



Here is a preview of the two-page output based on the simulated data:

In [None]:
IPython.display.HTML(filename='/content/track-output/summary_output.html')

Dataset,R-squared,MAPE,wMAPE
All Data,0.8,5%,5%


For a customized two-page report, model results summary table, and individual visualizations, see [Model results report](https://developers.google.com/meridian/docs/user-guide/generate-model-results-report) and [plot media visualizations](https://developers.google.com/meridian/docs/user-guide/plot-media-visualizations).





<a name="generate-optimize"></a>
## Step 5: Run budget optimization & generate an optimization report

You can choose what scenario to run for the budget allocation. In default scenario, you find the optimal allocation across channels for a given budget to maximize the return on investment (ROI).

1\. Instantiate the `BudgetOptimizer` class and run the `optimize()` method without any customization, to run the default library's Fixed Budget Scenario to maximize ROI.

In [None]:
%%time
budget_optimizer = optimizer.BudgetOptimizer(mmm)
optimization_results = budget_optimizer.optimize()

CPU times: user 3min 16s, sys: 5.01 s, total: 3min 21s
Wall time: 3min 18s


2\. Export the 2-page HTML optimization report, which contains optimized spend allocations and ROI.

In [None]:
filepath = '/content/track-output'
optimization_results.output_optimization_summary('optimization_output.html', filepath)

In [None]:
IPython.display.HTML(filename='/content/track-output/optimization_output.html')

Channel,Non-optimized spend,Optimized spend
Shopping,97%,96%
Search,1%,2%
Spotify,0%,0%
Facebook,0%,0%
SirusXM,0%,0%
MNTN,0%,0%


For information about customized optimization scenarios, such as flexible budget scenarios, see [Budget optimization scenarios](https://developers.google.com/meridian/docs/user-guide/budget-optimization-scenarios). For more information about optimization results summary and individual visualizations, see [optimization results output](https://developers.google.com/meridian/docs/user-guide/generate-optimization-results-output) and [optimization visualizations](https://developers.google.com/meridian/docs/user-guide/plot-optimization-visualizations).

<a name="save-model"></a>
## Step 6: Save the model object

We recommend that you save the model object for future use. This helps you to  avoid repetitive model runs and saves time and computational resources. After the model object is saved, you can load it at a later stage to continue the analysis or visualizations without having to re-run the model.


Run the following codes to save the model object:

In [None]:
file_path='/content/track-output/saved_mmm.pkl'
model.save_mmm(mmm, file_path)

Run the following codes to load the saved model:

In [None]:
mmm = model.load_mmm(file_path)