<a href="https://colab.research.google.com/github/Dannah-TMP/MMM/blob/main/demo/Meridian_RF_Demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google/meridian/blob/main/demo/Meridian_RF_Demo.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/google/meridian/blob/main/demo/Meridian_RF_Demo.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

# **Meridian Reach and Frequency Demo**

Welcome to the Meridian end-to-end demo for Reach and Frequency data. This simplified demo showcases the fundamental functionalities and basic usage of the library on data containing Reach and Frequency channels, including working examples of the major modeling steps:


<ol start="0">
  <li><a href="#install">Install</a></li>
  <li><a href="#load-data">Load the data</a></li>
  <li><a href="#configure-model">Configure the model</a></li>
  <li><a href="#model-diagnostics">Run model diagnostics</a></li>
  <li><a href="#generate-summary">Generate model results & two-page output</a></li>
  <li><a href="#generate-optimize">Run budget optimization & two-page output</a></li>
  <li><a href="#save-model">Save the model object</a></li>
</ol>


Note that this notebook skips all of the exploratory data analysis and preprocessing steps. It assumes that you have completed these tasks before reaching this point in the demo.

This notebook utilizes sample data. As a result, the numbers and results obtained might not accurately reflect what you encounter when working with a real dataset.

<a name="install"></a>
## Step 0: Install

1\. Make sure you are using one of the available GPU Colab runtimes which is **required** to run Meridian. You can change your notebook's runtime in `Runtime > Change runtime type` in the menu. All users can use the T4 GPU runtime which is sufficient to run the demo colab, free of charge. Users who have purchased one of Colab's paid plans have access to premium GPUs (such as V100, A100 or L4 Nvidia GPU).

2\. Install the latest version of Meridian, and verify that GPU is available.

In [None]:
# Install meridian: from PyPI @ latest release
!pip install --upgrade google-meridian[colab,and-cuda]

# Install meridian: from PyPI @ specific version
# !pip install google-meridian[colab,and-cuda]==1.0.3

# Install meridian: from GitHub @HEAD
# !pip install --upgrade "google-meridian[colab,and-cuda] @ git+https://github.com/google/meridian.git"

In [None]:
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow_probability as tfp
import arviz as az

import IPython

from meridian import constants
from meridian.data import load
from meridian.data import test_utils
from meridian.model import model
from meridian.model import spec
from meridian.model import prior_distribution
from meridian.analysis import optimizer
from meridian.analysis import analyzer
from meridian.analysis import visualizer
from meridian.analysis import summarizer
from meridian.analysis import formatter

# check if GPU is available
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
print("Num CPUs Available: ", len(tf.config.experimental.list_physical_devices('CPU')))

<a name="load-data"></a>
## Step 1: Load the data

Load the [simulated dataset in CSV format](https://github.com/google/meridian/blob/main/meridian/data/simulated_data/csv/geo_media_rf.csv) as follows.

1\. Map the column names to their corresponding variable types. For example, the column names 'GQV' and 'Competitor_Sales' are mapped to `controls`. The required variable types are `time`, `controls`, `population`, `kpi`, `revenue_per_kpi`, `media` and `spend`. If your data includes organic media or non-media treatments, you can add them using `organic_media` and `non_media_treatments` arguments. For the definition of each variable, see
[Collect and organize your data](https://developers.google.com/meridian/docs/user-guide/collect-data).

In [None]:

# Import the library to mount Google Drive
from google.colab import drive
# Mount the Google Drive at /content/drive
drive.mount('/content/drive')

In [None]:
import pandas as pd
from google.colab import drive

# Import the library to mount Google Drive
# Mount the Google Drive at /content/drive'
# This line might already be executed above, but keep it for self-containment
# drive.mount('/content/drive')

# 1. Load the CSV file
file_path = "/content/drive/MyDrive/MMM_Meridian.csv"
df = pd.read_csv(file_path)

# 2. Clean column names (remove spaces and dots)
df.columns = df.columns.str.strip()                      # Remove extra spaces
df.columns = df.columns.str.replace('.', '', regex=False)  # Remove periods
df.columns = df.columns.str.replace(' ', '_')            # Replace spaces with underscores

# 3. Clean and convert numeric columns
df['Impression'] = df['Impression'].astype(str).str.replace(',', '')   # Remove commas from 'Impr'
df['Impression'] = pd.to_numeric(df['Impression'], errors='coerce')    # Convert 'Impr' to numeric
df['Conversions'] = pd.to_numeric(df['Conversions'], errors='coerce')
df['Cost'] = pd.to_numeric(df['Cost'], errors='coerce')
df['Week_(Mon_to_Sun)'] = pd.to_datetime(df['Week_(Mon_to_Sun)'], format='%d/%m/%Y', errors='coerce')
# Keep the datetime format for now to ensure consistent weekly intervals
# df['Week_(Mon_to_Sun)'] = df['Week_(Mon_to_Sun)'].dt.strftime('%Y-%m-%d') # Convert later if needed

df["Campaign_type"] = df["Campaign_type"].replace('Performance_Max', 'PMAX', regex=True)
df["Campaign_type"] = df["Campaign_type"].replace('Demand_Gen', 'DemandGen', regex=True)

print(df["Campaign_type"].unique())

# Add cleaning for GQV and Impression_share here
# Assuming GQV might have commas or non-numeric characters
if 'GQV' in df.columns:
    df['GQV'] = df['GQV'].astype(str).str.replace(',', '', regex=False)
    df['GQV'] = pd.to_numeric(df['GQV'], errors='coerce') # Use coerce to turn errors into NaN

# Assuming Impression_share might be percentages like '10%'
if 'Impression_share' in df.columns:
    df['Impression_share'] = df['Impression_share'].astype(str).str.replace('%', '', regex=False)
    df['Impression_share'] = pd.to_numeric(df['Impression_share'], errors='coerce') # Use coerce

# 4. Create a new column for Conversion Rate
# Note: This Conversion Rate is campaign-specific based on the original data structure
# Avoid division by zero
df['Conversion_Rate'] = df['Conversions'] / df['Impression'].replace(0, pd.NA) # Use pd.NA to get nullable float

# 5. Select needed columns, including new ones (assumed to be non-campaign specific)
df_subset = df[['Week_(Mon_to_Sun)', 'Region_(Matched)', 'Campaign_type',
                'Impression', 'Conversions', 'Conversion_Rate', 'Cost',
                'GQV', 'Population', 'Impression_share', 'Revenue']]

# 6. Create a pivot table for campaign-type metrics
# Use .stack() and .unstack() pattern for more control and handling missing combinations
# Group by Week and Region first, then pivot Campaign_type
pivot_table_processed = df_subset.groupby(['Week_(Mon_to_Sun)', 'Region_(Matched)', 'Campaign_type'])[
    ['Impression', 'Conversions', 'Conversion_Rate', 'Cost']
].sum().unstack(fill_value=0) # Use fill_value=0 for unstacking to handle missing campaign types

# Flatten the column names
pivot_table_processed.columns = [f"{col[1]}_{col[0]}" for col in pivot_table_processed.columns]

# 6b. Aggregate non-campaign-specific columns separately
# Note: GQV and Impression_share are assumed to be non-campaign specific and aggregated here
extras = df_subset.groupby(['Week_(Mon_to_Sun)', 'Region_(Matched)'])[
    ['GQV', 'Population', 'Impression_share', 'Revenue']
].sum() # Stay grouped for easier merge later

total_conversions = df_subset.groupby(['Week_(Mon_to_Sun)', 'Region_(Matched)'])[ 'Conversions' ].sum().rename('Total_Conversions') # Stay grouped and rename

# Combine processed pivot table and extras
combined_df = pivot_table_processed.join([extras, total_conversions], how='left')

# 7. Reset index to turn grouped columns into regular columns
combined_df = combined_df.reset_index()
combined_df = combined_df.rename(columns={'Week_(Mon_to_Sun)': 'Week'})

# 8. Ensure complete time series and region combinations

# Get all unique regions
all_regions = combined_df['Region_(Matched)'].unique()

# Get the min and max dates to create a full date range
min_date = combined_df['Week'].min()
max_date = combined_df['Week'].max()

# Create a complete weekly date range
# Assuming the data is weekly, start from the first Monday/Sunday and end on the last
full_date_range = pd.date_range(start=min_date, end=max_date, freq='W-MON') # Assuming weekly data starts on Monday

# Create a complete grid of all week and region combinations
full_grid = pd.MultiIndex.from_product([full_date_range, all_regions], names=['Week', 'Region_(Matched)']).to_frame(index=False)

# Merge the combined data onto the full grid
# Use a left merge to keep all combinations from the full_grid
pivot_table = full_grid.merge(combined_df, on=['Week', 'Region_(Matched)'], how='left')

# 9. Fill NaN values after ensuring all week/region combinations exist
# Define columns that should be filled with 0
zero_fill_columns = [col for col in pivot_table.columns if any(metric in col for metric in ['Impression', 'Cost', 'Conversions'])] + ['GQV', 'Impression_share', 'Revenue', 'Total_Conversions', 'Population'] # Include other columns that should be 0 if missing

for col in zero_fill_columns:
    if col in pivot_table.columns:
        pivot_table[col] = pivot_table[col].fillna(0)

# Conversion Rate needs special handling - fill with 0 where Impressions are 0, else NaN can remain or be filled carefully
# Recompute Conversion Rate where Impression is 0 after filling impressions with 0
for campaign in df["Campaign_type"].unique():
    imp_col = f"{campaign}_Impression"
    cr_col = f"{campaign}_Conversion_Rate"
    if imp_col in pivot_table.columns and cr_col in pivot_table.columns:
        # Recalculate CR where it might have been NaN initially due to 0 impressions before fillna
        pivot_table[cr_col] = pivot_table[cr_col].fillna(
            pivot_table[cr_col].where(pivot_table[imp_col] != 0, 0)
        )
        pivot_table[cr_col] = pivot_table[cr_col].fillna(0) # Fill any remaining NaNs

# Convert 'Week' back to string format expected by Meridian if necessary, after ensuring regularity
pivot_table['Week'] = pivot_table['Week'].dt.strftime('%Y-%m-%d')

# 10. Final reorder
# Identify columns dynamically
all_cols = pivot_table.columns.tolist()
fixed_cols = ['Week', 'Region_(Matched)']
dynamic_cols = sorted([col for col in all_cols if col not in fixed_cols and col != 'Total_Conversions'])

ordered_columns = fixed_cols + dynamic_cols + ['Total_Conversions']

# Ensure all ordered columns are actually in the dataframe before reordering
ordered_columns = [col for col in ordered_columns if col in pivot_table.columns]
pivot_table = pivot_table[ordered_columns]


# 11. Preview
print(pivot_table.head())
print(pivot_table['Week'].nunique(), "unique weeks")
print(pivot_table['Region_(Matched)'].nunique(), "unique regions")
print(pivot_table.shape, "rows, columns")

# 12. Save to CSV
output_path = '/content/drive/MyDrive/cleaned_meridian.csv'
pivot_table.to_csv(output_path, index=False)

2\. Map the media variables and the media spends to the designated channel names intended for display in the two-page HTML output. In the following example,  'Channel0_impression' and 'Channel0_spend' are connected to the same channel, 'Channel0'.

In [None]:
coord_to_columns = load.CoordToColumns(
    time='Week',
    geo='Region_(Matched)',
    controls=['GQV', 'Impression_share'],
    population='Population',
    kpi='Total_Conversions',
    revenue_per_kpi='Revenue',
    media=[
        'Search_Impression',
        'PMAX_Impression',
        'Display_Impression',
        'DemandGen_Impression'
    ],
    media_spend=[
        'Search_Cost',
        'PMAX_Cost',
        'Display_Cost',
        'DemandGen_Cost'
    ]

)

In [None]:
correct_media_to_channel = {
 'Search_Impression': 'Search',
    'PMAX_Impression': 'Performance_Max',
    'Display_Impression': 'Display',
    'DemandGen_Impression': 'Demand_Gen',
}
correct_media_spend_to_channel = {
    'Search_Cost': 'Search',
    'PMAX_Cost': 'Performance_Max',
    'Display_Cost': 'Display',
    'DemandGen_Cost': 'Demand_Gen',
}

3\. Load the CSV data using `CsvDataLoader`. Note that `csv_path` is the path to the data file location.

In [None]:
loader = load.CsvDataLoader(
    csv_path='/content/drive/MyDrive/cleaned_meridian.csv',
    kpi_type='non_revenue',
    coord_to_columns=coord_to_columns,
    media_to_channel=correct_media_to_channel,
    media_spend_to_channel=correct_media_spend_to_channel,
)
data = loader.load()

Note that the simulated data here contains reach and frequency channels. We recommend including reach and frequency data whenever they are available. For information about the advantages of utilizing reach and frequency, see [Bayesian Hierarchical Media Mix Model Incorporating Reach and Frequency Data](https://research.google/pubs/bayesian-hierarchical-media-mix-model-incorporating-reach-and-frequency-data/#:~:text=By%20incorporating%20R%26F%20into%20MMM,based%20on%20optimal%20frequency%20recommendations.).

<a name="configure-model"></a>
## Step 2: Configure the model

Meridian uses Bayesian framework and Markov Chain Monte Carlo (MCMC) algorithms to sample from the posterior distribution.

1\. Inititalize the `Meridian` class by passing the loaded data and the customized model specification. One advantage of Meridian lies in its capacity to calibrate the model directly through ROI priors, as described in [Media Mix Model Calibration With Bayesian Priors](https://research.google/pubs/media-mix-model-calibration-with-bayesian-priors/). In this particular example, the ROI priors for all media channels are identical, with each being represented as Lognormal(0.2, 0.9).

In [None]:
roi_rf_mu = 0.2     # Mu for ROI prior for each RF channel.
roi_rf_sigma = 0.9  # Sigma for ROI prior for each RF channel.
prior = prior_distribution.PriorDistribution(
    roi_rf=tfp.distributions.LogNormal(roi_rf_mu, roi_rf_sigma, name=constants.ROI_RF)
)
model_spec = spec.ModelSpec(prior=prior)

mmm = model.Meridian(input_data=data, model_spec=model_spec)

2\. Use the `sample_prior()` and `sample_posterior()` methods to obtain samples from the prior and posterior distributions of model parameters. If you are using the T4 GPU runtime this step may take about 10 minutes for the provided data set.

In [None]:
%%time
mmm.sample_prior(500)
mmm.sample_posterior(n_chains=10, n_adapt=2000, n_burnin=500, n_keep=500, seed=1)

For more information about configuring the parameters and using a customized model specification, such as setting different ROI priors for each media channel, see [Configure the model](https://developers.google.com/meridian/docs/user-guide/configure-model).

<a name="model-diagnostics"></a>
## Step 3: Run model diagnostics

After the model is built, you must assess convergence, debug the model if needed, and then assess the model fit.

1\. Assess convergence. Run the following code to generate r-hat statistics. R-hat close to 1.0 indicate convergence. R-hat < 1.2 indicates approximate convergence and is a reasonable threshold for many problems.

In [None]:
model_diagnostics = visualizer.ModelDiagnostics(mmm)
model_diagnostics.plot_rhat_boxplot()

2\. Assess the model's fit by comparing the expected sales against the actual sales.

In [None]:
model_fit = visualizer.ModelFit(mmm)
model_fit.plot_model_fit()

For more information and additional model diagnostics checks, see [Modeling diagnostics](https://developers.google.com/meridian/docs/user-guide/model-diagnostics).

<a name="generate-summary"></a>
## Step 4: Generate model results & two-page output

To export the two-page HTML summary output, initialize the `Summarizer` class with the model object. Then pass in the filename, filepath, start date, and end date to `output_model_results_summary` to run the summary for that time duration and save it to the specified file.

In [None]:
mmm_summarizer = summarizer.Summarizer(mmm)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
filepath = '/content/drive/MyDrive'
start_date = '2021-01-25'
end_date = '2024-01-15'
mmm_summarizer.output_model_results_summary('summary_output.html', filepath, start_date, end_date)

Here is a preview of the two-page output based on the simulated data:

In [None]:
IPython.display.HTML(filename='/content/drive/MyDrive/summary_output.html')

For a customized two-page report, model results summary table, and individual visualizations, see [Model results report](https://developers.google.com/meridian/docs/user-guide/generate-model-results-report) and [plot media visualizations](https://developers.google.com/meridian/docs/user-guide/plot-media-visualizations).





<a name="generate-optimize"></a>
## Step 5: Run budget optimization & generate an optimization report

You can choose what scenario to run for the budget allocation. In default scenario, you find the optimal allocation across channels for a given budget to maximize the return on investment (ROI).

1\. Instantiate the `BudgetOptimizer` class and run the `optimize()` method without any customization, to run the default library's Fixed Budget Scenario to maximize ROI.

In [None]:
%%time
budget_optimizer = optimizer.BudgetOptimizer(mmm)
optimization_results = budget_optimizer.optimize()

2\. Export the 2-page HTML optimization report, which contains optimized spend allocations and ROI.

In [None]:
filepath = '/content/drive/MyDrive'
optimization_results.output_optimization_summary('optimization_output.html', filepath)

In [None]:
IPython.display.HTML(filename='/content/drive/MyDrive/optimization_output.html')

For information about customized optimization scenarios, such as flexible budget scenarios, see [Budget optimization scenarios](https://developers.google.com/meridian/docs/user-guide/budget-optimization-scenarios). For more information about optimization results summary and individual visualizations, see [optimization results output](https://developers.google.com/meridian/docs/user-guide/generate-optimization-results-output) and [optimization visualizations](https://developers.google.com/meridian/docs/user-guide/plot-optimization-visualizations).

<a name="save-model"></a>
## Step 6: Save the model object

We recommend that you save the model object for future use. This helps you to  avoid repetitive model runs and saves time and computational resources. After the model object is saved, you can load it at a later stage to continue the analysis or visualizations without having to re-run the model.


Run the following codes to save the model object:

In [None]:
file_path='/content/drive/MyDrive/saved_mmm.pkl'
model.save_mmm(mmm, file_path)

Run the following codes to load the saved model:

In [None]:
mmm = model.load_mmm(file_path)