<a href="https://colab.research.google.com/github/daeexe/Meridian/blob/dev/demo/Meridian_RF_Demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google/meridian/blob/main/demo/Meridian_RF_Demo.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/google/meridian/blob/main/demo/Meridian_RF_Demo.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

# **Meridian Reach and Frequency Demo**

Welcome to the Meridian end-to-end demo for Reach and Frequency data. This simplified demo showcases the fundamental functionalities and basic usage of the library on data containing Reach and Frequency channels, including working examples of the major modeling steps:


<ol start="0">
  <li><a href="#install">Install</a></li>
  <li><a href="#load-data">Load the data</a></li>
  <li><a href="#configure-model">Configure the model</a></li>
  <li><a href="#model-diagnostics">Run model diagnostics</a></li>
  <li><a href="#generate-summary">Generate model results & two-page output</a></li>
  <li><a href="#generate-optimize">Run budget optimization & two-page output</a></li>
  <li><a href="#save-model">Save the model object</a></li>
</ol>


Note that this notebook skips all of the exploratory data analysis and preprocessing steps. It assumes that you have completed these tasks before reaching this point in the demo.

This notebook utilizes sample data. As a result, the numbers and results obtained might not accurately reflect what you encounter when working with a real dataset.

<a name="install"></a>
## Step 0: Install

1\. Make sure you are using one of the available GPU Colab runtimes which is **required** to run Meridian. You can change your notebook's runtime in `Runtime > Change runtime type` in the menu. All users can use the T4 GPU runtime which is sufficient to run the demo colab, free of charge. Users who have purchased one of Colab's paid plans have access to premium GPUs (such as V100, A100 or L4 Nvidia GPU).

2\. Install the latest version of Meridian, and verify that GPU is available.

In [None]:
# Install meridian: from PyPI @ latest release
!pip install --upgrade google-meridian[colab,and-cuda]

# Install meridian: from PyPI @ specific version
# !pip install google-meridian[colab,and-cuda]==1.0.3

# Install meridian: from GitHub @HEAD
# !pip install --upgrade "google-meridian[colab,and-cuda] @ git+https://github.com/google/meridian.git"

In [None]:
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow_probability as tfp
import arviz as az

import IPython

from meridian import constants
from meridian.data import load
from meridian.data import test_utils
from meridian.model import model
from meridian.model import spec
from meridian.model import prior_distribution
from meridian.analysis import optimizer
from meridian.analysis import analyzer
from meridian.analysis import visualizer
from meridian.analysis import summarizer
from meridian.analysis import formatter

# check if GPU is available
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
print("Num CPUs Available: ", len(tf.config.experimental.list_physical_devices('CPU')))

<a name="load-data"></a>
## Step 1: Load the data

Load the [simulated dataset in CSV format](https://github.com/google/meridian/blob/main/meridian/data/simulated_data/csv/geo_media_rf.csv) as follows.

1\. Map the column names to their corresponding variable types. For example, the column names 'GQV' and 'Competitor_Sales' are mapped to `controls`. The required variable types are `time`, `controls`, `population`, `kpi`, `revenue_per_kpi`, `media` and `spend`. If your data includes organic media or non-media treatments, you can add them using `organic_media` and `non_media_treatments` arguments. For the definition of each variable, see
[Collect and organize your data](https://developers.google.com/meridian/docs/user-guide/collect-data).

In [None]:

# Import the library to mount Google Drive
from google.colab import drive
# Mount the Google Drive at /content/drive
drive.mount('/content/drive')

In [None]:
import pandas as pd
from google.colab import drive

# 1. Load the CSV file
file_path = "/content/drive/MyDrive/MMM_Meridian.csv"
df = pd.read_csv(file_path)

# Print the raw column names to diagnose the issue
print("Raw column names:", df.columns.tolist())

# 2. Clean column names (remove spaces and dots)
# Let's be more robust in cleaning, including newlines and multiple spaces
df.columns = df.columns.str.strip()                      # Remove extra spaces
df.columns = df.columns.str.replace('\n', ' ', regex=False) # Replace newlines with spaces
df.columns = df.columns.str.replace('.', '', regex=False)  # Remove periods
df.columns = df.columns.str.replace(' ', '_')            # Replace spaces with underscores

# Print cleaned column names
print("Cleaned column names:", df.columns.tolist())


# 3. Clean and convert numeric columns
# Use try-except blocks or check if columns exist before processing
# Based on the printed cleaned names, the numeric columns are campaign-specific
# and also 'Population', 'Monetary_Value', 'Brand_GQV', 'Generic_GQV', 'SOV', 'Total_Conversions'
numeric_cols_to_clean = [
    'Population', 'Monetary_Value', 'Brand_GQV', 'Generic_GQV', 'SOV', 'Total_Conversions'
]

# Add campaign-specific metric columns dynamically
campaign_types_in_data = ['Display', 'Display_Plug', 'Display_STB', 'Facebook', 'Facebook_Awareness', 'Facebook_Consideration', 'Facebook_Conversion', 'Facebook_Share_to_Buy', 'Google_Demand_Gen', 'Google_Performance_Max', 'Google_Search', 'TikTok', 'YouTube'] # Based on cleaned column names
metrics_in_data = ['Impressions', 'Media_Cost', 'Total_Conversions', 'Total_Conversions_CPA'] # Based on cleaned column names

for campaign in campaign_types_in_data:
    for metric in metrics_in_data:
        col_name = f"{campaign}_{metric}"
        if col_name in df.columns:
            numeric_cols_to_clean.append(col_name)

# Remove duplicates and ensure columns exist
numeric_cols_to_clean = list(set(numeric_cols_to_clean))
numeric_cols_to_clean = [col for col in numeric_cols_to_clean if col in df.columns]

print("Numeric columns to clean:", numeric_cols_to_clean)

for col in numeric_cols_to_clean:
    try:
        df[col] = df[col].astype(str).str.replace(',', '', regex=False) # Remove commas
        df[col] = df[col].replace('--', pd.NA) # Handle '--' or other non-numeric strings
        df[col] = pd.to_numeric(df[col], errors='coerce')    # Convert to numeric, turn errors into NaN
    except Exception as e:
        print(f"Could not process column {col}: {e}")


# Clean and convert the date column
# Based on the printed raw column names, the date column is 'Week Start Date'
date_column_name_raw = 'Week Start Date'
date_column_name_cleaned = 'Week_Start_Date' # After cleaning

date_col_found = False
if date_column_name_cleaned in df.columns:
    try:
        df[date_column_name_cleaned] = pd.to_datetime(df[date_column_name_cleaned], errors='coerce')
        # Rename to 'Week' as expected by Meridian
        df = df.rename(columns={date_column_name_cleaned: 'Week'})
        date_col_found = True
        print(f"Successfully processed date column: {date_column_name_cleaned}")
    except Exception as e:
         print(f"Could not process date column {date_column_name_cleaned}: {e}")
elif date_column_name_raw in df.columns:
     # This case should not happen if cleaning works, but as a fallback
     try:
        df[date_column_name_raw] = pd.to_datetime(df[date_column_name_raw], errors='coerce')
        df = df.rename(columns={date_column_name_raw: 'Week'})
         # Check if the cleaned name is now in columns after renaming from raw
        if 'Week_Start_Date' in df.columns:
            df = df.rename(columns={'Week_Start_Date': 'Week'})
        date_col_found = True
        print(f"Successfully processed date column: {date_column_name_raw}")
     except Exception as e:
         print(f"Could not process date column {date_column_name_raw}: {e}")


if not date_col_found:
    print(f"Could not find or process date column. Looked for '{date_column_name_raw}' and '{date_column_name_cleaned}'.")


# Assuming 'Region' needs to be renamed to 'Region_(Matched)'
region_column_name_raw = 'Region'
region_column_name_cleaned = 'Region' # After cleaning, if no spaces
region_target_name = 'Region_(Matched)'

if region_column_name_cleaned in df.columns:
    if region_column_name_cleaned != region_target_name:
        df = df.rename(columns={region_column_name_cleaned: region_target_name})
    print(f"Successfully processed region column: {region_column_name_cleaned}")
elif region_column_name_raw in df.columns:
     if region_column_name_raw != region_target_name:
        df = df.rename(columns={region_column_name_raw: region_target_name})
     print(f"Successfully processed region column: {region_column_name_raw}")
else:
    print(f"Could not find region column. Looked for '{region_column_name_raw}' and '{region_column_name_cleaned}'.")


# Assuming 'Campaign_type' is not a column to pivot on based on the wide format
if 'Campaign_type' in df.columns:
    print("Warning: 'Campaign_type' column found but not used for pivoting based on wide data format.")
    # You might want to drop it or use it for filtering if needed


# 4. Select needed columns for the pivot_table
# Include 'Week', 'Region_(Matched)', and all cleaned numeric columns
selected_columns = ['Week', 'Region_(Matched)'] + numeric_cols_to_clean

# Ensure all selected columns actually exist in the DataFrame after renaming
selected_columns = [col for col in selected_columns if col in df.columns]
print("Selected columns for pivot_table:", selected_columns)

# Ensure 'Week' and 'Region_(Matched)' are in the selected columns for grouping
if 'Week' not in selected_columns:
     print("Error: 'Week' column is missing after processing.")
if 'Region_(Matched)' not in selected_columns:
     print("Error: 'Region_(Matched)' column is missing after processing.")


df_subset = df[selected_columns].copy() # Use .copy() to avoid SettingWithCopyWarning

# 5. Group by Week and Region_(Matched) and sum the relevant columns
# This step is still useful to aggregate data if there are multiple rows for the same week/region combination
# or just to ensure the expected structure.
# Identify columns to sum (all columns except 'Week' and 'Region_(Matched)')
columns_to_sum = [col for col in df_subset.columns if col not in ['Week', 'Region_(Matched)']]

if 'Week' in df_subset.columns and 'Region_(Matched)' in df_subset.columns:
    grouped_df = df_subset.groupby(['Week', 'Region_(Matched)'])[columns_to_sum].sum().reset_index()
    print("\nGrouped data info:")
    grouped_df.info()
else:
    print("\nSkipping grouping step as 'Week' or 'Region_(Matched)' is missing.")
    grouped_df = df_subset.copy() # If grouping keys are missing, just use the subset as is


# 6. Rename columns to match the expected format for Meridian (e.g., 'ChannelName_Impression', 'ChannelName_Cost')
# Based on the Meridian example and the cleaned column names, we need to map
# names like 'Google_Performance_Max_Impressions' to 'Performance_Max_Impression'.
# This requires identifying the channel name and the metric name from the cleaned column names.

# Let's redefine the mapping based on the cleaned column names
# Example: 'Google_Performance_Max_Impressions' -> Channel: 'Performance_Max', Metric: 'Impression'
# 'Display_Media_Cost' -> Channel: 'Display', Metric: 'Cost'
# 'Facebook_Total_Conversions' -> Channel: 'Facebook', Metric: 'Conversions'

# Identify potential channel prefixes and metric suffixes from cleaned column names
channel_prefixes = list(set([col.split('_')[0] for col in grouped_df.columns if '_' in col]))
metric_suffixes = list(set([col.split('_')[-1] for col in grouped_df.columns if '_' in col]))

print("\nIdentified potential channel prefixes:", channel_prefixes)
print("Identified potential metric suffixes:", metric_suffixes)

# Create a more robust renaming logic
rename_map = {}
# Iterate through all columns in the grouped_df
for col in grouped_df.columns:
    if col in ['Week', 'Region_(Matched)', 'Population', 'Monetary_Value', 'Brand_GQV', 'Generic_GQV', 'SOV']:
        # Keep non-media columns as they are or rename specifically if needed
        # Based on coord_to_columns, 'Monetary_Value' should be 'Revenue'
        if col == 'Monetary_Value':
            rename_map[col] = 'Revenue'
        # 'Brand_GQV' and 'Generic_GQV' should be in 'controls'
        # 'SOV' could also be a control
        # 'Population' is mapped correctly
        # 'Total_Conversions' is the KPI
        elif col == 'Total_Conversions':
             rename_map[col] = 'Total_Conversions' # Keep as is for now, will be mapped to kpi later
        else:
            rename_map[col] = col # Keep other non-media columns as is

    # Check for campaign_type_metric pattern
    # Look for patterns like ChannelName_MetricName (e.g., Display_Impressions)
    # Or potentially ChannelName_SubChannel_MetricName (e.g., Facebook_Awareness_Impressions)
    parts = col.split('_')
    if len(parts) >= 2:
        metric_part = parts[-1]
        channel_part = '_'.join(parts[:-1]) # Join all parts except the last one as channel

        # Map the metric part to the desired Meridian metric name
        meridian_metric_name = None
        if metric_part == 'Impressions':
            meridian_metric_name = 'Impression'
        elif metric_part == 'Media_Cost':
            meridian_metric_name = 'Cost'
        elif metric_part == 'Total_Conversions':
             meridian_metric_name = 'Conversions'

        if meridian_metric_name:
            # Map the channel part to the desired Meridian channel name
            # This might require a specific mapping if the cleaned channel names
            # don't directly match the desired channel names in correct_media_to_channel
            # For now, let's use the cleaned channel part as the channel name
            meridian_channel_name = channel_part

            # Adjust channel names to match the expected format in correct_media_to_channel/spend_to_channel
            # Examples from correct_media_to_channel/spend_to_channel: 'Search', 'Performance_Max', 'Display', 'Demand_Gen'
            # The cleaned data has: 'Google_Search', 'Google_Performance_Max', 'Display', 'Google_Demand_Gen', 'Facebook', 'TikTok', 'YouTube', etc.
            # We need to map the cleaned names to the desired channel names.

            # Let's create a mapping based on the observed cleaned names and the desired Meridian channel names
            cleaned_to_meridian_channel_map = {
                'Google_Search': 'Search',
                'Google_Performance_Max': 'Performance_Max',
                'Display': 'Display',
                'Google_Demand_Gen': 'Demand_Gen',
                'Facebook': 'Facebook',
                'TikTok': 'TikTok',
                'YouTube': 'YouTube',
                'Display_Plug': 'Display_Plug', # Example: if you want to keep sub-channels separate
                'Display_STB': 'Display_STB',
                'Facebook_Awareness': 'Facebook_Awareness',
                'Facebook_Consideration': 'Facebook_Consideration',
                'Facebook_Conversion': 'Facebook_Conversion',
                'Facebook_Share_to_Buy': 'Facebook_Share_to_Buy',
                # Add other mappings as needed based on your data and desired channels
            }

            mapped_channel_name = cleaned_to_meridian_channel_map.get(meridian_channel_name, meridian_channel_name) # Use cleaned name if not in map

            new_col_name = f"{mapped_channel_name}_{meridian_metric_name}"
            rename_map[col] = new_col_name
        else:
             # If the metric part doesn't match expected metrics, keep the original name
             rename_map[col] = col


print("\nRename map generated:", rename_map)
pivot_table = grouped_df.rename(columns=rename_map)


# Ensure 'Total_Conversions' is present and named correctly for KPI
if 'Total_Conversions' not in pivot_table.columns:
     print("Warning: 'Total_Conversions' column is missing after renaming.")
     # Attempt to create it by summing if campaign conversions exist
     conversion_cols = [col for col in pivot_table.columns if col.endswith('_Conversions')]
     if conversion_cols:
         pivot_table['Total_Conversions'] = pivot_table[conversion_cols].sum()
         print("Created 'Total_Conversions' by summing campaign conversions.")
     else:
         print("Error: Could not create 'Total_Conversions'. No columns ending with '_Conversions' found.")
         pivot_table['Total_Conversions'] = 0 # Add a placeholder if no conversion columns are found

# Explicitly rename 'Monetary_Value' to 'Revenue' if it exists
if 'Monetary_Value' in pivot_table.columns:
    pivot_table = pivot_table.rename(columns={'Monetary_Value': 'Revenue'})
    print("Renamed 'Monetary_Value' to 'Revenue'.")


# 7. Ensure complete time series and region combinations (This part is still relevant)

# Get all unique regions from the grouped data
if 'Region_(Matched)' in pivot_table.columns:
    all_regions = pivot_table['Region_(Matched)'].unique()
else:
    print("Error: 'Region_(Matched)' column not found for generating full grid.")
    all_regions = [] # Cannot proceed with full grid without region

# Get the min and max dates to create a full date range from the grouped data
if 'Week' in pivot_table.columns:
    min_date = pivot_table['Week'].min()
    max_date = pivot_table['Week'].max()
    # Create a complete weekly date range
    full_date_range = pd.date_range(start=min_date, end=max_date, freq='W-MON') # Assuming weekly data starts on Monday
    print(f"\nGenerated full date range from {min_date} to {max_date}")
else:
    print("\nError: 'Week' column not found for generating full date range.")
    full_date_range = pd.to_datetime([]) # Empty date range

# Create a complete grid of all week and region combinations
if not full_date_range.empty and all_regions.size > 0:
    full_grid = pd.MultiIndex.from_product([full_date_range, all_regions], names=['Week', 'Region_(Matched)']).to_frame(index=False)
    print("Generated full grid of week and region combinations.")

    # Convert 'Week' in pivot_table to datetime for merging if it's not already
    pivot_table['Week'] = pd.to_datetime(pivot_table['Week'])

    # Merge the combined data onto the full grid
    # Use a left merge to keep all combinations from the full_grid
    pivot_table = full_grid.merge(pivot_table, on=['Week', 'Region_(Matched)'], how='left')
    print("Merged data onto full grid.")
else:
    print("Skipping full grid merge due to missing 'Week' or 'Region_(Matched)' column.")


# 8. Fill NaN values after ensuring all week/region combinations exist
# Identify columns that should be filled with 0 after merge
# These are the metric columns (all columns except 'Week' and 'Region_(Matched)')
zero_fill_columns = [col for col in pivot_table.columns if col not in ['Week', 'Region_(Matched)']]

for col in zero_fill_columns:
    if col in pivot_table.columns:
        pivot_table[col] = pivot_table[col].fillna(0)
print("Filled NaN values with 0 for metric columns.")


# Convert 'Week' back to string format expected by Meridian
if 'Week' in pivot_table.columns:
    pivot_table['Week'] = pivot_table['Week'].dt.strftime('%Y-%m-%d')
    print("Converted 'Week' column back to string format.")
else:
    print("Error: 'Week' column not found after filling NaNs.")


# 9. Final reorder (Adjust column ordering based on available columns)
# Identify columns dynamically based on the final pivot_table
all_cols = pivot_table.columns.tolist()
fixed_cols = ['Week', 'Region_(Matched)']

# Separate media and non-media metrics for ordering
# Media columns end with '_Impression' or '_Cost'
media_cols = sorted([col for col in all_cols if col.endswith('_Impression') or col.endswith('_Cost')])
# Non-media columns are everything else except the fixed columns and Total_Conversions and Revenue
non_media_cols = sorted([col for col in all_cols if col not in fixed_cols and col not in media_cols and col != 'Total_Conversions' and col != 'Revenue'])

ordered_columns = fixed_cols + media_cols + non_media_cols + ['Total_Conversions', 'Revenue']

# Ensure all ordered columns are actually in the dataframe before reordering
ordered_columns = [col for col in ordered_columns if col in pivot_table.columns]
pivot_table = pivot_table[ordered_columns]
print("\nReordered columns.")


# 10. Preview
print("\nPivoted and cleaned data preview:")
print(pivot_table.head())
print("\nData Info:")
pivot_table.info()

if 'Week' in pivot_table.columns:
    print(f"\n{pivot_table['Week'].nunique()} unique weeks")
else:
    print("\n'Week' column not found in final pivot_table.")

if 'Region_(Matched)' in pivot_table.columns:
    print(f"{pivot_table['Region_(Matched)'].nunique()} unique regions")
else:
    print("'Region_(Matched)' column not found in final pivot_table.")

print(f"{pivot_table.shape[0]} rows, {pivot_table.shape[1]} columns")


# 11. Save to CSV
output_path = '/content/drive/MyDrive/cleaned_meridian.csv'
try:
    pivot_table.to_csv(output_path, index=False)
    print(f"\nCleaned data saved to {output_path}")
except Exception as e:
    print(f"\nError saving cleaned data to {output_path}: {e}")

2\. Map the media variables and the media spends to the designated channel names intended for display in the two-page HTML output. In the following example,  'Channel0_impression' and 'Channel0_spend' are connected to the same channel, 'Channel0'.

In [None]:
coord_to_columns = load.CoordToColumns(
    time='Week',
    geo='Region_(Matched)',
    controls=['Brand_GQV', 'Generic_GQV'],  # Updated to use available GQV columns
    population='Population',
    kpi='Total_Conversions',
    revenue_per_kpi='Revenue',
    media=[
        'Search_Impression',
        'Performance_Max_Impression', # Corrected to match cleaned data naming
        'Display_Impression',
        'Demand_Gen_Impression', # Corrected to match cleaned data naming
        'Facebook_Impression',
        'TikTok_Impression',
        'YouTube_Impression',
        'Display_Plug_Impression',
        'Display_STB_Impression',
        'Facebook_Awareness_Impression',
        'Facebook_Consideration_Impression',
        'Facebook_Conversion_Impression',
        'Facebook_Share_to_Buy_Impression',

    ],
    media_spend=[
        'Google_Search_Media_Cost', # Corrected to match cleaned data naming
        'Google_Performance_Max_Media_Cost', # Corrected to match cleaned data naming
        'Display_Media_Cost',
        'Google_Demand_Gen_Media_Cost', # Corrected to match cleaned data naming
        'Facebook_Media_Cost',
        'TikTok_Media_Cost',
        'YouTube_Media_Cost',
        'Display_Plug_Media_Cost',
        'Display_STB_Media_Cost',
        'Facebook_Awareness_Media_Cost',
        'Facebook_Consideration_Media_Cost',
        'Facebook_Conversion_Media_Cost',
        'Facebook_Share_to_Buy_Media_Cost',
    ]

)

In [None]:
correct_media_to_channel = {
    'Search_Impression': 'Search',
    'Performance_Max_Impression': 'Performance_Max', # Corrected to match coord_to_columns
    'Display_Impression': 'Display',
    'Demand_Gen_Impression': 'Demand_Gen', # Corrected to match coord_to_columns
    'Facebook_Impression': 'Facebook',
    'TikTok_Impression': 'TikTok',
    'YouTube_Impression': 'YouTube',
    'Display_Plug_Impression': 'Display_Plug',
    'Display_STB_Impression': 'Display_STB',
    'Facebook_Awareness_Impression': 'Facebook_Awareness',
    'Facebook_Consideration_Impression': 'Facebook_Consideration',
    'Facebook_Conversion_Impression': 'Facebook_Conversion',
    'Facebook_Share_to_Buy_Impression': 'Facebook_Share_to_Buy',
}
correct_media_spend_to_channel = {
    'Google_Search_Media_Cost': 'Search', # Corrected to match cleaned data naming and channel
    'Google_Performance_Max_Media_Cost': 'Performance_Max', # Corrected to match cleaned data naming and channel
    'Display_Media_Cost': 'Display',
    'Google_Demand_Gen_Media_Cost': 'Demand_Gen', # Corrected to match cleaned data naming and channel
    'Facebook_Media_Cost': 'Facebook',
    'TikTok_Media_Cost': 'TikTok',
    'YouTube_Media_Cost': 'YouTube',
    'Display_Plug_Media_Cost': 'Display_Plug',
    'Display_STB_Media_Cost': 'Display_STB',
    'Facebook_Awareness_Media_Cost': 'Facebook_Awareness',
    'Facebook_Consideration_Media_Cost': 'Facebook_Consideration',
    'Facebook_Conversion_Media_Cost': 'Facebook_Conversion',
    'Facebook_Share_to_Buy_Media_Cost': 'Facebook_Share_to_Buy',
}

3\. Load the CSV data using `CsvDataLoader`. Note that `csv_path` is the path to the data file location.

In [None]:
# Add diagnostic print statements
import pandas as pd

# Load the cleaned data to inspect columns
cleaned_df = pd.read_csv('/content/drive/MyDrive/cleaned_meridian.csv')
print("Columns in cleaned_meridian.csv:", cleaned_df.columns.tolist())

# Get the media_spend columns from the cleaned data
cleaned_media_spend_cols = [col for col in cleaned_df.columns if col in coord_to_columns.media_spend]
print("Media spend columns found in cleaned data (matching coord_to_columns):", cleaned_media_spend_cols)

# Get the keys from correct_media_spend_to_channel
media_spend_channel_keys = list(correct_media_spend_to_channel.keys())
print("Keys in correct_media_spend_to_channel:", media_spend_channel_keys)

# Get the media_spend list from coord_to_columns
coord_media_spend_list = coord_to_columns.media_spend
print("media_spend list in coord_to_columns:", coord_media_spend_list)

# Compare the sets of columns/keys
print("Are media_spend columns in cleaned data equal to media_spend list in coord_to_columns?", set(cleaned_media_spend_cols) == set(coord_media_spend_list))
print("Are media_spend list in coord_to_columns equal to keys in correct_media_spend_to_channel?", set(coord_media_spend_list) == set(media_spend_channel_keys))


loader = load.CsvDataLoader(
    csv_path='/content/drive/MyDrive/cleaned_meridian.csv',
    kpi_type='non_revenue',
    coord_to_columns=coord_to_columns,
    media_to_channel=correct_media_to_channel,
    media_spend_to_channel=correct_media_spend_to_channel,
)
data = loader.load()

Note that the simulated data here contains reach and frequency channels. We recommend including reach and frequency data whenever they are available. For information about the advantages of utilizing reach and frequency, see [Bayesian Hierarchical Media Mix Model Incorporating Reach and Frequency Data](https://research.google/pubs/bayesian-hierarchical-media-mix-model-incorporating-reach-and-frequency-data/#:~:text=By%20incorporating%20R%26F%20into%20MMM,based%20on%20optimal%20frequency%20recommendations.).

<a name="configure-model"></a>
## Step 2: Configure the model

Meridian uses Bayesian framework and Markov Chain Monte Carlo (MCMC) algorithms to sample from the posterior distribution.

1\. Inititalize the `Meridian` class by passing the loaded data and the customized model specification. One advantage of Meridian lies in its capacity to calibrate the model directly through ROI priors, as described in [Media Mix Model Calibration With Bayesian Priors](https://research.google/pubs/media-mix-model-calibration-with-bayesian-priors/). In this particular example, the ROI priors for all media channels are identical, with each being represented as Lognormal(0.2, 0.9).

In [None]:
roi_rf_mu = 0.2     # Mu for ROI prior for each RF channel.
roi_rf_sigma = 0.9  # Sigma for ROI prior for each RF channel.
prior = prior_distribution.PriorDistribution(
    roi_rf=tfp.distributions.LogNormal(roi_rf_mu, roi_rf_sigma, name=constants.ROI_RF)
)
model_spec = spec.ModelSpec(prior=prior)

mmm = model.Meridian(input_data=data, model_spec=model_spec)

2\. Use the `sample_prior()` and `sample_posterior()` methods to obtain samples from the prior and posterior distributions of model parameters. If you are using the T4 GPU runtime this step may take about 10 minutes for the provided data set.

In [None]:
%%time
mmm.sample_prior(500)
mmm.sample_posterior(n_chains=10, n_adapt=2000, n_burnin=500, n_keep=500, seed=1)

For more information about configuring the parameters and using a customized model specification, such as setting different ROI priors for each media channel, see [Configure the model](https://developers.google.com/meridian/docs/user-guide/configure-model).

<a name="model-diagnostics"></a>
## Step 3: Run model diagnostics

After the model is built, you must assess convergence, debug the model if needed, and then assess the model fit.

1\. Assess convergence. Run the following code to generate r-hat statistics. R-hat close to 1.0 indicate convergence. R-hat < 1.2 indicates approximate convergence and is a reasonable threshold for many problems.

In [None]:
model_diagnostics = visualizer.ModelDiagnostics(mmm)
model_diagnostics.plot_rhat_boxplot()

2\. Assess the model's fit by comparing the expected sales against the actual sales.

In [None]:
model_fit = visualizer.ModelFit(mmm)
model_fit.plot_model_fit()

For more information and additional model diagnostics checks, see [Modeling diagnostics](https://developers.google.com/meridian/docs/user-guide/model-diagnostics).

<a name="generate-summary"></a>
## Step 4: Generate model results & two-page output

To export the two-page HTML summary output, initialize the `Summarizer` class with the model object. Then pass in the filename, filepath, start date, and end date to `output_model_results_summary` to run the summary for that time duration and save it to the specified file.

In [None]:
mmm_summarizer = summarizer.Summarizer(mmm)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
filepath = '/content/drive/MyDrive'
start_date = '2021-01-25'
end_date = '2025-01-15'
mmm_summarizer.output_model_results_summary('summary_output.html', filepath, start_date, end_date)

Here is a preview of the two-page output based on the simulated data:

In [None]:
IPython.display.HTML(filename='/content/drive/MyDrive/summary_output.html')

For a customized two-page report, model results summary table, and individual visualizations, see [Model results report](https://developers.google.com/meridian/docs/user-guide/generate-model-results-report) and [plot media visualizations](https://developers.google.com/meridian/docs/user-guide/plot-media-visualizations).





<a name="generate-optimize"></a>
## Step 5: Run budget optimization & generate an optimization report

You can choose what scenario to run for the budget allocation. In default scenario, you find the optimal allocation across channels for a given budget to maximize the return on investment (ROI).

1\. Instantiate the `BudgetOptimizer` class and run the `optimize()` method without any customization, to run the default library's Fixed Budget Scenario to maximize ROI.

In [None]:
%%time
budget_optimizer = optimizer.BudgetOptimizer(mmm)
optimization_results = budget_optimizer.optimize()

2\. Export the 2-page HTML optimization report, which contains optimized spend allocations and ROI.

In [None]:
filepath = '/content/drive/MyDrive'
optimization_results.output_optimization_summary('optimization_output.html', filepath)

In [None]:
IPython.display.HTML(filename='/content/drive/MyDrive/optimization_output.html')

For information about customized optimization scenarios, such as flexible budget scenarios, see [Budget optimization scenarios](https://developers.google.com/meridian/docs/user-guide/budget-optimization-scenarios). For more information about optimization results summary and individual visualizations, see [optimization results output](https://developers.google.com/meridian/docs/user-guide/generate-optimization-results-output) and [optimization visualizations](https://developers.google.com/meridian/docs/user-guide/plot-optimization-visualizations).

<a name="save-model"></a>
## Step 6: Save the model object

We recommend that you save the model object for future use. This helps you to  avoid repetitive model runs and saves time and computational resources. After the model object is saved, you can load it at a later stage to continue the analysis or visualizations without having to re-run the model.


Run the following codes to save the model object:

In [None]:
file_path='/content/drive/MyDrive/saved_mmm.pkl'
model.save_mmm(mmm, file_path)

Run the following codes to load the saved model:

In [None]:
mmm = model.load_mmm(file_path)