In [None]:
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Author : Lavi Nigam, ML Engineering @ Google
# Linkedin: https://www.linkedin.com/in/lavinigam/

<table align="left">

  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/notebook_template.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/notebook_template.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/notebook_template.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in Vertex AI Workbench
    </a>
  </td>
</table>

## E-Commerce Future Life Time Value (LTV) prediction using using Google Analytics (GA4) Data

Learn how to build a system to predict Long Term Value of a GA4 e-commerce user using BigQuery ML (BQML).

Client Lifetime Value, also known as Lifetime Value (LTV) and Customer Lifetime Value (CLV), is an important measure that is used in marketing to indicate an estimate of the net profit that would result from an entire future connection with a customer.

Many marketers make an effort to target specific individuals or groups of users who are quite similar to one another with their adverts, but they do not always promote to their most valued consumers. In the context of business, the Pareto principle, which states that 20 percent of a company's clients are responsible for 80 percent of its revenue, is frequently invoked.

What would happen if you were able to determine which of your clients will constitute that 20% of your business not just in the past but also in the future? Those consumers may be located through the process of predicting the customer lifetime value, also known as CLV.


CLV models allow you to find answers to the following sorts of inquiries regarding customers:

* Number of purchases: During a certain period of time in the future, how many separate purchases does the client anticipate making?
* Lifetime: The amount of time that will elapse before the client is rendered completely inactive forever.

* Value in terms of currency: How much value in terms of currency will the client create over a specific period of time in the future?

There are two main difficulties to consider when attempting to forecast future lifetime value, each of which calls for its own unique set of data and modelling strategies:

* Determine the future worth of existing consumers who have a known transaction history and make your predictions using that information.

* Make an educated guess about the future worth of new consumers who have recently made their initial purchase.

With recent changes, BigQuery ML can directly access GA4 data, bringing capture app and web data in a single interface. This integration opens many opportunities for various machine learning use cases and potential customers. For example, the e-commerce industry can funnel their GA4 data to BQML and expand their analytics with ML capabilities. This pattern aims to help such companies leverage different ML algorithms and scale their analytics with BQ.

## Audience
The pattern is intended for marketing analytics teams for an enterprise, or, teams explicitly responsible for analyzing Google Analytics data. It assumes that you have basic knowledge of the following:

- Machine Learning concepts
- Standard SQL & Python

## Costs
This tutorial uses the following billable components of Google Cloud:

- BigQuery
- BigQuery ML
- Cloud Storage


To generate a cost estimate based on your projected usage, use the pricing calculator.

Learn about
- [BigQuery
pricing](https://cloud.google.com/bigquery/pricing),
- [BigQuery ML pricing](https://cloud.google.com/bigquery-ml/pricing),
- [Cloud Storage
pricing](https://cloud.google.com/storage/pricing),

and use the [Pricing
Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

## The Dataset
The solution uses the public [GA4 Google Merchandise Store](https://console.cloud.google.com/bigquery?p=bigquery-public-data&d=ga4_obfuscated_sample_ecommerce) dataset.

Google Merchandise Store is an online store that sells Google-branded merchandise. The site uses Google Analytics 4's standard web ecommerce implementation along with enhanced measurement. The ga4_obfuscated_sample_ecommerce dataset available through the BigQuery Public Datasets program and contains a sample of obfuscated BigQuery event export data for three months from 2020-11-01 to 2021-01-31.


This dataset contains obfuscated data that emulates what a real world dataset would look like from an actual Google Analytics 4 implementation. Certain fields will contain placeholder values including <Other>, NULL, and " " . Due to obfuscation, internal consistency of the dataset might be somewhat limited.


To play with the data on your BQ Console, follow this [quick code](https://developers.google.com/analytics/bigquery/web-ecommerce-demo-dataset#using_the_dataset)

You can check the schema details of the dataset [here](https://support.google.com/analytics/answer/7029846#zippy=)

There are total 23 columns in the datasets with mixed datatypes, and approximately 4 million rows (each day event is seperate table in the data and total 92 events(tables) are present).

## Exporting Google Analytics data to BigQuery
If instead of the sample data, you want to use your own data from a GA4 property, you can follow the instructions in [(GA4) Set up BigQuery ](https://support.google.com/analytics/answer/9823238#zippy=%2Cin-this-article) Export to export your data.

### Set up your local development environment

**If you are using Colab or Google Cloud Notebooks**, your environment already meets
all the requirements to run this notebook. You can skip this step.

**Otherwise**, make sure your environment meets this notebook's requirements.
You need the following:

* The Google Cloud SDK
* Python 3
* Jupyter notebook running in a virtual environment with Python 3

The Google Cloud guide to [Setting up a Python development
environment](https://cloud.google.com/python/setup) and the [Jupyter
installation guide](https://jupyter.org/install) provide detailed instructions
for meeting these requirements. The following steps provide a condensed set of
instructions:

1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)

1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)

1. [Install
   virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)
   and create a virtual environment that uses Python 3. Activate the virtual environment.

1. To install Jupyter, run `pip3 install jupyter` on the
command-line in a terminal shell.

1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.

1. Open this notebook in the Jupyter Notebook Dashboard.

### Install additional packages

Install additional package dependencies not installed in your notebook environment, such as {plotly.express, pandas, google.cloud}. Use the latest major GA version of each package.

In [None]:
import os

# The Google Cloud Notebook product has specific requirements
IS_GOOGLE_CLOUD_NOTEBOOK = os.path.exists("/opt/deeplearning/metadata/env_version")

# Google Cloud Notebook requires dependencies to be installed with '--user'
USER_FLAG = ""
if IS_GOOGLE_CLOUD_NOTEBOOK:
    USER_FLAG = "--user"

### Restart the kernel

After you install the additional packages, you need to restart the notebook kernel so it can find the packages.

In [None]:
# Automatically restart kernel after installs
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

## Before you begin

### Set up your Google Cloud project

**The following steps are required, regardless of your notebook environment.**

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.

2. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).


3. If you are running this notebook locally, you will need to install the [Cloud SDK](https://cloud.google.com/sdk).

4. Enter your project ID in the cell below. Then run the cell to make sure the
Cloud SDK uses the right project for all the commands in this notebook.

**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands.

#### Set your project ID

**If you don't know your project ID**, you may be able to get your project ID using `gcloud`.

In [None]:
import os

PROJECT_ID = ""

# Get your Google Cloud project ID from gcloud
if not os.getenv("IS_TESTING"):
    shell_output = !gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT_ID = shell_output[0]
    print("Project ID: ", PROJECT_ID)

Otherwise, set your project ID here.

In [None]:
if PROJECT_ID == "" or PROJECT_ID is None:
    PROJECT_ID = ""  # @param {type:"string"}

### Authenticate your Google Cloud account

**If you are using Google Cloud Notebooks**, your environment is already
authenticated. Skip this step.

In [None]:
import os
import sys

# If you are running this notebook in Colab, run this cell and follow the
# instructions to authenticate your GCP account. This provides access to your
# Cloud Storage bucket and lets you submit training jobs and prediction
# requests.

# The Google Cloud Notebook product has specific requirements
IS_GOOGLE_CLOUD_NOTEBOOK = os.path.exists("/opt/deeplearning/metadata/env_version")

# If on Google Cloud Notebooks, then don't execute this code
if not IS_GOOGLE_CLOUD_NOTEBOOK:
    if "google.colab" in sys.modules:
        from google.colab import auth as google_auth

        google_auth.authenticate_user()

    # If you are running this notebook locally, replace the string below with the
    # path to your service account key and run this cell to authenticate your GCP
    # account.
    elif not os.getenv("IS_TESTING"):
        %env GOOGLE_APPLICATION_CREDENTIALS ''

In [None]:
# Importing some important libraries that will be used during the notebook
import pandas as pd
import plotly.express as px
from google.cloud import bigquery

In [None]:
# Client manages connections to the BigQuery API and helps
# bundle configuration (project, credentials) needed for API requests.
client = bigquery.Client(PROJECT_ID)

# to make sure all columns are displayed while working with dataframe
pd.set_option("display.max_columns", None)
pd.set_option("display.max_colwidth", 50)

## Assumptions

## Exploratory Data Analysis (EDA)

You can start by defining some essential variables that can change according to your data. It is always better to consider the most recent records from your data as features. For this purpose, you can set the START_DATE and END_DATE based on your data recency.

In this case, the date range is set for 3 months.

In [None]:
# Dataset (GA4 Google Merchandise Store) specific Variable
# Change it to your dataset spefic values, if you want to use the code for your data.
# We assume table names will be "events_*"
PROJECT_ID_DATA = "bigquery-public-data"
DATASET_ID_DATA = "ga4_obfuscated_sample_ecommerce"  # ga4-bq-pattern.1crdata.fake_ga4 #ga4_obfuscated_sample_ecommerce
START_DATE = "20201101"
END_DATE = "20210131"  # taking 3 months recent data.
# In queries, these variables are editable so that you can put your project, dataset, and date,
# making it easier for you to make the least amount of changes. Of course, you don't need to change
# it for public data. But, for making the queries editable, it made sense to define them here.
# You can run the whole notebook (mostly) with your data by changing values here.

You can start the data exploration by returning the first five rows of data.
The data has multiple event tables for each day. So, all the tables (events) could be queried by using events* as the wildcard.

[GA4 Data Export Schema](https://support.google.com/analytics/answer/7029846#zippy=)

Note: BigQuery export, by default, are [date sharded tables](https://cloud.google.com/bigquery/docs/partitioned-tables#dt_partition_shard)

In [None]:
query = f"""
SELECT
  *
FROM
  `{PROJECT_ID_DATA}.{DATASET_ID_DATA}.events*`
LIMIT
  5
"""
query_job = client.query(query)
top5_data = query_job.to_dataframe()
top5_data.head()

The first five rows of data can help you understand the tables' composite structure of data types. For example, you can see numerical, categorical, Arrays, and Struct as data types. Using this information, later, you will be able to write specific `UNNEST` queries for [Arrays](https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#query_structs_in_an_array) & [Struct](https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#querying_array-type_fields_in_a_struct).

By looking at some columns, you can also identify a few essential features like event_date, event_name, user_ltv, device, geo, traffic_source, platform, and items. However, as discussed earlier, you still are not aware of their value distribution, availability, and data types.

You can check the data types of each column using [INFORMATION_SCHEMA](https://cloud.google.com/bigquery/docs/information-schema-tables) table. It can give you detailed metadata of your columns.

In [None]:
query = f"""
SELECT
  DISTINCT(column_name),
  data_type
FROM
  `{PROJECT_ID_DATA}.{DATASET_ID_DATA}.INFORMATION_SCHEMA.COLUMNS`
"""

query_job = client.query(query)
predict_data = query_job.to_dataframe()
predict_data

You can start by understanding overall data by getting a quick summary of the data, namely - total events  (event_count), total users (user_count), total days in the data (day_count), and total registered users of the platform (registered_user_id).
This can help you get a sense of the scale of data.

In [None]:
query = f"""

SELECT
  COUNT(*) AS event_count,
  COUNT(DISTINCT user_pseudo_id) AS user_count,
  COUNT(DISTINCT event_date) AS day_count,
  COUNT(DISTINCT user_id) AS registered_user_id
FROM
  `{PROJECT_ID_DATA}.{DATASET_ID_DATA}.events*`
"""
query_job = client.query(query)
top5_data = query_job.to_dataframe()
top5_data

As you can observe, there are roughly 4 million events with close to 270,000 users, stretched along 92 days of activity on the platform.

There are no registered users data in the table. The user_pseudo_id is not a "user_id"; it is an client ID (cookie ID) for the user. This means that a single user can be represented as multiple pseudo_id in the data.

For simplicity, we will assume that all user_pseudo_id are unique and represent a single user.

If your data has 'user_id', use that directly, or else you can go ahead and use 'user_psuudo_id'.


---
Now, you can start by looking into `event_name` distribution.

event_name is a significant column in this dataset. It contains all the events triggered as users interact with the Google Merchandise Store like page_view, scroll (scrolling the page), view_item (viewing specific item), etc. You can refer [here](https://developers.google.com/analytics/devguides/collection/ga4/reference/events) for a more detailed meaning of each event_name.


In [None]:
query = f"""
SELECT
  event_name,
  COUNT(*) as row_count
FROM
   `{PROJECT_ID_DATA}.{DATASET_ID_DATA}.events*`
GROUP BY
  event_name
ORDER BY
  row_count DESC
"""
query_job = client.query(query)
result_df = query_job.to_dataframe()
fig = px.bar(
    result_df, x="row_count", y="event_name", title="Event Name Frequency Distribution"
)
fig.show()

You can observe a great imbalance in the frequency of different event_name(s). The top five events based on frequency:

* page_view - User is viewing a page

* user_engagement - Sessions that last 10 seconds or longer

* scroll - User scrolling through a page

* view_item - some content was shown to the user. You can use this to discover the most popular items.

* session_start - User session after the engagement has been initiated.


The other events don't have too many records and hence would be a challenge to be considered a feature. However, you can also notice that typical purchase events - "add_to_cart", "begin_checkout", "add_shipping_info", "add_payment_info", and "purchase" have a tiny amount of records, indicating that this data doesn't contain too many events where a user has bought something.


So,`page_view` seems to be the best filter for the column `event_name` since it has the highest records and covers users' general browsing behavior. However, you can still leverage `add_to_cart` and `purchase` value for purchase information by simply counting a user's total events for these event types.

Also, remember that the actual key of `page_view` event_name is available in event_params, and their values are in event_params.values.{int/float/string} in nested array format.

Data references:

[Dimensions & Metrics](https://support.google.com/analytics/topic/11151952?hl=en&ref_topic=9228654)


## Feature Engineering

Now that you have done some basic exploration of GA4 data, you can create different features for LTV.

However, before doing that, you should create a Dataset in BQ Console named "ga4_ecomm_feature_set" inside your project. Then, you can create a table for different kinds of features and store in the dataset.


This will help retain the features for later purposes.

In [None]:
DATASET_NAME = "ga4_ecomm_feature_set_ltv"
feature_table = "ltv_features"
try:
    dataset = client.create_dataset(DATASET_NAME, timeout=30)  # Make an API request.
    print("Created dataset {}.{}".format(client.project, dataset.dataset_id))
except Exception as e:
    print(e)

Let's start building features that can be leveraged to make models learn user behavior that can help predict future LTV based on given LTV values pre-computed in the dashboard. Our goal is to go over and beyond the default formula and model used in the dashboard for LTV. [Refer here](https://support.google.com/analytics/answer/9947257?hl=en) to learn more about default LTV value.

Since the feature-building query code will be significantly large, so for readability purposes, we are dividing the query into base feature table and final feature groups-bys.
The base table will focus on features where core logic is implemented and multiple tables are created. Once they are created, we can make final aggregates and flags based on previously computed tables in the following query block.

High-Level Features:

* engagement -> User Engagement - session_engaged value =  1.

* bounces -> User Bounce - session_engaged value = 0.

* returning_customers -> returning customers - if a user made multiple purchases (>=2) events on different dates.

* non_returning_customers ->  non-returning customers - if a user has done single or no purchase events on all dates.

*  repeated_purchase -> repeated purchase flag - total times purchase has been made. If it's >=0, then the actual number else 0.

* events_sequence -> Event Sequence - order of event_name based on event_date.

* grouped_events_sequence -> grouping all events

* user_events_counted -> User Event counted - total events observed and tracked for the user.

* pages -> various pages visited as part of page_view event; multiple levels (all links accessed after the landing page) are tracked.
landing_page, second_page, exit page [three levels]

          - pagepath_level_1
          - previous_page_path_level_1
          - landing_pagepath_level_1
          - second_pagepath_level_1


In [None]:
base_feature_tabel = f"""CREATE OR REPLACE TABLE `{DATASET_NAME}.{feature_table}` AS
with engagement as (
select
   user_pseudo_id,
   event_date,
   SAFE_DIVIDE(
       count(distinct case when session_engaged = 1 then concat(user_pseudo_id,session_id) end),
       COUNT(DISTINCT session_id)
   ) AS engagement_rate,
   count(distinct case when session_engaged = 1 then concat(user_pseudo_id,session_id) end) as engaged_sessions,
   count(distinct case when session_engaged = 0 then concat(user_pseudo_id,session_id) end) as bounces,
   SAFE_DIVIDE(
       count(distinct case when session_engaged = 0 then concat(user_pseudo_id,session_id) end),
       COUNT(DISTINCT session_id)
   ) as bounce_rate,
   COUNT(DISTINCT session_id) AS total_sessions,
   IFNULL(round(sum(engagement_time_msec)/1000),0) as engagement_time_seconds,
from (
   select
       user_pseudo_id,
       event_date,
       (select value.int_value from unnest(event_params) where key = 'ga_session_id') as session_id,
       IFNULL(max((select value.int_value from unnest(event_params) where key = 'session_engaged')), 0) as session_engaged,
       max((select value.int_value from unnest(event_params) where key = 'engagement_time_msec')) as engagement_time_msec
   from
       `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events*`
   WHERE parse_date("%Y%m%d",event_date) >= parse_date("%Y-%m-%d", '2021-01-01')
   group by
       user_pseudo_id,
       event_date,
       session_id)
   group by user_pseudo_id, event_date),

returning_customers as (
SELECT user_pseudo_id, event_date, MAX(unique_purchase) as unique_purchase
   FROM (
       SELECT
           user_pseudo_id,
           event_date,
           RANK() OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp ASC)
               AS unique_purchase
       FROM
           `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events*` AS GA
       WHERE event_name = 'purchase'
       AND parse_date("%Y%m%d",event_date) >= parse_date("%Y-%m-%d", '2021-01-01')
       GROUP BY user_pseudo_id, event_date, event_timestamp
   )
   WHERE unique_purchase >= 2
   GROUP BY user_pseudo_id, event_date
),

non_returning_customers as (
SELECT
       user_pseudo_id,
       event_date,
   FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events*` AS GA
   WHERE user_pseudo_id NOT IN (SELECT user_pseudo_id FROM returning_customers)
   AND parse_date("%Y%m%d",event_date) >= parse_date("%Y-%m-%d", '2021-01-01')
   GROUP BY user_pseudo_id, event_date
),

combined as (
 SELECT user_pseudo_id, event_date, unique_purchase
   FROM returning_customers
   UNION ALL
   SELECT user_pseudo_id, event_date, -1
   FROM non_returning_customers
   GROUP BY user_pseudo_id, event_date
),

repeated_purchase as (
 SELECT
   user_pseudo_id,
   event_date,
   CASE
       WHEN unique_purchase >= 0
       THEN unique_purchase ELSE 0 END AS has_repeated_purchase
 FROM
   combined
),

events_sequence AS (
   SELECT
     ROW_NUMBER() OVER () AS rownumber,
     ROW_NUMBER() OVER (PARTITION BY user_pseudo_id ORDER BY event_date) AS rownumber_by_user,
     user_pseudo_id,
     event_date
   FROM (
     SELECT DISTINCT
           user_pseudo_id,
           event_date,
       FROM
           `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events*` AS GA
       WHERE
       parse_date("%Y%m%d",event_date) >= parse_date("%Y-%m-%d", '2021-01-01')
       GROUP BY user_pseudo_id, event_date
   )
   ORDER BY user_pseudo_id, event_date
),

grouped_events_sequence AS (
 SELECT user_pseudo_id,
        event_date,
        DENSE_RANK() OVER (ORDER BY rownumber) - DENSE_RANK() OVER (PARTITION BY user_pseudo_id ORDER BY rownumber) AS car_group,
        rownumber_by_user
 FROM events_sequence
),

user_events_counted AS (
SELECT DISTINCT user_pseudo_id,
      count_of_consecutive_days,
      MAX(count_of_days) as count_of_days
FROM
(SELECT user_pseudo_id,
       COUNT(1) AS count_of_consecutive_days,
       MAX(rownumber_by_user) AS count_of_days
FROM grouped_events_sequence
GROUP BY user_pseudo_id, car_group
HAVING COUNT(1) >= 1)
GROUP BY user_pseudo_id, count_of_consecutive_days
),

pages as (
select
   user_pseudo_id,
   (select value.int_value from unnest(event_params) where event_name = 'page_view' and key = 'ga_session_id') as session_id,
   event_timestamp,
   event_date,
   event_name,
   (select value.string_value from unnest(event_params) where event_name = 'page_view' and key = 'page_location') as page,
   lag((select value.string_value from unnest(event_params) where event_name = 'page_view' and key = 'page_location'), 1) over (partition by user_pseudo_id,(select value.int_value from unnest(event_params) where event_name = 'page_view' and key = 'ga_session_id') order by event_timestamp asc) as previous_page,
   case when split(split((select value.string_value from unnest(event_params) where event_name = 'page_view' and key = 'page_location'),'/')[safe_ordinal(4)],'?')[safe_ordinal(1)] = '' then null else concat('/',split(split((select value.string_value from unnest(event_params) where event_name = 'page_view' and key = 'page_location'),'/')[safe_ordinal(4)],'?')[safe_ordinal(1)]) end as pagepath_level_1,
   case when split(split(lag((select value.string_value from unnest(event_params) where event_name = 'page_view' and key = 'page_location'), 1) over (partition by user_pseudo_id,(select value.int_value from unnest(event_params) where event_name = 'page_view' and key = 'ga_session_id') order by event_timestamp asc),'/')[safe_ordinal(4)],'?')[safe_ordinal(1)] = '' then null else concat('/',split(split(lag((select value.string_value from unnest(event_params) where event_name = 'page_view' and key = 'page_location'), 1) over (partition by user_pseudo_id,(select value.int_value from unnest(event_params) where event_name = 'page_view' and key = 'ga_session_id') order by event_timestamp asc),'/')[safe_ordinal(4)],'?')[safe_ordinal(1)]) end as previous_page_path_level_1,
   (select value.string_value from unnest(event_params) where event_name = 'page_view' and key = 'page_title') as page_title,
   case when (select value.int_value from unnest(event_params) where event_name = 'page_view' and key = 'entrances') = 1 then (select value.string_value from unnest(event_params) where event_name = 'page_view' and key = 'page_location') end as landing_page,
   case when split(split((case when (select value.int_value from unnest(event_params) where event_name = 'page_view' and key = 'entrances') = 1 then (select value.string_value from unnest(event_params) where event_name = 'page_view' and key = 'page_location') END),'/')[safe_ordinal(4)],'?')[safe_ordinal(1)] = '' then null else concat('/',split(split((case when (select value.int_value from unnest(event_params) where event_name = 'page_view' and key = 'entrances') = 1 then (select value.string_value from unnest(event_params) where event_name = 'page_view' and key = 'page_location') END),'/')[safe_ordinal(4)],'?')[safe_ordinal(1)]) end as landing_pagepath_level_1,
   case when (select value.int_value from unnest(event_params) where event_name = 'page_view' and key = 'entrances') = 1 then lead((select value.string_value from unnest(event_params) where event_name = 'page_view' and key = 'page_location'), 1) over (partition by user_pseudo_id,(select value.int_value from unnest(event_params) where event_name = 'page_view' and key = 'ga_session_id') order by event_timestamp asc) else null end as second_page,
   case when split(split((case when (select value.int_value from unnest(event_params) where event_name = 'page_view' and key = 'entrances') = 1 then lead((select value.string_value from unnest(event_params) where event_name = 'page_view' and key = 'page_location'), 1) over (partition by user_pseudo_id,(select value.int_value from unnest(event_params) where event_name = 'page_view' and key = 'ga_session_id') order by event_timestamp asc) else null end),'/')[safe_ordinal(4)],'?')[safe_ordinal(1)] = '' then null else concat('/',split(split((case when (select value.int_value from unnest(event_params) where event_name = 'page_view' and key = 'entrances') = 1 then lead((select value.string_value from unnest(event_params) where event_name = 'page_view' and key = 'page_location'), 1) over (partition by user_pseudo_id,(select value.int_value from unnest(event_params) where event_name = 'page_view' and key = 'ga_session_id') order by event_timestamp asc) else null end),'/')[safe_ordinal(4)],'?')[safe_ordinal(1)]) end as second_pagepath_level_1,
   case when (select value.string_value from unnest(event_params) where event_name = 'page_view' and key = 'page_location') = first_value((select value.string_value from unnest(event_params) where event_name = 'page_view' and key = 'page_location')) over (partition by user_pseudo_id,(select value.int_value from unnest(event_params) where event_name = 'page_view' and key = 'ga_session_id') order by event_timestamp desc) then ( select value.string_value from unnest(event_params) where event_name = 'page_view' and key = 'page_location') else null end as exit_page,
   case when split(split((case when (select value.string_value from unnest(event_params) where event_name = 'page_view' and key = 'page_location') = first_value((select value.string_value from unnest(event_params) where event_name = 'page_view' and key = 'page_location')) over (partition by user_pseudo_id,(select value.int_value from unnest(event_params) where event_name = 'page_view' and key = 'ga_session_id') order by event_timestamp desc) then ( select value.string_value from unnest(event_params) where event_name = 'page_view' and key = 'page_location') else null end),'/')[safe_ordinal(4)],'?')[safe_ordinal(1)] = '' then null else concat('/',split(split((case when (select value.string_value from unnest(event_params) where event_name = 'page_view' and key = 'page_location') = first_value((select value.string_value from unnest(event_params) where event_name = 'page_view' and key = 'page_location')) over (partition by user_pseudo_id,(select value.int_value from unnest(event_params) where event_name = 'page_view' and key = 'ga_session_id') order by event_timestamp desc) then ( select value.string_value from unnest(event_params) where event_name = 'page_view' and key = 'page_location') else null end),'/')[safe_ordinal(4)],'?')[safe_ordinal(1)]) end as exit_pagepath_level_1,
from
   -- change this to your google analytics 4 export location in bigquery
   `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events*`
where
   event_name = 'page_view'
   AND parse_date("%Y%m%d",event_date) >= parse_date("%Y-%m-%d", '2021-01-01')),

pages_tracking as (
select
   user_pseudo_id,
   event_date,
   -- page (dimension | a page on the website specified by path and/or query parameters)
   page,
   -- page path level 1 (dimension | this dimension rolls up all the page paths in the first hierarchical level)
   MAX(pagepath_level_1) as pagepath_level_1,
   -- previous page path (dimension | a page visited before another page on the same property)
   MAX(previous_page) as previous_page,
   MAX(previous_page_path_level_1) as previous_page_path_level_1,
   -- landing page (dimension | the first page in users' sessions)
   landing_page,
   MAX(landing_pagepath_level_1) as landing_pagepath_level_1,
   -- second page (dimension | the second page in users' sessions)
   MAX(second_page) as second_page,
   MAX(second_pagepath_level_1) as second_pagepath_level_1,
   -- exit page (dimension | the last page in users' sessions)
   exit_page,
   MAX(exit_pagepath_level_1) as exit_pagepath_level_1,
   -- entrances (metric | the number of entrances to the property measured as the first pageview in a session)
   count(landing_page) as entrances,
   -- pageviews (metric | the total number of pageviews for the property)
   count(page) as pageviews,
   -- unique pageviews (metric | the number of sessions during which the specified page was viewed at least once, a unique pageview is counted for each page url + page title combination)
   count(distinct concat(page,page_title,session_id)) as unique_pageviews,
   -- pages / session (metric | the average number of pages viewed during a session, including repeated views of a single page)
   count(page) / count(distinct session_id) as pages_per_session,
   -- exits (metric | the number of exits from the property)
   count(exit_page) as exits,
   -- exit % (metric | the percentage of exits from the property that occurred out of the total pageviews)
   count(exit_page) / count(page) as exit_rate
from
   pages
group by
   user_pseudo_id,
   event_date,
   page,
   page_title,
   landing_page,
   exit_page
),"""

Final  Features (Grouped for user):

* Date:  Date specific features -> year,month_of_the_year,week_of_the_year,day_of_the_month,day_of_week,hour.

* Count of user event footprints -> count_of_days, count_of_consecutive_days, count_total_events, count_of_sessions, count_view_item, count_add_to_cart_item, count_select_item, count_begin_checkout_item

* If the user is visiting for the first time -> is_first_visit

* Purchase and Promotion elated -> has_promotionhas_added_payment_infohas_added_shipping_info

* Engagement and Bounce numbers based on feature base tables: engagement_rate, bounce_rate, bounces, engagement_time_seconds, total_sessions

* Page Trackers based on different most visited pages; pagepath_level_1, previous_page_path_level_1, landing_pagepath_level_1, second_pagepath_level_1

* Page view specific numbers: entrances, pageviews, unique_pageviews, pages_per_session, exits, exit_rate, new_or_returning_visitor

* The campaign that is used most times and various traffic source: campaign, traffic_medium, traffic_source

* The total session engaged; engaged_session_event

* GA4 internal model predicted LTV value, which is our class or predictor data; ltv_revenue

* The category that is visited most and average item quantity viewed and purchased; category, avg_item_quantity

* Device specific information for a user:mobile_brand_name, operating_system, operating_system_version, browser, browser_version

* User demographic: continent, sub_continent, country, region, city


In [None]:
grouped_feature_query = """master_ga4_erp as (
SELECT
A.user_pseudo_id,
MAX(parse_date("%Y%m%d",A.event_date)) as event_date,
MAX(
 CASE
 WHEN ecommerce.purchase_revenue > 0
 THEN 1
 ELSE 0
END) AS has_purchased,
MAX(Repeated_Purchased.has_repeated_purchase) as had_purchased_before,
MAX(CAST(format_date('%Y',parse_date("%Y%m%d",A.event_date)) as INT64)) as year,
MAX(CAST(format_date('%m',parse_date("%Y%m%d",A.event_date)) as INT64)) as month_of_the_year,
MAX(CAST(format_date('%U',parse_date("%Y%m%d",A.event_date)) as INT64)) as week_of_the_year,
MAX(CAST(format_date('%d',parse_date("%Y%m%d",A.event_date)) as INT64)) as day_of_the_month,
MAX(CAST(format_date('%w',parse_date("%Y%m%d",A.event_date)) as INT64)) as day_of_week,
MAX(CAST(format("%02d",extract(hour from timestamp_micros(A.event_timestamp))) as INT64)) as hour,
MAX(User_Events_Counted.count_of_days) as count_of_days,
MAX(User_Events_Counted.count_of_consecutive_days) as count_of_consecutive_days,
COUNT(DISTINCT A.event_timestamp) as count_total_events,
SUM(CASE WHEN REGEXP_CONTAINS(event_name, '(?i)purchase') THEN 1 ELSE 0 END) as count_item_purchases,
SUM(CASE WHEN (REGEXP_CONTAINS(event_name, '(?i)session_start') AND ep.key = 'ga_session_number') THEN  ep.value.int_value ELSE 0 END) as count_of_sessions,
SUM(CASE WHEN REGEXP_CONTAINS(event_name, '(?i)view_item') THEN 1 ELSE 0 END) as count_view_item,
SUM(CASE WHEN REGEXP_CONTAINS(event_name, '(?i)add_to_cart') THEN 1 ELSE 0 END) as count_add_to_cart_item,
SUM(CASE WHEN REGEXP_CONTAINS(event_name, '(?i)select_item') THEN 1 ELSE 0 END) as count_select_item,
SUM(CASE WHEN REGEXP_CONTAINS(event_name, '(?i)begin_checkout') THEN 1 ELSE 0 END) as count_begin_checkout_item,
SUM(CASE WHEN REGEXP_CONTAINS(event_name, '(?i)first_visit') THEN 1 ELSE 0 END) as is_first_visit,
SUM(CASE WHEN REGEXP_CONTAINS(event_name, '(?i)select_promotion') THEN 1 ELSE 0 END) as has_promotion,
SUM(CASE WHEN REGEXP_CONTAINS(event_name, '(?i)add_payment_info') THEN 1 ELSE 0 END) as has_added_payment_info,
SUM(CASE WHEN REGEXP_CONTAINS(event_name, '(?i)add_shipping_info') THEN 1 ELSE 0 END) as has_added_shipping_info,
IFNULL(MAX(Engagement.engagement_rate), 0) AS engagement_rate,
IFNULL(MAX(Engagement.bounce_rate), 0) AS bounce_rate,
IFNULL(MAX(Engagement.bounces), 0) AS bounces,
IFNULL(MAX(Engagement.engagement_time_seconds), 0) AS engagement_time_seconds,
IFNULL(MAX(Engagement.total_sessions), 0) AS total_sessions,
MAX(Pages_Tracking.page) as page,
MAX(Pages_Tracking.pagepath_level_1) as pagepath_level_1,
MAX(Pages_Tracking.previous_page) as previous_page,
MAX(Pages_Tracking.previous_page_path_level_1) as previous_page_path_level_1,
MAX(Pages_Tracking.landing_page) as landing_page,
MAX(Pages_Tracking.landing_pagepath_level_1) as landing_pagepath_level_1,
MAX(Pages_Tracking.second_page) as second_page,
MAX(Pages_Tracking.second_pagepath_level_1) as second_pagepath_level_1,
MAX(Pages_Tracking.exit_page) exit_page,
MAX(Pages_Tracking.exit_pagepath_level_1) as exit_pagepath_level_1,
MAX(Pages_Tracking.entrances) as entrances,
MAX(Pages_Tracking.pageviews) as pageviews,
MAX(Pages_Tracking.unique_pageviews) as unique_pageviews,
MAX(Pages_Tracking.pages_per_session) as pages_per_session,
MAX(Pages_Tracking.exits) as exits,
MAX(Pages_Tracking.exit_rate) as exit_rate,
MAX(CASE ep.key
   WHEN "tax" THEN CAST(ep.value.double_value AS STRING)
   END)
AS tax,
MAX(CASE ep.key
   WHEN "ga_session_id" THEN CAST(ep.value.int_value AS STRING)
   END)
AS ga_session_id,
CASE WHEN MAX(CASE ep.key WHEN "ga_session_number" THEN ep.value.int_value END) = 1 THEN 'new' ELSE 'return' END
AS new_or_returning_visitor,
#MAX(CASE ep.key
#    WHEN "engagement_time_msec" THEN CAST(ep.value.int_value AS STRING)
#    END)
#AS engagement_time_msec,
MAX(CASE ep.key
   WHEN "shipping_tier" THEN CAST(ep.value.string_value AS STRING)
   END)
AS shipping_tier,
MAX(CASE ep.key
   WHEN "coupon" THEN CAST(ep.value.string_value AS STRING)
   END)
AS coupon,
MAX(CASE ep.key
   WHEN "promotion_name" THEN CAST(ep.value.string_value AS STRING)
   END)
AS promotion_name,
MAX(CASE ep.key
   WHEN "payment_type" THEN CAST(ep.value.string_value AS STRING)
   END)
AS payment_type,
MAX(CASE ep.key
   WHEN "campaign" THEN CAST(ep.value.string_value AS STRING)
   END)
AS campaign,
COUNT(CASE ep.key
   WHEN "engaged_session_event" THEN CAST(ep.value.int_value AS STRING)
   END)
AS engaged_session_event,
MAX(A.event_value_in_usd) as event_value_in_usd,
MAX(timestamp_micros(A.user_first_touch_timestamp)) as user_first_touch_timestamp,
MAX(A.user_ltv.revenue) as ltv_revenue,
MAX(A.device.category) as category,
MAX(A.device.mobile_brand_name) as mobile_brand_name,
MAX(A.device.operating_system) as operating_system,
MAX(A.device.operating_system_version) as operating_system_version,
MAX(A.device.web_info.browser) as browser,
MAX(A.device.web_info.browser_version) as browser_version,
MAX(A.geo.continent) as continent,
MAX(A.geo.sub_continent) as sub_continent,
MAX(A.geo.country) as country,
MAX(A.geo.region) as region,
MAX(A.geo.city) as city,
MAX(A.traffic_source.medium) as traffic_medium,
MAX(A.traffic_source.source) as traffic_source,
CAST(ROUND(AVG(it.quantity)) AS INT64) as avg_item_quantity,
FROM (
 select e.* EXCEPT(event_params), ep
 FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events*` e , UNNEST(event_params) ep
) A, UNNEST(A.items) it
LEFT JOIN repeated_purchase as Repeated_Purchased
ON A.user_pseudo_id = Repeated_Purchased.user_pseudo_id
LEFT JOIN engagement AS Engagement
ON A.user_pseudo_id = Engagement.user_pseudo_id AND A.event_date = Engagement.event_date
LEFT JOIN user_events_counted as User_Events_Counted
ON A.user_pseudo_id = User_Events_Counted.user_pseudo_id
LEFT JOIN pages_tracking AS Pages_Tracking
ON A.user_pseudo_id = Pages_Tracking.user_pseudo_id AND A.event_date = Pages_Tracking.event_date
WHERE
ep.key IN ('tax', 'ga_session_id', 'ga_session_number',
               'engagement_time_msec', 'shipping_tier',
               'coupon', 'promotion_name', 'payment_type',
               'page_location', 'campaign', 'engaged_session_event')
AND parse_date("%Y%m%d",A.event_date) >= parse_date("%Y-%m-%d", '2021-01-01')
GROUP BY user_pseudo_id, A.event_date)

SELECT
*
EXCEPT(page, previous_page, landing_page, second_page,
        exit_page, exit_pagepath_level_1, ga_session_id,
        event_value_in_usd, user_first_touch_timestamp, count_item_purchases,
        tax, shipping_tier, coupon, promotion_name, payment_type)
        #store_code, store_name, transaction_channel,
        #engagement_time_msec
        #item_category5, google_product_category_path, product_gtin)
FROM master_ga4_erp

"""

Combining both the queries, to execute

In [None]:
final_feature_table = f"""
{base_feature_tabel}{grouped_feature_query}
"""
# print(final_feature_table)
query_job = client.query(final_feature_table)

The above feature creation takes approx 5 minutes. The following code will throw an error unless the table has been built. Wait and retry. If you still get an error after a while, your query might have failed to execute. Try running the query in the BQ console.

In [None]:
query = f"""

SELECT
  *
FROM
  `{DATASET_NAME}.{feature_table}`
LIMIT
5
"""

query_job = client.query(query)
result_df = query_job.to_dataframe()
result_df.head()

## BQML Modeling

Once we have all the features, we can run the BQML model. We don't have to specify specific parameters and can leverage auto hyperparameter tunning. Just as we discussed, our target variable is "ltv_revenue". We want the model to understand the relationship between all the user-specific features and "ltv_revenue", so we can use this model to predict future LTV values for customers. The model can also help us predict for users where the default LTV (given in the dashboard) failed to generate any LTV.

We can also leverage global explaination available as part of BQML by using enable_global_explain=TRUE. This can provide us with feature rankings and their weights. It will also help us identify why each prediction is being made and which columns have been responsible for the predicted value.

In [None]:
model_name = "customer_ltv_model"
trails = 8

linear_regression_query = f"""
CREATE OR REPLACE MODEL
  `{DATASET_NAME}.{model_name}` OPTIONS (model_type='linear_reg',
    input_label_cols=['ltv_revenue'],num_trials={trails},
   max_parallel_trials=4,
   enable_global_explain=TRUE) AS
SELECT
  *
FROM
  `{PROJECT_ID}.{DATASET_NAME}.{feature_table}`
WHERE
  ltv_revenue IS NOT NULL
"""
# print(linear_regression_query)
query_job = client.query(linear_regression_query)

In [None]:
model_name = "customer_ltv_model"
ml_evaluate_query = f"""
SELECT
  *
FROM
  ML.EVALUATE(MODEL `{DATASET_NAME}.{model_name}`,
    (
    SELECT
      *
    FROM
      `{PROJECT_ID}.{DATASET_NAME}.{feature_table}`
    WHERE
      ltv_revenue IS NOT NULL))
"""
query_job = client.query(ml_evaluate_query)
ml_info_df = query_job.to_dataframe()
ml_info_df

After a while, we can check the final results of tunning. We can observe that we already have a model with 71% r2 and 72% variance explained. These are not great numbers, but they are good to start. We only have three months of user data to manage our expectations from an accuracy perspective. In the future, when more data flow, we can expect it to explain more than 90% variance, such that we have reliable LTV.

Since we have asked the BQML model to run different trials, we can check the parameters for each trial to decide on any other model if we want to. However, by default, it has picked the first model, where is_optimal = True.

In [None]:
ml_trail_info_query = f"""
SELECT
  *
FROM
  ML.TRIAL_INFO(MODEL `{DATASET_NAME}.{model_name}`)
"""
query_job = client.query(ml_trail_info_query)
ml_trail_df = query_job.to_dataframe()
ml_trail_df

## Batch Prediction

Now that we have the final model, we can use our data to predict the new values for LTV, which we are considering more comprehensive and future values. Do note that we have not broken our data into train and test and our just using the whole feature set as training data. That is not an optimized way to execute. So, we are sending the same features in the model for prediction. The LTV revenue will be considered as the future values for each user.

You can also carefully focus on users where the ltv_revenue is zero, but our models have given some positive values. This shows that model thinks that these users have some potential and can be used for more targeted marketing.

Some of the same users with zero base values are also given negative values, which can be considered zero - meaning the model also thinks they have no potential revenue in the future.

In [None]:
prediction_data_table_name = "model_prediction_ltv"
query = f"""
CREATE OR REPLACE TABLE
  {DATASET_NAME}.{prediction_data_table_name} AS
SELECT
  *
FROM
  ML.PREDICT(MODEL `{DATASET_NAME}.{model_name}`,
    (
    SELECT
      *
    FROM
      `{PROJECT_ID}.{DATASET_NAME}.{feature_table}`
    WHERE
      ltv_revenue IS NOT NULL
    ))
"""
# print(query)
query_job = client.query(query)

In [None]:
query = f"""
SELECT
  *
FROM
  {PROJECT_ID}.{DATASET_NAME}.{prediction_data_table_name}
LIMIT 1000
"""
# print(query)
query_job = client.query(query)
predict_data = query_job.to_dataframe()
predict_data[predict_data["predicted_ltv_revenue"] >= 0.0].head()

In [None]:
predict_data[predict_data["predicted_ltv_revenue"] >= 0.0][
    ["user_pseudo_id", "predicted_ltv_revenue", "ltv_revenue"]
]

## Prediction Analysis

Since we have used a global explanation in our model, we can see the weights of features. They are already ranked based on their importance. As you can see:  new_or_returning_visitor, has_purchased, and count_add_to_cart_item are the top 3 most important feature that helps the model predict LTV. This is also true from a real-world perspective.

In [None]:
query = f"""
#standardSQL
SELECT
  *
FROM
  ML.GLOBAL_EXPLAIN(MODEL `{DATASET_NAME}.{model_name}`)
"""

query_job = client.query(query)
features_weight = query_job.to_dataframe()
features_weight

At last, we can take this further and ask the model to give local explanations. It will provide row-level or user-level descriptions of essential features.

In [None]:
query = f"""
SELECT
  *
FROM
  ML.EXPLAIN_PREDICT(MODEL `{DATASET_NAME}.{model_name}`,
    (
    SELECT
      *
    FROM
      `{PROJECT_ID}.{DATASET_NAME}.{feature_table}`
    WHERE
      ltv_revenue IS NOT NULL
      ),
    STRUCT(3 as top_k_features))
"""

query_job = client.query(query)
explain_prediction_df = query_job.to_dataframe()
# explain_prediction_df.head()
columns_to_view = [
    "user_pseudo_id",
    "predicted_ltv_revenue",
    "ltv_revenue",
    "top_feature_attributions",
    "baseline_prediction_value",
    "approximation_error",
]
explain_prediction_df[explain_prediction_df["predicted_ltv_revenue"] > 10.0][
    columns_to_view
].head()

We can take the example of one user - '88839956.1260646312'
You can see that the default LTV revenue was zero, and the model predicted 12.79.

We can see why the user has some potential LTV since the user is a new_or_returning_visitor and the count_days of views are higher. Both the columns have boosted the user LTV and our model thinks that the user holds an excellent 12$ potential.

In [None]:
explain_prediction_df[explain_prediction_df["user_pseudo_id"] == "88839956.1260646312"][
    [
        "user_pseudo_id",
        "predicted_ltv_revenue",
        "ltv_revenue",
        "top_feature_attributions",
        "baseline_prediction_value",
        "approximation_error",
    ]
].iloc[0]

In [None]:
explain_prediction_df[explain_prediction_df["user_pseudo_id"] == "88839956.1260646312"][
    [
        "user_pseudo_id",
        "predicted_ltv_revenue",
        "ltv_revenue",
        "top_feature_attributions",
        "baseline_prediction_value",
        "approximation_error",
    ]
].iloc[0]["top_feature_attributions"]

## Next Steps

Once you have the model ready, there can be multiple things that can be achieved through LTV


1. Periodic LTV Monitoring:

Monitoring your performance is a straightforward use of Future LTV that may be used. It is advisable to run the LTV calculation at least once every month and compare your performance to determine the efficacy of your marketing initiatives, in terms of increasing the lifetime value of your clients.

2. Develop your marketing approach

It is possible to study the impact that a number of different variables have on the average lifespan of a customer by using a number of different "if-then" scenarios. You are able to determine which factors may be altered in order to bring about a different response from the target audience. For instance, you may discover that lowering the price of your items not only encourages people to purchase from you more frequently but also results in a lower turnover rate among those customers. You can improve the worth of your consumers during their lifespan by lowering the prices of certain of your items, which you can do now that you have this additional knowledge.

3. Determine which of your marketing avenues bring in the most revenue.

You are able to examine LTV on a channel-by-channel, campaign-by-campaign, source-by-source, and medium-by-medium basis by using the reports that Google Analytics provides. LTV is able to inform you whether or not you are spending too little or too much money on each of these channels. The following is an excellent illustration of this: If you determine that the LTV of email is x dollar, but the LTV of a Facebook channel is x+100 dollar, then you should spend twice as much time on Facebook as you do on email since you know the Facebook channel is more valuable.

4. Establish a customer-loyalty programme.

You are able to divide your consumers into groups according to the amount of money they are expected to spend over the course of their lifetime, and then tailor the way you communicate with each of those groups. Keeping this in mind, you may devise a customer loyalty programme that tailors its communications and offers of enticement to the specific needs of various subsets of your clientele, while also continuing to track and evaluate the program's performance.

## Clean Up

In [None]:
# # Are you sure you want to do this? This is to delete all models
# models = client.list_models(dataset_id) # Make an API request.
# for model in models:
#     full_model_id = f"{model.dataset_id}.{model.model_id}"
#     client.delete_model(full_model_id)  # Make an API request.
#     print(f"Deleted: {full_model_id}")
# # Are you sure you want to do this? This is to delete all tables and views
# tables = client.list_tables(dataset_id)  # Make an API request.
# for table in tables:
#     full_table_id = f"{table.dataset_id}.{table.table_id}"
#     client.delete_table(full_table_id)  # Make an API request.
#     print(f"Deleted: {full_table_id}")