Observations:
- There are many steps that are likely irrelevant: scroll, click, user_engagement, etc.
- Major steps for the funnel analysis are: page_view, add_to_cart, begin_checkout and purchase
- Device cateogry, device operating system, device brand and operating system will be useful for segmentation
- Columns also exist for country and language, both good for segmentation
- ecommerce seems to be completely blank
- Items has sparse information, needs further exploration

In [1]:
from google.cloud import bigquery
import pandas as pd

#### Setting up the connection

In [2]:
client = bigquery.Client(project='product-analytics-portfolio')

#### Exploratory Queries

In [None]:
# Looking at a user journey for a single user
example_user_journey_sql = """
SELECT user_pseudo_id,
       event_timestamp,
       event_name
FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210131`
WHERE user_pseudo_id = "1026454.4271112504"
ORDER BY event_timestamp
"""

example_user_journey_query = client.query(example_user_journey_sql).to_dataframe()

example_user_journey_query

Observations:
- The first three events (page_view, session_start and first_visit) all have the same timestamp
    - This should be tested with some exploratory queries to know for sure
- Same finding for page_view and view_promotion
- This user did not proceed to adding items to cart or checking out

In [None]:
# Looking at the unique event names
event_names_sql = """
SELECT DISTINCT event_name
FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210131`
"""

event_names_query = client.query(event_names_sql).to_dataframe()
event_names_query

In [None]:
# Looking at the unqiue device categories
device_sql = """
SELECT DISTINCT device.category,
       device.operating_system,
       device.mobile_brand_name,
       device.web_info.browser
FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210131` 
ORDER BY device.category
LIMIT 1000
"""

device_query = client.query(device_sql).to_dataframe()
device_query

#### Looking at the number of days of data

In [3]:
table_sql = """
SELECT table_name 
FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.INFORMATION_SCHEMA.TABLES`
ORDER BY table_name
"""
tables = client.query(table_sql).to_dataframe()



In [22]:
tables['dates'] = pd.to_datetime(tables['table_name'].apply(lambda x: x.replace("events_", "")))

((tables.dates.max() - tables.dates.min()).days + 1) == tables.shape[0]

True

#### Looking to see if users appear across days

In [27]:
cross_day_sql = """SELECT count(user_pseudo_id) as total_users
FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210131`
WHERE user_pseudo_id IN (SELECT DISTINCT user_pseudo_id
                         FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210130`)"""
cross_day_query = client.query(cross_day_sql).to_dataframe()



In [28]:
print(cross_day_query)

   total_users
0         1394
