<a href="https://colab.research.google.com/github/custerc/miscellaneous-marketing-code/blob/main/GA4_Landing_Page_Comparison_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# What is this?

Google Analytics will happily allow you compare the performance of pages from one time period to another (e.g. this month vs. last month). But it won't allow you to sort by the percent or raw change, so it's not easy to see changes that matter.

This script take a data export from GA4's Reports > Engagement > Landing page report and allows you to sort the data by percent and raw change for several different metrics.

## How do I use this?

Please refer to the following how-to video: https://www.youtube.com/watch?v=AcSrTlsMC7Y

### Step 1: Export your data from GA

It must be a comparison report of the Reports > Engagement > Landing page report, and the file must be named `data-export.csv`. Upload it to the notebook by clicking the folder icon on the left and then dragging and dropping the file into the "Files" tab.

### Step 2: Change your session floor in the code block below, if desired

The minimum number of sessions required for data to be returned. Ideal number will vary based on your site's traffic and the lengths of time you're comparing.

Once you've set the number you would like, press the play button in the top left corner of the block (mouseover and you'll see it).

In [8]:
# Make your changes here.
session_floor = 100

### Step 3. Press the play button in the  block below to run the code and display the results.

Once the resulting table is displayed, look at the top right corner of the block with the table for a button you can click to turn it into a sortable table.

In [9]:
# Please make a copy and work there if you want to change any code in this block.
# If you're just running the script, all you need to do is click the play button on the left.

import pandas as pd
import csv

#create empty lists for data from the current and previous periods in the export
current_list = []
previous_list = []

#open the csv file
with open (str('data-export.csv'), 'r') as file:
  data_reader=csv.reader(file)
  datalist = list(data_reader)

# Flag to determine whether to add rows to 'current' or 'previous'
add_to_current = True

for row in datalist[9:]:
# handle the empty rows and second round of dates
  if not row:
    continue
  if 'Start date' in row[0]:
    continue
  if 'End date' in row[0]:
    continue

  # keep add_to_current true until you hit # All Users
  if row[0] == '# All Users':
    add_to_current = False
    continue  # Skip the row containing '# All Users'

  # appending to the appropriate lists
  if add_to_current:
    current_list.append(row)
  else:
    previous_list.append(row)

# create two dataframes with the two lists
df_current = pd.DataFrame(current_list,columns=['Landing page', 'Sessions', 'Users', 'New users', 'Average engagement time per session', 'Conversions', 'Total revenue'])
df_previous = pd.DataFrame(previous_list,columns=['Landing page', 'Sessions_prev', 'Users_prev', 'New users_prev', 'Average engagement time per session_prev', 'Conversions_prev', 'Total revenue_prev'])

# merge the dataframes and clean the data
joined = pd.merge(df_current, df_previous, on='Landing page', how='outer')
joined.replace(',','', regex=True, inplace=True)
joined.replace('%','', regex=True, inplace=True)
joined = joined.drop(index=0)

# convert to numeric datatypes
joined[['Sessions', 'Users', 'New users', 'Average engagement time per session', 'Conversions', 'Total revenue', 'Sessions_prev', 'Users_prev', 'New users_prev', 'Average engagement time per session_prev', 'Conversions_prev', 'Total revenue_prev']] = joined[['Sessions', 'Users', 'New users', 'Average engagement time per session', 'Conversions', 'Total revenue', 'Sessions_prev', 'Users_prev', 'New users_prev', 'Average engagement time per session_prev', 'Conversions_prev', 'Total revenue_prev']].apply(pd.to_numeric)

# create new columns tracking the percentage and raw change in sessions in the joined df
joined['SessionsDelta_perc'] = (joined['Sessions'] - joined['Sessions_prev']) / joined['Sessions_prev'] * 100
joined['SessionsDelta_raw'] = (joined['Sessions'] - joined['Sessions_prev'])
joined['EngagementDelta_perc'] = (joined['Average engagement time per session'] - joined['Average engagement time per session_prev']) / joined['Average engagement time per session_prev'] * 100
joined['EngagementDelta_raw'] = (joined['Average engagement time per session'] - joined['Average engagement time per session_prev'])
joined['ConversionsDelta_perc'] = (joined['Conversions'] - joined['Conversions_prev']) / joined['Conversions_prev'] * 100
joined['ConversionsDelta_raw'] = (joined['Conversions'] - joined['Conversions_prev'])
joined = joined.round({'SessionsDelta_perc': 1})
joined = joined.round({'EngagementDelta_perc': 1})
joined = joined.round({'ConversionsDelta_perc': 1})
joined = joined.round({'ConversionsDelta_raw': 1})
joined = joined.round({'EngagementDelta_raw': 1})
joined = joined.round({'Average engagement time per session': 1})
joined = joined.round({'Average engagement time per session_prev': 1})

# default column sorting & output
sortbycol = 'SessionsDelta_perc'
output = joined.sort_values(by=[sortbycol], ascending=True)
# rows_to_output = 1000 # if you're reading code on github, change this value to 1000 or higher


# setting session count floor
output = output.drop(output[output.Sessions_prev < session_floor].index)
output = output.drop(output[output.Sessions < session_floor].index)

#define and generate output
output = output[['Landing page', 'SessionsDelta_perc', 'SessionsDelta_raw', 'Sessions', 'Sessions_prev', 'EngagementDelta_perc', 'EngagementDelta_raw', 'ConversionsDelta_perc', 'ConversionsDelta_raw', 'Average engagement time per session', 'Average engagement time per session_prev']]
# output.head(rows_to_output)
output

Unnamed: 0,Landing page,SessionsDelta_perc,SessionsDelta_raw,Sessions,Sessions_prev,EngagementDelta_perc,EngagementDelta_raw,ConversionsDelta_perc,ConversionsDelta_raw,Average engagement time per session,Average engagement time per session_prev
18,/blog/anonymous-article-18,-6.3,-52,771,823,-4.7,-2.4,-100.0,-1,48.5,50.9
17,/blog/anonymous-article-17,-6.3,-55,818,873,-4.9,-2.8,-33.3,-1,54.1,56.9
10,/blog/anonymous-article-10,-6.1,-98,1521,1619,-2.6,-1.4,-7.7,-1,50.6,51.9
16,/blog/anonymous-article-16,-6.1,-55,843,898,-2.6,-1.9,-14.3,-1,70.4,72.3
15,/blog/anonymous-article-15,-6.1,-55,846,901,-5.8,-2.4,-20.0,-1,38.5,40.9
13,/blog/anonymous-article-13,-6.1,-71,1088,1159,-4.3,-0.6,-7.7,-1,12.7,13.2
12,/blog/anonymous-article-12,-6.1,-71,1094,1165,-2.1,-1.3,-20.0,-1,61.4,62.7
19,/blog/anonymous-article-19,-6.1,-48,737,785,-1.7,-1.3,-7.7,-1,72.8,74.1
14,/blog/anonymous-article-14,-6.0,-69,1075,1144,-3.0,-1.7,-7.7,-4,55.3,57.0
11,/blog/anonymous-article-11,-6.0,-80,1258,1338,-3.0,-1.5,-7.7,-1,49.1,50.6
