# Example usage
## Overview
Here we will illustrate the use of the dscigametrics package to compute and analyze four key metrics within a specified time frame:
- **Ratio of new to returning visitors**: measures the ratio of new users to returning users of each campaign in certain period.
- **Conversion rate**: measures the percentage of users who complete a specific desired action of each campaign in certain period.
- **Total transaction revenue**: measures total transaction revenue of each campaign in certain period.
- **Average transaction revenue**: measures average transaction revenue of each campaign in certain period.  

We'll guide you through obtaining summary statistics to grasp the market campaign overall performance, visualizing daily trends for deeper temporal insights, and identifying the most and least effective marketing campaigns based on these metrics. Whether you're tracking daily performance fluctuations or assessing the impact of marketing strategies, dscigametrics provides the tools you need to make data-driven decisions.

In [1]:
from dscigametrics.compute_metrics import compute_metrics
from dscigametrics.stat_summary import stat_summary
from dscigametrics.daily_plot import daily_plot
from dscigametrics.find_campaigns import find_campaigns
import dscigametrics
import pandas as pd

In [2]:
print(dscigametrics.__version__)

0.1.0


## Load Test Data 

In [3]:
data_path = '../tests/ga_metrics_test_data.csv'
data = pd.read_csv(data_path)

## `compute_metrics` - Calculate four key metrics from GA data

Google analytics provides a lot of information, so much that it can be dizzying looking at the raw spreadsheet. Fortunately we can start to assess the performance of a marketing campaign using just five columns, shown below.

In [5]:
data[['date',
'trafficSource.adwordsClickInfo.campaignId',
'totals.newVisits',
'totals.transactions',
'totals.transactionRevenue'
]].head()

Unnamed: 0,date,trafficSource.adwordsClickInfo.campaignId,totals.newVisits,totals.transactions,totals.transactionRevenue
0,20220801,219011657,1.0,,
1,20220802,219011657,,,
2,20220803,219011657,1.0,,
3,20220804,219011657,1.0,,
4,20220805,219011657,1.0,,


Each row represents an event where someone clicked on an add. The first column is the date of that event, then we have the unique ID of the campaign that the add belonged to, then a column which records whether the visit was from someone new (1.0 means new), then whether a purchase was made (1.0 indicates a purchase), and the last column records the revenue made on that purchase. 

The function `compute_metrics` uses the first two columns to filter based on your specifications: you are to provide the ID of the campaign that you're interested in, and specify the date range that you want to see data from.

`compute_metrics` then uses the last three columns to compute four key metrics:

**Conversion rate:** this is the percentage of events where a purchase was made. Note that this is a percentage of the total number of clicks rather than individuals: the same person could click on an add 100 times but only make a purchase following the last one; the conversion rate in this case would be 1%. This tells you how successful the add is in actually selling products.

**New to return rate:** this is the percentage of clicks that are made by someone who hasn't clicked on the add before. In the last example, this woud also be 1% because only one of the 200 clicks was from a new person. This gives a sense of both how many individuals are seeing the add and whether they're clicking on the add multiple times.

**Total transaction revenue:** this is the total amount of money made from all purchases made after clicking on an add. 

**Average transaction revenue:** this is the total amount of money made but divided over the total number of times someone purchased something after clicking on an add. These last two metrics give you a sense of whether revenue is coming from a few small purchases or a few large ones.

Let's try it out! We'll collect data for the add campaign with id 123851219, starting on the 1st of August 2022, up to *and including* the 31st of August, 2022. Notice below that these dates are encoded as integers in a YYYYMMDD format. 

In [6]:
campaign_id = 123851219
start_date = 20220801
end_date = 20220831

metric_dict = compute_metrics(data, campaign_id, start_date, end_date)

conversion rate: 0.116 
new to return rate: 0.88 
total transaction revenue: $14548.0 
average transaction revenue: $501.6551724137931


It's as easy as that! Take note of a few things here: the function prints out a few lines, but there's no need to copy this down somewhere. If you assign the output of the function to an object like we've done here  with `metric_dict`, the results will be stored there as a dictionary. You can access individual metrics from it like this:

In [8]:
conversion_rate = metric_dict['conversion rate']

print(conversion_rate)

0.116


As a final note, beware that the function does not round its outputs at all. This is to make sure you are in control of precision in case this matters for subsequent calculations, but it also means you'll probably want to round off the results before presenting them elsewhere.

In [9]:
metric_dict['average transaction revenue'].round(2)

501.66

## `stat_summary` - Return Statistic Summary of Specified Campaign
#### Function Description: 
The function `stat_summary` will return the statistic summary of specified campaign that user assign by the campaign ID and the dates.
#### Steps:
- Input a specific campaign ID and the dates to the function
- The function automatically calculate mean, median and standard deviation from the data points, which are the values of the 4 metrics grouped by date.
- The output will be a pandas Dataframe with index are the mean, median and standard deviation, and the columns are the 4 metrics.
#### Notes:
- The 4 metrics are identical as above.

In [6]:
# Assign the campaign ID and dates.
campaign_id = 219011657  # campaign ID's data type should be int
start_date = 20220810  # the start date of the specified campaign, the data type should be int
end_date = 20220811  # the end date of the specified campaign, the data type should be int

In [7]:
# Call the function with the info above as the input arguments.
summary = stat_summary(data, campaign_id, start_date, end_date)
summary

Unnamed: 0,return_rate,conversion_rate,ttl_revenue,avg_revenue
Mean,0.85,0.05,389.5,38.95
Median,0.85,0.05,389.5,38.95
Standard Deviation,0.05,0.05,389.5,38.95


## `daily_plot()` - Visualises performance changes of campaign based on four metrics.
#### Function Description:
The `daily_plot` function is designed to plot a time series chart to show campaign performance over a specified period using data from Google Analytics. By defining the time period and campaign ID, users can view the performance of the campaign during a period from four perspectives. The `daily_plot` function is built upon [Altair](https://altair-viz.github.io/index.html).

#### Steps:  
- Define the analysis period (in original Google Analytics raw data format),  
- Prepare the campaign ID to analyze (in int format).  
- Call function to show the time series plot of each of four metrics.

In [4]:
campaign_id = 219011657  # Data type of campaign ID should be int
start_date = 20220801  # Start date of the specified campaign, the data type should be int
end_date = 20220821  # End date of the specified campaign, the data type should be int

After defining all the mandatory variables, user can call the function with default values of optional variables, `width=600` and `height=1000` that specified the size of chart. User can also specify values of these two optional variables. Please note the chart show below containing four charts so `width` and `height` specified in the function are the size of grand chart, thus please make sure the values specified in the function are suitable to accomodate four charts.

In [8]:
plot = daily_plot(data, campaign_id, start_date, end_date, width=300, height=800)
plot

The chart above shows the performance over time by displaying four metrics in four chart separately. The x-axes of all charts are date, which is fixed aggregation level, and the y-axes are four metrics. If there's no transaction in the day, then the value of Conversion rate, Total transaction rate and Average Transaction Revenue will be zero.

##  `find_campaigns` - Analyze Marketing Campaign Performance
#### Function Description:
The `find_campaigns` function is part of a toolkit designed to analyze marketing campaign performance over a specified period using data from Google Analytics. By inputting a dataframe containing campaign information, alongside the desired date range and metrics, users can swiftly identify the most and least effective campaigns. It will return a dictionary for further use

#### Steps:
- Define the analysis period (in int or Timestamp data format)
- Prepare the list of campaign IDs to analyze (in int format)
- Decide the metric to evaluate campaign performance (in str format)
- Call function to find the best and worst campaigns based on the conversion rate

In [8]:

# Define the analysis period
start_date = 20220801 
end_date = 20220825   

campaign_ids = [219011657, 140569061, 215934049, 123851219]

# Metric to evaluate campaign performance
metric = 'conversion_rate'

# Find the best and worst campaigns based on the conversion rate
output_dict = find_campaigns(
    data=data,
    start_date=start_date,
    end_date=end_date,
    campaign_ids=campaign_ids,
    metric=metric
)



In [9]:
print(output_dict)

{'best_campaign': {'id': 123851219, 'value': 0.116}, 'worst_campaign': {'id': 219011657, 'value': 0.056}}


In [10]:

print(f"Best Campaign: ID {output_dict['best_campaign']['id']} with a {metric} of {output_dict['best_campaign']['value']}")
print(f"Worst Campaign: ID {output_dict['worst_campaign']['id']} with a {metric} of {output_dict['worst_campaign']['value']}")


Best Campaign: ID 123851219 with a conversion_rate of 0.116
Worst Campaign: ID 219011657 with a conversion_rate of 0.056


This is the end of this tutorial where you have seen application of four functions of `dscigametrics` package. If you want to build another function using Google Analytics data or suggest change to current functions, please take a look at the [contributor guide](contributing.md)