Skip to content

Google Analytics V4 Reporting Api

dirksteynberg edited this page Mar 16, 2022 · 3 revisions

Google Analytics Core v4 Reporting Api

Overview

The Google Analytics Reporting API v4 provides programmatic methods to access report data in Google Analytics (Universal Analytics properties only). With the Google Analytics Reporting API.

More details can be found via Googles documentation Here.

Configuration

Param Description Example
view_id The property Id found in the GA Dashboard for a given property 000000000
start_date Date phrase or date string for the start date of the query 2022-01-01, 'today', 'yesterday', '3_days_ago', or 5_months_ago', '1_year_ago'
end_date Date phrase or date string for the end date of the query 2022-01-01, 'today', 'yesterday', '3_days_ago', or 5_months_ago', '1_year_ago'
metrics List of metrics that will be batch queried from the v4 Api [ga:xxx]
dimensions List of dimensions that will be batch queried from the v4 Api [ga:xxx]
filters_expression String expression that is a common GA filter clause ga:browser=~^Firefox
sampling_level The desired report sample size. If the the samplingLevel field is unspecified the DEFAULT sampling level is used
include_empty_rows If set to false, the response does not include rows if all the retrieved metrics are equal to zero. The default is false which will exclude these rows
hide_totals If set to true, hides the total of all metrics for all the matching rows, for every date range. The default false and will return the totals
hide_value_ranges If set to true, hides the minimum and maximum across all matching rows. The default is false and the value ranges are returned
pivots The pivot definitions. Requests can have a maximum of 2 pivots
segments Segment the data returned for the request. A segment definition helps look at a subset of the segment request
metric_filter_clauses Filter clauses for given metrics [{"filters":[{"metricName":"ga:adCost","operator":"GREATER_THAN","comparisonValue":"0"}]}]
dimension_filter_clauses Filter clauses for given dimensions [{"filters":[{"dimensionName":"ga:adwordsCampaignID","operator":"EXACT","expressions":["8675309"]}]}]
order_bys Order given data [{"fieldName":"ga:users","sortOrder":"DESCENDING"},{"fieldName":"ga:source"}]
credentials Expects a .p12 file generated in the Google Cloud Console
service_account_email The service account email associated to the service account owning the .p12 file and access to the GA Api
use_resource_quotas The amount of resource quota remaining for the property.

Sampling

Values for the sampling level.

Enums
SAMPLING_UNSPECIFIED If the samplingLevel field is unspecified the DEFAULT sampling level is used.
DEFAULT Returns response with a sample size that balances speed and accuracy.
SMALL It returns a fast response with a smaller sampling size.
LARGE Returns a more accurate response using a large sampling size. But this may result in response being slower.

Filter Logical Operator

How the filters are logically combined.

Enums
OPERATOR_UNSPECIFIED Unspecified operator. It is treated as an OR.
OR The logical OR operator.
AND The logical AND operator.

Dimension Filter

Fields
dimensionName string The dimension to filter on. A DimensionFilter must contain a dimension.
not boolean Logical NOT operator. If this boolean is set to true, then the matching dimension values will be excluded in the report. The default is false.
operator enum(`[Operator](https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/reports/batchGet#Operator)`) How to match the dimension to the expression. The default is REGEXP.
expressions[] string Strings or regular expression to match against. Only the first value of the list is used for comparison unless the operator is IN_LIST. If IN_LIST operator, then the entire list is used to filter the dimensions as explained in the description of the IN_LIST operator.
caseSensitive boolean Should the match be case sensitive? Default is false. Dimension filter specifies the filtering options on a dimension.

Operator

Different match types supported.

Enums
OPERATOR_UNSPECIFIED If the match type is unspecified, it is treated as a REGEXP.
REGEXP The match expression is treated as a regular expression. All match types are not treated as regular expressions.
BEGINS_WITH Matches the value which begin with the match expression provided.
ENDS_WITH Matches the values which end with the match expression provided.
PARTIAL Substring match.
EXACT The value should match the match expression entirely.
NUMERIC_EQUAL Integer comparison filters. case sensitivity is ignored for these and the expression is assumed to be a string representing an integer. Failure conditions: * If expression is not a valid int64, the client should expect an error. * Input dimensions that are not valid int64 values will never match the filter.
NUMERIC_GREATER_THAN Checks if the dimension is numerically greater than the match expression. Read the description for NUMERIC_EQUALS for restrictions.
NUMERIC_LESS_THAN Checks if the dimension is numerically less than the match expression. Read the description for NUMERIC_EQUALS for restrictions.
IN_LIST This option is used to specify a dimension filter whose expression can take any value from a selected list of values. This helps avoiding evaluating multiple exact match dimension filters which are OR'ed for every single response row. For example: expressions: ["A", "B", "C"] Any response row whose dimension has it is value as A, B or C, matches this DimensionFilter.

Different comparison type options.

Enums
OPERATOR_UNSPECIFIED If the operator is not specified, it is treated as EQUAL.
EQUAL Should the value of the metric be exactly equal to the comparison value.
LESS_THAN Should the value of the metric be less than to the comparison value.
GREATER_THAN Should the value of the metric be greater than to the comparison value.
IS_MISSING Validates if the metric is missing. Doesn't take comparisonValue into account.

MetricType

The types of metrics.

Enum
METRIC_TYPE_UNSPECIFIED Metric type is unspecified.
INTEGER Integer metric.
FLOAT Float metric.
CURRENCY Currency metric.
PERCENT Percentage metric.
TIME Time metric in HH:MM:SS format.

Example

An example of the configuration object could look like:

{
  "start_date": "yesterday",
  "end_date": "yesterday",
  "view_ids": [
    "xxxxxxxxx"
  ],
  "metrics": [
    "ga:totalEvents"
  ],
  "dimensions": [
    "ga:date",
    "ga:source"
  ],
  "order_by": [
    {
      "fieldName": "ga:date",
      "sortOrder": "DESCENDING"
    }
  ]
}

Setting up a python pipeline:

from turbo_stream.google_analyitcs.reader import GoogleAnalyticsV3Reader


def google_analytics_pipeline():
    reader = GoogleAnalyticsV3Reader(
        configuration={
            "start_date": "yesterday",
            "end_date": "yesterday",
            "view_ids": [
                "xxxxxxxxx"
            ],
            "metrics": [
                "ga:totalEvents"
            ],
            "dimensions": [
                "ga:date",
                "ga:source"
            ],
            "order_by": [
                {
                    "fieldName": "ga:date",
                    "sortOrder": "DESCENDING"
                }
            ]
        },
        credentials="creds.p12",
        service_account_email="service@account.email"
    )

    data = reader.run_query()  # start the above query
    print(data)  # option to return the response object as a flat json structure

    # current option to write to AWS s3 exists, key file extension is supported
    reader.write_data_to_s3(bucket="my-bucket", key="path/data.json")
    reader.write_data_to_s3(bucket="my-bucket", key="path/data.csv")
    reader.write_data_to_s3(bucket="my-bucket", key="path/data.parquet")

    # anything else will be written as a blob with its given extension

    # additional option to partition data before writing to s3
    # this allows users to write file names into a bucket grouped by a given field
    # commonly a date field, as this allows to write to s3 without creating duplication
    reader.write_partition_data_to_s3(bucket="my-bucket", path="my/path", partition="ga:date", fmt="json")
    reader.write_partition_data_to_s3(bucket="my-bucket", path="my/path", partition="ga:date", fmt="csv")
    reader.write_partition_data_to_s3(bucket="my-bucket", path="my/path", partition="ga:date", fmt="parquet")

Features

turbo-stream comes with detailed logging functionality so that all pipelines can be tracked via logs. The request object has functionality to stagger requests to reduce rolling quota issues, as well as retry support for common GA timeout and HttpError errors, with the opportunity to attempt to re-run the query 5 times before failing. This is common across all vendors.

Comments

It is important to note that GA is a multi-dimensional database with features such as sampling on aggregates based on the query structure, and the ability to drop rows where dimensions are null. This means that when using the Api, the user must take into account an nth-level of discrepancies when comparing the Api data to the data found directly in the GA Console.

For high accuracy, bug-data/raw-data downstreaming, it is suggested you ake use of the beta Data Api coupled with an advanced GA plan (such as 360) or make use of BigQuery to stream raw data directly from any Google api more accurately.