Google Analytics V4 Reporting Api

Google Analytics Core v4 Reporting Api

Overview

The Google Analytics Reporting API v4 provides programmatic methods to access report data in Google Analytics (Universal Analytics properties only). With the Google Analytics Reporting API.

More details can be found via Googles documentation Here.

Configuration

Param	Description	Example
view_id	The property Id found in the GA Dashboard for a given property	000000000
start_date	Date phrase or date string for the start date of the query	2022-01-01, 'today', 'yesterday', '3_days_ago', or 5_months_ago', '1_year_ago'
end_date	Date phrase or date string for the end date of the query	2022-01-01, 'today', 'yesterday', '3_days_ago', or 5_months_ago', '1_year_ago'
metrics	List of metrics that will be batch queried from the v4 Api	[ga:xxx]
dimensions	List of dimensions that will be batch queried from the v4 Api	[ga:xxx]
filters_expression	String expression that is a common GA filter clause	ga:browser=~^Firefox
sampling_level	The desired report sample size. If the the samplingLevel field is unspecified the DEFAULT sampling level is used
include_empty_rows	If set to false, the response does not include rows if all the retrieved metrics are equal to zero. The default is false which will exclude these rows
hide_totals	If set to true, hides the total of all metrics for all the matching rows, for every date range. The default false and will return the totals
hide_value_ranges	If set to true, hides the minimum and maximum across all matching rows. The default is false and the value ranges are returned
pivots	The pivot definitions. Requests can have a maximum of 2 pivots
segments	Segment the data returned for the request. A segment definition helps look at a subset of the segment request
metric_filter_clauses	Filter clauses for given metrics	[{"filters":[{"metricName":"ga:adCost","operator":"GREATER_THAN","comparisonValue":"0"}]}]
dimension_filter_clauses	Filter clauses for given dimensions	[{"filters":[{"dimensionName":"ga:adwordsCampaignID","operator":"EXACT","expressions":["8675309"]}]}]
order_bys	Order given data	[{"fieldName":"ga:users","sortOrder":"DESCENDING"},{"fieldName":"ga:source"}]
credentials	Expects a .p12 file generated in the Google Cloud Console
service_account_email	The service account email associated to the service account owning the .p12 file and access to the GA Api
use_resource_quotas	The amount of resource quota remaining for the property.

Sampling

Values for the sampling level.

Enums
`SAMPLING_UNSPECIFIED`	If the `samplingLevel` field is unspecified the `DEFAULT` sampling level is used.
`DEFAULT`	Returns response with a sample size that balances speed and accuracy.
`SMALL`	It returns a fast response with a smaller sampling size.
`LARGE`	Returns a more accurate response using a large sampling size. But this may result in response being slower.

Filter Logical Operator

How the filters are logically combined.

Enums
`OPERATOR_UNSPECIFIED`	Unspecified operator. It is treated as an `OR`.
`OR`	The logical `OR` operator.
`AND`	The logical `AND` operator.

Dimension Filter

Fields
`dimensionName`	`string` The dimension to filter on. A DimensionFilter must contain a dimension.
`not`	`boolean` Logical `NOT` operator. If this boolean is set to true, then the matching dimension values will be excluded in the report. The default is false.
`operator`	enum(`[Operator](https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/reports/batchGet#Operator)`) How to match the dimension to the expression. The default is REGEXP.
`expressions[]`	`string` Strings or regular expression to match against. Only the first value of the list is used for comparison unless the operator is `IN_LIST`. If `IN_LIST` operator, then the entire list is used to filter the dimensions as explained in the description of the `IN_LIST` operator.
`caseSensitive`	`boolean` Should the match be case sensitive? Default is false.	Dimension filter specifies the filtering options on a dimension.

Operator

Different match types supported.

Enums
`OPERATOR_UNSPECIFIED`	If the match type is unspecified, it is treated as a `REGEXP`.
`REGEXP`	The match expression is treated as a regular expression. All match types are not treated as regular expressions.
`BEGINS_WITH`	Matches the value which begin with the match expression provided.
`ENDS_WITH`	Matches the values which end with the match expression provided.
`PARTIAL`	Substring match.
`EXACT`	The value should match the match expression entirely.
`NUMERIC_EQUAL`	Integer comparison filters. case sensitivity is ignored for these and the expression is assumed to be a string representing an integer. Failure conditions: * If expression is not a valid int64, the client should expect an error. * Input dimensions that are not valid int64 values will never match the filter.
`NUMERIC_GREATER_THAN`	Checks if the dimension is numerically greater than the match expression. Read the description for `NUMERIC_EQUALS` for restrictions.
`NUMERIC_LESS_THAN`	Checks if the dimension is numerically less than the match expression. Read the description for `NUMERIC_EQUALS` for restrictions.
`IN_LIST`	This option is used to specify a dimension filter whose expression can take any value from a selected list of values. This helps avoiding evaluating multiple exact match dimension filters which are OR'ed for every single response row. For example: expressions: ["A", "B", "C"] Any response row whose dimension has it is value as A, B or C, matches this DimensionFilter.

Different comparison type options.

Enums
`OPERATOR_UNSPECIFIED`	If the operator is not specified, it is treated as `EQUAL`.
`EQUAL`	Should the value of the metric be exactly equal to the comparison value.
`LESS_THAN`	Should the value of the metric be less than to the comparison value.
`GREATER_THAN`	Should the value of the metric be greater than to the comparison value.
`IS_MISSING`	Validates if the metric is missing. Doesn't take comparisonValue into account.

MetricType

The types of metrics.

Enum
`METRIC_TYPE_UNSPECIFIED`	Metric type is unspecified.
`INTEGER`	Integer metric.
`FLOAT`	Float metric.
`CURRENCY`	Currency metric.
`PERCENT`	Percentage metric.
`TIME`	Time metric in `HH:MM:SS` format.

Example

An example of the configuration object could look like:

{
  "start_date": "yesterday",
  "end_date": "yesterday",
  "view_ids": [
    "xxxxxxxxx"
  ],
  "metrics": [
    "ga:totalEvents"
  ],
  "dimensions": [
    "ga:date",
    "ga:source"
  ],
  "order_by": [
    {
      "fieldName": "ga:date",
      "sortOrder": "DESCENDING"
    }
  ]
}

Setting up a python pipeline:

from turbo_stream.google_analyitcs.reader import GoogleAnalyticsV3Reader


def google_analytics_pipeline():
    reader = GoogleAnalyticsV3Reader(
        configuration={
            "start_date": "yesterday",
            "end_date": "yesterday",
            "view_ids": [
                "xxxxxxxxx"
            ],
            "metrics": [
                "ga:totalEvents"
            ],
            "dimensions": [
                "ga:date",
                "ga:source"
            ],
            "order_by": [
                {
                    "fieldName": "ga:date",
                    "sortOrder": "DESCENDING"
                }
            ]
        },
        credentials="creds.p12",
        service_account_email="service@account.email"
    )

    data = reader.run_query()  # start the above query
    print(data)  # option to return the response object as a flat json structure

    # current option to write to AWS s3 exists, key file extension is supported
    reader.write_data_to_s3(bucket="my-bucket", key="path/data.json")
    reader.write_data_to_s3(bucket="my-bucket", key="path/data.csv")
    reader.write_data_to_s3(bucket="my-bucket", key="path/data.parquet")

    # anything else will be written as a blob with its given extension

    # additional option to partition data before writing to s3
    # this allows users to write file names into a bucket grouped by a given field
    # commonly a date field, as this allows to write to s3 without creating duplication
    reader.write_partition_data_to_s3(bucket="my-bucket", path="my/path", partition="ga:date", fmt="json")
    reader.write_partition_data_to_s3(bucket="my-bucket", path="my/path", partition="ga:date", fmt="csv")
    reader.write_partition_data_to_s3(bucket="my-bucket", path="my/path", partition="ga:date", fmt="parquet")

Features

turbo-stream comes with detailed logging functionality so that all pipelines can be tracked via logs. The request object has functionality to stagger requests to reduce rolling quota issues, as well as retry support for common GA timeout and HttpError errors, with the opportunity to attempt to re-run the query 5 times before failing. This is common across all vendors.

Comments

It is important to note that GA is a multi-dimensional database with features such as sampling on aggregates based on the query structure, and the ability to drop rows where dimensions are null. This means that when using the Api, the user must take into account an nth-level of discrepancies when comparing the Api data to the data found directly in the GA Console.

For high accuracy, bug-data/raw-data downstreaming, it is suggested you ake use of the beta Data Api coupled with an advanced GA plan (such as 360) or make use of BigQuery to stream raw data directly from any Google api more accurately.

                     \    /\                                    
                      )  ( ')    < meow...                      
                     (  /  )                                    
                      \(__)|

Provide feedback

Saved searches

Use saved searches to filter your results more quickly