-
-
Notifications
You must be signed in to change notification settings - Fork 0
Google Analytics V4 Reporting Api
The Google Analytics Reporting API v4 provides programmatic methods to access report data in Google Analytics (Universal Analytics properties only). With the Google Analytics Reporting API.
More details can be found via Googles documentation Here.
Param | Description | Example |
---|---|---|
view_id | The property Id found in the GA Dashboard for a given property | 000000000 |
start_date | Date phrase or date string for the start date of the query | 2022-01-01, 'today', 'yesterday', '3_days_ago', or 5_months_ago', '1_year_ago' |
end_date | Date phrase or date string for the end date of the query | 2022-01-01, 'today', 'yesterday', '3_days_ago', or 5_months_ago', '1_year_ago' |
metrics | List of metrics that will be batch queried from the v4 Api | [ga:xxx] |
dimensions | List of dimensions that will be batch queried from the v4 Api | [ga:xxx] |
filters_expression | String expression that is a common GA filter clause | ga:browser=~^Firefox |
sampling_level | The desired report sample size. If the the samplingLevel field is unspecified the DEFAULT sampling level is used | |
include_empty_rows | If set to false, the response does not include rows if all the retrieved metrics are equal to zero. The default is false which will exclude these rows | |
hide_totals | If set to true, hides the total of all metrics for all the matching rows, for every date range. The default false and will return the totals | |
hide_value_ranges | If set to true, hides the minimum and maximum across all matching rows. The default is false and the value ranges are returned | |
pivots | The pivot definitions. Requests can have a maximum of 2 pivots | |
segments | Segment the data returned for the request. A segment definition helps look at a subset of the segment request | |
metric_filter_clauses | Filter clauses for given metrics | [{"filters":[{"metricName":"ga:adCost","operator":"GREATER_THAN","comparisonValue":"0"}]}] |
dimension_filter_clauses | Filter clauses for given dimensions | [{"filters":[{"dimensionName":"ga:adwordsCampaignID","operator":"EXACT","expressions":["8675309"]}]}] |
order_bys | Order given data | [{"fieldName":"ga:users","sortOrder":"DESCENDING"},{"fieldName":"ga:source"}] |
credentials | Expects a .p12 file generated in the Google Cloud Console | |
service_account_email | The service account email associated to the service account owning the .p12 file and access to the GA Api | |
use_resource_quotas | The amount of resource quota remaining for the property. |
Values for the sampling level.
Enums | |
---|---|
SAMPLING_UNSPECIFIED |
If the samplingLevel field is unspecified the DEFAULT sampling level is used. |
DEFAULT |
Returns response with a sample size that balances speed and accuracy. |
SMALL |
It returns a fast response with a smaller sampling size. |
LARGE |
Returns a more accurate response using a large sampling size. But this may result in response being slower. |
How the filters are logically combined.
Enums | |
---|---|
OPERATOR_UNSPECIFIED |
Unspecified operator. It is treated as an OR . |
OR |
The logical OR operator. |
AND |
The logical AND operator. |
Fields | ||
---|---|---|
dimensionName |
string The dimension to filter on. A DimensionFilter must contain a dimension. |
|
not |
boolean Logical NOT operator. If this boolean is set to true, then the matching dimension values will be excluded in the report. The default is false. |
|
operator |
enum(`[Operator](https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/reports/batchGet#Operator)`) How to match the dimension to the expression. The default is REGEXP. |
|
expressions[] |
string Strings or regular expression to match against. Only the first value of the list is used for comparison unless the operator is IN_LIST . If IN_LIST operator, then the entire list is used to filter the dimensions as explained in the description of the IN_LIST operator. |
|
caseSensitive |
boolean Should the match be case sensitive? Default is false. |
Dimension filter specifies the filtering options on a dimension. |
Different match types supported.
Enums | |
---|---|
OPERATOR_UNSPECIFIED |
If the match type is unspecified, it is treated as a REGEXP . |
REGEXP |
The match expression is treated as a regular expression. All match types are not treated as regular expressions. |
BEGINS_WITH |
Matches the value which begin with the match expression provided. |
ENDS_WITH |
Matches the values which end with the match expression provided. |
PARTIAL |
Substring match. |
EXACT |
The value should match the match expression entirely. |
NUMERIC_EQUAL |
Integer comparison filters. case sensitivity is ignored for these and the expression is assumed to be a string representing an integer. Failure conditions: * If expression is not a valid int64, the client should expect an error. * Input dimensions that are not valid int64 values will never match the filter. |
NUMERIC_GREATER_THAN |
Checks if the dimension is numerically greater than the match expression. Read the description for NUMERIC_EQUALS for restrictions. |
NUMERIC_LESS_THAN |
Checks if the dimension is numerically less than the match expression. Read the description for NUMERIC_EQUALS for restrictions. |
IN_LIST |
This option is used to specify a dimension filter whose expression can take any value from a selected list of values. This helps avoiding evaluating multiple exact match dimension filters which are OR'ed for every single response row. For example: expressions: ["A", "B", "C"] Any response row whose dimension has it is value as A, B or C, matches this DimensionFilter. |
Different comparison type options.
Enums | |
---|---|
OPERATOR_UNSPECIFIED |
If the operator is not specified, it is treated as EQUAL . |
EQUAL |
Should the value of the metric be exactly equal to the comparison value. |
LESS_THAN |
Should the value of the metric be less than to the comparison value. |
GREATER_THAN |
Should the value of the metric be greater than to the comparison value. |
IS_MISSING |
Validates if the metric is missing. Doesn't take comparisonValue into account. |
The types of metrics.
Enum | |
---|---|
METRIC_TYPE_UNSPECIFIED |
Metric type is unspecified. |
INTEGER |
Integer metric. |
FLOAT |
Float metric. |
CURRENCY |
Currency metric. |
PERCENT |
Percentage metric. |
TIME |
Time metric in HH:MM:SS format. |
An example of the configuration object could look like:
{
"start_date": "yesterday",
"end_date": "yesterday",
"view_ids": [
"xxxxxxxxx"
],
"metrics": [
"ga:totalEvents"
],
"dimensions": [
"ga:date",
"ga:source"
],
"order_by": [
{
"fieldName": "ga:date",
"sortOrder": "DESCENDING"
}
]
}
Setting up a python pipeline:
from turbo_stream.google_analyitcs.reader import GoogleAnalyticsV3Reader
def google_analytics_pipeline():
reader = GoogleAnalyticsV3Reader(
configuration={
"start_date": "yesterday",
"end_date": "yesterday",
"view_ids": [
"xxxxxxxxx"
],
"metrics": [
"ga:totalEvents"
],
"dimensions": [
"ga:date",
"ga:source"
],
"order_by": [
{
"fieldName": "ga:date",
"sortOrder": "DESCENDING"
}
]
},
credentials="creds.p12",
service_account_email="service@account.email"
)
data = reader.run_query() # start the above query
print(data) # option to return the response object as a flat json structure
# current option to write to AWS s3 exists, key file extension is supported
reader.write_data_to_s3(bucket="my-bucket", key="path/data.json")
reader.write_data_to_s3(bucket="my-bucket", key="path/data.csv")
reader.write_data_to_s3(bucket="my-bucket", key="path/data.parquet")
# anything else will be written as a blob with its given extension
# additional option to partition data before writing to s3
# this allows users to write file names into a bucket grouped by a given field
# commonly a date field, as this allows to write to s3 without creating duplication
reader.write_partition_data_to_s3(bucket="my-bucket", path="my/path", partition="ga:date", fmt="json")
reader.write_partition_data_to_s3(bucket="my-bucket", path="my/path", partition="ga:date", fmt="csv")
reader.write_partition_data_to_s3(bucket="my-bucket", path="my/path", partition="ga:date", fmt="parquet")
turbo-stream comes with detailed logging functionality so that all pipelines can be tracked via logs. The request object
has functionality to stagger requests to reduce rolling quota issues, as well as retry support for common GA timeout
and HttpError
errors, with the opportunity to attempt to re-run the query 5 times before failing. This is common
across all vendors.
It is important to note that GA is a multi-dimensional database with features such as sampling on aggregates based on the query structure, and the ability to drop rows where dimensions are null. This means that when using the Api, the user must take into account an nth-level of discrepancies when comparing the Api data to the data found directly in the GA Console.
For high accuracy, bug-data/raw-data downstreaming, it is suggested you ake use of the beta Data Api
coupled with an
advanced GA plan (such as 360) or make use of BigQuery to stream raw data directly from any Google api more accurately.
\ /\
) ( ') < meow...
( / )
\(__)|