Skip to content

Google Search Console Api

dirksteynberg edited this page Mar 16, 2022 · 3 revisions

Google Analytics Core v4 Reporting Api

Overview

The Google Analytics Reporting API v4 provides programmatic methods to access report data in Google Analytics (Universal Analytics properties only). With the Google Analytics Reporting API.

More details can be found via Googles documentation Here.

Configuration

Param Description Example
start_date Date phrase or date string for the start date of the query 2022-01-01, 'today', 'yesterday', '3_days_ago', or 5_months_ago', '1_year_ago'
end_date Date phrase or date string for the end date of the query 2022-01-01, 'today', 'yesterday', '3_days_ago', or 5_months_ago', '1_year_ago'
metrics List of metrics that will be batch queried from the v4 Api [ctr, clicks]
dimensions List of dimensions that will be batch queried from the v4 Api [date]
type GSC Search Type "Web", "Image", "News" etc...
row_limit Set paging limit per response row 25000
aggregation_type Aggregation functionality based on given metrics and dimensions "auto", "byPage" or "byPath"
credentials Expects a .pickle that use can generate with the feature described below
dimensionFilterGroups The value for the filter to match or exclude, depending on the operator. [{"groupType":"string","filters":[{"dimension":"string","operator":"string","expression":"string"}]}]
data_state If "all" (case-insensitive), data will include fresh data. If "final" (case-insensitive) or if this parameter is omitted, the returned data will include only finalized data.

Search Types

Type Description
"discover" Discover results
googleNews" Results from news.google.com and the Google News app on Android and iOS. Doesn't include results from the "News" tab in Google Search.
"news" Search results from the "News" tab in Google Search.
"image" Search results from the "Image" tab in Google Search.
"video" Video search results.
"web" [Default] Filter results to the combined ("All") tab in Google Search. Does not include Discover or Google News results.

Dimension Filter Groups

Type Description
"contains" The row value must either contain or equal your expression (non-case-sensitive).
"equals" [Default] Your expression must exactly equal the row value (case-sensitive for page and query dimensions).
"notContains" The row value must not contain your expression either as a substring or a (non-case-sensitive) complete match.
"notEquals" Your expression must not exactly equal the row value (case-sensitive for page and query dimensions).
"includingRegex" An RE2 syntax regular expression that must be matched.
"excludingRegex" An RE2 syntax regular expression that must NOT be matched.

Aggregation Type

Type Description
"auto" [Default] Let the service decide the appropriate aggregation type.
"byPage" Aggregate values by URI.
"byProperty" Aggregate values by property. Not supported for type=discover or type=googleNews

Example

An example of the configuration object could look like:

{
  "start_date": "2022-02-01",
  "end_date": "2022-02-05",
  "dimensions": [
    "date"
  ],
  "metrics": [
    "clicks",
    "ctr"
  ],
  "search_type": "",
  "row_limit": 25000,
  "site_url": "https://www.mywebsite.com/",
  "aggregation_type": "auto"
}

Setting up a python pipeline:

from turbo_stream.google_search_console.reader import GoogleSearchConsoleReader


def google_analytics_pipeline():
    reader = GoogleSearchConsoleReader(
        configuration={
            "start_date": "2022-02-01",
            "end_date": "2022-02-05",
            "dimensions": [
                "date"
            ],
            "metrics": [
                "clicks",
                "ctr"
            ],
            "search_type": "",
            "row_limit": 25000,
            "site_url": "https://www.mywebsite.com/",
            "aggregation_type": "auto"
        },
        credentials="creds.pickle",
    )

    data = reader.run_query()  # start the above query
    print(data)  # option to return the response object as a flat json structure

    # current option to write to AWS s3 exists, key file extension is supported
    reader.write_data_to_s3(bucket="my-bucket", key="path/data.json")
    reader.write_data_to_s3(bucket="my-bucket", key="path/data.csv")
    reader.write_data_to_s3(bucket="my-bucket", key="path/data.parquet")

    # anything else will be written as a blob with its given extension

    # additional option to partition data before writing to s3
    # this allows users to write file names into a bucket grouped by a given field
    # commonly a date field, as this allows to write to s3 without creating duplication
    reader.write_partition_data_to_s3(bucket="my-bucket", path="my/path", partition="ga:date", fmt="json")
    reader.write_partition_data_to_s3(bucket="my-bucket", path="my/path", partition="ga:date", fmt="csv")
    reader.write_partition_data_to_s3(bucket="my-bucket", path="my/path", partition="ga:date", fmt="parquet")

Features

turbo-stream comes with detailed logging functionality so that all pipelines can be tracked via logs. The request object has functionality to stagger requests to reduce rolling quota issues, as well as retry support for common GA timeout and HttpError errors, with the opportunity to attempt to re-run the query 5 times before failing. This is common across all vendors.

A user-friendly method to generate a .pickle file for future authentication is available in the reader as the generate_authentication() method. For the first time, you would need to log in with your web browser based on this web authentication flow. After that, it will save your credentials in a pickle file. Every subsequent time you run the script, it will use the “pickled” credentials stored in credentials.pickle to build the connection to Search Console.

from turbo_stream.google_search_console.reader import GoogleSearchConsoleReader


def main():
    reader = GoogleSearchConsoleReader(
        configuration={},
        credentials="google_search_console_creds.json"  # to generate, make use of the secrets.json
    )

    reader.generate_authentication(auth_file_location="google_search_console_creds.pickle")


if __name__ == "__main__":
    main()

Comments