-
-
Notifications
You must be signed in to change notification settings - Fork 0
Google Search Console Api
The Google Analytics Reporting API v4 provides programmatic methods to access report data in Google Analytics (Universal Analytics properties only). With the Google Analytics Reporting API.
More details can be found via Googles documentation Here.
Param | Description | Example |
---|---|---|
start_date | Date phrase or date string for the start date of the query | 2022-01-01, 'today', 'yesterday', '3_days_ago', or 5_months_ago', '1_year_ago' |
end_date | Date phrase or date string for the end date of the query | 2022-01-01, 'today', 'yesterday', '3_days_ago', or 5_months_ago', '1_year_ago' |
metrics | List of metrics that will be batch queried from the v4 Api | [ctr, clicks] |
dimensions | List of dimensions that will be batch queried from the v4 Api | [date] |
type | GSC Search Type | "Web", "Image", "News" etc... |
row_limit | Set paging limit per response row | 25000 |
aggregation_type | Aggregation functionality based on given metrics and dimensions | "auto", "byPage" or "byPath" |
credentials | Expects a .pickle that use can generate with the feature described below | |
dimensionFilterGroups | The value for the filter to match or exclude, depending on the operator. | [{"groupType":"string","filters":[{"dimension":"string","operator":"string","expression":"string"}]}] |
data_state | If "all" (case-insensitive), data will include fresh data. If "final" (case-insensitive) or if this parameter is omitted, the returned data will include only finalized data. |
Type | Description |
---|---|
"discover" | Discover results |
googleNews" | Results from news.google.com and the Google News app on Android and iOS. Doesn't include results from the "News" tab in Google Search. |
"news" | Search results from the "News" tab in Google Search. |
"image" | Search results from the "Image" tab in Google Search. |
"video" | Video search results. |
"web" | [Default] Filter results to the combined ("All") tab in Google Search. Does not include Discover or Google News results. |
Type | Description |
---|---|
"contains" | The row value must either contain or equal your expression (non-case-sensitive). |
"equals" | [Default] Your expression must exactly equal the row value (case-sensitive for page and query dimensions). |
"notContains" | The row value must not contain your expression either as a substring or a (non-case-sensitive) complete match. |
"notEquals" | Your expression must not exactly equal the row value (case-sensitive for page and query dimensions). |
"includingRegex" | An RE2 syntax regular expression that must be matched. |
"excludingRegex" | An RE2 syntax regular expression that must NOT be matched. |
Type | Description |
---|---|
"auto" | [Default] Let the service decide the appropriate aggregation type. |
"byPage" | Aggregate values by URI. |
"byProperty" | Aggregate values by property. Not supported for type=discover or type=googleNews |
An example of the configuration object could look like:
{
"start_date": "2022-02-01",
"end_date": "2022-02-05",
"dimensions": [
"date"
],
"metrics": [
"clicks",
"ctr"
],
"search_type": "",
"row_limit": 25000,
"site_url": "https://www.mywebsite.com/",
"aggregation_type": "auto"
}
Setting up a python pipeline:
from turbo_stream.google_search_console.reader import GoogleSearchConsoleReader
def google_analytics_pipeline():
reader = GoogleSearchConsoleReader(
configuration={
"start_date": "2022-02-01",
"end_date": "2022-02-05",
"dimensions": [
"date"
],
"metrics": [
"clicks",
"ctr"
],
"search_type": "",
"row_limit": 25000,
"site_url": "https://www.mywebsite.com/",
"aggregation_type": "auto"
},
credentials="creds.pickle",
)
data = reader.run_query() # start the above query
print(data) # option to return the response object as a flat json structure
# current option to write to AWS s3 exists, key file extension is supported
reader.write_data_to_s3(bucket="my-bucket", key="path/data.json")
reader.write_data_to_s3(bucket="my-bucket", key="path/data.csv")
reader.write_data_to_s3(bucket="my-bucket", key="path/data.parquet")
# anything else will be written as a blob with its given extension
# additional option to partition data before writing to s3
# this allows users to write file names into a bucket grouped by a given field
# commonly a date field, as this allows to write to s3 without creating duplication
reader.write_partition_data_to_s3(bucket="my-bucket", path="my/path", partition="ga:date", fmt="json")
reader.write_partition_data_to_s3(bucket="my-bucket", path="my/path", partition="ga:date", fmt="csv")
reader.write_partition_data_to_s3(bucket="my-bucket", path="my/path", partition="ga:date", fmt="parquet")
turbo-stream comes with detailed logging functionality so that all pipelines can be tracked via logs. The request object
has functionality to stagger requests to reduce rolling quota issues, as well as retry support for common GA timeout
and HttpError
errors, with the opportunity to attempt to re-run the query 5 times before failing. This is common
across all vendors.
A user-friendly method to generate a .pickle file for future authentication is available in the reader as
the generate_authentication()
method. For the first time, you would need to log in with your web browser based on this
web authentication flow. After that, it will save your credentials in a pickle file. Every subsequent time you run the
script, it will use the “pickled” credentials stored in credentials.pickle to build the connection to Search Console.
from turbo_stream.google_search_console.reader import GoogleSearchConsoleReader
def main():
reader = GoogleSearchConsoleReader(
configuration={},
credentials="google_search_console_creds.json" # to generate, make use of the secrets.json
)
reader.generate_authentication(auth_file_location="google_search_console_creds.pickle")
if __name__ == "__main__":
main()
\ /\
) ( ') < meow...
( / )
\(__)|