Skip to content

Utility to facilitate fetching data from Google Search Console API

License

Notifications You must be signed in to change notification settings

bearloga/gsc-utils

Repository files navigation

gsc-utils

Utilities for accessing and downloading the statistics on a site's presence in Google's search results via Search Console API.

Setup

pip3 install -U git+https://github.com/bearloga/gsc-utils.git

Credential authorization

Create a OAuth 2.0 Client ID on the Credentials page of the API console. Then download the secrets JSON (which should look similar to this example). You will use this to create and save a set of authorized credentials. When run, the code will ask you to navigate to a specific URL to authorize with your Google account and prompt you for a verification code which you will be given after approving the authorization request.

from gsc_utils import utils

creds = utils.authorize('path/to/secrets.json')
utils.save_credentials(creds, 'path/to/credentials.json')

The created credentials.json should look similar to this example. You can re-use it in future sessions without having to re-authorize:

from gsc_utils import utils

creds = utils.load_credentials('path/to/credentials.json')

Note: Some of the code has been adapted from Quickstart: Run a Search Console App in Python and OAuth 2.0 Storage.

Usage

  • gsc_utils.sites module has functions
    • list() for obtaining a list of sites registered in GSC
    • add() and remove() for registering and unregistering sites in GSC, respectively
  • gsc_utils.performance module has function stats() for impressions, clicks, ctr (clickthrough rate), and position for one or more sites registered in GSC

Refer to example notebook

Rich Card Results

Some results appear as rich cards in Google's search results, and the way the statistics are calculated is different. Specifically, there are two aggregation types: by site and by page. When rich_results=True, the statistics returned will be aggregated by page. Otherwise all results will be considered and the aggregation will be by site.

Refer to Aggregating data by site vs by page for details.

Example usage in R

# install.packages(c("purrr", "urltools", "readr", "reticulate"))

library(reticulate)

gsc_utils <- import("gsc_utils.utils")
sites <- import("gsc_utils.sites")
performance <- import("gsc_utils.performance")

creds <- gsc_utils$load_credentials('path/to/credentials.json')

site_list <- sites$list(creds)

results <- purrr::map_dfr(
  site_list$siteUrl,
  function(site_url, start, end) {
    website <- urltools::domain(site_url)
    use_https <- urltools::scheme(site_url) == "https"
    result <- performance$stats(creds, website, start, end, use_https = use_https)
    return(result)
  },
  start = "2020-01-01", end = "2020-01-31"
)

readr::write_csv(results, "stats_2020-01.csv")

Since gsc_utils.performance.stats() can operate on a vector of websites, this is the alternative usage if all of the sites use the same protocol (all HTTPS):

results <- performance$stats(creds, urltools::domain(site_list$siteUrl), start = "2020-01-01", end = "2020-01-31")

Note: If gsc-utils is installed in a different virtual environment than the default one, include the following in .Rprofile in working directory:

Sys.setenv(RETICULATE_PYTHON = "path/to/python")

Refer to Python Version Configuration for more instructions and details.

Information

Maintainer: Mikhail Popov (mpopov at wikimedia dot org)

About

Utility to facilitate fetching data from Google Search Console API

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages