gsc-utils

Utilities for accessing and downloading the statistics on a site's presence in Google's search results via Search Console API.

Setup

pip3 install -U git+https://github.com/bearloga/gsc-utils.git

Credential authorization

Create a OAuth 2.0 Client ID on the Credentials page of the API console. Then download the secrets JSON (which should look similar to this example). You will use this to create and save a set of authorized credentials. When run, the code will ask you to navigate to a specific URL to authorize with your Google account and prompt you for a verification code which you will be given after approving the authorization request.

from gsc_utils import utils

creds = utils.authorize('path/to/secrets.json')
utils.save_credentials(creds, 'path/to/credentials.json')

The created credentials.json should look similar to this example. You can re-use it in future sessions without having to re-authorize:

from gsc_utils import utils

creds = utils.load_credentials('path/to/credentials.json')

Note: Some of the code has been adapted from Quickstart: Run a Search Console App in Python and OAuth 2.0 Storage.

Usage

gsc_utils.sites module has functions
- list() for obtaining a list of sites registered in GSC
- add() and remove() for registering and unregistering sites in GSC, respectively
gsc_utils.performance module has function stats() for impressions, clicks, ctr (clickthrough rate), and position for one or more sites registered in GSC

Refer to example notebook

Rich Card Results

Some results appear as rich cards in Google's search results, and the way the statistics are calculated is different. Specifically, there are two aggregation types: by site and by page. When rich_results=True, the statistics returned will be aggregated by page. Otherwise all results will be considered and the aggregation will be by site.

Refer to Aggregating data by site vs by page for details.

Example usage in R

# install.packages(c("purrr", "urltools", "readr", "reticulate"))

library(reticulate)

gsc_utils <- import("gsc_utils.utils")
sites <- import("gsc_utils.sites")
performance <- import("gsc_utils.performance")

creds <- gsc_utils$load_credentials('path/to/credentials.json')

site_list <- sites$list(creds)

results <- purrr::map_dfr(
  site_list$siteUrl,
  function(site_url, start, end) {
    website <- urltools::domain(site_url)
    use_https <- urltools::scheme(site_url) == "https"
    result <- performance$stats(creds, website, start, end, use_https = use_https)
    return(result)
  },
  start = "2020-01-01", end = "2020-01-31"
)

readr::write_csv(results, "stats_2020-01.csv")

Since gsc_utils.performance.stats() can operate on a vector of websites, this is the alternative usage if all of the sites use the same protocol (all HTTPS):

results <- performance$stats(creds, urltools::domain(site_list$siteUrl), start = "2020-01-01", end = "2020-01-31")

Note: If gsc-utils is installed in a different virtual environment than the default one, include the following in .Rprofile in working directory:

Sys.setenv(RETICULATE_PYTHON = "path/to/python")

Refer to Python Version Configuration for more instructions and details.

Information

Maintainer: Mikhail Popov (mpopov at wikimedia dot org)

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
docs		docs
gsc_utils		gsc_utils
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
sample-credentials.json		sample-credentials.json
sample-secrets.json		sample-secrets.json
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gsc-utils

Setup

Credential authorization

Usage

Rich Card Results

Example usage in R

Information

About

Releases

Packages

Languages

License

bearloga/gsc-utils

Folders and files

Latest commit

History

Repository files navigation

gsc-utils

Setup

Credential authorization

Usage

Rich Card Results

Example usage in R

Information

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages