# Google Search Console GSC API Examples


### Overview

If you are interested in your organic traffic then the Google Search Console is hands down 
one of the richest datasources.

Here are example scripts for connecting to this valuable datasource.



### About Me

My name is Alton Alexander. I am a Data Science consultant turned entreprenuer building SaaS tools for SEO.

Find more about my free scripts or ask me any question on twitter: @alton_lex

# GSC API Examples:

In [1]:
# load libraries
import requests
import json
from urllib.parse import urlparse

import httplib2
from apiclient import errors
from apiclient.discovery import build

import datetime

from google.oauth2 import service_account
import google.oauth2.credentials
import google.auth.transport.requests

!pip install pandas
import pandas as pd

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m22.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m


## 0) Setup service account

In Google Cloud Console you need to create the following resources:

1. A project

2. Enable Access to the GSC API

3. A service account (make note of the Email for the service account)

4. OPTIONAL Add rolls to the service account so that it can access additional resources (ie. BigQuery)

5. Save the Key as JSON for connecting from python


In Google Search Console:

1. add the service account email as a user to each of the properties you want it to access

## 1) Connect with Service Account JSON key

Every request to the GSC API must have user authentication.

We use the service account JSON Key we get credentials.


In [2]:
# local path to the service account key
# service account's email must be added to one or more properties in GSC

#SERVICE_ACCOUNT_FILE = "../gcp-keys/website-analytics-161019-16456165cddc.json"
SERVICE_ACCOUNT_FILE = "./service-account-key.json"

In [3]:
SCOPES = ['https://www.googleapis.com/auth/webmasters']
credentials = service_account.Credentials.from_service_account_file(
        SERVICE_ACCOUNT_FILE, scopes=SCOPES)

In [4]:
webmasters_service = build(
    'webmasters',
    'v3',
    credentials=credentials
)

In [5]:
# test by pulling a list of sites
# reminder to add the service account email to every GSC property as a user


response = webmasters_service.sites().list().execute()
response

{'siteEntry': [{'siteUrl': 'sc-domain:frontanalytics.com',
   'permissionLevel': 'siteFullUser'}]}

In [10]:
# Furthermore now we have credentials stored in credentials.token

# preview the token
credentials.token[0:30]+"............"

'ya29.c.b0AT7lpjCAmdthmSx_9bNWk............'

## 1) Pull a list of pages

This pulls a list of all the pages that are showing on Google Search in the last 30 days:

In [None]:
# define the domain that we are using

# hardcode the domain for the property
#site = "frontanalytics.com"

# or select the first property from the response above
site = response['siteEntry'][0]['siteUrl'].split(":")[1]
site

In [None]:
# set date for the last 30 days
today = datetime.datetime.today()
startDate = today - datetime.timedelta(days=32)
endDate = today - datetime.timedelta(days=2)

In [None]:
# by page

# recent
# Set the dates in this format "2022-12-01"
data = {
  "startDate": startDate.strftime("%Y-%m-%d"),
  "endDate": endDate.strftime("%Y-%m-%d"),
  "dimensions": "page"
}
res = requests.post("https://www.googleapis.com/webmasters/v3/sites/"+"sc-domain:"+site+"/searchAnalytics/query?access_token="+credentials.token, json=data)

In [None]:
# check for valid resonse
if str(res.status_code) != "200":
    print("error with the api response. verify credentials are active.")

j = json.loads(res.text)['rows']
for i in range(len(j)):
    j[i]['url'] = j[i]['keys'][0]

df_pages = pd.DataFrame(j)

df_pages = df_pages.sort_values('impressions',ascending=False)
df_pages

## 2) Get all KWs for each page

This pulls every query used for each page:

In [None]:
# set date for the previous 1 days (day before yesterday)
today = datetime.datetime.today()
startDate = today - datetime.timedelta(days=3)
endDate = today - datetime.timedelta(days=2)

df_all_queries = pd.DataFrame()

# Get all the queries
for index, row in df_pages.iterrows():
    
    # get the url of this page
    page_url = row['url']

    # recent
    data = {
      "startDate": startDate.strftime("%Y-%m-%d"),
      "endDate": endDate.strftime("%Y-%m-%d"),
      "dimensions": ["query","device","country"],
      "dimensionFilterGroups": [
        {
          "groupType": "and",
          "filters": [
            {
              "dimension": "page",
              "operator": "contains",
              "expression": page_url
            }
          ]
        }
      ]
    }
    res = requests.post("https://www.googleapis.com/webmasters/v3/sites/"+"sc-domain:"+site+"/searchAnalytics/query?access_token="+credentials.token, json=data)

    # convert the response to a data frame
    j = json.loads(res.text).get('rows',[])

    if(len(j)):
        df_queries = pd.DataFrame(j)
        df_queries['url'] = page_url
        df_queries['property'] = site
        df_queries['start_date'] = startDate
        df_queries['update_at'] = today

        # By default the keys/dimensions are in a single column, let's split them out into separate columns.
        new_cols = df_queries['keys'].astype(str).str.replace("[","").str.replace("]","")
        new_cols = new_cols.str.split(pat=',',expand=True,n=2)

        # Give the columsn sensible names
        new_cols.columns = ["query","device","country"]

        # Bring back a key from the intial dataframe so we can join
        new_cols['key'] = df_queries['keys']

        # Get rid of quotation marks
        new_cols['query'] = new_cols.apply(lambda row : row['query'].replace("'","").lower(), axis=1)
        
        if 'device' in new_cols:
            new_cols['device'] = new_cols.apply(lambda row : row['device'].replace("'","").lower(), axis=1)
        else:
            new_cols['device'] = None
        if 'country' in new_cols:
            new_cols['country'] = new_cols.apply(lambda row : row['country'].replace("'","").lower(), axis=1)
        else:
            new_cols['country'] = None

        # Join in the new clean columns to our intiial data
        df_queries = pd.concat([df_queries, new_cols], axis=1, join='inner')

        # Drop the key columns
        df_queries = df_queries.drop(["key","keys"],axis=1)

        # save all the queries for this page with all other pages
        df_all_queries = pd.concat([df_all_queries, df_queries])

In [21]:
# Now you can save df_all_queries for additional analysis
df_all_queries

Unnamed: 0,clicks,impressions,ctr,position,url,property,start_date,update_at,query,device,country
0,1,1,1,3,https://frontanalytics.com/,frontanalytics.com,2023-01-10 22:11:59.430622,2023-01-13 22:11:59.430622,front analytics,desktop,gbr
1,0,1,0,21,https://frontanalytics.com/,frontanalytics.com,2023-01-10 22:11:59.430622,2023-01-13 22:11:59.430622,advanced analytics generally refers to,desktop,chl
2,0,1,0,21,https://frontanalytics.com/,frontanalytics.com,2023-01-10 22:11:59.430622,2023-01-13 22:11:59.430622,advanced analytics generally refers to,desktop,col
3,0,1,0,21,https://frontanalytics.com/,frontanalytics.com,2023-01-10 22:11:59.430622,2023-01-13 22:11:59.430622,advanced analytics generally refers to,desktop,rus
4,0,2,0,23,https://frontanalytics.com/,frontanalytics.com,2023-01-10 22:11:59.430622,2023-01-13 22:11:59.430622,advanced analytics generally refers to,desktop,usa
5,0,1,0,41,https://frontanalytics.com/,frontanalytics.com,2023-01-10 22:11:59.430622,2023-01-13 22:11:59.430622,advanced analytics refers to,desktop,usa
6,0,1,0,95,https://frontanalytics.com/,frontanalytics.com,2023-01-10 22:11:59.430622,2023-01-13 22:11:59.430622,call center analytics case study,mobile,irl
7,0,1,0,29,https://frontanalytics.com/,frontanalytics.com,2023-01-10 22:11:59.430622,2023-01-13 22:11:59.430622,call center optimization case study,desktop,mys
8,0,1,0,29,https://frontanalytics.com/,frontanalytics.com,2023-01-10 22:11:59.430622,2023-01-13 22:11:59.430622,call center optimization case study,desktop,usa
9,0,1,0,88,https://frontanalytics.com/,frontanalytics.com,2023-01-10 22:11:59.430622,2023-01-13 22:11:59.430622,define system dynamics,desktop,jpn
