# Google Search Console GSC API Examples


### Overview

If you are interested in your organic traffic then the Google Search Console is hands down 
one of the richest datasources.

Here are example scripts for connecting to this valuable datasource.



### About Me

My name is Alton Alexander. I am a Data Science consultant turned entreprenuer building SaaS tools for SEO.

Find more about my free scripts or ask me any question on twitter: @alton_lex

# GSC API Examples:

In [1]:
# load libraries
import requests
import json
from urllib.parse import urlparse

import httplib2
from apiclient import errors
from apiclient.discovery import build

import datetime

import google.oauth2.credentials
import google.auth.transport.requests

!pip install pandas
import pandas as pd

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


## 0) Innitialize the Oauth Flow using the App Service Account

Every request must have user authentication.

Steps:

1. Using the service account credentials we initialize Oauth

2. Then the user authenticates with their google account associated with GSC

3. We save the user's credentials for future use

See the documentation to get the service account credentials first.


In [3]:
# get authorization
# https://developers.google.com/identity/protocols/oauth2
# https://github.com/googleapis/google-api-python-client/blob/main/docs/oauth.md

#get_ipython().system('pip uninstall google_auth_oauthlib -y')
#get_ipython().system('pip3 uninstall google_auth_oauthlib -y')
#get_ipython().system('pip3 install google-auth')
get_ipython().system('pip3 install google-auth-oauthlib')

from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build

flow = InstalledAppFlow.from_client_secrets_file(
    './client_secret_824986252225-squl6c7v1oshe5l3d8fjc9s2j9hv8sdm.apps.googleusercontent.com.json',
    scopes=[
        'openid', 
        'https://www.googleapis.com/auth/userinfo.email', 
        'https://www.googleapis.com/auth/userinfo.profile',
        'https://www.googleapis.com/auth/webmasters.readonly',
        'https://www.googleapis.com/auth/webmasters'
    ])

flow.run_local_server(port=8081)

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Please visit this URL to authorize this application: https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=824986252225-squl6c7v1oshe5l3d8fjc9s2j9hv8sdm.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A8081%2F&scope=openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.profile+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fwebmasters.readonly+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fwebmasters&state=YdIs5jyXMr741aD3kP2e93yZICytHh&access_type=offline


KeyboardInterrupt: ignored

Next steps:
    
- click the link above, login with the account that has the site in Google Search Console

- Return to this page and the credentials are stored in memmory

- We save these credentails for future use without needing to ask the user again

In [None]:
# persist the credentials
credentials = flow.credentials

#now we turn the passed in credentials obj into a dicts obj
#note the expiry formatting
temp = {
    'token': credentials.token,
    'refresh_token': credentials.refresh_token,
    'id_token':credentials.id_token,
    'token_uri': credentials.token_uri,
    'client_id': credentials.client_id,
    'client_secret': credentials.client_secret,
    'scopes': credentials.scopes,
    'expiry':datetime.datetime.strftime(credentials.expiry,'%Y-%m-%d %H:%M:%S')
}

# Serializing json
json_object = json.dumps(temp, indent=4)
 
# Writing to sample.json
with open("../gcp-keys/gsc-user-creds.json", "w") as outfile:
    outfile.write(json_object)

## 1) Load the User's creds 

The user's credentials are saved from step 0 above.

Now we just need to refresh the token if it has expired.

In [None]:
# load the credentials for this user
# Opening JSON file
with open('../gcp-keys/gsc-user-creds.json', 'r') as openfile:
 
    # Reading from json file
    temp = json.load(openfile)


credentials = google.oauth2.credentials.Credentials(
    temp['token'],
    refresh_token=temp['refresh_token'],
    id_token=temp['id_token'],
    token_uri=temp['token_uri'],
    client_id=temp['client_id'],
    client_secret=temp['client_secret'],
    scopes=temp['scopes'],
)
expiry = temp['expiry']
expiry_datetime = datetime.datetime.strptime(expiry,'%Y-%m-%d %H:%M:%S')
credentials.expiry = expiry_datetime
#and now we refresh the token   
#but not if we know that its not a valid token.

request = google.auth.transport.requests.Request()
if credentials.expired:
    credentials.refresh(request)

## 1) Pull a list of pages

This pulls a list of all the pages that are showing on Google Search in the last 30 days:

In [None]:
# define the domain that we are using

site = "frontanalytics.com"


In [None]:
# set date for the last 30 days
today = datetime.datetime.today()
startDate = today - datetime.timedelta(days=32)
endDate = today - datetime.timedelta(days=2)

In [None]:
# by page

# recent
# Set the dates in this format "2022-12-01"
data = {
  "startDate": startDate.strftime("%Y-%m-%d"),
  "endDate": endDate.strftime("%Y-%m-%d"),
  "dimensions": "page"
}
res = requests.post("https://www.googleapis.com/webmasters/v3/sites/"+"sc-domain:"+site+"/searchAnalytics/query?access_token="+credentials.token, json=data)

In [None]:
j = json.loads(res.text)['rows']
for i in range(len(j)):
    j[i]['url'] = j[i]['keys'][0]

df_pages = pd.DataFrame(j)

df_pages = df_pages.sort_values('impressions',ascending=False)
df_pages

Unnamed: 0,keys,clicks,impressions,ctr,position,url
0,[https://frontanalytics.com/],10,200,0.05,9.915,https://frontanalytics.com/
4,[https://frontanalytics.com/advanced-analytics/],0,101,0.0,33.574257,https://frontanalytics.com/advanced-analytics/
1,[https://frontanalytics.com/system-dynamics-mo...,7,77,0.090909,42.311688,https://frontanalytics.com/system-dynamics-mod...
8,[https://frontanalytics.com/inbound-call-cente...,0,65,0.0,54.169231,https://frontanalytics.com/inbound-call-center...
2,[https://frontanalytics.com/about/],0,47,0.0,3.085106,https://frontanalytics.com/about/
5,[https://frontanalytics.com/contact/],0,46,0.0,3.043478,https://frontanalytics.com/contact/
9,[https://frontanalytics.com/privacy/],0,46,0.0,2.978261,https://frontanalytics.com/privacy/
6,[https://frontanalytics.com/data-monetization/],0,4,0.0,29.0,https://frontanalytics.com/data-monetization/
7,[https://frontanalytics.com/digital-strategy/],0,3,0.0,36.333333,https://frontanalytics.com/digital-strategy/
3,[https://frontanalytics.com/about/alton-alexan...,0,1,0.0,1.0,https://frontanalytics.com/about/alton-alexander/


## 2) Get all KWs for each page

This pulls every query used for each page:

In [66]:
# set date for the previous 1 days (day before yesterday)
today = datetime.datetime.today()
startDate = today - datetime.timedelta(days=3)
endDate = today - datetime.timedelta(days=2)

df_all_queries = pd.DataFrame()

# Get all the queries
for index, row in df_pages.iterrows():
    
    # get the url of this page
    page_url = row['url']

    # recent
    data = {
      "startDate": startDate.strftime("%Y-%m-%d"),
      "endDate": endDate.strftime("%Y-%m-%d"),
      "dimensions": ["query","device","country"],
      "dimensionFilterGroups": [
        {
          "groupType": "and",
          "filters": [
            {
              "dimension": "page",
              "operator": "contains",
              "expression": page_url
            }
          ]
        }
      ]
    }
    res = requests.post("https://www.googleapis.com/webmasters/v3/sites/"+"sc-domain:"+site+"/searchAnalytics/query?access_token="+credentials.token, json=data)

    # convert the response to a data frame
    j = json.loads(res.text).get('rows',[])

    if(len(j)):
        df_queries = pd.DataFrame(j)
        df_queries['url'] = page_url
        df_queries['property'] = site
        df_queries['start_date'] = startDate
        df_queries['update_at'] = today

        # By default the keys/dimensions are in a single column, let's split them out into separate columns.
        new_cols = df_queries['keys'].astype(str).str.replace("[","").str.replace("]","")
        new_cols = new_cols.str.split(pat=',',expand=True,n=2)

        # Give the columsn sensible names
        new_cols.columns = ["query","device","country"]

        # Bring back a key from the intial dataframe so we can join
        new_cols['key'] = df_queries['keys']

        # Get rid of quotation marks
        new_cols['query'] = new_cols['query'].str.replace("'","").str.lower()
        new_cols['device'] = new_cols['device'].str.replace("'","").str.lower()
        new_cols['country'] = new_cols['country'].str.replace("'","").str.lower()

        # Join in the new clean columns to our intiial data
        df_queries = pd.concat([df_queries, new_cols], axis=1, join='inner')

        # Drop the key columns
        df_queries = df_queries.drop(["key","keys"],axis=1)

        # save all the queries for this page with all other pages
        df_all_queries = pd.concat([df_all_queries, df_queries])

  new_cols = df_queries['keys'].astype(str).str.replace("[","").str.replace("]","")


In [67]:
# Now you can save df_all_queries for additional analysis
df_all_queries

Unnamed: 0,clicks,impressions,ctr,position,url,property,start_date,update_at,query,device,country
0,1,1,1,3,https://frontanalytics.com/,frontanalytics.com,2023-01-10 03:26:31.432918,2023-01-13 03:26:31.432918,front analytics,desktop,gbr
1,0,1,0,24,https://frontanalytics.com/,frontanalytics.com,2023-01-10 03:26:31.432918,2023-01-13 03:26:31.432918,advanced analytics generally refers to,desktop,usa
2,0,1,0,41,https://frontanalytics.com/,frontanalytics.com,2023-01-10 03:26:31.432918,2023-01-13 03:26:31.432918,advanced analytics refers to,desktop,usa
3,0,1,0,3,https://frontanalytics.com/,frontanalytics.com,2023-01-10 03:26:31.432918,2023-01-13 03:26:31.432918,front analytics,desktop,deu
4,0,1,0,5,https://frontanalytics.com/,frontanalytics.com,2023-01-10 03:26:31.432918,2023-01-13 03:26:31.432918,in front analytics,desktop,fra
5,0,1,0,58,https://frontanalytics.com/,frontanalytics.com,2023-01-10 03:26:31.432918,2023-01-13 03:26:31.432918,what can system dynamics modeling be used for,desktop,aut
6,0,1,0,60,https://frontanalytics.com/,frontanalytics.com,2023-01-10 03:26:31.432918,2023-01-13 03:26:31.432918,what is system dynamics modeling,desktop,hkg
0,0,1,0,58,https://frontanalytics.com/system-dynamics-mod...,frontanalytics.com,2023-01-10 03:26:31.432918,2023-01-13 03:26:31.432918,what can system dynamics modeling be used for,desktop,aut
1,0,1,0,60,https://frontanalytics.com/system-dynamics-mod...,frontanalytics.com,2023-01-10 03:26:31.432918,2023-01-13 03:26:31.432918,what is system dynamics modeling,desktop,hkg
0,0,1,0,24,https://frontanalytics.com/advanced-analytics/,frontanalytics.com,2023-01-10 03:26:31.432918,2023-01-13 03:26:31.432918,advanced analytics generally refers to,desktop,usa


In [1]:
from google.oauth2 import service_account
from googleapiclient.discovery import build
import requests
import json
import pandas as pd
from google.cloud import bigquery

In [2]:
SERVICE_ACCOUNT_FILE = "./website-analytics-161019-16456165cddc.json"

In [3]:
SCOPES = ['https://www.googleapis.com/auth/webmasters']
credentials = service_account.Credentials.from_service_account_file(
        SERVICE_ACCOUNT_FILE, scopes=SCOPES)

In [4]:
service = build(
    'webmasters',
    'v3',
    credentials=credentials
)

In [5]:
credentials.token

In [14]:
site = "frontanalytics.com"

today = datetime.datetime.today()
startDate = today - datetime.timedelta(days=32)
endDate = today - datetime.timedelta(days=2)

# Set the dates in this format "2022-12-01"
data = {
  "startDate": startDate.strftime("%Y-%m-%d"),
  "endDate": endDate.strftime("%Y-%m-%d"),
  "dimensions": "page"
}
res = requests.post("https://www.googleapis.com/webmasters/v3/sites/"+"sc-domain:"+site+"/searchAnalytics/query?access_token="+credentials.token, json=data)

In [10]:
import datetime

site_url = "frontanalytics.com"
today = datetime.datetime.today()
startDate = today - datetime.timedelta(days=32)
endDate = today - datetime.timedelta(days=2)

In [11]:
request = {
      'startDate': startDate.strftime("%Y-%m-%d"),
      'endDate': endDate.strftime("%Y-%m-%d"),
      'dimensions': ["page","device","query"], # uneditable to enforce a nice clean dataframe at the end!
      'rowLimit': 25000,
      'startRow': 0
       }

In [12]:
# Make the request to grab the data
response = service.searchanalytics().query(siteUrl=site_url, body=request).execute()
# Go down one level in the response object
res = response['rows']
# Create a DataFrame of the results
df = pd.DataFrame.from_dict(res)



HttpError: ignored

In [13]:
credentials.token

'ya29.c.b0AT7lpjDUVwsNBtD8nO3FbOWPvBkTw4nDGo63MAwoLGWiM54mIoU-luDi7LHjD_R61HQcvjzkgAwyl2EP2u_u35oPK8nJl0QIe09yQpqU93qJnFNNO0mb2FeWuBR3ZwHWcbOXSTJZZJANuK3nUg3jd8pO33EeGUJry23tVZ7OBAN8O1aYU5yqwdMEJ5gLhhUmFZxKHaAg_w7Rs5KlclJe-GyTavpXi64...............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

In [15]:
j = json.loads(res.text)['rows']
for i in range(len(j)):
    j[i]['url'] = j[i]['keys'][0]

df_pages = pd.DataFrame(j)

df_pages = df_pages.sort_values('impressions',ascending=False)
df_pages

Unnamed: 0,keys,clicks,impressions,ctr,position,url
0,[https://frontanalytics.com/],9,210,0.042857,9.352381,https://frontanalytics.com/
1,[https://frontanalytics.com/system-dynamics-mo...,9,101,0.089109,48.990099,https://frontanalytics.com/system-dynamics-mod...
4,[https://frontanalytics.com/advanced-analytics/],0,100,0.0,34.16,https://frontanalytics.com/advanced-analytics/
8,[https://frontanalytics.com/inbound-call-cente...,0,71,0.0,53.802817,https://frontanalytics.com/inbound-call-center...
2,[https://frontanalytics.com/about/],0,50,0.0,4.34,https://frontanalytics.com/about/
5,[https://frontanalytics.com/contact/],0,50,0.0,3.1,https://frontanalytics.com/contact/
9,[https://frontanalytics.com/privacy/],0,50,0.0,3.08,https://frontanalytics.com/privacy/
6,[https://frontanalytics.com/data-monetization/],0,5,0.0,21.8,https://frontanalytics.com/data-monetization/
7,[https://frontanalytics.com/digital-strategy/],0,3,0.0,36.333333,https://frontanalytics.com/digital-strategy/
10,[https://frontanalytics.com/system-dynamics-fo...,0,2,0.0,67.0,https://frontanalytics.com/system-dynamics-for...


# upload to BQ


In [96]:

from google.cloud import bigquery

In [68]:
df_all_queries

Unnamed: 0,clicks,impressions,ctr,position,url,property,start_date,update_at,query,device,country
0,1,1,1,3,https://frontanalytics.com/,frontanalytics.com,2023-01-10 03:26:31.432918,2023-01-13 03:26:31.432918,front analytics,desktop,gbr
1,0,1,0,24,https://frontanalytics.com/,frontanalytics.com,2023-01-10 03:26:31.432918,2023-01-13 03:26:31.432918,advanced analytics generally refers to,desktop,usa
2,0,1,0,41,https://frontanalytics.com/,frontanalytics.com,2023-01-10 03:26:31.432918,2023-01-13 03:26:31.432918,advanced analytics refers to,desktop,usa
3,0,1,0,3,https://frontanalytics.com/,frontanalytics.com,2023-01-10 03:26:31.432918,2023-01-13 03:26:31.432918,front analytics,desktop,deu
4,0,1,0,5,https://frontanalytics.com/,frontanalytics.com,2023-01-10 03:26:31.432918,2023-01-13 03:26:31.432918,in front analytics,desktop,fra
5,0,1,0,58,https://frontanalytics.com/,frontanalytics.com,2023-01-10 03:26:31.432918,2023-01-13 03:26:31.432918,what can system dynamics modeling be used for,desktop,aut
6,0,1,0,60,https://frontanalytics.com/,frontanalytics.com,2023-01-10 03:26:31.432918,2023-01-13 03:26:31.432918,what is system dynamics modeling,desktop,hkg
0,0,1,0,58,https://frontanalytics.com/system-dynamics-mod...,frontanalytics.com,2023-01-10 03:26:31.432918,2023-01-13 03:26:31.432918,what can system dynamics modeling be used for,desktop,aut
1,0,1,0,60,https://frontanalytics.com/system-dynamics-mod...,frontanalytics.com,2023-01-10 03:26:31.432918,2023-01-13 03:26:31.432918,what is system dynamics modeling,desktop,hkg
0,0,1,0,24,https://frontanalytics.com/advanced-analytics/,frontanalytics.com,2023-01-10 03:26:31.432918,2023-01-13 03:26:31.432918,advanced analytics generally refers to,desktop,usa


In [111]:
BQ_PROJECT_NAME = 'website-analytics-161019'
BQ_DATASET_NAME = 'test'
BQ_TABLE_NAME = 'test_sc'


# establish a BigQuery client
client = bigquery.Client.from_service_account_json(SERVICE_ACCOUNT_FILE)
dataset_id = BQ_DATASET_NAME
table_name = BQ_TABLE_NAME

# create a job config
# Set the destination table

table_id = '{}.{}.{}'.format(BQ_PROJECT_NAME, BQ_DATASET_NAME, BQ_TABLE_NAME)



job_config = bigquery.LoadJobConfig(
    # Specify a (partial) schema. All columns are always written to the
    # table. The schema is used to assist in data type definitions.
    schema=[
        # Specify the type of columns whose type cannot be auto-detected. For
        # example the "title" column uses pandas dtype "object", so its
        # data type is ambiguous.
        #bigquery.SchemaField("title", bigquery.enums.SqlTypeNames.STRING),
        # Indexes are written if included in the schema by name.
        #bigquery.SchemaField("wikidata_id", bigquery.enums.SqlTypeNames.STRING),
    ],
    # Optionally, set the write disposition. BigQuery appends loaded rows
    # to an existing table by default, but with WRITE_TRUNCATE write
    # disposition it replaces the table with the loaded data.
    write_disposition="WRITE_APPEND",
)

job = client.load_table_from_dataframe(
    df_all_queries, table_id, job_config=job_config
)  # Make an API request.
job.result()  # Wait for the job to complete.



LoadJob<project=website-analytics-161019, location=us-central1, id=894f2726-2951-499a-886f-113073f638fb>

In [106]:

tables = client.list_tables(dataset_id)  # Make an API request.

print("Tables contained in '{}':".format(dataset_id))
for table in tables:
    print("{}.{}.{}".format(table.project, table.dataset_id, table.table_id))

Tables contained in 'test':
website-analytics-161019.test.test_sc


In [102]:
datasets = client.list_datasets()

for dataset in datasets:
  print("dataset", dataset.project)

dataset website-analytics-161019
dataset website-analytics-161019


In [103]:
projects = client.list_projects()

for dataset in projects:
  print("dataset", dataset)

dataset <google.cloud.bigquery.client.Project object at 0x7fab4d0c48e0>


In [104]:
datasets = list(client.list_datasets())  # Make an API request.
project = client.project
datasets

[<google.cloud.bigquery.dataset.DatasetListItem at 0x7fab4cee9400>,
 <google.cloud.bigquery.dataset.DatasetListItem at 0x7fab4e3c36a0>]

In [105]:
datasets

[<google.cloud.bigquery.dataset.DatasetListItem at 0x7fab4cee9400>,
 <google.cloud.bigquery.dataset.DatasetListItem at 0x7fab4e3c36a0>]