# Google Search Console GSC API Examples


### Overview

If you are interested in your organic traffic then the Google Search Console is hands down 
one of the richest datasources.

Here are example scripts for connecting to this valuable datasource.


> SEE PART 1 for how to connect and download GSC data



### About Me

My name is Alton Alexander. I am a Data Science consultant turned entreprenuer building SaaS tools for SEO.

Find more about my free scripts or ask me any question on twitter: @alton_lex

# GSC API Examples:

In [80]:
# load libraries
import requests
import json
from urllib.parse import urlparse

import httplib2
from apiclient import errors
from apiclient.discovery import build

import datetime

from google.oauth2 import service_account
import google.oauth2.credentials
import google.auth.transport.requests

!pip install pandas
import pandas as pd

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m22.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m


## 0) Setup service account

In Google Cloud Console you need to create the following resources:

1. A project

2. Enable Access to the GSC API

3. A service account (make note of the Email for the service account)

4. OPTIONAL Add rolls to the service account so that it can access additional resources (ie. BigQuery)

5. Save the Key as JSON for connecting from python


In Google Search Console:

1. add the service account email as a user to each of the properties you want it to access

## 1) Connect with Service Account JSON key

Every request to the GSC API must have user authentication.

We use the service account JSON Key we get credentials.


In [83]:
# local path to the service account key
# service account's email must be added to one or more properties in GSC

SERVICE_ACCOUNT_FILE = "../gcp-keys/website-analytics-161019-16456165cddc.json"
#SERVICE_ACCOUNT_FILE = "./service-account-key.json"

In [84]:
SCOPES = ['https://www.googleapis.com/auth/webmasters']
credentials = service_account.Credentials.from_service_account_file(
        SERVICE_ACCOUNT_FILE, scopes=SCOPES)

In [85]:
service = build(
    'webmasters',
    'v3',
    credentials=credentials
)

In [86]:
# test by pulling a list of sites
# reminder to add the service account email to every GSC property as a user


response = service.sites().list().execute()
response

{'siteEntry': [{'siteUrl': 'sc-domain:frontanalytics.com',
   'permissionLevel': 'siteOwner'}]}

In [87]:
# Furthermore now we have credentials stored in credentials.token
credentials

<google.oauth2.service_account.Credentials at 0x7f4c1aa67550>

## 1) Pull a list of pages

This pulls a list of all the pages that are showing on Google Search in the last 30 days:

In [88]:
# define the domain that we are using

site = "frontanalytics.com"


In [89]:
# set date for the last 30 days
today = datetime.datetime.today()
startDate = today - datetime.timedelta(days=32)
endDate = today - datetime.timedelta(days=2)

In [90]:
# by page

# recent
# Set the dates in this format "2022-12-01"
data = {
  "startDate": startDate.strftime("%Y-%m-%d"),
  "endDate": endDate.strftime("%Y-%m-%d"),
  "dimensions": "page"
}
res = requests.post("https://www.googleapis.com/webmasters/v3/sites/"+"sc-domain:"+site+"/searchAnalytics/query?access_token="+credentials.token, json=data)

In [91]:
j = json.loads(res.text)['rows']
for i in range(len(j)):
    j[i]['url'] = j[i]['keys'][0]

df_pages = pd.DataFrame(j)

df_pages = df_pages.sort_values('impressions',ascending=False)
df_pages

Unnamed: 0,keys,clicks,impressions,ctr,position,url
0,[https://frontanalytics.com/],9,220,0.040909,9.15,https://frontanalytics.com/
4,[https://frontanalytics.com/advanced-analytics/],0,107,0.0,33.607477,https://frontanalytics.com/advanced-analytics/
1,[https://frontanalytics.com/system-dynamics-mo...,9,104,0.086538,49.701923,https://frontanalytics.com/system-dynamics-mod...
8,[https://frontanalytics.com/inbound-call-cente...,0,74,0.0,53.689189,https://frontanalytics.com/inbound-call-center...
2,[https://frontanalytics.com/about/],0,53,0.0,4.679245,https://frontanalytics.com/about/
5,[https://frontanalytics.com/contact/],0,52,0.0,3.096154,https://frontanalytics.com/contact/
9,[https://frontanalytics.com/privacy/],0,52,0.0,3.076923,https://frontanalytics.com/privacy/
6,[https://frontanalytics.com/data-monetization/],0,5,0.0,21.8,https://frontanalytics.com/data-monetization/
7,[https://frontanalytics.com/digital-strategy/],0,3,0.0,36.333333,https://frontanalytics.com/digital-strategy/
10,[https://frontanalytics.com/system-dynamics-fo...,0,2,0.0,67.0,https://frontanalytics.com/system-dynamics-for...


## 2) Get all KWs for each page

This pulls every query used for each page:

In [92]:
# set date for the previous 1 days (day before yesterday)
today = datetime.datetime.today()
startDate = today - datetime.timedelta(days=3)
endDate = today - datetime.timedelta(days=2)

df_all_queries = pd.DataFrame()

# Get all the queries
for index, row in df_pages.iterrows():
    
    # get the url of this page
    page_url = row['url']

    # recent
    data = {
      "startDate": startDate.strftime("%Y-%m-%d"),
      "endDate": endDate.strftime("%Y-%m-%d"),
      "dimensions": ["query","device","country"],
      "dimensionFilterGroups": [
        {
          "groupType": "and",
          "filters": [
            {
              "dimension": "page",
              "operator": "contains",
              "expression": page_url
            }
          ]
        }
      ]
    }
    res = requests.post("https://www.googleapis.com/webmasters/v3/sites/"+"sc-domain:"+site+"/searchAnalytics/query?access_token="+credentials.token, json=data)

    # convert the response to a data frame
    j = json.loads(res.text).get('rows',[])

    if(len(j)):
        df_queries = pd.DataFrame(j)
        df_queries['url'] = page_url
        df_queries['property'] = site
        df_queries['start_date'] = startDate
        df_queries['update_at'] = today

        # By default the keys/dimensions are in a single column, let's split them out into separate columns.
        new_cols = df_queries['keys'].astype(str).str.replace("[","").str.replace("]","")
        new_cols = new_cols.str.split(pat=',',expand=True,n=2)

        # Give the columsn sensible names
        new_cols.columns = ["query","device","country"]

        # Bring back a key from the intial dataframe so we can join
        new_cols['key'] = df_queries['keys']

        # Get rid of quotation marks
        new_cols['query'] = new_cols['query'].str.replace("'","").str.lower()
        new_cols['device'] = new_cols['device'].str.replace("'","").str.lower()
        new_cols['country'] = new_cols['country'].str.replace("'","").str.lower()

        # Join in the new clean columns to our intiial data
        df_queries = pd.concat([df_queries, new_cols], axis=1, join='inner')

        # Drop the key columns
        df_queries = df_queries.drop(["key","keys"],axis=1)

        # save all the queries for this page with all other pages
        df_all_queries = pd.concat([df_all_queries, df_queries])

  new_cols = df_queries['keys'].astype(str).str.replace("[","").str.replace("]","")
  new_cols = df_queries['keys'].astype(str).str.replace("[","").str.replace("]","")
  new_cols = df_queries['keys'].astype(str).str.replace("[","").str.replace("]","")
  new_cols = df_queries['keys'].astype(str).str.replace("[","").str.replace("]","")
  new_cols = df_queries['keys'].astype(str).str.replace("[","").str.replace("]","")
  new_cols = df_queries['keys'].astype(str).str.replace("[","").str.replace("]","")
  new_cols = df_queries['keys'].astype(str).str.replace("[","").str.replace("]","")


In [93]:
# Now you can save df_all_queries for additional analysis
df_all_queries

Unnamed: 0,clicks,impressions,ctr,position,url,property,start_date,update_at,query,device,country
0,1,1,1,3,https://frontanalytics.com/,frontanalytics.com,2023-01-10 13:21:13.576277,2023-01-13 13:21:13.576277,front analytics,desktop,gbr
1,0,1,0,21,https://frontanalytics.com/,frontanalytics.com,2023-01-10 13:21:13.576277,2023-01-13 13:21:13.576277,advanced analytics generally refers to,desktop,chl
2,0,1,0,21,https://frontanalytics.com/,frontanalytics.com,2023-01-10 13:21:13.576277,2023-01-13 13:21:13.576277,advanced analytics generally refers to,desktop,col
3,0,1,0,21,https://frontanalytics.com/,frontanalytics.com,2023-01-10 13:21:13.576277,2023-01-13 13:21:13.576277,advanced analytics generally refers to,desktop,rus
4,0,2,0,23,https://frontanalytics.com/,frontanalytics.com,2023-01-10 13:21:13.576277,2023-01-13 13:21:13.576277,advanced analytics generally refers to,desktop,usa
5,0,1,0,41,https://frontanalytics.com/,frontanalytics.com,2023-01-10 13:21:13.576277,2023-01-13 13:21:13.576277,advanced analytics refers to,desktop,usa
6,0,1,0,95,https://frontanalytics.com/,frontanalytics.com,2023-01-10 13:21:13.576277,2023-01-13 13:21:13.576277,call center analytics case study,mobile,irl
7,0,1,0,29,https://frontanalytics.com/,frontanalytics.com,2023-01-10 13:21:13.576277,2023-01-13 13:21:13.576277,call center optimization case study,desktop,mys
8,0,1,0,29,https://frontanalytics.com/,frontanalytics.com,2023-01-10 13:21:13.576277,2023-01-13 13:21:13.576277,call center optimization case study,desktop,usa
9,0,1,0,88,https://frontanalytics.com/,frontanalytics.com,2023-01-10 13:21:13.576277,2023-01-13 13:21:13.576277,define system dynamics,desktop,jpn


# Uploading to BQ

Make sure your service account has a role that permits access to BigQuery

In [95]:
!pip install --upgrade google-cloud
!pip install --upgrade google-cloud-bigquery
!pip install --upgrade google-cloud-storage
from google.cloud import bigquery

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m22.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m
Defaulting to user installation because normal site-packages is not writeable
Collecting google-cloud-bigquery
  Downloading google_cloud_bigquery-3.4.1-py2.py3-none-any.whl (215 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m215.1/215.1 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m[31m2.5 MB/s[0m eta [36m0:00:01[0m
Installing collected packages: google-cloud-bigquery
Successfully installed google-cloud-bigquery-3.4.1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m22.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, 

In [78]:

# include the name of your project
BQ_PROJECT_NAME = 'website-analytics-161019'

# create the tables manually to include all the fields
BQ_DATASET_NAME = 'test'
BQ_TABLE_NAME = 'test_sc'


# establish a BigQuery client
client = bigquery.Client.from_service_account_json(SERVICE_ACCOUNT_FILE)
dataset_id = BQ_DATASET_NAME
table_name = BQ_TABLE_NAME

# create a job config
# Set the destination table

table_id = '{}.{}.{}'.format(BQ_PROJECT_NAME, BQ_DATASET_NAME, BQ_TABLE_NAME)



job_config = bigquery.LoadJobConfig(
    # Specify a (partial) schema. All columns are always written to the
    # table. The schema is used to assist in data type definitions.
    schema=[
        # Specify the type of columns whose type cannot be auto-detected. For
        # example the "title" column uses pandas dtype "object", so its
        # data type is ambiguous.
        #bigquery.SchemaField("title", bigquery.enums.SqlTypeNames.STRING),
        # Indexes are written if included in the schema by name.
        #bigquery.SchemaField("wikidata_id", bigquery.enums.SqlTypeNames.STRING),
    ],
    # Optionally, set the write disposition. BigQuery appends loaded rows
    # to an existing table by default, but with WRITE_TRUNCATE write
    # disposition it replaces the table with the loaded data.
    write_disposition="WRITE_APPEND",
)

job = client.load_table_from_dataframe(
    df_all_queries, table_id, job_config=job_config
)  # Make an API request.
job.result()  # Wait for the job to complete.



NameError: name 'bigquery' is not defined

In [106]:

tables = client.list_tables(dataset_id)  # Make an API request.

print("Tables contained in '{}':".format(dataset_id))
for table in tables:
    print("{}.{}.{}".format(table.project, table.dataset_id, table.table_id))

Tables contained in 'test':
website-analytics-161019.test.test_sc


In [102]:
datasets = client.list_datasets()

for dataset in datasets:
  print("dataset", dataset.project)

dataset website-analytics-161019
dataset website-analytics-161019


In [103]:
projects = client.list_projects()

for dataset in projects:
  print("dataset", dataset)

dataset <google.cloud.bigquery.client.Project object at 0x7fab4d0c48e0>


In [104]:
datasets = list(client.list_datasets())  # Make an API request.
project = client.project
datasets

[<google.cloud.bigquery.dataset.DatasetListItem at 0x7fab4cee9400>,
 <google.cloud.bigquery.dataset.DatasetListItem at 0x7fab4e3c36a0>]

In [105]:
datasets

[<google.cloud.bigquery.dataset.DatasetListItem at 0x7fab4cee9400>,
 <google.cloud.bigquery.dataset.DatasetListItem at 0x7fab4e3c36a0>]