# Google Search Console GSC API Examples


### Overview

If you are interested in your organic traffic then the Google Search Console is hands down 
one of the richest datasources.

Here are example scripts for connecting to this valuable datasource.



### About Me

My name is Alton Alexander. I am a Data Science consultant turned entreprenuer building SaaS tools for SEO.

Find more about my free scripts or ask me any question on twitter: @alton_lex

# GSC API Examples:

In [31]:
# load libraries
import requests
import json
from urllib.parse import urlparse

import httplib2
from apiclient import errors
from apiclient.discovery import build

import datetime

import google.oauth2.credentials
import google.auth.transport.requests

!pip install pandas
import pandas as pd

Defaulting to user installation because normal site-packages is not writeable
Collecting pandas
  Using cached pandas-1.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.1 MB)
Installing collected packages: pandas
Successfully installed pandas-1.5.2

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m22.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m


## 0) Innitialize the Oauth Flow using the App Service Account

Every request must have user authentication.

Steps:

1. Using the service account credentials we initialize Oauth

2. Then the user authenticates with their google account associated with GSC

3. We save the user's credentials for future use

See the documentation to get the service account credentials first.


In [72]:
# get authorization
# https://developers.google.com/identity/protocols/oauth2
# https://github.com/googleapis/google-api-python-client/blob/main/docs/oauth.md

#get_ipython().system('pip uninstall google_auth_oauthlib -y')
#get_ipython().system('pip3 uninstall google_auth_oauthlib -y')
#get_ipython().system('pip3 install google-auth')
get_ipython().system('pip3 install google-auth-oauthlib')

from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build

flow = InstalledAppFlow.from_client_secrets_file(
    '../gcp-keys/client_secret_824986252225-squl6c7v1oshe5l3d8fjc9s2j9hv8sdm.apps.googleusercontent.com.json',
    scopes=[
        'openid',    
        'https://www.googleapis.com/auth/userinfo.email', 
        'https://www.googleapis.com/auth/userinfo.profile',
        'https://www.googleapis.com/auth/webmasters.readonly',
        'https://www.googleapis.com/auth/webmasters'
    ])

flow.run_local_server(port=8080)

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m22.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m
Please visit this URL to authorize this application: https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=824986252225-squl6c7v1oshe5l3d8fjc9s2j9hv8sdm.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A8080%2F&scope=openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.profile+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fwebmasters.readonly+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fwebmasters&state=Ko6XijFkFyQa8CHm9PJmc2fgJ63aNJ&access_type=offline


<google.oauth2.credentials.Credentials at 0x7f8697d3b3a0>

Next steps:
    
- click the link above, login with the account that has the site in Google Search Console

- Return to this page and the credentials are stored in memmory

- We save these credentails for future use without needing to ask the user again

In [82]:
# persist the credentials
credentials = flow.credentials

#now we turn the passed in credentials obj into a dicts obj
#note the expiry formatting
temp = {
    'token': credentials.token,
    'refresh_token': credentials.refresh_token,
    'id_token':credentials.id_token,
    'token_uri': credentials.token_uri,
    'client_id': credentials.client_id,
    'client_secret': credentials.client_secret,
    'scopes': credentials.scopes,
    'expiry':datetime.datetime.strftime(credentials.expiry,'%Y-%m-%d %H:%M:%S')
}

# Serializing json
json_object = json.dumps(temp, indent=4)
 
# Writing to sample.json
with open("../gcp-keys/gsc-user-creds.json", "w") as outfile:
    outfile.write(json_object)

## 1) Load the User's creds 

The user's credentials are saved from step 0 above.

Now we just need to refresh the token if it has expired.

In [83]:
# load the credentials for this user
# Opening JSON file
with open('../gcp-keys/gsc-user-creds.json', 'r') as openfile:
 
    # Reading from json file
    temp = json.load(openfile)


credentials = google.oauth2.credentials.Credentials(
    temp['token'],
    refresh_token=temp['refresh_token'],
    id_token=temp['id_token'],
    token_uri=temp['token_uri'],
    client_id=temp['client_id'],
    client_secret=temp['client_secret'],
    scopes=temp['scopes'],
)
expiry = temp['expiry']
expiry_datetime = datetime.datetime.strptime(expiry,'%Y-%m-%d %H:%M:%S')
credentials.expiry = expiry_datetime
#and now we refresh the token   
#but not if we know that its not a valid token.

request = google.auth.transport.requests.Request()
if credentials.expired:
    credentials.refresh(request)

## 1) Pull a list of pages

This pulls a list of all the pages that are showing on Google Search in the last 30 days:

In [75]:
# define the domain that we are using

site = "frontanalytics.com"


In [76]:
# set date for the last 30 days
today = datetime.datetime.today()
startDate = today - datetime.timedelta(days=32)
endDate = today - datetime.timedelta(days=2)

In [77]:
# by page

# recent
# Set the dates in this format "2022-12-01"
data = {
  "startDate": startDate.strftime("%Y-%m-%d"),
  "endDate": endDate.strftime("%Y-%m-%d"),
  "dimensions": "page"
}
res = requests.post("https://www.googleapis.com/webmasters/v3/sites/"+"sc-domain:"+site+"/searchAnalytics/query?access_token="+credentials.token, json=data)

In [78]:
j = json.loads(res.text)['rows']
for i in range(len(j)):
    j[i]['url'] = j[i]['keys'][0]

df_pages = pd.DataFrame(j)

df_pages = df_pages.sort_values('impressions',ascending=False)
df_pages

Unnamed: 0,keys,clicks,impressions,ctr,position,url
0,[https://frontanalytics.com/],10,200,0.05,9.915,https://frontanalytics.com/
4,[https://frontanalytics.com/advanced-analytics/],0,101,0.0,33.574257,https://frontanalytics.com/advanced-analytics/
1,[https://frontanalytics.com/system-dynamics-mo...,7,77,0.090909,42.311688,https://frontanalytics.com/system-dynamics-mod...
8,[https://frontanalytics.com/inbound-call-cente...,0,65,0.0,54.169231,https://frontanalytics.com/inbound-call-center...
2,[https://frontanalytics.com/about/],0,47,0.0,3.085106,https://frontanalytics.com/about/
5,[https://frontanalytics.com/contact/],0,46,0.0,3.043478,https://frontanalytics.com/contact/
9,[https://frontanalytics.com/privacy/],0,46,0.0,2.978261,https://frontanalytics.com/privacy/
6,[https://frontanalytics.com/data-monetization/],0,4,0.0,29.0,https://frontanalytics.com/data-monetization/
7,[https://frontanalytics.com/digital-strategy/],0,3,0.0,36.333333,https://frontanalytics.com/digital-strategy/
3,[https://frontanalytics.com/about/alton-alexan...,0,1,0.0,1.0,https://frontanalytics.com/about/alton-alexander/


## 2) Get all KWs for each page

This pulls every query used for each page:

In [79]:
# set date for the previous 1 days (day before yesterday)
today = datetime.datetime.today()
startDate = today - datetime.timedelta(days=3)
endDate = today - datetime.timedelta(days=2)

df_all_queries = pd.DataFrame()

# Get all the queries
for index, row in df_pages.iterrows():
    
    # get the url of this page
    page_url = row['url']

    # recent
    data = {
      "startDate": startDate.strftime("%Y-%m-%d"),
      "endDate": endDate.strftime("%Y-%m-%d"),
      "dimensions": "query",
      "dimensionFilterGroups": [
        {
          "groupType": "and",
          "filters": [
            {
              "dimension": "page",
              "operator": "contains",
              "expression": page_url
            }
          ]
        }
      ]
    }
    res = requests.post("https://www.googleapis.com/webmasters/v3/sites/"+"sc-domain:"+site+"/searchAnalytics/query?access_token="+credentials.token, json=data)

    # convert the response to a data frame
    j = json.loads(res.text).get('rows',[])
    for i in range(len(j)):
        # just use the first key
        j[i]['query'] = j[i]['keys'][0]

    if(len(j)):
        df_queries = pd.DataFrame(j)
        df_queries['url'] = page_url
        df_queries = df_queries.drop(columns=['keys'])

        # save all the queries for this page with all other pages
        df_all_queries = pd.concat([df_all_queries, df_queries])

In [80]:
# Now you can save df_all_queries for additional analysis
df_all_queries

Unnamed: 0,clicks,impressions,ctr,position,query,url
0,0,2,0,22.5,advanced analytics generally refers to,https://frontanalytics.com/
1,0,5,0,34.0,call center optimization case study,https://frontanalytics.com/
2,0,1,0,4.0,data consultant,https://frontanalytics.com/
3,0,8,0,2.75,front analytics,https://frontanalytics.com/
4,0,1,0,96.0,systems dynamics modeling,https://frontanalytics.com/
0,0,2,0,22.5,advanced analytics generally refers to,https://frontanalytics.com/advanced-analytics/
0,0,1,0,96.0,systems dynamics modeling,https://frontanalytics.com/system-dynamics-mod...
0,0,5,0,34.0,call center optimization case study,https://frontanalytics.com/inbound-call-center...
0,0,2,0,3.0,front analytics,https://frontanalytics.com/about/
0,0,2,0,3.0,front analytics,https://frontanalytics.com/contact/
