![Image of Yaktocat](https://seeders.nl/wp-content/uploads/2020/03/seeders-logo.png)
## SERP Analyser
*This notebook gets SERPs for top searched keywords in Europe and anlayses the top 10 results to gain insights for important SEO ranking factors across Europe.*

We will analyse SERPS based on the following questions<br>
 - Is the domain exstension a ranking factor?<br>
 - ??

### Import libraries

In [1]:
import pandas as pd
import gspread
from gspread_dataframe import get_as_dataframe
from oauth2client.service_account import ServiceAccountCredentials

import requests as rq
from requests import get
from bs4 import BeautifulSoup
import time
from tqdm import tqdm
from urllib.parse import urlparse

import matplotlib.pyplot as plt 
import seaborn as sns

### Get the data from google spreadsheets

In [2]:
scope = ['https://spreadsheets.google.com/feeds', 'https://www.googleapis.com/auth/drive']
credentials = ServiceAccountCredentials.from_json_keyfile_name('C:/Users/Anne/PycharmProjects/crawlersAndscrapers/Scrapers and Crawlers-79156bc3792f.json', scope)
# credentials = ServiceAccountCredentials.from_json_keyfile_name('C:/Users/TJAwi/OneDrive/Bureaublad/githubSyncer/scrapers-and-crawlers-3e70bf97958c.json', scope)
client = gspread.authorize(credentials)
print("Authorizing.......")

spreadsheet_key = '1n6lCJTKjX6ZDlP8WSv_6ZNgq11SwFM_Owbkfo_hmCwo'
print("Opening.......")
sheet = client.open("Zoekwoorden voor onderzoek").sheet1

Authorizing.......
Opening.......


### Clean the data

In [3]:
df = get_as_dataframe(sheet, header=[0,1])#GET ONLY ROWS POPULATED WITH DATA
df = df[0:21]
df = df.filter(regex='^((?!Unnamed).)*$', axis=1) #REMOVE ALL COLUMNS WHERE THE HEADER IS UNNAMED
df = df.filter(regex='^((?!Volume).)*$', axis=1) #REMOVE ALL COLUMNS CONTAINING VAGUE SEARCH VOLUME DATA
df = df.rename(columns=lambda x: x.strip()) #REMOVE WHITESPACE FROM COLUMN NAMES
df.head(3)

Unnamed: 0_level_0,Nederland,Duitsland,Engeland,Spanje,Italie,Frankrijk,Portugal,Belgie,Denemarken,Zweden,Polen
Unnamed: 0_level_1,Keyword,Keyword,Keyword,Keyword,Keyword,Keyword,Keyword,Keyword,Keyword,Keyword,Keyword
0,autoverzekering,Autoversicherung,Car insurance,seguro coche,assicurazione auto,assurance auto,seguro automóvel,autoverzekering,bilforsikring,bilförsäkring,ubezpieczenie samochodu
1,sneakers,Sneakers,Sneakers,Sneakers,scarpe da ginnastica,sneakers,ténis,sneakers,sneakers,sneakers,sneakers
2,geld lenen,Geld leihen,Money loan,prestar dinero,prestiti,prêt,empréstimo,geld lenen,låne penge,låna pengar,pożyczka gotówkowa


In [4]:
#Get only first two columns

df = df.filter(items=[( 'Nederland', 'Keyword'),('Duitsland', 'Keyword')])
df.columns.names = ['Country', 'Atts']
df = df.head(1) #only test with one retry at a time
df

Country,Nederland,Duitsland
Atts,Keyword,Keyword
0,autoverzekering,Autoversicherung


In [5]:
# testlist = ['een', 'twee', 'drie', 'vier' , 'vijf', 'zes', 'zeven', 'acht', 'negen', 'tien', 'elf', 'twaalf', 'dertien', 'viertien', 'vijftien', 'zestien', 'zeventien', 'achttien', 'negentien', 'twintig',' eenentwintig']

## Build Google Search function

Use Google's own API at: https://github.com/googleapis/google-api-python-client/blob/master/docs/README.md
from:

 - https://towardsdatascience.com/current-google-search-packages-using-python-3-7-a-simple-tutorial-3606e459e0d4
 - --> Skip straight to: https://github.com/googleapis/google-api-python-client/blob/master/docs/start.md
 - --> ~Zoek uit hoe je de google search ID moet maken~
 - --> Verschil tussen Googe Custom Search en google.com: https://support.google.com/programmable-search/answer/70392

_"The Google API client, to my knowledge, is the only Google-owned item from the ones we have looked at so far. 
They offer a wide range of capabilities when it comes to navigating this space. 
I list them last as the process of getting an account fully set up and ready to go can be a difficult space to navigate for some. 
However, if you are developing a model that heavily relies search queries, 
I would recommend that you jump to this option above all others."_

### To do:

 1. Check how to get results close(r) to real results
 2. Then check how to change region
 3. Find a way to visualize how close the results are compared to a manual search.


In [20]:
from googleapiclient.discovery import build

api_key = "AIzaSyDBWkK3QLlaJLTclvPhRQtCYjGc7AVBWyU" #could be restriced better by specifing which API the key can be used for
cse_id = "426760db88ffae779"

def google_query(query, api_key, cse_id, **kwargs):
    query_service = build("customsearch", 
                          "v1", 
                          developerKey=api_key
                          )  
    query_results = query_service.cse().list(q=query,    # Query
                                             cx=cse_id, gl='countryNL',  # CSE ID
                                             **kwargs    
                                             ).execute()
    return query_results['items']

my_results_list = []
my_results = google_query("apple iphone news 2019",
                          api_key, 
                          cse_id, 
                          num = 10
                          )



{'cse_thumbnail': [{'src': 'https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQfsQs1NnaMw79H0pkYK4ic2vbv2zKEujKOUcFCnAat8zujdlFKQJRc4YE', 'width': '225', 'height': '225'}], 'BreadcrumbList': [{}], 'metatags': [{'analytics-s-bucket-1': 'appleglobal,applestoreww', 'analytics-s-bucket-0': 'appleglobal,applestoreww', 'og:image': 'https://www.apple.com/newsroom/images/tile-images/Apple_iPhone-11-Pro_Most-Powerful-Advanced_091019.jpg.og.jpg?202101220908', 'og:type': 'article', 'twitter:title': 'iPhone 11 Pro and iPhone 11 Pro Max: the most powerful and advanced smartphones', 'twitter:card': 'summary_large_image', 'og:site_name': 'Apple Newsroom', 'og:title': 'iPhone 11 Pro and iPhone 11 Pro Max: the most powerful and advanced smartphones', 'ac-gn-store-key': 'SX29D2YPJFKFAFC2P', 'og:description': 'Apple today announced iPhone 11 Pro and iPhone 11 Pro Max, a new pro line for iPhone that delivers advanced performance.', 'twitter:image': 'https://www.apple.com/newsroom/images/tile-images/Ap

In [74]:
# print(type(my_results))
for result in my_results:#['pagemap']['metatags'][0]['position']:
#     my_results_list.append(result['link'])
#     print(result['link'])
#     print(type(result['pagemap']['metatags'][0]))
    print(result['link'], result['pagemap']['metatags'][0].get('position'))

<class 'list'>
https://www.apple.com/newsroom/2019/09/iphone-11-pro-and-iphone-11-pro-max-the-most-powerful-and-advanced-smartphones/ 1
https://investor.apple.com/investor-relations/default.aspx None
https://www.apple.com/apple-events/ 1
https://support.apple.com/en-us/HT202329 1
https://developer.apple.com/news/ None
https://www.macrumors.com/roundup/iphone-11/ None
https://www.businessinsider.com/new-iphones-from-apple-2019-rumors-features-specs-2019-3 None
https://www.bbc.com/news/business-48110709 None
https://qz.com/1702874/what-to-expect-from-apples-september-2019-iphone-event/ None
https://www.bloomberg.com/news/articles/2019-09-09/apple-foxconn-broke-a-chinese-labor-law-for-iphone-production None


### To Do
 1. Find out what pagemap can do for structured data on your website
 2. Find out if 'position' is position in Google? 