# Scrape top Google Results

### Overview

If you are interested in search traffic (paid or organic) then the it is important to monitor SERPs.

Here is data you can get using just requests!


### About Me

My name is Alton Alexander. I am a Data Science consultant turned entreprenuer building SaaS tools for SEO.

Find more about my free scripts or ask me any questions

Follow me for more data and tutorials

- twitter: https://twitter.com/alton_lex @alton_lex

- linkedin: https://www.linkedin.com/in/altonalexander/


### About Data Winners

Join the conversation:

- private Discord community

- Video tutorials

- Feedback and support on this and other scripts

Join now: https://datawinners.gumroad.com/l/data-analytics-for-seo

###  Motivation:

scrape a SERP result for any given search query and extract the basic HTML tags like page titles, H1s, H2s-H6s, and so on

not interested on the entire domain but a particular page/post

this would be to improve the blog article outline process

so I can see how the top ranking results for any given query is structuring their content

##  Step 1) Get the SERP page

In [101]:
# get the libraries
!pip install googlesearch-python

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m22.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m


In [113]:
import requests
from googlesearch import search
from bs4 import BeautifulSoup
search("Google")

<generator object search at 0x7f2ca26228f0>

In [114]:
query = "python for seo"
results_generator = search(query, num_results=10, lang="en")

In [115]:


# loop over the generator and save to a list
results = []

for link in list(results_generator):
    print(link)
    results.append(link)

https://www.jcchouinard.com/python-for-seo/
https://www.python.org/success-stories/python-seo-link-analyzer/
https://www.searchenginejournal.com/python-machine-learning-technical-seo/430000/
https://www.danielherediamejias.com/python-scripts-seo/
https://www.rankranger.com/blog/python-for-seo
https://importsem.com/
https://practicaldatascience.co.uk/data-science/19-python-seo-projects-that-will-improve-your-site
https://github.com/sethblack/python-seo-analyzer
https://www.seoradar.com/how-to-use-python-seo/
https://rockcontent.com/blog/python-for-seo/


# Step 2) A custom function to extract info from each link

Input a link

Get out a list of sections

In [116]:

def parse_page_to_sections_list( link ):
    '''
    
    Simple function to download the raw html of a link
    parses contents into an array of dictionaries
    
    '''
    page = requests.get(link)
    soup = BeautifulSoup(page.text, "lxml")

    title = soup.find("title").text

    headers = ['h1', 'h2', 'h3', 'h4', 'h5']

    df_json = []

    for header in headers:
        sections = soup.find_all(header)

        order = 0
        for section in sections:
            tmp = {
                "url": link,
                "page_title": title,
                "header": header,
                "order": order,
                "title": section.text
            }
            df_json.append( tmp )
            order = order +1

    return df_json

In [117]:
# test our function

link = "https://www.jcchouinard.com/python-for-seo/"
parse_page_to_sections_list( link )

[{'url': 'https://www.jcchouinard.com/python-for-seo/',
  'page_title': 'Python for SEO: Complete Guide (in 8 Chapters) - JC Chouinard',
  'header': 'h1',
  'order': 0,
  'title': 'Python for SEO: Complete Guide (in 8 Chapters)'},
 {'url': 'https://www.jcchouinard.com/python-for-seo/',
  'page_title': 'Python for SEO: Complete Guide (in 8 Chapters) - JC Chouinard',
  'header': 'h2',
  'order': 0,
  'title': 'Subscribe to my Newsletter'},
 {'url': 'https://www.jcchouinard.com/python-for-seo/',
  'page_title': 'Python for SEO: Complete Guide (in 8 Chapters) - JC Chouinard',
  'header': 'h2',
  'order': 1,
  'title': 'Chapter 1: Python Basics'},
 {'url': 'https://www.jcchouinard.com/python-for-seo/',
  'page_title': 'Python for SEO: Complete Guide (in 8 Chapters) - JC Chouinard',
  'header': 'h2',
  'order': 2,
  'title': 'Chapter 2: Technical SEO Challenges'},
 {'url': 'https://www.jcchouinard.com/python-for-seo/',
  'page_title': 'Python for SEO: Complete Guide (in 8 Chapters) - JC Chou

# Step 3: Parse each page in For loop

In [118]:
df_master_list = []

for link in results:
    print(link)
    df_json = parse_page_to_sections_list(link)
    df_master_list.extend(df_json)

df_master_list

https://www.jcchouinard.com/python-for-seo/
https://www.python.org/success-stories/python-seo-link-analyzer/
https://www.searchenginejournal.com/python-machine-learning-technical-seo/430000/
https://www.danielherediamejias.com/python-scripts-seo/
https://www.rankranger.com/blog/python-for-seo
https://importsem.com/
https://practicaldatascience.co.uk/data-science/19-python-seo-projects-that-will-improve-your-site
https://github.com/sethblack/python-seo-analyzer
https://www.seoradar.com/how-to-use-python-seo/
https://rockcontent.com/blog/python-for-seo/


[{'url': 'https://www.jcchouinard.com/python-for-seo/',
  'page_title': 'Python for SEO: Complete Guide (in 8 Chapters) - JC Chouinard',
  'header': 'h1',
  'order': 0,
  'title': 'Python for SEO: Complete Guide (in 8 Chapters)'},
 {'url': 'https://www.jcchouinard.com/python-for-seo/',
  'page_title': 'Python for SEO: Complete Guide (in 8 Chapters) - JC Chouinard',
  'header': 'h2',
  'order': 0,
  'title': 'Subscribe to my Newsletter'},
 {'url': 'https://www.jcchouinard.com/python-for-seo/',
  'page_title': 'Python for SEO: Complete Guide (in 8 Chapters) - JC Chouinard',
  'header': 'h2',
  'order': 1,
  'title': 'Chapter 1: Python Basics'},
 {'url': 'https://www.jcchouinard.com/python-for-seo/',
  'page_title': 'Python for SEO: Complete Guide (in 8 Chapters) - JC Chouinard',
  'header': 'h2',
  'order': 2,
  'title': 'Chapter 2: Technical SEO Challenges'},
 {'url': 'https://www.jcchouinard.com/python-for-seo/',
  'page_title': 'Python for SEO: Complete Guide (in 8 Chapters) - JC Chou

In [119]:
len(df_master_list)

283

# Step 4: Save results to CSV File

Using pandas

In [120]:
import pandas as pd
   
# Initialise data to lists.

df = pd.DataFrame(df_master_list )
df

Unnamed: 0,url,page_title,header,order,title
0,https://www.jcchouinard.com/python-for-seo/,Python for SEO: Complete Guide (in 8 Chapters)...,h1,0,Python for SEO: Complete Guide (in 8 Chapters)
1,https://www.jcchouinard.com/python-for-seo/,Python for SEO: Complete Guide (in 8 Chapters)...,h2,0,Subscribe to my Newsletter
2,https://www.jcchouinard.com/python-for-seo/,Python for SEO: Complete Guide (in 8 Chapters)...,h2,1,Chapter 1: Python Basics
3,https://www.jcchouinard.com/python-for-seo/,Python for SEO: Complete Guide (in 8 Chapters)...,h2,2,Chapter 2: Technical SEO Challenges
4,https://www.jcchouinard.com/python-for-seo/,Python for SEO: Complete Guide (in 8 Chapters)...,h2,3,Chapter 3: Automation With Python
...,...,...,...,...,...
278,https://rockcontent.com/blog/python-for-seo/,Python for SEO: A Guide on How to Automate You...,h3,16,More in SEO
279,https://rockcontent.com/blog/python-for-seo/,Python for SEO: A Guide on How to Automate You...,h3,17,\n\nLocal SEO For Retailers: How To Reach A Re...
280,https://rockcontent.com/blog/python-for-seo/,Python for SEO: A Guide on How to Automate You...,h3,18,\n\nHow Stage helped the IRKO Group overcome i...
281,https://rockcontent.com/blog/python-for-seo/,Python for SEO: A Guide on How to Automate You...,h3,19,"\n\nDomain, subdomain or subdirectory: which i..."


In [121]:
df.to_csv("search_sections_for_"+query+".csv", index=False)