Skip to content

IPRoyal/vba-web-scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

VBA Web Scraping Starter Kit

GitHub Banner

Overview

Web scraping is the process of extracting HTML content from websites and transforming it into useful data.
While Python is the most common language for this task, Visual Basic for Applications (VBA) provides a capable, built-in alternative within Microsoft Office.

VBA is the internal scripting language for Excel, Word, and other Office products.
By combining VBA with Excel’s data manipulation features, you can scrape and analyze information directly in Excel without relying on external libraries or tools.

This repository demonstrates two practical approaches to web scraping using VBA:

  • Automated browser path (Internet Explorer COM) – suitable for complex or dynamic pages that need to be rendered.
  • Direct HTTP path (ServerXMLHTTP) – faster and simpler for static web pages that return plain HTML.

Internet Explorer automation is deprecated on modern Windows systems. Use ServerXMLHTTP where possible, or integrate SeleniumBasic for modern browsers.


Enabling the VBA Developer Tab

VBA is disabled by default in Microsoft Excel. To enable it:

  1. Open Excel and click File

    Excel File Menu
  2. Click Options at the bottom of the sidebar.

    Excel Options
  3. Choose Customize Ribbon.

    Customize Ribbon
  4. On the right-hand side, enable the checkbox for Developer and click OK.

    Enable Developer Tab
  5. Confirm that the Developer tab is now visible at the top of Excel.

    Developer Tab Visible

You can now access the VBA Editor through Developer → Visual Basic.


Setting Up VBA References

Inside the VBA editor, go to Tools → References and enable the following libraries:

  • Microsoft HTML Object Library
  • Microsoft Internet Controls

These libraries are essential for interacting with HTML and automating Internet Explorer.


Basic Concepts of VBA Web Scraping

The two enabled libraries provide the foundation for VBA web scraping:

  • Microsoft HTML Object Library – allows manipulation of HTML elements such as tags, IDs, and classes.
  • Microsoft Internet Controls – provides access to the Internet Explorer engine to open and navigate web pages.

Together, they allow Excel to send requests, load pages, and extract information from HTML content.
Alternatively, you can use ServerXMLHTTP to fetch data directly from websites without launching a browser.


Writing Your First VBA Web Scraper

  1. Create a new worksheet in Excel and name it Test.
  2. Open the VBA editor (Developer → Visual Basic).
  3. Right-click your workbook, select Insert → Module, and paste the following code.

Example 1 – Using Internet Explorer Automation

Sub ScrapeWebsiteUsingIE()
    ' Create Internet Explorer instance
    Dim ie As Object
    Set ie = CreateObject("InternetExplorer.Application")

    ' Hide browser window
    ie.Visible = False
    ie.navigate "http://iproyal.com"

    ' Wait for page to load completely
    Do While ie.Busy Or ie.readyState <> 4
        DoEvents
    Loop

    ' Access the HTML document
    Dim html As Object
    Set html = ie.document

    ' Retrieve specific HTML elements
    Dim element As Object
    Set element = html.getElementsByClassName("mt-16 sm:mt-24 text-center lg:text-left tp-headline-m lg:tp-headline-xl")

    ' Reference the output worksheet
    Dim ws As Worksheet
    Set ws = ThisWorkbook.Sheets("Test")

    Dim startRow As Integer
    startRow = 1

    ' Loop through elements and extract text
    If element.Length > 0 Then
        Dim i As Integer
        For i = 0 To element.Length - 1
            On Error Resume Next
            Dim elementText As String
            elementText = element.Item(i).innerText
            If Err.Number <> 0 Then
                Err.Clear
                elementText = element.Item(i).outerText
            End If
            On Error GoTo 0

            If elementText <> "" Then
                ws.Cells(startRow, 1).Value = elementText
            Else
                ws.Cells(startRow, 1).Value = "Element does not support innerText or outerText."
            End If

            startRow = startRow + 1
        Next i
    Else
        ws.Cells(startRow, 1).Value = "No elements found with the specified class name."
    End If

    ' Close Internet Explorer
    ie.Quit
    Set ie = Nothing
End Sub

Example 2 – Using XMLHTTP for Direct Requests

Sub ScrapeWebsiteUsingServerXMLHTTP()
    ' Create HTTP object
    Dim xml As Object
    Set xml = CreateObject("MSXML2.ServerXMLHTTP.6.0")

    ' Send GET request
    xml.Open "GET", "http://iproyal.com", False
    xml.send

    ' Wait for the response
    Do While xml.readyState <> 4
        DoEvents
    Loop

    ' Verify the request
    If xml.Status = 200 Then
        ' Parse HTML
        Dim html As Object
        Set html = CreateObject("HTMLFile")
        html.body.innerHTML = xml.responseText

        ' Get elements
        Dim elements As Object
        Set elements = html.getElementsByClassName("mt-16 sm:mt-24 text-center lg:text-left tp-headline-m lg:tp-headline-xl")

        Dim startRow As Integer
        startRow = 1

        ' Write results
        If elements.Length > 0 Then
            Dim i As Integer
            For i = 0 To elements.Length - 1
                Dim elementText As String
                On Error Resume Next
                elementText = elements.Item(i).innerText
                If Err.Number <> 0 Then
                    Err.Clear
                    elementText = elements.Item(i).outerText
                End If
                On Error GoTo 0

                Cells(startRow, 1).Value = elementText
                startRow = startRow + 1
            Next i
        Else
            Cells(startRow, 1).Value = "No elements found with the specified class name."
        End If
    Else
        MsgBox "Failed to retrieve the web page. Status: " & xml.Status
    End If

    ' Clean up
    Set xml = Nothing
End Sub

Comparison

Method Advantages Disadvantages
Internet Explorer Handles dynamic content and JavaScript Slower, deprecated
ServerXMLHTTP Fast and lightweight Can be blocked by websites or fail on dynamic pages

About

VBA Web Scraping: Extracting Data with Excel VBA

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages