**Reference**
- This example if from the following link
- https://www.pluralsight.com/guides/web-scraping-with-beautiful-soup
- Added some notes and made a few modifications

In [1]:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#Created on Thu Sep  6 11:17:11 2018

#@author: kerry

#import libraries

from bs4 import BeautifulSoup
import requests
import csv

- Create a BeautifulSoup object and define the parser.

In [2]:
URL = 'https://en.wikipedia.org/wiki/List_of_game_engines'
content = requests.get(URL)
soup = BeautifulSoup(content.text, 'html.parser')

- BeautifulSoup can extract single or multiple occurrences of a specific tag and can also accept search criteria based on attributes such as:

- Find: This function takes the name of the tag as string input and returns the first found match of the particular tag from the webpage response as:

In [3]:
row = soup.find('tr') # Extract and return first occurrence of tr
print(row)            # Print row with HTML formatting
print("=========Text Result==========")
print(row.get_text()) # Print row as text

<tr>
<th style="width: 12em">Name
</th>
<th>Primary <a href="/wiki/Programming_language" title="Programming language">programming language</a>
</th>
<th><a href="/wiki/Scripting_language" title="Scripting language">Scripting</a>
</th>
<th><a class="mw-redirect" href="/wiki/Cross-platform" title="Cross-platform">Cross-platform</a>
</th>
<th>2D/3D oriented
</th>
<th>Target <a href="/wiki/Computing_platform" title="Computing platform">platform</a>
</th>
<th>Notable games
</th>
<th>License
</th>
<th class="unsortable">Notes and references
</th></tr>

Name

Primary programming language

Scripting

Cross-platform

2D/3D oriented

Target platform

Notable games

License

Notes and references



In [4]:
#Findall: Use find_all to extract all the occurrences of a particular tag from the page response as:
rows = soup.find_all('tr')
for row in rows:          # Print all occurrences
    print(row.get_text())


Name

Primary programming language

Scripting

Cross-platform

2D/3D oriented

Target platform

Notable games

License

Notes and references


4A Engine

C++



Yes

3D

Windows, OS X, Linux, PlayStation 3, PlayStation 4, Xbox 360, Xbox One

Metro 2033, Metro: Last Light, Metro Exodus

Proprietary




A-Frame (VR)

HTML, JavaScript

JavaScript

Yes

3D

Cross-platform



MIT

Open source Entity component system WebVR framework


Adventure Game Interpreter



C style

Yes

2D

DOS, Apple SOS, ProDOS, Classic Mac OS, Atari TOS

List

Proprietary




Adventure Game Studio

C++

AGSScript

Yes

2D

Windows, Linux

Chzo Mythos, Blackwell

Artistic 2.0

Mostly used to develop third-person pre-rendered graphic adventure games, one of the most popular for developing amateur adventure games


Alamo





Yes

3D

Windows, OS X, Xbox 360

Star Wars: Empire at War, Star Wars: Empire at War: Forces of Corruption, Universe at War: Earth Assault

Proprietary




Aleph One

C++

Lua, Marathon markup 

- find_all returns an object of ResultSet which offers index based access to the result of found occurrences and can be printed using a for loop.

- Pass List: find_all can accept a list of tags as soup.find_all(['th', 'td']) and parameters like id to find tags with unique id and href to process tags with href attribute as:

In [5]:
content = requests.get(URL)
soup = BeautifulSoup(content.text, 'html.parser')
tags = soup.find_all(id = True, href = True) 

- Pass Function: A function can contain your customized logic to validate the tag and can be used as:

In [6]:
def isAnchorTagWithLargeText(tag):
    """ Validate the anchor tag and should have text length greater than 50 """
    
    return True if tag.name == 'a' and len(tag.get_text()) > 50 else False

In [7]:
content = requests.get(URL)
soup = BeautifulSoup(content.text, 'html.parser')
tags = soup.find_all(isAnchorTagWithLargeText, limit = 10)
for tag in tags:
    print(tag.get_text())

advanced: lighting, shadows, interactive GUI surfaces
Star Wars: Knights of the Old Republic II: The Sith Lords
"A Gentle Introduction to Frogatto Formula Language"
"Artifact will use Source 2, bringing the engine to iOS and Android"
https://en.wikipedia.org/w/index.php?title=List_of_game_engines&oldid=955516745


- Attribute Driven Search: The result of find_all function can also contain Rows from other tables
- Unwanted values These are not desired most of the time. So, attributes like id, class, or value are used to further refine the search.
- Let's print the first found table (content table) to identify the attributes as:

In [30]:
table  = soup.find_all('table')
print(table)

[<table class="wikitable sortable" style="text-align: center; font-size: 85%; width: auto; table-layout: fixed;">
<tbody><tr>
<th style="width: 12em">Name
</th>
<th>Primary <a href="/wiki/Programming_language" title="Programming language">programming language</a>
</th>
<th><a href="/wiki/Scripting_language" title="Scripting language">Scripting</a>
</th>
<th><a class="mw-redirect" href="/wiki/Cross-platform" title="Cross-platform">Cross-platform</a>
</th>
<th>2D/3D oriented
</th>
<th>Target <a href="/wiki/Computing_platform" title="Computing platform">platform</a>
</th>
<th>Notable games
</th>
<th>License
</th>
<th class="unsortable">Notes and references
</th></tr>
<tr>
<th><a href="/wiki/4A_Engine" title="4A Engine">4A Engine</a>
</th>
<td><a href="/wiki/C%2B%2B" title="C++">C++</a>
</td>
<td>
</td>
<td class="table-yes" style="background:#9F9;vertical-align:middle;text-align:center;">Yes
</td>
<td>3D
</td>
<td><a href="/wiki/Microsoft_Windows" title="Microsoft Windows">Windows</a>, <a

- The content table has a unique CSS class attribute i.e. wikitable sortable which can be used to find the main content table as:

In [31]:
contentTable  = soup.find('table', { "class" : "wikitable sortable"}) # Use dictionary to pass key : value pair
rows  = contentTable.find_all('tr')
for row in rows:
    print(row.get_text())


Name

Primary programming language

Scripting

Cross-platform

2D/3D oriented

Target platform

Notable games

License

Notes and references


4A Engine

C++



Yes

3D

Windows, OS X, Linux, PlayStation 3, PlayStation 4, Xbox 360, Xbox One

Metro 2033, Metro: Last Light, Metro Exodus

Proprietary




A-Frame (VR)

HTML, JavaScript

JavaScript

Yes

3D

Cross-platform



MIT

Open source Entity component system WebVR framework


Adventure Game Interpreter



C style

Yes

2D

DOS, Apple SOS, ProDOS, Classic Mac OS, Atari TOS

List

Proprietary




Adventure Game Studio

C++

AGSScript

Yes

2D

Windows, Linux

Chzo Mythos, Blackwell

Artistic 2.0

Mostly used to develop third-person pre-rendered graphic adventure games, one of the most popular for developing amateur adventure games


Alamo





Yes

3D

Windows, OS X, Xbox 360

Star Wars: Empire at War, Star Wars: Empire at War: Forces of Corruption, Universe at War: Earth Assault

Proprietary




Aleph One

C++

Lua, Marathon markup 

- Here find is more suitable than find_all, since only one table has wikitable sortable class property.
- Alternatively, the _class (not available in old versions) attribute 
- can be used as soup.find_all('table', class_ ="wikitable sortable").
- Nested Tags: Nested tags can be found using the select method as:

In [33]:
print(soup.select("html head title")[0].get_text()) # List of game engines – Wikipedia

List of game engines - Wikipedia


- Beautiful Soup also allows you to mention tags as properties to find first occurrence of the tag as:

In [35]:
content = requests.get(URL)
soup = BeautifulSoup(content.text, 'html.parser')
print(soup.head, soup.title)
print(soup.table.tr)  # Print first row of the first table 

<head>
<meta charset="utf-8"/>
<title>List of game engines - Wikipedia</title>
<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"Xqyi2gpAMNYAA7SsX9cAAAAV","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_game_engines","wgTitle":"List of game engines","wgCurRevisionId":954338549,"wgRevisionId":954338549,"wgArticleId":2323909,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with short description","Use mdy dates from June 2018","All articles with unsourced statements","Articles with unsourced statements from July 2015","Video game engines","Technology-related lists"],"wgPageContentLan