# 2. BeautifulSoup (HTML CSS)

### HTML div Tag

    
**Definition and Usage**
    
The div tag defines a division or a section in an HTML document.

The div element is often used as a container for other HTML elements to style them with CSS or to perform certain tasks with JavaScript.


<tag id=""></tag>



### HTML  Id Attributes

**Definition and Usage**

The id attribute is a unique identifier which is used to specify the document.

It is used by CSS and JavaScript to perform a certain task for a unique element. 

In CSS, the id attribute is used using # symbol followed by id.

### HTML Class Attribute

**Definition and Usage**

Class in html:

The class is an attribute which specifies one or more class names for an HTML element.

The class attribute can be used on any HTML element.

The class name can be used by CSS and JavaScript to perform certain tasks for elements with the specified class name.


### How a web page call another?

By using the < a >: Anchor element

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a


In [2]:
import requests
from bs4 import BeautifulSoup

html = requests.get('http://en.wikipedia.org/wiki/Malaysia')

In [4]:
bs = BeautifulSoup(html.content,'html.parser')
print(bs)

<!DOCTYPE html>

<html class="client-nojs vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-sticky-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-feature-limited-width-clientpref-1 vector-feature-limited-width-content-enabled vector-feature-zebra-design-enabled vector-feature-custom-font-size-clientpref-0 vector-feature-client-preferences-disabled vector-feature-client-prefs-pinned-disabled vector-toc-available" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>Malaysia - Wikipedia</title>
<script>(function(){var className="client-js vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-sticky-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-feature-limited-width-clientpre

In [5]:
for link in bs.find_all('a'):
    print(link.attrs)

{'class': ['mw-jump-link'], 'href': '#bodyContent'}
{'href': '/wiki/Main_Page', 'title': 'Visit the main page [z]', 'accesskey': ['z']}
{'href': '/wiki/Wikipedia:Contents', 'title': 'Guides to browsing Wikipedia'}
{'href': '/wiki/Portal:Current_events', 'title': 'Articles related to current events'}
{'href': '/wiki/Special:Random', 'title': 'Visit a randomly selected article [x]', 'accesskey': ['x']}
{'href': '/wiki/Wikipedia:About', 'title': 'Learn about Wikipedia and how it works'}
{'href': '//en.wikipedia.org/wiki/Wikipedia:Contact_us', 'title': 'How to contact Wikipedia'}
{'href': 'https://donate.wikimedia.org/wiki/Special:FundraiserRedirector?utm_source=donate&utm_medium=sidebar&utm_campaign=C13_en.wikipedia.org&uselang=en', 'title': 'Support us by donating to the Wikimedia Foundation'}
{'href': '/wiki/Help:Contents', 'title': 'Guidance on how to use and edit Wikipedia'}
{'href': '/wiki/Help:Introduction', 'title': 'Learn how to edit Wikipedia'}
{'href': '/wiki/Wikipedia:Community

In [None]:
for link in bs.find_all('a'):
    if 'href' in link.attrs:
        print(link.attrs['href'])

In [None]:
# Get all hyperlinks
import requests
from bs4 import BeautifulSoup

html = requests.get('http://en.wikipedia.org/wiki/Malaysia')
bs = BeautifulSoup(html.content, 'html.parser')
for link in bs.find_all('a'):
    if 'href' in link.attrs:
        print(link.attrs['href'])

In [None]:
import re

for elem in bs.find('div', {'id':'bodyContent'}).find_all('a',href=re.compile('^(/wiki/)((?!:).)*$')):
    if 'href' in elem.attrs:
        print(elem.attrs['href'])

In [None]:
# retrieve only desired list of articles by using regular expression  ^(/wiki/)((?!:).)*$"):
import requests
import re
from bs4 import BeautifulSoup

html = requests.get('http://en.wikipedia.org/wiki/Malaysia')
bs = BeautifulSoup(html.content, 'html.parser')
for link in bs.find('div', {'id':'bodyContent'}).find_all('a', href=re.compile('^(/wiki/)((?!:).)*$')):
    if 'href' in link.attrs:
        print(link.attrs['href'])

### Saving the results to a CSV file

In [None]:
import csv 
import requests 

from bs4 import BeautifulSoup

html = requests.get("http://en.wikipedia.org/wiki/Comparison_of_text_editors")
bsObj = BeautifulSoup(html.content, 'html.parser')

#The main comparison table is currently the first table on the page
table = bsObj.findAll("table",{"class":"wikitable"})[0]
rows = table.findAll("tr")

csvFile = open("editors.csv", 'w', encoding='utf8')
writer = csv.writer(csvFile)

try:
    for row in rows:    
        csvRow = []
        for cell in row.findAll(['td', 'th']):
            text = cell.get_text()
            csvRow.append(text)    
        writer.writerow(csvRow)
finally:    
    csvFile.close()

In [None]:
for cell in rows[0].findAll(['th']):
    print(cell.get_text())

In [None]:
import csv

def load_csv(filename, delim=','):
    data = []
    with open(filename, 'r') as f:
        reader = csv.reader(f, delimiter=delim)
        for row in reader:
            data.append(row)
    return data


In [None]:
data = load_csv('editors.csv')
print(data)