# 2. BeautifulSoup (HTML CSS)

### HTML and CSS

Let's start explore how a Web page HTML doc works with CSS (Cascading Style Sheet)

https://codepen.io/buzztracer/pen/yLYJBOd



### HTML div Tag

    
**Definition and Usage**
    
The div tag defines a division or a section in an HTML document.

The div element is often used as a container for other HTML elements to style them with CSS or to perform certain tasks with JavaScript.


<tag id=""></tag>



### HTML  Id Attributes

**Definition and Usage**

The id attribute is a unique identifier which is used to specify the document.

It is used by CSS and JavaScript to perform a certain task for a unique element. 

In CSS, the id attribute is used using # symbol followed by id.

### HTML Class Attribute

**Definition and Usage**

Class in html:

The class is an attribute which specifies one or more class names for an HTML element.

The class attribute can be used on any HTML element.

The class name can be used by CSS and JavaScript to perform certain tasks for elements with the specified class name.


### How a web page call another?

By using the < a >: Anchor element

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a


In [4]:
# Get all hyperlinks

import requests

from bs4 import BeautifulSoup

html = requests.get('http://en.wikipedia.org/wiki/Malaysia')
bs = BeautifulSoup(html.content, 'html.parser')
for link in bs.find_all('a'):
    if 'href' in link.attrs:
        print(link.attrs['href'])

r)
https://id.loc.gov/authorities/names/n79022246
/wiki/MBAREA_(identifier)
https://musicbrainz.org/area/305d19c7-c040-349c-8d5f-6ac75d2d2a09
/wiki/NARA_(identifier)
https://catalog.archives.gov/id/10035733
/wiki/NDL_(identifier)
https://id.ndl.go.jp/auth/ndlna/00567491
/wiki/NKC_(identifier)
https://aleph.nkp.cz/F/?func=find-c&local_base=aut&ccl_term=ica=ge129933&CON_LNG=ENG
/wiki/NLI_(identifier)
http://uli.nli.org.il/F/?func=direct&doc_number=000088105&local_base=nlx10
/wiki/SELIBR_(identifier)
https://libris.kb.se/auth/153273
/wiki/TDV%C4%B0A_(identifier)
https://islamansiklopedisi.org.tr/malezya
/wiki/VIAF_(identifier)
https://viaf.org/viaf/145332698
/wiki/WorldCat_Identities_(identifier)
https://www.worldcat.org/identities/lccn-n79022246
https://en.wikipedia.org/w/index.php?title=Malaysia&oldid=1014283200
/wiki/Help:Category
/wiki/Category:Malaysia
/wiki/Category:Commonwealth_monarchies
/wiki/Category:Developing_8_Countries_member_states
/wiki/Category:Federal_monarchies
/wiki/Ca

In [6]:
# retrieve only desired list of articles by using regular expression  ^(/wiki/)((?!:).)*$"):
import requests
import re
from bs4 import BeautifulSoup

html = requests.get('http://en.wikipedia.org/wiki/Malaysia')
bs = BeautifulSoup(html.content, 'html.parser')

for link in bs.find('div', {'id':'bodyContent'}).find_all('a', href=re.compile('^(/wiki/)((?!:).)*$')):
    if 'href' in link.attrs:
        print(link.attrs['href'])

FIFA_World_Cup
/wiki/Thomas_Cup
/wiki/Squash_Racquets_Association_Of_Malaysia
/wiki/Malaysia_men%27s_national_field_hockey_team
/wiki/FIH_World_Rankings
/wiki/Hockey_World_Cup
/wiki/Merdeka_Stadium
/wiki/Formula_One
/wiki/Sepang_International_Circuit
/wiki/Malaysian_Grand_Prix
/wiki/Silat_Melayu
/wiki/Ethnic_Malays
/wiki/1956_Summer_Olympics
/wiki/Olympic_Council_of_Malaysia
/wiki/Malaysia_at_the_Olympics
/wiki/1972_Munich_Olympic_Games
/wiki/Commonwealth_Games
/wiki/List_of_Malaysia-related_topics
/wiki/Outline_of_Malaysia
/wiki/National_Language_Act_1963/67
/wiki/National_Language_Act_1963/67
/wiki/Federal_Constitution_of_Malaysia
/wiki/National_Language_Act_1963/67
/wiki/ISBN_(identifier)
/wiki/International_Monetary_Fund
/wiki/ISBN_(identifier)
/wiki/ISBN_(identifier)
/wiki/International_Monetary_Fund
/wiki/ISBN_(identifier)
/wiki/Central_Intelligence_Agency
/wiki/Utusan_Malaysia
/wiki/ISBN_(identifier)
/wiki/ISBN_(identifier)
/wiki/ISBN_(identifier)
/wiki/ISBN_(identifier)
/wiki/I

### Saving the results to a CSV file

In [13]:
import csv 
import requests 

from bs4 import BeautifulSoup

html = requests.get("http://en.wikipedia.org/wiki/Comparison_of_text_editors")
bsObj = BeautifulSoup(html.content, 'html.parser')

#The main comparison table is currently the first table on the page
table = bsObj.findAll("table",{"class":"wikitable"})[0]

rows = table.findAll("tr")

csvFile = open("editors.csv", 'w', encoding='utf8')
writer = csv.writer(csvFile)

try:
    for row in rows:    
        csvRow = []
        for cell in row.findAll(['td', 'th']):
                csvRow.append(cell.get_text())
                writer.writerow(csvRow)
finally:    
    csvFile.close()

.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/0/03/Green_check.svg/26px-Green_check.svg.png 2x" title="Yes" width="13"/>
</td>
<td class="table-yes2" data-sort-value="Yes" style="background: #D2FFD2; color: black; vertical-align: middle; text-align: center;"><img alt="Yes" data-file-height="600" data-file-width="600" decoding="async" height="13" src="//upload.wikimedia.org/wikipedia/commons/thumb/0/03/Green_check.svg/13px-Green_check.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/0/03/Green_check.svg/20px-Green_check.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/0/03/Green_check.svg/26px-Green_check.svg.png 2x" title="Yes" width="13"/>
</td>
<td class="table-no2" data-sort-value="No" style="background: #FFD2D2; color:black; vertical-align: middle; text-align: center;"><img alt="No" data-file-height="600" data-file-width="600" decoding="async" height="13" src="//upload.wikimedia.org/wikipedia/commons/thumb/4/48/Dark_Red_x.svg/13px-Dark_Red