#**ACM BPDC**

### **Web scraping is the process of gathering information from the Internet. Even copying and pasting the lyrics of your favorite song is a form of web scraping! However, the words “web scraping” usually refer to a process that involves automation. Some websites don’t like it when automatic scrapers gather their data, while others don’t mind.**

What is the use of **BeautifulSou**p in Python?
Beautiful Soup is a Python library that is used for web scraping purposes to pull the data out of HTML and XML files. It creates a parse tree from page source code that can be used to extract data in a hierarchical and more readable manner.

In [2]:
!pip install BeautifulSoup4



In [3]:
#Import BeautifulSoup library
from bs4 import BeautifulSoup

# **Demo**

In [1]:
html_doc = """<html><head><title>ACM Coding Bootcamp 2021</title></head>
<body>
<p class="title"><b>Web Scarping</b></p>

<p class="description">This bootcamp will provide its audience the core foundation in this amazing field of programming by covering topics like </p><br>
<a href="https://github.com/acmbpdc/coding-bootcamp-2021/blob/main/docs/01-command-line-interface" class="topic" id="link1">Command Line Interface</a><br>
<a href="https://github.com/acmbpdc/coding-bootcamp-2021/blob/main/docs/02-overview-of-programming-languages" class="topic" id="link2">Overview of Prgramming Languages</a><br>
<a href="https://github.com/acmbpdc/coding-bootcamp-2021/blob/main/docs/03-data-structures" class="topic" id="link3">Data Structures</a><br>
<a href="https://github.com/acmbpdc/coding-bootcamp-2021/blob/main/docs/04-algorithms" class="topic" id="link4">Algorithms</a><br>
<a href="https://github.com/acmbpdc/coding-bootcamp-2021/blob/main/docs/05-python-1" class="topic" id="link5">Python</a><br>
<a href="https://github.com/acmbpdc/coding-bootcamp-2021/blob/main/docs/07-git-and-github-1" class="topic" id="link6">Git and GitHub</a><br>

</body></html>"""

In [4]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

In [None]:
soup

In [6]:
title = soup.find('p', class_ = 'title').text

In [8]:
title

'Web Scarping'

In [10]:
description = soup.find('p', class_ = 'description').text

In [11]:
description

'This bootcamp will provide its audience the core foundation in this amazing field of programming by covering topics like '

In [12]:
topics = soup.find_all('a')

In [13]:
topics

[<a class="topic" href="https://github.com/acmbpdc/coding-bootcamp-2021/blob/main/docs/01-command-line-interface" id="link1">Command Line Interface</a>,
 <a class="topic" href="https://github.com/acmbpdc/coding-bootcamp-2021/blob/main/docs/02-overview-of-programming-languages" id="link2">Overview of Prgramming Languages</a>,
 <a class="topic" href="https://github.com/acmbpdc/coding-bootcamp-2021/blob/main/docs/03-data-structures" id="link3">Data Structures</a>,
 <a class="topic" href="https://github.com/acmbpdc/coding-bootcamp-2021/blob/main/docs/04-algorithms" id="link4">Algorithms</a>,
 <a class="topic" href="https://github.com/acmbpdc/coding-bootcamp-2021/blob/main/docs/05-python-1" id="link5">Python</a>,
 <a class="topic" href="https://github.com/acmbpdc/coding-bootcamp-2021/blob/main/docs/07-git-and-github-1" id="link6">Git and GitHub</a>]

In [14]:
topic1 = soup.find_all('a')[0]

In [15]:
topic1

<a class="topic" href="https://github.com/acmbpdc/coding-bootcamp-2021/blob/main/docs/01-command-line-interface" id="link1">Command Line Interface</a>

In [16]:
topic2 = soup.find_all('a')[1]

In [17]:
topic2

<a class="topic" href="https://github.com/acmbpdc/coding-bootcamp-2021/blob/main/docs/02-overview-of-programming-languages" id="link2">Overview of Prgramming Languages</a>

In [18]:
#Alternative
topic_1 = soup.find('a', class_ = 'topic', id = 'link1').text

In [19]:
topic_1

'Command Line Interface'

# **Web Scrape Drive Arabia Website**

**Requests** will allow you to send HTTP/1.1 requests using Python. With it, you can add content like headers, form data, multipart files, and parameters via simple Python libraries. It also allows you to access the response data of Python in the same way.

Urllib package is the URL handling module for python. It is used to fetch URLs (Uniform Resource Locators). It uses the urlopen function and is able to fetch URLs using a variety of different protocols.

### **urllib.request** for opening and reading.

In [None]:
#Import Requests and urlopen library
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup

In [None]:
drive_arabia_url = 'https://www.drivearabia.com/carprices/uae/ford/ford-bronco/2021/'
site= drive_arabia_url
hdr = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}
req = Request(site,headers=hdr)
page = urlopen(req)
soup = BeautifulSoup(page)

In [None]:
#Opening Raw HTML/CSS file of Drive Arabia Website
soup

In [None]:
##Finding Main Tag inside 'soup' 
tag_having_main = soup.find('main', class_ = 'main-content')

In [None]:
#Opening 'tag_having_main' 
tag_having_main

In [None]:
#Finding Manufacturer Name inside 'tag_having_main' using itemprop as an arguement and converting it to text
tag_having_manufacturer_name = tag_having_main.find('span', itemprop = 'manufacturer').text

In [None]:
#Printing 'tag_having_manufacturer_name' 
tag_having_manufacturer_name

'Ford'

In [None]:
#Finding Brand Name inside 'tag_having_main' using itemprop as an arguement and converting it to text
tag_having_brand_name = tag_having_main.find('span', itemprop = 'brand').text

In [None]:
#Printing 'tag_having_manufacturer_name' 
tag_having_brand_name

'Bronco'

In [None]:
#Finding Model Name inside 'tag_having_main' using itemprop as an arguement and converting it to text
tag_having_model_number = tag_having_main.find('span', itemprop = 'model').text

In [None]:
#Printing 'tag_having_manufacturer_name' 
tag_having_model_number

'2021'

In [None]:
#Finding Tag inside 'tag_having_main' containing Car price and converting it to text
tag_having_car_price = tag_having_main.find('span', itemprop = 'price').text

In [None]:
#Printing 'tag_having_car_price' contaning Car price in AED
tag_having_car_price

'193,095'

In [None]:
#Finding the Tag containing Specification Table
tag_having_table = soup.find('table', class_ = 'table table-sm table-bordered')

In [None]:
#Opening 'tag_having_table' 
tag_having_table

<table class="table table-sm table-bordered">
<tbody>
<tr>
<th>Country of Origin</th>
<td dir="ltr">United States</td>
</tr>
<tr>
<th>Class</th>
<td dir="ltr" itemprop="bodyType">Midsize SUV</td>
</tr>
<tr>
<th>Body Styles</th>
<td dir="ltr" itemprop="additionalType">5-door wagon</td>
</tr>
<tr>
<th>Weight (kg)</th>
<td dir="ltr" itemprop="weight">2040 - 2413</td>
</tr>
</tbody>
</table>

In [None]:
#Finding Country of Origin inside 'tag_having_table' using 'itemprop' as an argument and converting to text
tag_having_country_of_origin = tag_having_table.find('td', itemprop = '').text

In [None]:
#Printing 'tag_having_country_of_origin' 
tag_having_country_of_origin

'United States'

In [None]:
#Finding Car Class inside 'tag_having_table' using 'itemprop' as an argument and converting to text
tag_having_car_class = tag_having_table.find('td', itemprop = 'bodyType').text

In [None]:
#Printing 'tag_having_car_class' 
tag_having_car_class

'Midsize SUV'

In [None]:
#Finding Car Body Styles inside 'tag_having_table' using 'itemprop' as an argument and converting to text
tag_having_car_body_styles = tag_having_table.find('td', itemprop = 'additionalType').text

In [None]:
#Printing tag_having_car_body_styles' 
tag_having_car_body_styles

'5-door wagon'

In [None]:
#Finding Car Weight inside 'tag_having_table' using 'itemprop' as an argument and converting to text
tag_having_car_weight_kg = tag_having_table.find('td', itemprop = 'weight').text

In [None]:
tag_having_car_weight_kg

'2040 - 2413'

In [None]:
print('Manufacturer Name:' + tag_having_manufacturer_name)
print('Brand Name:' + tag_having_brand_name)
print('Model Number:' + tag_having_model_number)
print('Car Price (AED):' + tag_having_car_price)
print('Country of Origin:' + tag_having_country_of_origin)
print('Car Class:' + tag_having_car_class)
print('Car Body Style:' + tag_having_car_body_styles)
print('Car Weight(Kg):' + tag_having_car_weight_kg)

Manufacturer Name:Ford
Brand Name:Bronco
Model Number:2021
Car Price (AED):193,095
Country of Origin:United States
Car Class:Midsize SUV
Car Body Style:5-door wagon
Car Weight(Kg):2040 - 2413


# Write a Python Function that takes in the arguement 


1.   Manufacturer Name
2.   Brand Name
3.   Model Number

# And returns the following:

*   Car Price (AED)
*   Country of Origin
*   Car Class
*   Body Style
*   Car Weight (Kg)

# By web scraping **Drive Arabia** Website


In [None]:
def drive_arabia(manufacturer_name,brand_name,model_number):
  url = 'https://www.drivearabia.com/carprices/uae/' + manufacturer_name +'/' + manufacturer_name + '-' + brand_name + '/' + model_number
  site= url
  hdr = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}
  req = Request(site,headers=hdr)
  page = urlopen(req)
  soup = BeautifulSoup(page)
  tag_having_main = soup.find('main', class_ = 'main-content')
  tag_having_car_price = tag_having_main.find('span', itemprop = 'price').text
  tag_having_table = soup.find('table', class_ = 'table table-sm table-bordered')
  tag_having_country_of_origin = tag_having_table.find('td', itemprop = '').text
  tag_having_car_class = tag_having_table.find('td', itemprop = 'bodyType').text
  tag_having_car_body_styles = tag_having_table.find('td', itemprop = 'additionalType').text
  tag_having_car_weight_kg = tag_having_table.find('td', itemprop = 'weight').text
  print('Car Price (AED): ' + tag_having_car_price)
  print('Country of Origin: ' + tag_having_country_of_origin)
  print('Car Class: ' + tag_having_car_class)
  print('Car Body Style: ' + tag_having_car_body_styles)
  print('Car Weight(Kg): ' + tag_having_car_weight_kg)


In [None]:
manufacturer_name = input('Enter Manufacturer Name: ')
brand_name = input('Enter Brand Name: ')
model_number = input('Enter Model Number: ')
manufacturer_name = manufacturer_name.lower()
brand_name = brand_name.lower()

Enter Manufacturer Name: cmc
Enter Brand Name: z7
Enter Model Number: 2017


In [None]:
drive_arabia(manufacturer_name,brand_name,model_number)

Car Price (AED): 27,000
Country of Origin: Taiwan
Car Class: Midsize Minivan
Car Body Style: 5-door minivan
Car Weight(Kg): 1700 - 1725
