# Tutorial: Web Scraping with Python Using Beautiful Soup

# What is Web Scraping?

        Web scraping is a technique to access and extract information from a website.

# Why Web Scraping?

        - Collect data that is not available through an API
        - Automate the process of collecting data (Save time and effort) 
        - Collect data that is updated dynamically

# Is Web Scraping Legal?

        * Web scraping is legal as long as you’re not violating the website terms of service .
        * If the website doesn’t have a terms of service, you should be fine (web scraping consumes server resources for the host website).

# The Components of a Web Page

        * HTML: The markup language that defines the structure of a web page.
        * CSS: The language that defines the style of a web page.
        * JavaScript: The language that defines the behavior of a web page.

## HTML Tree Structure

    <html>
        <head>
            <title>Page Title</title>
        </head>
        <body>
            <h1>Heading 1</h1>
            <p>Paragraph 1</p>
            <p>Paragraph 2</p>
            <p>Paragraph 3</p>
        </body>
    </html>

## CSS Classes and IDs



        * CSS classes and IDs are used to identify specific elements on a web page.
        * CSS classes are used to identify multiple elements on a web page.
        * CSS IDs are used to identify a single element on a web page.
* Class: 
```
<p class="class1"> Paragraph 1 </p>
<h2 class="class1"> Heading 2 </h2>
```

* ID: 
```
<p id="id1">Paragraph 1</p>
```



# How Does Web Scraping Work?

<center> <img src="web.png"> </center>

        * We’re essentially doing the same thing a web browser does — sending a server request with a specific URL and asking the server to return the code for that page.
        * But unlike a web browser, our web scraping code won’t interpret the page’s source code and display the page visually.
        * Instead, we’ll write some custom code that filters through the page’s source code looking for specific elements we’ve specified, and extracting whatever content we want.

        * We write a code that sends a request to the server that’s hosting the page we specified.
        * The server will return the source code — HTML— for the page 
        * We can then parse the HTML to extract the data we want.

<center> <img src="response.png"> </center> 

In [11]:
import requests
from bs4 import BeautifulSoup

In [32]:
response=requests.get('https://wuzzuf.net/search/jobs/?q=data%20&a=hpb')
response

<Response [200]>

In [33]:
soup=BeautifulSoup(response.content,'lxml')
soup

<!DOCTYPE html>
<html lang="en" translate="no">
<head>
<meta charset="utf-8"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width, initial-scale=1.0, shrink-to-fit=no" name="viewport"/>
<meta content="Thu Dec 08 2022 18:30:44 GMT+0200" http-equiv="expires"/>
<meta content="no-cache" http-equiv="Pragma"/>
<meta content="no-cache, no-store, must-revalidate" http-equiv="cache-control"/>
<meta content="notranslate" name="googlebot"/>
<title data-react-helmet="true">2,214 data  Jobs in Egypt – Discover Job Details Now!</title>
<meta charset="utf-8" data-react-helmet="true"/><meta content="Explore 2,214 data  Jobs in Egypt. Discover exciting opportunities with a leading recruitment company. Start your job search today!" data-react-helmet="true" name="description"/><meta content="jobs in Egypt, job in Egypt, careers egypt, jobs in Cairo, jobs in alexandria, employment in egypt, Egypt jobs, jobs vacancies, job vacancies in egypt, job search egypt, job vaca

In [34]:
title=soup.find("h2",{'class':'css-m604qf'})
title

<h2 class="css-m604qf"><style data-emotion="css o171kl">.css-o171kl{-webkit-text-decoration:none;text-decoration:none;color:inherit;}</style><a class="css-o171kl" href="https://wuzzuf.net/jobs/p/7CKrWa66tbe3-Senior-Data-Warehousing-Business-Intelligence-Engineer-with-Expertise-in-Microsoft-Technologies-The-Micro-Small-Medium-Enterprise-Development-Agency-Giza-Egypt" rel="noreferrer" target="_blank">Senior Data Warehousing &amp; Business Intelligence Engineer with Expertise in Microsoft Technologies</a></h2>

In [35]:
title.a.text

'Senior Data Warehousing & Business Intelligence Engineer with Expertise in Microsoft Technologies'

In [36]:
titles=soup.find_all("h2",{'class':'css-m604qf'})
titles

[<h2 class="css-m604qf"><style data-emotion="css o171kl">.css-o171kl{-webkit-text-decoration:none;text-decoration:none;color:inherit;}</style><a class="css-o171kl" href="https://wuzzuf.net/jobs/p/7CKrWa66tbe3-Senior-Data-Warehousing-Business-Intelligence-Engineer-with-Expertise-in-Microsoft-Technologies-The-Micro-Small-Medium-Enterprise-Development-Agency-Giza-Egypt" rel="noreferrer" target="_blank">Senior Data Warehousing &amp; Business Intelligence Engineer with Expertise in Microsoft Technologies</a></h2>,
 <h2 class="css-m604qf"><a class="css-o171kl" href="https://wuzzuf.net/internship/FcFTIdEGvaCZ-Data-Entry-Specialist-Micro-Engineering-Cairo-Egypt" rel="noreferrer" target="_blank">Data Entry Specialist</a></h2>,
 <h2 class="css-m604qf"><a class="css-o171kl" href="https://wuzzuf.net/jobs/p/rcevcukHek6N-Senior-Data-and-AI-Engineer-Global-Brands-Cairo-Egypt" rel="noreferrer" target="_blank">Senior Data and AI Engineer</a></h2>,
 <h2 class="css-m604qf"><a class="css-o171kl" href="htt

In [40]:
titles[1].text

'Data Entry Specialist'

In [42]:
for i in titles:
    print(i.text)

Senior Data Warehousing & Business Intelligence Engineer with Expertise in Microsoft Technologies
Data Entry Specialist
Senior Data and AI Engineer
Data Admin
Instructor - Data Integration and ETL Processes
Data Entry
Data Analyst Lead / Manager
Data Entry Specialist
Data Analyst
Data Management & Business Admin Assistant
Data Engineer
Data Maintenance Specialist
Senior Data Integration Engineer
Data Entry & Reservation
Data Allocation Specialist / Data Entry
