## Web Scraping Demo
This is a simple demonstration of web scraping using Python. The code uses libraries such as `requests` and `BeautifulSoup` to extract job data from websites like Wuzzuf and Bayt.


### Step 1: Install Required Libraries
Install the `lxml` library for HTML parsing.

In [1]:
!pip install lxml

Collecting lxml
  Downloading lxml-5.3.0-cp313-cp313-win_amd64.whl.metadata (3.9 kB)
Downloading lxml-5.3.0-cp313-cp313-win_amd64.whl (3.8 MB)
   ---------------------------------------- 0.0/3.8 MB ? eta -:--:--
   ---------------------------------------- 0.0/3.8 MB ? eta -:--:--
   ---------------------------------------- 0.0/3.8 MB ? eta -:--:--
   -- ------------------------------------- 0.3/3.8 MB ? eta -:--:--
   -- ------------------------------------- 0.3/3.8 MB ? eta -:--:--
   ----- ---------------------------------- 0.5/3.8 MB 588.1 kB/s eta 0:00:06
   ----- ---------------------------------- 0.5/3.8 MB 588.1 kB/s eta 0:00:06
   -------- ------------------------------- 0.8/3.8 MB 752.6 kB/s eta 0:00:05
   ---------- ----------------------------- 1.0/3.8 MB 777.2 kB/s eta 0:00:04
   ------------- -------------------------- 1.3/3.8 MB 845.3 kB/s eta 0:00:03
   ------------- -------------------------- 1.3/3.8 MB 845.3 kB/s eta 0:00:03
   ---------------- ----------------------- 

### Step 2: Import Required Libraries
Import the necessary libraries for making HTTP requests and parsing HTML.

In [4]:
import requests
from bs4 import BeautifulSoup

### Step 3: Scrape Wuzzuf for Job Data
Define the target URL and fetch the page content.

In [6]:
# Define the URL for Wuzzuf search page
u = "https://wuzzuf.net/search/jobs?a=spbg&q=machine%20learning"

# Send a GET request to the URL
page = requests.get(u)
print(page)

# Parse the page content using BeautifulSoup
soup = BeautifulSoup(page.content, "html.parser")

<Response [200]>


#### Step 3.1: Extract Job Titles
Use BeautifulSoup to find and print all job titles on the page.

In [16]:
# Find all job titles on the page
job_titles = soup.find_all("h2", class_="css-m604qf")
for job_title in job_titles:
    print(job_title.text)

Machine Learning Lead
AI Engineer
AI/ML Python Developer
Data Analyst
AI Engineer
Robotics and Programming Engineer
Sales Specialist
AWS Cloud Administrator
Senior Full Stack PHP Laravel Developer (Remote - Full Time)
AI Specialist
GIS Technician
CAM Engineer
Senior IT Help Desk
Receptionist
Chief Technology Officer / Co-Founder


#### Step 3.2: Extract Job Locations
Extract and print the locations of the jobs.

In [17]:
# Find all job locations
job_locations = soup.find_all("span", class_="css-5wys0k")
for job_location in job_locations:
    print(job_location.text)

Maadi, Cairo, Egypt 
Cairo, Egypt 
Riyadh, Saudi Arabia 
Azarita, Alexandria, Egypt 
Heliopolis, Cairo, Egypt 
Alexandria, Egypt 
Downtown, Cairo, Egypt 
New Cairo, Cairo, Egypt 
New Cairo, Cairo, Egypt 
Cairo, Egypt 
New Cairo, Cairo, Egypt 
Nasr City, Cairo, Egypt 
New Nozha, Cairo, Egypt 
Madinaty, Cairo, Egypt 
New Cairo, Cairo, Egypt 


#### Step 3.3: Extract Company Names
Extract and print the company names associated with the jobs.

In [18]:
# Find all company names
company_names = soup.find_all("a", class_="css-17s97q8")
for company_name in company_names:
    print(company_name.text)

WUZZUF -
Rehabitaire -
BrainBox -
Mobility Pro DMCC -
Integrated Technology Group -
Smart Technology -
Gila Electric -
Citylogix ME -
HIGHBASE TRADING W.L.L -
LINK Development -
Citylogix ME -
Mekano -
Early Arrive   -
Body Fit EMS fitness  -
Confidential -


#### Step 3.4: Extract Job Post Dates
Extract and print when the jobs were posted.

In [33]:
# Find all job posting dates
job_dates= soup.find_all("div", class_= ["css-4c4ojb", "css-do6t5g"])
for job_date in job_dates:
    print(job_date.text)

7 days ago
15 days ago
29 days ago
1 month ago
1 month ago
2 months ago
1 month ago
27 days ago
17 days ago
25 days ago
5 days ago
20 days ago
3 days ago
12 days ago
17 days ago


#### Step 3.5: Extract Job Types
Extract and print the types of jobs (e.g., Full-time, Part-time).

In [35]:
# Find all job types
job_types = soup.find_all("span", class_="css-1ve4b75 eoyjyou0")
for job_type in job_types:
    print(job_type.text)

Full Time
Full Time
Full Time
Full Time
Full Time
Full Time
Full Time
Full Time
Full Time
Full Time
Full Time
Full Time
Full Time
Full Time
Full Time


#### Step 3.6: Extract Job URLs
Extract and print the URLs for individual job postings.

In [38]:
# Extract job URLs
# there was an issue: we went to the higher class in the html content to avoid this problem
job_urls=[]
urls= soup.find_all("h2", class_="css-m604qf")
for job_url in urls:
    job_urls.append(job_url.find("a").attrs["href"])
print(job_urls)

['https://wuzzuf.net/jobs/p/4Nf60vomcxgU-Machine-Learning-Lead-WUZZUF-Cairo-Egypt', 'https://wuzzuf.net/jobs/p/Xu4wzCQcUFb8-AI-Engineer-Rehabitaire-Cairo-Egypt', 'https://wuzzuf.net/jobs/p/yOPi6zGlGA0e-AIML-Python-Developer-BrainBox-Riyadh-Saudi-Arabia', 'https://wuzzuf.net/jobs/p/MSKFu1ytuKG0-Data-Analyst-Mobility-Pro-DMCC-Alexandria-Egypt', 'https://wuzzuf.net/jobs/p/mJweLehwNSbz-AI-Engineer-Integrated-Technology-Group-Cairo-Egypt', 'https://wuzzuf.net/jobs/p/jjEycNgftu1U-Robotics-and-Programming-Engineer-Smart-Technology-Alexandria-Egypt', 'https://wuzzuf.net/jobs/p/PfXPkSfnCCNs-Sales-Specialist-Gila-Electric-Cairo-Egypt', 'https://wuzzuf.net/jobs/p/XgaTEqFbNrDY-AWS-Cloud-Administrator-Citylogix-ME-Cairo-Egypt', 'https://wuzzuf.net/jobs/p/mXk2cwxACW2l-Senior-Full-Stack-PHP-Laravel-Developer-Remote---Full-Time-HIGHBASE-TRADING-W-L-L-Cairo-Egypt', 'https://wuzzuf.net/jobs/p/3rg82U4xRJsY-AI-Specialist-LINK-Development-Cairo-Egypt', 'https://wuzzuf.net/jobs/p/vm7kDATBesFt-GIS-Technician

#### Step 3.7: Fetch Individual Job Details
Fetch and print the titles of individual jobs from their URLs.

In [39]:
# Example URL for testing
url = "https://wuzzuf.net/jobs/p/5NFxSMKMH5K0-SeniorMid-Senior-Deep-Learning-Engineer-Cairo-Egypt?o=1&l=sp&t=sj&a=machine%20learning|search-v3|spbg"

# Send a GET request and parse the content
page = requests.get(url)
print(page)
soup = BeautifulSoup(page.content, "html.parser")

<Response [200]>


In [40]:
jobs_titles = soup.find_all("span", class_="css-w7fd6s")
for job_title in jobs_titles:
    print(job_title.text)

Senior Management Jobs
Management Jobs
Experienced Jobs
Entry Level Jobs
Internships
All Jobs


### Step 4: Scrape Bayt for Job Data
Switch to a different platform and extract job data from Bayt.

In [43]:
# Define the Bayt URL for data science jobs
url = "https://www.bayt.com/en/egypt/jobs/data-science-jobs/"

# Send a GET request with a user-agent header to mimic a browser
page = requests.get(url, headers={"User-Agent": "Mozilla/5.0"})
print(page)

soup= BeautifulSoup(page.content, "html.parser")

<Response [200]>


#### Step 4.1: Extract Job Titles on Bayt
Extract and print job titles from Bayt.

In [44]:
# Find all job titles on the page
job_titles = soup.find_all("h2", class_="col u-stretch t-large m0 t-nowrap-d t-trim")
for job_title in job_titles:
    print(job_title.text)



Programme Associate - Data Analyst 



Odoo Developer Work In Suadi Arabia 



Brand Specialist, AVS (Amazon Vendor Services) 



Data Science Manager 



Senior Data Science Engineer 



Customer Success Manager – Focused on Data Science - 218456 



Team Lead Data Scientist 



Quality Audits Manager 



Data Governance Expert 



Data Engineer 



Data Scientist 



Senior Data Testing Consultant - 218657 



Enterprise Data Operations Assistant Analyst 



Area Sales Professional – Laboratory Diagnostics (Cairo & Delta Region) 



Senior IT Auditor 



Product Manager - SMS 



Public Relations and Communications Manager 



Chief Technology Officer (CTO) 



Chief Technology Officer (CTO) 



Developer, Packaging Development & Engineering 



#### Step 4.2: Extract Job URLs on Bayt
Extract and print the URLs for job postings on Bayt.

In [45]:
# Extract job URLs on Bayt
job_urls=[]
urls= soup.find_all("h2", class_="col u-stretch t-large m0 t-nowrap-d t-trim")
for job_url in urls:
    job_urls.append(job_url.find("a").attrs["href"])
print(job_urls)


['/en/egypt/jobs/programme-associate-data-analyst-5225967/', '/en/egypt/jobs/odoo-developer-work-in-suadi-arabia-5219562/', '/en/egypt/jobs/brand-specialist-avs-amazon-vendor-services-5218648/', '/en/egypt/jobs/data-science-manager-71400725/', '/en/egypt/jobs/senior-data-science-engineer-66105998/', '/en/egypt/jobs/customer-success-manager-focused-on-data-science-218456-72042911/', '/en/egypt/jobs/team-lead-data-scientist-72183001/', '/en/egypt/jobs/quality-audits-manager-72197103/', '/en/egypt/jobs/data-governance-expert-72106658/', '/en/egypt/jobs/data-engineer-72165704/', '/en/egypt/jobs/data-scientist-72170930/', '/en/egypt/jobs/senior-data-testing-consultant-218657-72167031/', '/en/egypt/jobs/enterprise-data-operations-assistant-analyst-72141070/', '/en/egypt/jobs/area-sales-professional-laboratory-diagnostics-cairo-delta-region-72190065/', '/en/egypt/jobs/senior-it-auditor-72190543/', '/en/egypt/jobs/product-manager-sms-72189158/', '/en/egypt/jobs/public-relations-and-communicati

### Quizzes
Try to find additional information from the scraped pages using the following tasks:

#### Quiz 1: Find the Locations of Jobs

In [46]:
# Find the location
location = soup.find_all("....", class_ ="...")
for l in location:
    print(l.text)

#### Quiz 2: Find the Company Names of Jobs

In [None]:
# Find the company
company = soup.find_all("...", class_ ="....")
for c in company:
    print(c.text)

#### Quiz 3: Extract Job Descriptions

In [None]:
# job descreption
desc = soup.find_all("....", class_ ="...")
for d in desc:
    print(d.text)