# Python Web Scraping "Wuzzuf.com"

## Case study: Wuzzuf.com [web scraping]


Wuzzuf.com url: 'https://wuzzuf.net/search/jobs/?q=data+scientist&a=hpb' 

## How do you scrape data from a website?
- Find URL that you want to scrape
- Inspecting the page
- Find the Data you want to extact
- Write the code
- Run the code and extract the data
- Store the data in the required format

### Import Laibraries & Methods

In [1]:
from bs4 import BeautifulSoup as bs

In [2]:
from urllib.request import urlopen

#### Inputting the URL

In [3]:
url = 'https://wuzzuf.net/search/jobs/?q=data+scientist&a=hpb'

#### Create a Client-based Request to Get the URL

In [4]:
client = urlopen(url)

#### Getting the HTML code of the Full Page

In [5]:
html = client.read()

In [6]:
html

b'<!DOCTYPE html>\n<html lang="en">\n<head>\n    <meta charset="utf-8">\n    <meta http-equiv="X-UA-Compatible" content="IE=edge">\n    <meta name="viewport" content="width=device-width, initial-scale=1.0, shrink-to-fit=no">\n\n    <title data-react-helmet="true">Job Search | WUZZUF</title>\n\n<meta data-react-helmet="true" charset="utf-8"/><meta data-react-helmet="true" name="description" content="Searching for jobs in Egypt? Wuzzuf helps you in your online job search to find Jobs in Egypt and Middle East. Choose the right job using our online recruitment services."/><meta data-react-helmet="true" name="keywords" content="jobs in Egypt, job in Egypt, careers egypt, jobs in Cairo, jobs in alexandria, employment in egypt, Egypt jobs, jobs vacancies, job vacancies in egypt, job search egypt, job vacancies egypt, job recruitment in egypt, job opportunities in egypt, jobs cairo, job vacancy egypt , \xd9\x88\xd8\xb8\xd8\xa7\xd8\xa6\xd9\x81 \xd9\x85\xd8\xb5\xd8\xb1"/><meta data-react-helmet=

#### Closing the Request

In [7]:
client.close()

#### Creating an HTML Parser Using BeautifulSoup

In [8]:
soup = bs(html, "html.parser")

In [9]:
soup

<!DOCTYPE html>

<html lang="en">
<head>
<meta charset="utf-8"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width, initial-scale=1.0, shrink-to-fit=no" name="viewport"/>
<title data-react-helmet="true">Job Search | WUZZUF</title>
<meta charset="utf-8" data-react-helmet="true"><meta content="Searching for jobs in Egypt? Wuzzuf helps you in your online job search to find Jobs in Egypt and Middle East. Choose the right job using our online recruitment services." data-react-helmet="true" name="description"><meta content="jobs in Egypt, job in Egypt, careers egypt, jobs in Cairo, jobs in alexandria, employment in egypt, Egypt jobs, jobs vacancies, job vacancies in egypt, job search egypt, job vacancies egypt, job recruitment in egypt, job opportunities in egypt, jobs cairo, job vacancy egypt , وظائف مصر" data-react-helmet="true" name="keywords"><meta content="Jobs in Egypt | WUZZUF" data-react-helmet="true" property="og:title"/><meta content="website"

#### Creating a container for Needed Data

In [10]:
containers = soup.find_all("div",{"class":"css-1gatmva e1v1l3u10"})

In [11]:
len(containers)

12

In [12]:
bs.prettify(containers[0])

'<div class="css-1gatmva e1v1l3u10">\n <style data-emotion="css pkv5jc">\n  .css-pkv5jc{position:relative;min-height:60px;}\n </style>\n <div class="css-pkv5jc">\n  <a href="https://wuzzuf.net/jobs/careers/Fixed-Solutions-Egypt-18523" rel="noreferrer" target="_blank">\n   <style data-emotion="css 17095x3">\n    .css-17095x3{position:absolute;right:0;top:0;width:60px;height:60px;object-fit:contain;object-position:center center;}\n   </style>\n  </a>\n  <style data-emotion="css laomuu">\n   .css-laomuu{padding-right:60px;}\n  </style>\n  <div class="css-laomuu">\n   <style data-emotion="css m604qf">\n    .css-m604qf{font-size:16px;font-weight:600;font-style:normal;letter-spacing:-0.4px;line-height:24px;color:#0055D9;margin:0;}\n   </style>\n   <h2 class="css-m604qf">\n    <style data-emotion="css o171kl">\n     .css-o171kl{-webkit-text-decoration:none;text-decoration:none;color:inherit;}\n    </style>\n    <a class="css-o171kl" href="/jobs/p/PVAeMhWwvA6Q-Senior-Data-Scientist-Fixed-Solut

#### Accessing Page elements

In [13]:
#Access Jop Title of fisrt element
containers[0].div.h2.text

'Senior Data Scientist'

In [14]:
#Access by another way the Jop Title of fisrt element
jtitle = containers[0].findAll("h2",{"class":"css-m604qf"})
jtitle[0].text

'Senior Data Scientist'

In [15]:
#Access Company name of fisrt element
cname = containers[0].findAll("a",{"class":"css-17s97q8"})
cname[0].text

'Fixed Solutions -'

In [16]:
#Access Jop Type of fisrt element
jtype = containers[0].findAll("div",{"class":"css-1lh32fc"})
jtype[0].text

'Full Time'

#### Bringing it All Togrther

In [17]:
f = open("data/wuzzuf_data_Scienist.csv",'w')
header = "jop_title, company_name, jop_type\n"
f.write(header)


34

In [18]:
for container in containers:
    jtitle = container.findAll("h2",{"class":"css-m604qf"})
    jop_title = jtitle[0].text.strip()
    
    cname = container.findAll("a",{"class":"css-17s97q8"})
    company_name = cname[0].text.strip()
    
    jtype = containers[0].findAll("div",{"class":"css-1lh32fc"})
    jop_type = jtype[0].text.strip()
    
#     print(jop_title)
#     print(company_name)
#     print(jop_type)
#     print()
#     print(jop_title+", "+company_name+", "+jop_type)
#     print()
    f.write(jop_title +", "+ company_name +", "+ jop_type +"\n")
f.close()

#### Inputting the File into Pandas

In [21]:
import pandas as pd
wuzzuf = pd.read_csv('data/wuzzuf_data_Scienist.csv',encoding="unicode_escape")

In [22]:
wuzzuf

Unnamed: 0,jop_title,company_name,jop_type
0,Senior Data Scientist,Fixed Solutions -,Full Time
1,Senior Data Scientist,BBI-Consultancy -,Full Time
2,Data Scientist,Confidential -,Full Time
3,Data Scientist,Seoudi Supermarket -,Full Time
4,Data Scientist Mathematics/Physics,Dafa -,Full Time
5,Junior Data Scientist,Limitless Labs -,Full Time
6,Senior Data Engineer,Edentech -,Full Time
7,Data Annotation,GAP CLOUD -,Full Time
8,Business Intelligence Specialist,Perfect Presentation -,Full Time
9,Offline Marketing Executive,Hamza Group -,Full Time


In [23]:
wuzzuf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   jop_title      12 non-null     object
 1    company_name  12 non-null     object
 2    jop_type      12 non-null     object
dtypes: object(3)
memory usage: 416.0+ bytes


In [None]:
## Tank Y