# Project 3: Sharing Dataset on Kaggle

## Introduction:
> ### About Bayt.com
>Bayt.com is the leading job site in the Middle East and North Africa, connecting job seekers with employers looking to hire. 

#### https://www.kaggle.com/haninalmarshad/bayt-com-webscraping

#### https://www.kaggle.com/dataset/84b8879e36c9dd9b7276837a7bc035170fd6438689816cd0a2967fbf6cf8b498


## Part 1 webscraping

### Import

In [1]:
import requests
import re
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
from lxml import html
import time
from datetime import datetime

### Defined Request

In [2]:
session = requests.session()

### Get links Jobs by Role in SA

In [3]:
roles_list =[]
page = session.get('https://www.bayt.com/en/saudi-arabia/')
tree = html.fromstring(page.content)
soup = BeautifulSoup(page.text, 'html.parser')
links = soup.find_all('a', class_='t-regular-m p10y u-block')
for i in links:
    roles_list.append(i['href'])
roles_list

['https://www.bayt.com/en/saudi-arabia/jobs/roles/administration/',
 'https://www.bayt.com/en/saudi-arabia/jobs/roles/engineering/',
 'https://www.bayt.com/en/saudi-arabia/jobs/roles/hospitality-tourism/',
 'https://www.bayt.com/en/saudi-arabia/jobs/roles/human-resources-recruitment/',
 'https://www.bayt.com/en/saudi-arabia/jobs/roles/information-technology/',
 'https://www.bayt.com/en/saudi-arabia/jobs/roles/maintenance-repair-technician/',
 'https://www.bayt.com/en/saudi-arabia/jobs/roles/management/',
 'https://www.bayt.com/en/saudi-arabia/jobs/roles/marketing-pr/',
 'https://www.bayt.com/en/saudi-arabia/jobs/roles/medical-healthcare-nursing/',
 'https://www.bayt.com/en/saudi-arabia/jobs/roles/quality-control/',
 'https://www.bayt.com/en/saudi-arabia/jobs/roles/sales/',
 'https://www.bayt.com/en/saudi-arabia/jobs/roles/teaching-academics/']

### Get List of links for each role page 

In [4]:
pages_list = []

start_time = datetime.now()

for url in roles_list:
    #print(url)
    page = session.get(url)
    #parse the page using the html module
    tree = html.fromstring(page.content)
    # use xpath to access span 
    path_1 = '/html/body/div[2]/section[2]/div[1]/div/div[1]/div[1]/span/text()'
    
    # xxx Jobs Found: Showing 1 - 20
    ## jobs_num = xxx < number of jobs found in the role
    
    jobs_num = tree.xpath(path_1)[0].split('Jobs')[0]
    jobs_num = int(jobs_num)
    
    ## using floor division in pages_num to get integer result
    pages_num = (jobs_num//20 )+1   
    
    ## iterate over pages and append to pages_list
    for i in range(1,pages_num+1):
        page_url = url + '?page=' + str(i)
        pages_list.append(page_url)
    
print( ' Run Time  '  , datetime.now() - start_time)

 Run Time   0:00:18.753400


### Get List of links for each job page

In [5]:

jobs_list = []

start_time = datetime.now()

## iterate over pages
for url in pages_list:
    #print(url)
    page = session.get(url)
    # check if page request was successful if not sleep then send another request
    while page.status_code != 200 :
        print(page)
        print(" sleep for 20 seconds ...")
        time.sleep(20)
        page = session.get(url)
    #if page request was successful 
    soup = BeautifulSoup(page.text, 'html.parser')
    jobIDs = soup.find_all('a', {'data-js-aid':"jobID"})
    
    ## iterate through all the 'a' tags and append id value to jobs_list
    for jobID in jobIDs:
        page_url = url + '&jobId=' +jobID['data-job-id']
        jobs_list.append(page_url)
    
print( '\n Run Time  '  , datetime.now() - start_time)


 Run Time   0:03:57.867485


In [6]:
len(jobs_list)

3469

### Get job details 

In [7]:
jobs =[]
start_time = datetime.now()

## iterate over jobs url's
for url in jobs_list:
    #print(url)
    page = session.get(url)    
    # check if page request was successful if not sleep then send another request
    while page.status_code != 200 :
        print(page)
        print(" sleep for 20 seconds ...")
        time.sleep(20)
        page = session.get(url)
    
    soup = BeautifulSoup(page.text, 'html.parser')
    
    # try to get job details from it's page
    job_info ={}
    try:
        job_info ['Job ID']= url.split('jobId=')[1]
        job_info ['Title'] = soup.find('h2',class_= "t-large").text.strip()
        job_info ['Job URL']= url
        company = soup.find('a', class_ = 'is-black')
        job_info['Company'] = company.text
        job_info['Company_URL'] = 'https://www.bayt.com' + company['href']  
        job_info['Date Posted'] = soup.find('span', class_ = 'p20l-d p10y-m u-block-m').text.strip()
        job_info['Job Description'] = soup.find('div', class_= 'card-content t-small bt p20').text.split('Job Details')[0]
        
        # bcs details diffrent from page to another creat list for key and list for values to append it 
        details_titles = soup.find_all('dt')
        details = soup.find_all('dd')
        
        keys = []
        values = []
        for dt in details_titles:
            keys.append(dt.text)
        for dd in details:
            values.append(dd.text)
        for i in range(len(keys)):
            job_info[keys[i]]= values[i]

        tags = soup.find_all('a', class_ = 'tag is-outline m10b')
        tags_list = []
        for tag in tags:
            tags_list.append(tag.text)

        job_info['Tags'] = tags_list

        jobs.append(job_info)
    except:
        print(url)
        print('Closed or Expired Job Posting')
        pass
    #print( '\n Time  '  , datetime.now() - start_time)
    
print( '\n Total Run Time  '  , datetime.now() - start_time)


<Response [502]>
 sleep for 20 seconds ...
https://www.bayt.com/en/saudi-arabia/jobs/roles/sales/?page=10&jobId=3685821
Closed or Expired Job Posting

 Total Run Time   1:24:04.846090


In [8]:
len(jobs)

3468

### Data frame to CSV

In [11]:
jobs_df = pd.DataFrame(jobs)
jobs_df.to_csv('../data/jobs_bayt_2023.csv',index_label='Job ID', index=False)
jobs_df

Unnamed: 0,Job ID,Title,Job URL,Company,Company_URL,Date Posted,Job Description,Job Location,Company Industry,Company Type,...,Employment Type,Monthly Salary Range,Number of Vacancies,Career Level,Years of Experience,Residence Location,Degree,Tags,Age,Gender
0,4177110,OPERATIONS OFFICER,https://www.bayt.com/en/saudi-arabia/jobs/role...,NBP,https://www.bayt.com/en/company/nbp-1891015/,Date Posted: Apr 15,\nJob Description\nCustomer Service: Ensure pr...,"Riyadh, Saudi Arabia",Banking,Employer (Public Sector),...,Full Time Employee,Unspecified,Unspecified,Entry Level,Min: 1,"Riyadh,Saudi Arabia",Bachelor's degree / higher diploma,"[Banking, Business Administration, Operation, ...",,
1,4177085,Branch Administration Manager,https://www.bayt.com/en/saudi-arabia/jobs/role...,Kinetic Business Solutions,https://www.bayt.com/en/company/kinetic-busine...,Date Posted: Apr 15,\nJob Description\nOur client is a conglomerat...,"Riyadh, Saudi Arabia",Medical Clinic,Recruitment Agency,...,Full Time Employee,Unspecified,1,Mid Career,Min: 5,,,"[Hospital Operations, Community Health, Commun...",,
2,4174537,Admin Assistant(Analytical Department),https://www.bayt.com/en/saudi-arabia/jobs/role...,Jubail,https://www.bayt.com/en/saudi-arabia/jobs/loca...,Date Posted: Apr 15,\nJob Description\n1. Keep a record of appoint...,"Jubail, Saudi Arabia",Oil & Gas,Employer (Private Sector),...,Unspecified,Unspecified,Unspecified,Mid Career,,,,[Administration],,
3,4177016,فنيّ معلومات ومطور برامج,https://www.bayt.com/en/saudi-arabia/jobs/role...,Jeddah,https://www.bayt.com/en/saudi-arabia/jobs/loca...,Date Posted: Apr 15,\nJob Description\nفنيّ معلومات ومطور برامج\nS...,"Jeddah, Saudi Arabia",Facilities & Property Management; Corporate Ma...,Employer (Private Sector),...,Unspecified,Unspecified,Unspecified,Mid Career,Min: 1 Max: 5,,,"[Applications Support, Email Management, Web E...",Min: 18 Max: 40,
4,4177035,Admin Assistant,https://www.bayt.com/en/saudi-arabia/jobs/role...,Jubail,https://www.bayt.com/en/saudi-arabia/jobs/loca...,Date Posted: Apr 15,\nJob Description\nprovide general administrat...,"Jubail, Saudi Arabia",Oil & Gas,Employer (Private Sector),...,Unspecified,Unspecified,Unspecified,Mid Career,,,,[Administration],,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3463,51205039,Assistant / Associate / Full Professor in Comp...,https://www.bayt.com/en/saudi-arabia/jobs/role...,King Fahd University of Petroleum and Minerals,https://www.bayt.com/en/company/king-fahd-univ...,Date Posted: Aug 30,\nJob Description\n\t\tThe Prep year program “...,Saudi Arabia,Other Business Support Services,Unspecified,...,Unspecified,Unspecified,Unspecified,,,,,[],,
3464,51205040,Assistant Professor in English. (Saudi nationa...,https://www.bayt.com/en/saudi-arabia/jobs/role...,King Fahd University of Petroleum and Minerals,https://www.bayt.com/en/company/king-fahd-univ...,Date Posted: Aug 30,\nJob Description\n\t\tThe Department of Prep-...,Saudi Arabia,Other Business Support Services,Unspecified,...,Unspecified,Unspecified,Unspecified,,,,,[],,
3465,51205046,Assistant / Associate / Full Professor in Phys...,https://www.bayt.com/en/saudi-arabia/jobs/role...,King Fahd University of Petroleum and Minerals,https://www.bayt.com/en/company/king-fahd-univ...,Date Posted: Aug 30,\nJob Description\n\t\tThe Prep year program “...,Saudi Arabia,Other Business Support Services,Unspecified,...,Unspecified,Unspecified,Unspecified,,,,,[],,
3466,51205048,Graduate assistant in Mathematics and Statisti...,https://www.bayt.com/en/saudi-arabia/jobs/role...,King Fahd University of Petroleum and Minerals,https://www.bayt.com/en/company/king-fahd-univ...,Date Posted: Aug 30,\nJob Description\n\t\tThe Prep year program “...,Saudi Arabia,Other Business Support Services,Unspecified,...,Unspecified,Unspecified,Unspecified,,,,,[],,
