<a href="https://colab.research.google.com/github/Lilwm/Web_Scraping_Python/blob/main/Web_Scraping_with_Python_Afterwork.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<font color='#2F4F4F'>To use this notebook on Colaboratory, you will need to make a copy of it. Go to File > Save a Copy in Drive. You can then use the new copy that will appear in the new tab.</font>


# <font color='#2F4F4F'>AfterWork Data Science: Web Scraping with Python</font>

### <strong><em>Problem Statement</em></strong>
The problem is that there are many job listings which can not get visits for the target job seekers. While working in a team, your task as a data scientist for this project is to scrape for job titles and links and then put them in a single table that can be used by your team members to further build a job aggregator.
<br>
You will be required to scrape for data from the following three technology webpages:

*   PigiaMe: https://www.pigiame.co.ke/it-software-jobs
*   MyJobMag: https://www.myjobmag.co.ke/jobs-by-field/information-technology
*   KenyaJob:
https://www.kenyajob.com/job-vacancies-search-kenya?f%5B0%5D=im_field_offre_secteur%3A13m

## <font color='#2F4F4F'>Prerequisites</font>

In [None]:
# We first import the required libraries
# ---
#
import pandas as pd             # library for data manupation
import requests                 # library for fetching a web page 
from bs4 import BeautifulSoup   # library for extrating contents from a webpage 

## <font color='#2F4F4F'>Step 1: Obtaining our Data</font>

In [None]:
# PigiaMe: https://www.pigiame.co.ke/it-software-jobs
# ---
#
pigia_me = requests.get('https://www.pigiame.co.ke/it-software-jobs')
pigia_me

<Response [200]>

In [None]:
# MyJobMag: https://www.myjobmag.co.ke/jobs-by-field/information-technology
# ---
#
my_job_mag = requests.get('https://www.myjobmag.co.ke/jobs-by-field/information-technology')
my_job_mag

<Response [200]>

In [39]:
# KenyanJob: https://www.kenyajob.com/job-vacancies-search-kenya?f%5B0%5D=im_field_offre_secteur%3A133
# ---
#
kenyan_job = requests.get('https://www.kenyajob.com/job-vacancies-search-kenya?f%5B0%5D=im_field_offre_secteur%3A133#')
kenyan_job


<Response [200]>

## <font color='#2F4F4F'>Step 2: Parsing</font>

In [None]:
# Parsing our document: pigia_me
# ---
# 
soup_pigia = BeautifulSoup(pigia_me.text, 'html.parser')

In [None]:
# Parsing our document: my_job_mag
# ---
#  
soup_jobmag = BeautifulSoup(my_job_mag.text, 'html.parser')

In [40]:
# Parsing our document: kenyan_job
# ---
# 
soup_kenyan_job = BeautifulSoup(kenyan_job.text, "html.parser")

## <font color='#2F4F4F'>Step 3: Extracting Required Elements</font>

In [None]:
# 1. Extracting job titles and links: pigia me
# ---
# 
titles_pigia = soup_pigia.find_all('div', {'class':'listing-card__header__title'})
get_links_pigia =soup_pigia.find_all('a', {'class':"listing-card__inner"})

jobs_pigia = []
links_pigia = []

for item in titles_pigia:
  text = item.get_text().replace('\n', '').split('-')
  jobs_pigia.append(text[0])

for link in get_links_pigia:
  url_link = link.get('href')
  links_pigia.append(url_link)

#check the length of the 2 lists to confirm each job title has a link
print(len( jobs_pigia))
print(len(links_pigia))

10
10


In [None]:

# ---
# 
# find the tag where the jobs are actually listed in the page
my_job_mag_jobs = soup_jobmag.find_all('li', class_ = "job-list-li")

# create two lists to store the results of our job titles and link
jobs_jobmag = []
links_jobmag = []

# iterate through each job in the found jobs
for job in my_job_mag_jobs:
    # check to confirm that the h2 tag in each job does not return a None type
    if job.h2:
      # find the job title text in the h2 tag and strip for any whitespaces
      title = job.h2.text.strip()
      # find the link text in the a tag inside h2 tag, concatinate this to the website url to get the complete link
      link = link = 'https://www.myjobmag.co.ke' + job.h2.a['href']
      
      # append results to the respective lists
      jobs_jobmag.append(title)
      links_jobmag.append(link)

#check the length of the 2 lists to confirm each job title has a link
print(len(jobs_jobmag))
print(len(links_jobmag))

18
18


In [45]:
# 3. Extracting job titles: kenya_job
# ---
#

kj_url = 'https://www.kenyajob.com'
# create two lists to store the job titles and links
kenyan_job_link_content = []
kenyan_job_link_url = []
# get tag where the jobs are listed
kenyan_job_results = soup_kenyan_job.find_all('div', {'class': "col-lg-5 col-md-5 col-sm-5 col-xs-12 job-title"})

for t in kenyan_job_results:
  # append results to the respective lists
  kenyan_job_link_content.append(t.a.get_text()) 
  kenyan_job_link_url.append(kj_url + t.a.get('href'))

print(len(kenyan_job_link_url))
print(len(kenyan_job_link_url))

25
25


## <font color='#2F4F4F'>Step 4: Saving our Data</font>

In [49]:
# Saving the scraped contents in a dataframe and preview our data
# ---
#
# Saving the scraped contents in a dataframe and preview our data
# ---
#
# combine the lists from the three jobs into one list
job_titles = jobs_pigia  + jobs_jobmag + kenyan_job_link_content
url_links =  links_pigia + links_jobmag + kenyan_job_link_url

# create a pandas DataFrame using our combined lists
df = pd.DataFrame({"Job Title": job_titles, "link_url": url_links})

# get 10 random job from the DataFrame
df.sample(10)

Unnamed: 0,Job Title,link_url
3,Frontend Developer Reframe,https://www.pigiame.co.ke/listings/frontend-de...
39,Cloud System Administrator (M/F),https://www.kenyajob.com/job-vacancies-kenya/c...
19,Full Stack Developer at SunCulture Kenya Ltd,https://www.myjobmag.co.ke/job/full-stack-deve...
13,Mid Machine Learning Engineer at Azenia,https://www.myjobmag.co.ke/job/mid-machine-lea...
31,Information Technology Sales Specialist,https://www.kenyajob.com/job-vacancies-kenya/i...
37,Cloud Engineer (M/F),https://www.kenyajob.com/job-vacancies-kenya/c...
5,Assistant IT Administrator,https://www.pigiame.co.ke/listings/assistant-i...
10,Product Engineer at Access Afya,https://www.myjobmag.co.ke/job/product-enginee...
49,JAVA Developer (M/F),https://www.kenyajob.com/job-vacancies-kenya/j...
48,IOS Developer (M/F),https://www.kenyajob.com/job-vacancies-kenya/i...
