<font color='#2F4F4F'>To use this notebook on Colaboratory, you will need to make a copy of it. Go to File > Save a Copy in Drive. You can then use the new copy that will appear in the new tab.</font>


# <font color='#2F4F4F'>AfterWork Data Science: Web Scraping with Python</font>

**Web Scraping with Python Project**

**Project Deliverables**

You will be required to complete the following deliverable.

● A python notebook with your solution.

**Instructions**

**Background Information**

Together with a team of startup entrepreneurs, you decide to work on an idea that could
change the way people search for jobs. You decide that job scraping could be the next
big thing as there are actively many people looking for jobs in the country, in this case,
Kenya.

**Problem Statement**

The problem is that there are many job listings which can not get visits for the target job
seekers. While working in a team, your task as a data scientist for this project is to
scrape for job titles and links and then put them in a single table that can be used by
your team members to further build a job aggregator.
You will be required to scrape for data from the following three technology webpages:

● PigiaMe: https://www.pigiame.co.ke/it-software-jobs

● MyJobMag: https://www.myjobmag.co.ke/jobs-by-field/information-technology

● KenyaJob:
https://www.kenyajob.com/job-vacancies-search-kenya?f%5B0%5D=im_field_offr
e_secteur%3A133

Your deliverable will be a python script that performs the mentioned task. You can use
the following guiding notebook to get started [Link].

## <font color='#2F4F4F'>Prerequisites</font>

In [1]:
# We first import the required libraries
# ---
#
import pandas as pd             # library for data manupation
import requests                 # library for fetching a web page 
from bs4 import BeautifulSoup   # library for extrating contents from a webpage 

## <font color='#2F4F4F'>Step 1: Obtaining our Data</font>

In [2]:
# PigiaMe: https://www.pigiame.co.ke/it-software-jobs
# ---
#
pigia_me = requests.get('https://www.pigiame.co.ke/it-software-jobs')
pigia_me

<Response [200]>

In [4]:
# MyJobMag: https://www.myjobmag.co.ke/jobs-by-field/information-technology
# ---
#
myjobmag = requests.get('https://www.myjobmag.co.ke/jobs-by-field/information-technology')
myjobmag


<Response [200]>

In [5]:
# KenyanJob: https://www.kenyajob.com/job-vacancies-search-kenya?f%5B0%5D=im_field_offre_secteur%3A133
# ---
#
kenyanjob = requests.get('https://www.kenyajob.com/job-vacancies-search-kenya?f%5B0%5D=im_field_offre_secteur%3A133')
kenyanjob

<Response [200]>

## <font color='#2F4F4F'>Step 2: Parsing</font>

In [6]:
# Parsing our document: pigia_me
# ---
# 
SoupPigiaMe = BeautifulSoup(pigia_me.text, 'html.parser')

In [7]:
# Parsing our document: my_job_mag
# ---
#  
SoupMyJobMag = BeautifulSoup(myjobmag.text, 'html.parser')


In [8]:
# Parsing our document: kenyan_job
# ---
# 
SoupKenyanJob = BeautifulSoup(kenyanjob.text, "html.parser")

## <font color='#2F4F4F'>Step 3: Extracting Required Elements</font>

In [11]:
# 1. Extracting job titles and links: pigia me
# ---
# find the tag
pigia_me_jobs = SoupPigiaMe.find_all('div', class_ = "listings-cards__list-item ")

# create two lists
pigia_me_job_title = []
pigia_me_job_link = []

#iterate through
for job in pigia_me_jobs:
  title = job.find('div', class_ = 'listing-card__header__title').text.strip()
  link = job.a['href']
  pigia_me_job_title.append(title)
  pigia_me_job_link.append(link)

print(pigia_me_job_title)
print(pigia_me_job_link)


['Software Developer', 'Software Development Java, nodeJs (Onsite Nyeri) entry level', 'Frontend Developer Reframe', 'Chatbot Developer', 'Assistant IT Administrator', 'Frontend Developer', 'CRM Enginee', 'Analyst Programmer', 'Senior Game Developer', 'Lead Full Stack Engineer']
['https://www.pigiame.co.ke/listings/software-developer-5478002', 'https://www.pigiame.co.ke/listings/software-development-java-nodejs-onsite-nyeri-entry-level-5474653', 'https://www.pigiame.co.ke/listings/frontend-developer-reframe-5473981', 'https://www.pigiame.co.ke/listings/chatbot-developer-5472628', 'https://www.pigiame.co.ke/listings/assistant-it-administrator-5471542', 'https://www.pigiame.co.ke/listings/frontend-developer-5471441', 'https://www.pigiame.co.ke/listings/crm-enginee-5471294', 'https://www.pigiame.co.ke/listings/analyst-programmer-5469047', 'https://www.pigiame.co.ke/listings/senior-game-developer-5468130', 'https://www.pigiame.co.ke/listings/lead-full-stack-engineer-5467710']


In [12]:
# 2. Extracting job titles: my_job_mag
# ---
# 
# find the tag of jobs listing
my_job_mag_jobs = SoupMyJobMag.find_all('li', class_ = "job-list-li")

# create two lists to store the results
my_job_mag_job_title = []
my_job_mag_job_link = []

# iterate through 
for job in my_job_mag_jobs:
    if job.h2:
      title = job.h2.text.strip()
      link = link = 'https://www.myjobmag.co.ke' + job.h2.a['href']
      
      my_job_mag_job_title.append(title)
      my_job_mag_job_link.append(link)

print(my_job_mag_job_title)
print(my_job_mag_job_link)


['Software Engineering Manager at Sanergy', 'Quality Assurance Specialist at Britam', 'Software Developer at Aga Khan Hospital Kisumu', 'Software Engineer - Identity Management for Canonical Products at Canonical', 'Senior Application Security Engineer at Cellulant Corporation', 'Shopify Developer at Crystal Recruit', 'Senior Software Engineer at One Acre Fund', 'IDEMA MBSS Support Engineer at Innovative Software Technologies Ltd', 'Officer ICT for Development (ICT4D) at Concern Worldwide', 'Group Model Risk Validation Senior Manager (Financial Crime / Fraud Model / AML Model Validation) at CA Global', 'Head Digital Engineering at NCBA Group', 'Software Developer at Aga Khan Development Network (AKDN)', 'Internal Control Manager at ENGIE', 'Digital & E-Channels Support Officer at Co-operative Bank of Kenya', 'Geographical Information Systems Specialist (GIS) at Medecins Sans Frontieres (MSF)', 'Senior CRM Officer at Amref Kenya', 'Junior OPENMRS Developer at University of California Sa

In [13]:
# 3. Extracting job titles: kenya_job
# ---
#
# find the tag of the jobs listing 
kenya_jobs = SoupKenyanJob.find_all('div', class_ = "job-description-wrapper")

# create two lists to store the results 
kenyan_job_title = []
kenyan_job_link = []

# iterate through 
for job in kenya_jobs:
  link = job['data-href']
  title = job.text.strip().split('\n')[0]

  kenyan_job_link.append(link)
  kenyan_job_title.append(title)

print(kenyan_job_link)
print(kenyan_job_title)


['https://www.kenyajob.com/job-vacancies-kenya/java-ee-java-8-developer-sql-skills-130458', 'https://www.kenyajob.com/job-vacancies-kenya/senior-freelance-web-designer-130459', 'https://www.kenyajob.com/job-vacancies-kenya/business-development-assistant-132274', 'https://www.kenyajob.com/job-vacancies-kenya/cctv-fire-alarms-systems-technician-127106', 'https://www.kenyajob.com/job-vacancies-kenya/cctv-fire-alarms-systems-technician-127107', 'https://www.kenyajob.com/job-vacancies-kenya/information-technology-sales-specialist-129253', 'https://www.kenyajob.com/job-vacancies-kenya/aws-cloud-architect-mf-129511', 'https://www.kenyajob.com/job-vacancies-kenya/aws-solutions-architect-mf-129512', 'https://www.kenyajob.com/job-vacancies-kenya/azure-solutions-architect-mf-129513', 'https://www.kenyajob.com/job-vacancies-kenya/cloud-architect-mf-129514', 'https://www.kenyajob.com/job-vacancies-kenya/cloud-computing-virtualization-engineer-mf-129515', 'https://www.kenyajob.com/job-vacancies-keny

## <font color='#2F4F4F'>Step 4: Saving our Data</font>

In [14]:
# Saving the scraped contents in a dataframe and preview our data
# ---
#
# combine the various lists.
job_titles = pigia_me_job_title + my_job_mag_job_title + kenyan_job_title
url_links = pigia_me_job_link + my_job_mag_job_link + kenyan_job_link

# create pandas DataFrame and get 20 random jobs
df = pd.DataFrame({"Job Title": job_titles, "link_url": url_links})
df.sample(20)


Unnamed: 0,Job Title,link_url
20,Head Digital Engineering at NCBA Group,https://www.myjobmag.co.ke/job/head-digital-en...
41,Cloud System Administrator (M/F),https://www.kenyajob.com/job-vacancies-kenya/c...
30,Business Development Assistant,https://www.kenyajob.com/job-vacancies-kenya/b...
40,Cloud Microservices Architect (M/F),https://www.kenyajob.com/job-vacancies-kenya/c...
37,Cloud Architect (M/F),https://www.kenyajob.com/job-vacancies-kenya/c...
14,Senior Application Security Engineer at Cellul...,https://www.myjobmag.co.ke/job/senior-applicat...
27,Junior Frontend Developer at University of Cal...,https://www.myjobmag.co.ke/job/university-of-c...
19,Group Model Risk Validation Senior Manager (Fi...,https://www.myjobmag.co.ke/job/group-model-ris...
13,Software Engineer - Identity Management for Ca...,https://www.myjobmag.co.ke/job/software-engine...
29,Senior Freelance Web Designer,https://www.kenyajob.com/job-vacancies-kenya/s...
