# Web Scraping with Python

### Background Information

Together with a team of startup entrepreneurs, you decide to work on an idea that could change the way people search for jobs. You decide that job scraping could be the next big thing as there are actively many people looking for jobs in the country, in this case, Kenya.

### Problem Statement

The problem is that there are many job listings which can not get visits for the target job seekers. While working in a team, your task as a data scientist for this project is to scrape for job titles and links and then put them in a single table that can be used by your team members to further build a job aggregator.

You will be required to scrape for data from the following three technology webpages:

* PigiaMe: https://www.pigiame.co.ke/it-software-jobs

* MyJobMag: https://www.myjobmag.co.ke/jobs-by-field/information-technology

* KenyaJob: https://www.kenyajob.com/job-vacancies-search-kenya?f%5B0%5D=im_field_offre_secteur%3A133

Your deliverable will be a python script that performs the mentioned task.

In [193]:
# We first import the required libraries
# ---
#
import pandas as pd            # library for data manupation
import requests                 # library for fetching a web page 
from bs4 import BeautifulSoup   # library for extrating contents from a webpage 


In [201]:
# PigiaMe: https://www.pigiame.co.ke/it-software-jobs
# ---
#
pigia_me = requests.get('https://www.pigiame.co.ke/it-software-jobs')
pigia_me

<Response [200]>

In [202]:
# MyJobMag: https://www.myjobmag.co.ke/jobs-by-field/information-technology
# ---
#

Myjob_mag=requests.get('https://www.myjobmag.co.ke/jobs-by-field/information-technology')
Myjob_mag

<Response [200]>

In [203]:
# KenyanJob: https://www.kenyajob.com/job-vacancies-search-kenya?f%5B0%5D=im_field_offre_secteur%3A133
# ---

Kenyan_job=requests.get('https://www.kenyajob.com/job-vacancies-search-kenya?f%5B0%5D=im_field_offre_secteur%3A133')
Kenyan_job

<Response [200]>

In [195]:

# Parsing our document: pigia_me
# ---
# 
soup = BeautifulSoup(pigia_me.text, "html.parser")



# Parsing our document: my_job_mag
# ---
#  
my_job_mag_soup = BeautifulSoup(Myjob_mag.text, "html.parser")


# Parsing our document: kenyan_job
# ---
# 
KenyanJob_soup = BeautifulSoup(Kenyan_job.text, "html.parser")

In [196]:
# 1. Extracting job titles:pigia_me

results = soup.find(class_="search")
job_elements= results.find_all("div",class_="listings-cards__list-item")

#create empty lists that we will use to store content fetched
job_title = []
job_link = []


#loop through these tags
for job_element in job_elements:

  title_element= job_element.find("div", class_="listing-card__header__title")
  job_url = job_element.find('a')['href']

  #append result to initialized lists
  job_title.append(title_element.text.strip())
  job_link.append(job_url)

df_pigia = pd.DataFrame({"Job Title":job_title, "Job Url": job_link})
df_pigia

Unnamed: 0,Job Title,Job Url
0,Bioinformatic and Software Developer,https://www.pigiame.co.ke/listings/bioinformatic-and-software-developer-5427801
1,IT Sales - Cybersecurity & Cloud,https://www.pigiame.co.ke/listings/it-sales-cybersecurity-cloud-5427375
2,"Executive, Support & Services",https://www.pigiame.co.ke/listings/executive-support-services-5427355
3,Data Engineer,https://www.pigiame.co.ke/listings/data-engineer-5419373
4,Software Development Trainer,https://www.pigiame.co.ke/listings/software-development-trainer-5418174
5,Oracle Database Administrator,https://www.pigiame.co.ke/listings/oracle-database-administrator-5417450
6,"UI/UX & Frontend Developer – Limuru, Kenya",https://www.pigiame.co.ke/listings/uiux-frontend-developer-limuru-kenya-5412804
7,IT Sales - Cybersecurity & Cloud,https://www.pigiame.co.ke/listings/it-sales-cybersecurity-cloud-5411841
8,Wordpress and Shopify Web Designer,https://www.pigiame.co.ke/listings/wordpress-and-shopify-web-designer-5403307
9,Entry level Software Developer,https://www.pigiame.co.ke/listings/entry-level-software-developer-5388823


In [197]:

# 2. Extracting job titles:my_job_mag

#create empty lists that we will use to store content fetched
title_mag = []
url_mag = []

# Getting all tags required
# ---
#
results_my_job_mag = my_job_mag_soup.find("ul", class_="job-list")
job_elements_mag = results_my_job_mag.find_all("h2")

job_elements_mag

# We the loop through these tags
for result in job_elements_mag:
   
    # Getting our text from each tag
    text = result.get_text()

    # We concatenate our domain with href link that we scrape
    # in order to form a full link
    link = 'https://www.myjobmag.co.ke'+result.find('a')['href']

    # Then appending the text to our title list
    title_mag.append(text)

    # Then appending the text to our url list
    url_mag.append(link)

df_mag = pd.DataFrame({"Job Title": title_mag, "Job Url":url_mag})
df_mag


Unnamed: 0,Job Title,Job Url
0,IT Audit Managers at KPMG,https://www.myjobmag.co.ke/job/it-audit-managers-kpmg
1,Fineract / Mifos Software Developer at Corporate Staffing,https://www.myjobmag.co.ke/job/fineract-mifos-software-developer-corporate-staffing
2,IT Intern at PATH,https://www.myjobmag.co.ke/job/it-intern-path
3,Senior Solutions Architect at Gebeya Limited,https://www.myjobmag.co.ke/job/senior-solutions-architect-gebeya-limited
4,Senior Business Architect at Gebeya Limited,https://www.myjobmag.co.ke/job/senior-business-architect-gebeya-limited
5,Senior Application Developer at Gebeya Limited,https://www.myjobmag.co.ke/job/senior-application-developer-gebeya-limited-1
6,Application Developer at Gebeya Limited,https://www.myjobmag.co.ke/job/gebeya-inc
7,Research Assistant-IT at Strathmore University,https://www.myjobmag.co.ke/job/research-assistant-it-strathmore-university
8,Junior Research Fellow-AI at Strathmore University,https://www.myjobmag.co.ke/job/junior-research-fellow-ai-strathmore-university
9,Technical Lead - InfraOps at Kyosk Digital Services,https://www.myjobmag.co.ke/job/technical-lead-infraops-kyosk-digital-services


In [198]:
# 3. Extracting job titles: kenya_job
# ---
#create empty lists that we will use to store content fetched
title_KenyanJob=[]
url_KenyanJob=[]

results_KenyanJob = KenyanJob_soup.find("div", id="content-2")
job_KenyanJob = results_KenyanJob.find_all("h5")

# loop through these tags
for result in job_KenyanJob:
   
    # Getting text from each tag
    text = result.get_text()

    # We concatenate our domain with href link that we scrape
    # in order to form a full link
    link = 'https://www.kenyajob.com'+result.find('a')['href']

    # Then appending the text to our title list
    title_KenyanJob.append(text)

    # Then appending the text to our url list
    url_KenyanJob.append(link)

df_KenyanJob = pd.DataFrame({"Job Title": title_KenyanJob, "Job Url":url_KenyanJob})
df_KenyanJob



Unnamed: 0,Job Title,Job Url
0,JAVA EE / JAVA 8 Developer with SQL Skills,https://www.kenyajob.com/job-vacancies-kenya/java-ee-java-8-developer-sql-skills-130458
1,Senior Freelance Web Designer,https://www.kenyajob.com/job-vacancies-kenya/senior-freelance-web-designer-130459
2,Accountant/Administrator,https://www.kenyajob.com/job-vacancies-kenya/accountantadministrator-126803
3,CCTV and Fire Alarms Systems Technician,https://www.kenyajob.com/job-vacancies-kenya/cctv-fire-alarms-systems-technician-127106
4,CCTV and Fire Alarms Systems Technician,https://www.kenyajob.com/job-vacancies-kenya/cctv-fire-alarms-systems-technician-127107
5,Information Technology Sales Specialist,https://www.kenyajob.com/job-vacancies-kenya/information-technology-sales-specialist-129253
6,AWS Cloud Architect (M/F),https://www.kenyajob.com/job-vacancies-kenya/aws-cloud-architect-mf-129511
7,AWS Solutions Architect (M/F),https://www.kenyajob.com/job-vacancies-kenya/aws-solutions-architect-mf-129512
8,AZURE Solutions Architect (M/F),https://www.kenyajob.com/job-vacancies-kenya/azure-solutions-architect-mf-129513
9,Cloud Architect (M/F),https://www.kenyajob.com/job-vacancies-kenya/cloud-architect-mf-129514


In [199]:
# Saving the scraped contents in a dataframe and preview our data
# ---
#
jobs_df = pd.concat([df_pigia, df_mag, df_KenyanJob],ignore_index=True)                    


#convert dataframe into an excel file and download 

#jobs_df.to_cdv('jobs.xls', index=False) 

jobs_df.sample(50)

Unnamed: 0,Job Title,Job Url
15,Senior Application Developer at Gebeya Limited,https://www.myjobmag.co.ke/job/senior-application-developer-gebeya-limited-1
30,Accountant/Administrator,https://www.kenyajob.com/job-vacancies-kenya/accountantadministrator-126803
39,Cloud Engineer (M/F),https://www.kenyajob.com/job-vacancies-kenya/cloud-engineer-mf-129516
10,IT Audit Managers at KPMG,https://www.myjobmag.co.ke/job/it-audit-managers-kpmg
2,"Executive, Support & Services",https://www.pigiame.co.ke/listings/executive-support-services-5427355
18,Junior Research Fellow-AI at Strathmore University,https://www.myjobmag.co.ke/job/junior-research-fellow-ai-strathmore-university
23,Integrations Support Specialist at NCBA Group,https://www.myjobmag.co.ke/job/integrations-support-specialist-ncba-group
6,"UI/UX & Frontend Developer – Limuru, Kenya",https://www.pigiame.co.ke/listings/uiux-frontend-developer-limuru-kenya-5412804
26,SAP Developer at NCBA Group,https://www.myjobmag.co.ke/job/sap-developer-ncba-group
7,IT Sales - Cybersecurity & Cloud,https://www.pigiame.co.ke/listings/it-sales-cybersecurity-cloud-5411841
