# Web Scraping with Python

## Import libraries

In [42]:
import pandas as pd # library for data manupation
import requests  # library for fetching a web page 
from bs4 import BeautifulSoup  # library for extrating contents from a webpage
import re

## Step 1: Obtaining Data

#### PigiaMe data

In [2]:
pigia_me = requests.get('https://www.pigiame.co.ke/it-software-jobs')
pigia_me

<Response [200]>

#### MyJobMag data

In [4]:
my_job_mag = requests.get('https://www.myjobmag.co.ke/jobs-by-field/information-technology')
my_job_mag

<Response [200]>

#### KenyanJob data

In [5]:
kenyan_job = requests.get('https://www.kenyajob.com/job-vacancies-search-kenya?f%5B0%5D=im_field_offre_secteur%3A133')
kenyan_job

<Response [200]>

## Step 2: Parsing

#### Parsing our document: pigia_me

In [39]:
pigia_me_soup_doc = BeautifulSoup(pigia_me.text, "html.parser")

#### Parsing our document: my_job_mag

In [40]:
my_job_mag_soup_doc = BeautifulSoup(my_job_mag.text, "html.parser")

#### Parsing our document: kenyan_job

In [41]:
kenyan_job_soup_doc = BeautifulSoup(kenyan_job.text, "html.parser")

## Step 3: Extracting Required Elements

#### 1) Extracting job titles and links ; pigia me

Extract job titles

In [21]:
pigia_me_results = pigia_me_soup_doc.find_all('div',attrs={'class':'listing-card__header__title'})
pigia_me_results

[<div class="listing-card__header__title">
 Front-End Web Developer
 </div>, <div class="listing-card__header__title">
 ICT Software Developer
 </div>, <div class="listing-card__header__title">
 Scrum Master
 </div>, <div class="listing-card__header__title">
 Senior Software Engineer – Integration Services
 </div>, <div class="listing-card__header__title">
 DHIS2 Developer
 </div>, <div class="listing-card__header__title">
 M-Pesa Africa – Scrum Master - Compliance
 </div>, <div class="listing-card__header__title">
 Mobile App Developer (Android &amp; iOS)
 </div>, <div class="listing-card__header__title">
 IT AND SOFTWARE
 </div>, <div class="listing-card__header__title">
 Freedom Project 01
 </div>, <div class="listing-card__header__title">
 Senior Software Engineer
 </div>]

Strip tags

In [23]:
pigia_me_results = [tag.get_text().strip()for tag in pigia_me_results]
pigia_me_results

['Front-End Web Developer',
 'ICT Software Developer',
 'Scrum Master',
 'Senior Software Engineer – Integration Services',
 'DHIS2 Developer',
 'M-Pesa Africa – Scrum Master - Compliance',
 'Mobile App Developer (Android & iOS)',
 'IT AND SOFTWARE',
 'Freedom Project 01',
 'Senior Software Engineer']

Extract job links

In [43]:
for link in pigia_me_soup_doc.find_all('a', attrs={'href': re.compile("^https://www.pigiame.co.ke/listing")}):
  # print the urls
  print(link.get('href')) 

https://www.pigiame.co.ke/listings/front-end-web-developer-4155874
https://www.pigiame.co.ke/listings/ict-software-developer-4152712
https://www.pigiame.co.ke/listings/scrum-master-4151991
https://www.pigiame.co.ke/listings/senior-software-engineer-integration-services-4151986
https://www.pigiame.co.ke/listings/dhis2-developer-4151468
https://www.pigiame.co.ke/listings/m-pesa-africa-scrum-master-compliance-4151255
https://www.pigiame.co.ke/listings/mobile-app-developer-android-ios-4146921
https://www.pigiame.co.ke/listings/it-and-software-4079363
https://www.pigiame.co.ke/listings/freedom-project-01-3976292
https://www.pigiame.co.ke/listings/senior-software-engineer-3943753


#### 2) Extracting job titles: my_job_mag

Extract job titles

In [34]:
my_job_mag_results = my_job_mag_soup_doc.find_all('h2')
my_job_mag_results

[<h2><a href="/job/server-and-network-administrator-african-population-and-health-research-center-aphrc">Server and Network Administrator at African Population And Health Research Center (APHRC)</a></h2>,
 <h2><a href="/job/system-audit-officer-ii-national-transport-and-safety-authority">System Audit Officer II at National Transport and Safety Authority</a></h2>,
 <h2><a href="/job/graphic-design-consultant-un-women">Graphic Design Consultant at UN Women</a></h2>,
 <h2><a href="/job/city-launcher-mombasa-glovo">City Launcher - Mombasa at Glovo</a></h2>,
 <h2><a href="/job/senior-information-systems-assistant-temporary-united-nations-office-at-nairobi-unon">Senior Information Systems Assistant [temporary] at United Nations Office at Nairobi (UNON)</a></h2>,
 <h2><a href="/job/senior-golang-developer-cloud-integrations-acronis-3">Senior Golang Developer (Cloud Integrations) at Acronis</a></h2>,
 <h2><a href="/job/graphic-design-trainer-finn-church-aid-fca">Graphic Design Trainer at Finn 

Strip tags

In [35]:
my_job_mag_results = [tag.get_text().strip()for tag in my_job_mag_results]
my_job_mag_results

['Server and Network Administrator at African Population And Health Research Center (APHRC)',
 'System Audit Officer II at National Transport and Safety Authority',
 'Graphic Design Consultant at UN Women',
 'City Launcher - Mombasa at Glovo',
 'Senior Information Systems Assistant [temporary] at United Nations Office at Nairobi (UNON)',
 'Senior Golang Developer (Cloud Integrations) at Acronis',
 'Graphic Design Trainer at Finn Church Aid (FCA)',
 'Communications Analyst at International Potato Center',
 'Chief Digital Officer at United Nations Environment Programme (UNEP)',
 'Information Technology Assistant [temporary] at United Nations Office at Nairobi (UNON)',
 'Product Manager at BFA (Bankable Frontier Associates)',
 'Frontend Developer at Innovex Solutions',
 'IT Specialist at The African Economic Research Consortium (AERC)',
 'Software Engineer (Python/Linux/Packaging) at Canonical',
 'Senior Software Engineer (MongoDB/Python) at Canonical',
 'Associate Field Software Engineer

#### 3) Extracting job titles: kenya_job

Extract job titles

In [30]:
kenya_job_results = kenyan_job_soup_doc.find_all('h5')
kenya_job_results

[<h5><a href="/job-vacancies-kenya/work-home-opportunity-103348">Work from Home Opportunity</a></h5>,
 <h5><a href="/job-vacancies-kenya/it-project-manager-10392">IT Project Manager</a></h5>,
 <h5><a href="/job-vacancies-kenya/it-analyst-24531">IT Analyst</a></h5>,
 <h5><a href="/job-vacancies-kenya/regional-project-lead-vendor-experience-103050">Regional Project Lead - Vendor Experience</a></h5>,
 <h5><a href="/job-vacancies-kenya/merchant-support-specialist-jumia-pay-full-time-103052">Merchant Support Specialist - Jumia Pay (Full Time)</a></h5>,
 <h5><a href="/job-vacancies-kenya/head-fulfilment-jumia-full-time-103053"> Head of Fulfilment - Jumia (Full-time)</a></h5>,
 <h5><a href="/job-vacancies-kenya/regional-growth-hacker-jumia-prime-full-time-103055">Regional Growth Hacker - Jumia Prime (Full Time)</a></h5>,
 <h5><a href="/job-vacancies-kenya/back-end-developer-88791">Back End Developer</a></h5>,
 <h5><a href="/job-vacancies-kenya/fullstack-developer-88794">FullStack Developer</a

Strip tags

In [31]:
kenya_job_results = [tag.get_text().strip()for tag in kenya_job_results]
kenya_job_results

['Work from Home Opportunity',
 'IT Project Manager',
 'IT Analyst',
 'Regional Project Lead - Vendor Experience',
 'Merchant Support Specialist - Jumia Pay (Full Time)',
 'Head of Fulfilment - Jumia (Full-time)',
 'Regional Growth Hacker - Jumia Prime (Full Time)',
 'Back End Developer',
 'FullStack Developer',
 'Sales and Marketing Agent',
 'Company Telephone Receptionist',
 'DRUPAL Developer',
 'Front End Developer',
 'JavaScript Developer',
 'PYTHON Developer',
 'Web Designer',
 'UX / UI Designer',
 'Artificial Intelligence Engineer – AI',
 'Digital Project Manager',
 'Software Engineer',
 '.Net Developer',
 'Customer Sales Team Leader, Nairobi',
 'Agent Expansion Manager, Nairobi',
 'Warehouse Clerk',
 'Project Assistant']

## Step 4: Saving the data to dataframes

In [36]:
pigia_me_df = pd.DataFrame({"title":pigia_me_results})
pigia_me_df.head()

Unnamed: 0,title
0,Front-End Web Developer
1,ICT Software Developer
2,Scrum Master
3,Senior Software Engineer – Integration Services
4,DHIS2 Developer


In [37]:
my_job_mag_df = pd.DataFrame({"title":my_job_mag_results})
my_job_mag_df.head()

Unnamed: 0,title
0,Server and Network Administrator at African Po...
1,System Audit Officer II at National Transport ...
2,Graphic Design Consultant at UN Women
3,City Launcher - Mombasa at Glovo
4,Senior Information Systems Assistant [temporar...


In [38]:
kenya_job_df = pd.DataFrame({"title":kenya_job_results})
kenya_job_df.head()

Unnamed: 0,title
0,Work from Home Opportunity
1,IT Project Manager
2,IT Analyst
3,Regional Project Lead - Vendor Experience
4,Merchant Support Specialist - Jumia Pay (Full ...
