# Web Scraping Job Vacancies

## Introduction

In this project, we'll build a web scraper to extract job listings from a popular job search platform. We'll extract job titles, companies, locations, job descriptions, and other relevant information.

Here are the main steps we'll follow in this project:

1. Setup our development environment
2. Understand the basics of web scraping
3. Analyze the website structure of our job search platform
4. Write the Python code to extract job data from our job search platform
5. Save the data to a CSV file
6. Test our web scraper and refine our code as needed

## Prerequisites

Before starting this project, you should have some basic knowledge of Python programming and HTML structure. In addition, you may want to use the following packages in your Python environment:

- requests
- BeautifulSoup
- csv
- datetime

These packages should already be installed in Coursera's Jupyter Notebook environment, however if you'd like to install additional packages that are not included in this environment or are working off platform you can install additional packages using `!pip install packagename` within a notebook cell such as:

- `!pip install requests`
- `!pip install BeautifulSoup`

## Step 1: Importing Required Libraries

In [3]:
# your code here

In [1]:
!pip install requests

You should consider upgrading via the '/opt/conda/bin/python3 -m pip install --upgrade pip' command.[0m


In [1]:
!pip install selenium

Collecting selenium
  Downloading selenium-4.10.0-py3-none-any.whl (6.7 MB)
     |████████████████████████████████| 6.7 MB 36.3 MB/s            
[?25hCollecting urllib3[socks]<3,>=1.26
  Downloading urllib3-2.0.4-py3-none-any.whl (123 kB)
     |████████████████████████████████| 123 kB 106.7 MB/s            
[?25hCollecting trio-websocket~=0.9
  Downloading trio_websocket-0.10.3-py3-none-any.whl (17 kB)
Collecting trio~=0.17
  Downloading trio-0.22.2-py3-none-any.whl (400 kB)
     |████████████████████████████████| 400 kB 104.0 MB/s            
[?25hCollecting certifi>=2021.10.8
  Downloading certifi-2023.7.22-py3-none-any.whl (158 kB)
     |████████████████████████████████| 158 kB 106.3 MB/s            
[?25hCollecting exceptiongroup>=1.0.0rc9
  Downloading exceptiongroup-1.1.2-py3-none-any.whl (14 kB)
Collecting outcome
  Downloading outcome-1.2.0-py2.py3-none-any.whl (9.7 kB)
Collecting attrs>=20.1.0
  Downloading attrs-23.1.0-py3-none-any.whl (61 kB)
     |██████████████████████

In [4]:
import requests
from bs4 import BeautifulSoup as bs

In [24]:
from selenium.webdriver import Chrome
from selenium.webdriver.common.keys import Keys

In [29]:
def getContent(title,loc):
    url = 'https://www.linkedin.com/jobs/search?keywords='+title+'&location='+loc+'%3D103644278&trk=public_jobs_jobs-search-bar_search-submit&position=1&pageNum=0'
    r = requests.get(url)
    return r.content

In [30]:
data = getContent('Software Engineer','Karachi Sindh')

In [42]:
import csv
jobs = []
header = ["Title","Company","Location","Link"]
soup = bs(data,'html.parser')
titles = soup.find_all('h3',class_="base-search-card__title")
company = soup.find_all('h4',class_="base-search-card__subtitle")
links = soup.find_all('a',class_="base-card__full-link absolute top-0 right-0 bottom-0 left-0 p-0 z-[2]")
loc = soup.find_all('span',class_="job-search-card__location")
for item in range(len(company)):
    job = {}
    job['Title']=(titles[item].get_text(strip=True,separator = ' '))
    job['Company']=(company[item].get_text(strip=True,separator = ' '))
    job['Location']=(loc[item].get_text(strip=True,separator=' '))
    job['Link']=(links[item].get('href'))
    jobs.append(job)

with open('jobs.csv','w') as f:
    writer = csv.DictWriter(f,fieldnames=header)
    writer.writeheader()
    writer.writerows(jobs)

In [44]:
with open('jobs.csv','r') as file:
    csvreader = csv.reader(file)
    for row in csvreader:
        print(row)

['Title', 'Company', 'Location', 'Link']
['Junior Software Developer', 'Contour Software', 'Karāchi, Sindh, Pakistan', 'https://pk.linkedin.com/jobs/view/junior-software-developer-at-contour-software-3622567830?refId=XEjC5C4Y130LO3MvKcuVbw%3D%3D&trackingId=9Erh%2F788W%2B7TpBmte8PPwA%3D%3D&position=1&pageNum=0&trk=public_jobs_jserp-result_search-card']
['Frontend Developer', 'Xgrid.co', 'Karāchi, Sindh, Pakistan', 'https://pk.linkedin.com/jobs/view/frontend-developer-at-xgrid-co-3612714355?refId=XEjC5C4Y130LO3MvKcuVbw%3D%3D&trackingId=EY7VSH21kz0fkEL15ixFZw%3D%3D&position=2&pageNum=0&trk=public_jobs_jserp-result_search-card']
['Software Developer', 'Contour Software', 'Karāchi, Sindh, Pakistan', 'https://pk.linkedin.com/jobs/view/software-developer-at-contour-software-3630173163?refId=XEjC5C4Y130LO3MvKcuVbw%3D%3D&trackingId=3Kg0G%2FaScQ9XfZyTsrkNiQ%3D%3D&position=3&pageNum=0&trk=public_jobs_jserp-result_search-card']
['Software Developer', 'Contour Software', 'Karāchi, Sindh, Pakistan',