# Web Scraping Job Vacancies


 ## Introduction

In this project, we'll build a web scraper to extract job listings from a popular job search platform. We'll extract job titles, companies, locations, job descriptions, and other relevant information.

Here are the main steps we'll follow in this project:

1. Setup our development environment
2. Understand the basics of web scraping
3. Analyze the website structure of our job search platform
4. Write the Python code to extract job data from our job search platform
5. Save the data to a CSV file
6. Test our web scraper and refine our code as needed

## Prerequisites

Before starting this project, you should have some basic knowledge of Python programming and HTML structure. In addition, you may want to use the following packages in your Python environment:

- requests
- BeautifulSoup
- csv
- datetime


## Task 1 – Import required libraries:


In [2]:
import csv
import requests
from bs4 import BeautifulSoup

## Task 2 – Generating a URL with a function:



In [5]:
base_url = requests.get('https://careers.coursera.com/jobs/search')

In [6]:
base_url.text

'<!DOCTYPE html>\n<html lang="en-US">\n  <head>\n      <meta name="google-site-verification" content="8Ox5Zr13P_8Yvn6C1nmk9TGpWDW9nNpMOY66B2rIFRQ" />\n    <title>Search Page</title>\n    <script type="importmap" data-turbo-track="reload" nonce="3ShwjkmfQN/GXOvB0xuCWQ==">{\n  "imports": {\n    "bootstrap": "https://ga.jspm.io/npm:bootstrap@5.3.2/dist/js/bootstrap.js",\n    "bootstrap4": "https://ga.jspm.io/npm:bootstrap@4.6.2/dist/js/bootstrap.js",\n    "bootstrap4/js/dist/toast": "https://ga.jspm.io/npm:bootstrap@4.6.2/js/dist/toast.js",\n    "bootstrap3": "https://ga.jspm.io/npm:bootstrap@3.4.1/dist/js/bootstrap.js",\n    "@popperjs/core": "https://ga.jspm.io/npm:@popperjs/core@2.11.8/lib/index.js",\n    "popper.js": "https://ga.jspm.io/npm:popper.js@1.16.1/dist/umd/popper.js",\n    "@hotwired/stimulus": "https://ga.jspm.io/npm:@hotwired/stimulus@3.2.2/dist/stimulus.js",\n    "@hotwired/stimulus-loading": "/assets/stimulus-loading-1fc53fe7a488db9281d2ff88509e8f45d6119ee4.js",\n    "@h

## Task 3 – Extract the Job Data from a single job posting card:


*  Define a function that takes a Beautiful Soup object representing a job posting and extracts relevant information.



In [11]:
soup = BeautifulSoup(base_url.text, "html.parser")

In [63]:
soup

<!DOCTYPE html>

<html lang="en-US">
<head>
<meta content="8Ox5Zr13P_8Yvn6C1nmk9TGpWDW9nNpMOY66B2rIFRQ" name="google-site-verification"/>
<title>Search Page</title>
<script data-turbo-track="reload" nonce="3ShwjkmfQN/GXOvB0xuCWQ==" type="importmap">{
  "imports": {
    "bootstrap": "https://ga.jspm.io/npm:bootstrap@5.3.2/dist/js/bootstrap.js",
    "bootstrap4": "https://ga.jspm.io/npm:bootstrap@4.6.2/dist/js/bootstrap.js",
    "bootstrap4/js/dist/toast": "https://ga.jspm.io/npm:bootstrap@4.6.2/js/dist/toast.js",
    "bootstrap3": "https://ga.jspm.io/npm:bootstrap@3.4.1/dist/js/bootstrap.js",
    "@popperjs/core": "https://ga.jspm.io/npm:@popperjs/core@2.11.8/lib/index.js",
    "popper.js": "https://ga.jspm.io/npm:popper.js@1.16.1/dist/umd/popper.js",
    "@hotwired/stimulus": "https://ga.jspm.io/npm:@hotwired/stimulus@3.2.2/dist/stimulus.js",
    "@hotwired/stimulus-loading": "/assets/stimulus-loading-1fc53fe7a488db9281d2ff88509e8f45d6119ee4.js",
    "@hotwired/turbo-rails": "https://g

In [56]:
jobs = soup.select('table.table tbody tr')

In [57]:
jobs

[<tr data-action="click-&gt;jobs--table-results#navigate">
 <td class="job-search-results-title">
 <a aria-label="Title: Staff Machine Learning Scientist (Search Science) " href="https://careers.coursera.com/jobs/staff-machine-learning-scientist-search-science-united-states" id="link_job_title_1_0_0">
           Staff Machine Learning Scientist (Search Science) 
 </a> </td>
 <td class="job-search-results-department" id="department_1_0_0">
 <ul>
 <li aria-label="Department: Data Science" id="department_1_0_0_0">Data Science
 </li> </ul>
 </td>
 <td class="job-search-results-location">
 <ul>
 <li aria-label="Location: United States" id="location_1_0_0_0">United States</li>
 </ul>
 </td>
 </tr>,
 <tr data-action="click-&gt;jobs--table-results#navigate">
 <td class="job-search-results-title">
 <a aria-label="Title: Staff Accountant" href="https://careers.coursera.com/jobs/staff-accountant-india" id="link_job_title_1_0_1">
           Staff Accountant
 </a> </td>
 <td class="job-search-resul

In [48]:
def extract_job_data(job_posting):
    try:
        job_title = job_posting.find('td', class_='job-search-results-title').a.text.strip()
    except AttributeError:
        job_title = ""

    try:
        department = job_posting.find('td', class_='job-search-results-department').ul.li.text.strip()
    except AttributeError:
        department = ""

    try:
        location_elem = job_posting.find('td', class_='job-search-results-location').ul.find_all('li')
        location = location_elem[0].text.strip() if location_elem else ""
    except AttributeError:
        location = ""

    return job_title, department, location

## Task 4 – Define the main function:

*  Create the main function that performs the specified steps.



In [62]:
def main(jobs):
    job_data_list = []
    for posting in jobs:
        job_data_list.append(extract_job_data(posting))

    with open('job_listings.csv', 'w', newline='', encoding='utf-8') as csvfile:
        fieldnames = ['Title', 'Department', 'Location']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        for job_data in job_data_list:
            writer.writerow({'Title': job_data[0], 'Department': job_data[1], 'Location': job_data[2]})

    print("Success! Job data extracted and saved to CSV.")
    return job_data_list

In [61]:
main(jobs)

Success! Job data extracted and saved to CSV.


[('Staff Machine Learning Scientist (Search Science)',
  'Data Science',
  'United States'),
 ('Staff Accountant', 'Accounting', 'India'),
 ('Data Science Content Strategist', 'Credentials & Content', 'India'),
 ('SaaS Integrations Specialist', 'Product & Content Services', 'Canada'),
 ('ServiceNow Developer', 'IT', 'India'),
 ('Chief of Staff- CRO', 'Product Management', 'United States'),
 ('Machine Learning Engineer', 'Data Science', 'India'),
 ('Enrollment Counselor, LATAM', 'Marketing', 'Colombia'),
 ('Manager, Enrollment Services', 'Degrees Marketing', 'Colombia'),
 ('Senior Tax Analyst', 'Accounting', 'United States')]

## Conclusions:

This web scraping project exemplifies a systematic and modular approach to extracting job data. The use of Python libraries like BeautifulSoup and requests, coupled with careful function design, ensures efficiency and maintainability. Robust error handling addresses potential challenges, emphasizing a thoughtful coding strategy. The project's standout features include a flexible URL generation function and clear documentation, showcasing adaptability and transparency. This portfolio piece demonstrates technical proficiency, problem-solving skills, and a commitment to producing clean, accessible code. Future improvements might explore handling complex website structures or integrating additional data sources for a more comprehensive job search tool.

