# Web Scraping Job Vacancies

## Introduction

In this project, we'll build a web scraper to extract job listings from a popular job search platform. We'll extract job titles, companies, locations, job descriptions, and other relevant information.

Here are the main steps we'll follow in this project:

1. Setup our development environment
2. Understand the basics of web scraping
3. Analyze the website structure of our job search platform
4. Write the Python code to extract job data from our job search platform
5. Save the data to a CSV file
6. Test our web scraper and refine our code as needed

## Prerequisites

Before starting this project, you should have some basic knowledge of Python programming and HTML structure. In addition, you may want to use the following packages in your Python environment:

- requests
- BeautifulSoup
- csv
- datetime

These packages should already be installed in Coursera's Jupyter Notebook environment, however if you'd like to install additional packages that are not included in this environment or are working off platform you can install additional packages using `!pip install packagename` within a notebook cell such as:

- `!pip install requests`
- `!pip install BeautifulSoup`

## Step 1: Import Required Libraries

In [21]:
import requests
from bs4 import BeautifulSoup
import csv
from datetime import datetime

## Step 2: Analyze the Website Structure

1. Visit the URL: https://www.jobs.ch/en/vacancies/?location=zurich&term=data%20analyst.
2. Inspect the page (right-click → Inspect) to understand the HTML structure.
3. Identify the tags and classes for job titles, companies, locations, and descriptions.

For jobs.ch:

1. Job titles are usually in "h2" tags.
2. Company names and locations are often in "span" or "div" tags.
3. Job descriptions might be in "div" tags with specific classes.

## Step 3: Write Python Code to Extract Job Data

In [None]:
# Define the URL of the job search page
url = "https://www.jobs.ch/en/vacancies/?term=data%20analyst"

# Send a GET request to the URL
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    print("Successfully fetched the webpage!")
else:
    print(f"Failed to fetch the webpage. Status code: {response.status_code}")

# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')

In [None]:
# Find all job listings on the page
job_listings = soup.find_all('div', class_='d_flex bg-c_brand.01 bdr_r16 flex-d_column h_100% p_s16 pos_relative')
#print(job_listings)

In [None]:
# Initialize a list to store job data
jobs = []

# Loop through each job listing and extract details
for job in job_listings:
    # Extract job title
    title = job.find('span', class_='c_gray.90 mb_s4 mr_s8 textStyle_h4 wb_break-word ov_hidden tov_ellipsis d_-webkit-box hy_auto ov-wrap_break-word white-space_normal word-wrap_break-word box-orient_vertical lc_4').text.strip()
    
    # Extract company name
    company = job.find('p', class_='mb_s12 lastOfType:mb_s0 textStyle_p2 c_gray.90').text.strip()
    
    # Extract job location
    location = job.find('p', class_='mb_s12 lastOfType:mb_s0 textStyle_p2').text.strip()
    
    # Extract job description (if available)
    #description_tag = job.find('div', class_='vacancy-card__description')
    #description = description_tag.text.strip() if description_tag else "No description available"
    
    # Append the job details to the list
    jobs.append({
        'Title': title,
        'Company': company,
        'Location': location
        #'Description': description
    })

# Print the first job to verify
print(jobs[0])

## Step 4: Save the Data to a CSV File

Now that we’ve extracted the data, let’s save it to a CSV file for further analysis.

In [None]:
# Define the filename with a timestamp
filename = f"job_postings_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"

# Write the data to a CSV file
with open(filename, mode='w', newline='', encoding='utf-8') as file:
    writer = csv.DictWriter(file, fieldnames=['Title', 'Company', 'Location'])
    writer.writeheader()
    writer.writerows(jobs)

print(f"Data saved to {filename}")

## Step 5: Test and Refine the Code

1. Run the code and check the output CSV file to ensure it contains the expected data.
2. If the website has pagination (multiple pages of job listings), modify the code to loop through all pages. For example:

In [None]:
base_url = "https://www.jobs.ch/en/vacancies/?term=data%20analyst"
jobs = []

# Loop through the first 5 pages (adjust as needed)
for page in range(1, 6):  # Scrape pages 1 to 5
    url = base_url + str(page)
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    job_listings = soup.find_all('div', class_='d_flex bg-c_brand.01 bdr_r16 flex-d_column h_100% p_s16 pos_relative')
    
    for job in job_listings:
        title = job.find('span', class_='c_gray.90 mb_s4 mr_s8 textStyle_h4 wb_break-word ov_hidden tov_ellipsis d_-webkit-box hy_auto ov-wrap_break-word white-space_normal word-wrap_break-word box-orient_vertical lc_4').text.strip()
        company = job.find('p', class_='mb_s12 lastOfType:mb_s0 textStyle_p2 c_gray.90').text.strip()
        location = job.find('p', class_='mb_s12 lastOfType:mb_s0 textStyle_p2').text.strip()
        
        jobs.append({
            'Title': title,
            'Company': company,
            'Location': location
        })

# Save the updated data to a CSV file
filename = f"job_postings_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
with open(filename, mode='w', newline='', encoding='utf-8') as file:
    writer = csv.DictWriter(file, fieldnames=['Title', 'Company', 'Location'])
    writer.writeheader()
    writer.writerows(jobs)

print(f"Data saved to {filename}")

## Step 6: Analyze and Visualize the

Once the data is saved, you can analyze it using libraries like pandas and matplotlib. For example: