# Web Scraping Job Vacancies

## Introduction

In this project, we'll build a web scraper to extract job listings from a popular job search platform. We'll extract job titles, companies, locations, job descriptions, and other relevant information.

Here are the main steps we'll follow in this project:

1. Setup our development environment
2. Understand the basics of web scraping
3. Analyze the website structure of our job search platform
4. Write the Python code to extract job data from our job search platform
5. Save the data to a CSV file
6. Test our web scraper and refine our code as needed

## Prerequisites

Before starting this project, you should have some basic knowledge of Python programming and HTML structure. In addition, you may want to use the following packages in your Python environment:

- requests
- BeautifulSoup
- csv
- datetime

These packages should already be installed in Coursera's Jupyter Notebook environment, however if you'd like to install additional packages that are not included in this environment or are working off platform you can install additional packages using `!pip install packagename` within a notebook cell such as:

- `!pip install requests`
- `!pip install BeautifulSoup`

## Step 1: Importing Required Libraries

In [143]:
import requests
from bs4 import BeautifulSoup

url = "https://remoteok.com/remote-dev-jobs"
headers = {"User-Agent": "Mozilla/5.0"}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")

Jobs = soup.find_all("tr", class_="job")

all_jobs = []

for job in Jobs:

    # for job title
    title = job.find("h2", itemprop="title")
    title = title.text.strip() if title else "Not found"

    # for company name
    company = job.find("h3", itemprop="name")
    company = company.text.strip() if company else "Not found"

    # for location and salary
    location = "Not found"
    salary = "Not found"

    location_and_salary = job.find_all("div", class_="location")

    for div in location_and_salary:
        text = div.text.strip()
        if "$" in text or "€" in text or "Salary" in text:
            salary = text
        else:
            if location == "Not found":
                location = text
            else:
                location += ", " + text

    # Skills
    Skills = job.find_all("div", class_="tag")
    skills = [s.text.strip() for s in Skills]
    if not skills:
        skills = ["Not found"]

    # for links
    link = job.get("data-href")
    link = "https://remoteok.com" + link if link else "Not found"

    # print all results
    print(f"Title    : {title}")
    print(f"Company  : {company}")
    print(f"Location : {location}")
    print(f"Salary   : {salary}")
    print(f"Skills   : {', '.join(skills)}")
    print(f"Link     : {link}")
    print("-" * 50)

    all_jobs.append([title, company, location, salary, ", ".join(skills), link])

Title    : Senior Fullstack Software Engineer
Company  : Blotato
Location : 🌏 Worldwide
Salary   : 💰 $80k - $120k
Skills   : JavaScript, Typescript, Heroku, AWS
Link     : https://remoteok.com/remote-jobs/remote-senior-fullstack-software-engineer-blotato-1093150
--------------------------------------------------
Title    : Typescript Engineer
Company  : wander.com
Location : 🌏 Worldwide
Salary   : 💰 $50k - $200k
Skills   : Not found
Link     : https://remoteok.com/remote-jobs/remote-typescript-engineer-wander-com-1093244
--------------------------------------------------
Title    : Go web dev and infra devops
Company  : SaasEdge
Location : 🌏 Worldwide, ⏰ Contractor
Salary   : 💰 $110k - $140k
Skills   : Golang, Linux, CSS, Nginx, Cybersecurity, Htmx, Dns, Nginx
Link     : https://remoteok.com/remote-jobs/remote-go-web-dev-and-infra-devops-saasedge-1093359
--------------------------------------------------
Title    : Lead Data Engineer
Company  : Open Architects
Location : 🇺🇸 United Stat

In [144]:
import re

def remove_emojis(text):
    if not text:
        return text
    return re.sub(r'[^\w\s,.\-/$€]', '', text)
# Example: Writing to CSV
import csv

with open("remote_jobs.csv", "w", newline="", encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerow(["Title", "Company", "Location", "Salary", "Skills", "Link"])
    for job in all_jobs:
        # Clean emojis only for CSV
        clean_row = [remove_emojis(field) if isinstance(field, str) else field for field in job]
        writer.writerow(clean_row)
