# Web Scraping Job Vacancies

## Introduction

In this project, we'll build a web scraper to extract job listings from a popular job search platform. We'll extract job titles, companies, locations, job descriptions, and other relevant information.

Here are the main steps we'll follow in this project:

1. Setup our development environment
2. Understand the basics of web scraping
3. Analyze the website structure of our job search platform
4. Write the Python code to extract job data from our job search platform
5. Save the data to a CSV file
6. Test our web scraper and refine our code as needed

## Prerequisites

Before starting this project, you should have some basic knowledge of Python programming and HTML structure. In addition, you may want to use the following packages in your Python environment:

- requests
- BeautifulSoup
- csv
- datetime

These packages should already be installed in Coursera's Jupyter Notebook environment, however if you'd like to install additional packages that are not included in this environment or are working off platform you can install additional packages using `!pip install packagename` within a notebook cell such as:

- `!pip install requests`
- `!pip install BeautifulSoup`

## Step 1: Importing Required Libraries

In [112]:
import requests
from bs4 import BeautifulSoup
import pandas as pd



In [113]:
url = "https://weworkremotely.com/"
response = requests.get(url)
soup = BeautifulSoup(response.content,'html.parser')
html = soup.prettify()

In [114]:
print(html)

<!DOCTYPE html>
<html>
 <head>
  <meta charset="utf-8"/>
  <script>
   window.NREUM||(NREUM={});NREUM.info={"beacon":"bam.nr-data.net","errorBeacon":"bam.nr-data.net","licenseKey":"f7ae79e7ca","applicationID":"192262830","transactionName":"d1gPFhEMXVVWQxwMDVZETgsNB1RB","queueTime":2,"applicationTime":801,"agent":""}
  </script>
  <script>
   (window.NREUM||(NREUM={})).init={ajax:{deny_list:["bam.nr-data.net"]}};(window.NREUM||(NREUM={})).loader_config={licenseKey:"f7ae79e7ca",applicationID:"192262830"};;/*! For license information please see nr-loader-rum-1.283.2.min.js.LICENSE.txt */
  </script>
  <meta content="width=device-width" name="viewport"/>
  <link crossorigin="anonymous" href="https://cdnjs.cloudflare.com/ajax/libs/chosen/1.8.7/chosen.min.css" integrity="sha512-yVvxUQV0QESBt1SyZbNJMAwyKvFTLMyXSyBHDO4BG5t7k/Lw34tyqlSDlKIrIENIzCl+RVUNjmCPG+V/GMesRw==" rel="stylesheet"/>
  <link href="https://weworkremotely.com/assets/application-88f719eff2c409438cb547e6f0c975ca8226c1befa3d2a85

In [130]:
c_name = soup.find_all(class_="new-listing__company-name")
j = soup.find_all(class_="new-listing__header__title")
j_type = soup.find_all(class_="new-listing__company-headquarters")



Found job posts:
https://weworkremotely.com/company/cdata-virtuality
https://weworkremotely.com/company/files-com
https://weworkremotely.com/company/used-conex-llc
https://weworkremotely.com/company/toggl
https://weworkremotely.comhttps://metana.io/full-stack-software-engineer-bootcamp/?utm_source=weworkremotely.com&utm_medium=homepage-ad
https://weworkremotely.com/company/onthegosystems
https://weworkremotely.com/company/the-hoyle-firm-pc
https://weworkremotely.com/company/subscript
https://weworkremotely.com/company/realiste
https://weworkremotely.com/company/ixdf-interaction-design-foundation
https://weworkremotely.com/company/chameleon
https://weworkremotely.com/company/spiralyze
https://weworkremotely.com/company/michael-findeisen-consulting
https://weworkremotely.com/company/6-figure-creative
https://weworkremotely.com/company/almabase
https://weworkremotely.com/company/patchstack
https://weworkremotely.com/company/close
https://weworkremotely.com/company/nogigiddy
https://wework

In [132]:
company_name = []
for job in c_name:
  if job:
    company_name.append(job.get_text(strip=True))
  else:
        company_name.append(None)


job_title = []
for t in j :
 if t:
    job_title.append(t.get_text(strip=True))
 else:
    job_title.append(None)
 
job_links = []
div_class = "jobs-listing"  # Replace with the specific class you want to search for
    
    # Loop through the divs containing the links
for div in soup.find_all('div', class_="tooltip--flag-logo"):
        link = div.find('a', href=True)
        if link:
            job_links.append(link['href'])
    
    

In [136]:
data = {
    'company name': company_name,
    'Job Title': job_title,
    'Link': [f"https://weworkremotely.com{link}" for link in job_links]

}

# Step 5: Create the DataFrame
df = pd.DataFrame(data)

# Step 6: Display the DataFrame
df.head(201)

Unnamed: 0,company name,Job Title,Link
0,CData Virtuality,Senior Backend Developer SaaS,https://weworkremotely.com/company/cdata-virtu...
1,Files.com,UI/UX Designer,https://weworkremotely.com/company/files-com
2,Used Conex LLC,Entry Level Sales Agent - Chat Only - Side Hustle,https://weworkremotely.com/company/used-conex-llc
3,Toggl,Senior Product Designer,https://weworkremotely.com/company/toggl
4,Metana,Coding Tech Bootcamp - Job Guaranteed by Metana,https://weworkremotely.comhttps://metana.io/fu...
...,...,...,...
196,EveryoneSocial,"Backend Engineer (profitable startup, veteran ...",https://weworkremotely.com/company/everyonesocial
197,Scalable Path,Machine Learning Engineer for Agentic AI,https://weworkremotely.com/company/scalable-path
198,SKYCATCHFIRE,Senior Django Developer,https://weworkremotely.com/company/skycatchfire
199,NERIS Analytics Limited,Experienced Backend Developer (Laravel/Vue),https://weworkremotely.com/company/neris-analy...
