### Job Post Data Extraction and Analysis

#### Introduction

##### In the recruitment industry, sourcing job vacancies efficiently is critical for gaining a competitive edge. Manually browsing job posting sites is time-consuming and prone to missed opportunities. This project addresses this challenge by leveraging web scraping techniques to automate the extraction of job posting data from major job sites.

##### The objectives of this analysis are:

    - To increase the efficiency and accuracy of job vacancy sourcing.
    - To provide actionable insights that improve the quality of job postings delivered to clients.
    - To enable a competitive advantage for clients by accessing relevant job openings faster.

##### Target Job Posting Sites:
  - LinkedIn
  - Indeed
  - Monster
  - Glassdoor

##### Our Target Data
  - Job title
  - Company name
  - Location
  - Salary (if available)
  - Job description
  - Date posted

In [15]:
#install prerequisites

In [None]:
!pip install requests
!pip install beautifulsoup4
!pip install selenium

In [3]:
#lets begin with Indeed data

In [15]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

# URL for Indeed search results (Data Analyst jobs)
url = "https://www.indeed.com/jobs?q=data+analyst&l="
# Request page with a user-agent to mimic browser behavior
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}
response = requests.get(url, headers=headers)

# Parse HTML
soup = BeautifulSoup(response.text, 'html.parser')



In [31]:
for string in soup.strings:
    print(repr(string))

'Security Check - Indeed.com'
'\n'
' '
' '
' '
' '
' '
' '
' '
' '
'Find jobs'
' '
' '
' '
'Company reviews'
' '
' '
' '
'Find salaries'
' '
' '
' '
' '
'Sign in'
' '
' '
' '
' '
' '
' '
' '
'Upload your resume'
' '
' '
' '
'Sign in'
' '
' '
' '
'Employers / Post Job'
' '
' '
' '
'Find jobs'
' '
' '
' '
'Company reviews'
' '
' '
' '
'Find salaries'
' '
' '
' '
' '
' '
' '
'Additional Verification Required'
' '
'Enable JavaScript and cookies to continue'
' '
'Your Ray ID for this request is '
'8ff6b679bfe49aa2'
' '
'Need more help? '
'Contact us'
' '
' '


In [None]:
# Extract job postings
jobs = []
for card in soup.find_all('div', class_='job_seen_beacon'):
    title = card.find('h2', class_='jobTitle').text.strip() if card.find('h2', class_='jobTitle') else 'No Title'
    company = card.find('span', class_='companyName').text.strip() if card.find('span', class_='companyName') else 'No Company'
    location = card.find('div', class_='companyLocation').text.strip() if card.find('div', class_='companyLocation') else 'No Location'
    jobs.append({"Job Title": title, "Company": company, "Location": location})

# Convert to DataFrame for better organization
df = pd.DataFrame(jobs)

# Display the data (Optional)
print(df)

# Save to CSV file
df.to_csv('indeed_jobs.csv', index=False)