# Job Listings Scraper
## Writing the Web Scraping Script for Indeed.com

In [42]:
import requests
from bs4 import BeautifulSoup
import pandas as pd


- **requests:** For making HTTP requests to fetch webpage content.
- **BeautifulSoup:** For parsing the HTML content and extracting data.
- **Pandas:** For storing and manipulating the extracted data.

In [43]:
#Sending a GET Request to Website

In [50]:
url = 'https://de.indeed.com/jobs?q=Data%20Analyst&l=&from=searchOnDesktopSerp'
response = requests.get(url)

print(response.status_code)  # Should print 200 if successful
print(response.text[:500])  # Print the first 500 characters of the page content



403
<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><title>Blocked - Indeed.com</title><meta name="viewport" content="width=device-width, initial-scale=1"><style>/* cyrillic-ext */
@font-face {
  font-family: 'Noto Sans';
  font-style: italic;
  font-weight: 400;
  font-stretch: normal;
  font-display: swap;
  src: url(data:font/woff2;base64,d09GMgABAAAAAGMIAA8AAAAA3eQAAGKmAAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGoFkG7JaHGAGYD9TVEFUWgCDFBEICoLHNIKDTQuGEAABNgIkA4wcBCAFhGgHoCsb6bc32DaNG97tAFaW9Fu8Y


- **url:** The target URL where you want to scrape data.
- **requests.get(url):** Sends a GET request to the specified URL, and the HTML content of the page is returned and stored in response.

In [45]:
#Parsinf the HTML Content
soup = BeautifulSoup(response.content, 'html.parser')


- **BeautifulSoup(response.content, 'html.parser'):** Parses the HTML content using BeautifulSoup. The 'html.parser' is a built-in parser in Python that converts the HTML into a navigable structure.

In [52]:
# Extracting job Listing
job_titles = []
companies = []
locations = []
summaries = []

for job_card in soup.find_all('div', class_='jobsearch-SerpJobCard'):
    title = job_card.find('a', class_='jobtitle').text.strip()
    company = job_card.find('span', class_='company').text.strip()
    location = job_card.find('div', class_='location').text.strip()
    summary = job_card.find('div', class_='summary').text.strip()

    job_titles.append(title)
    companies.append(company)
    locations.append(location)
    summaries.append(summary)


- **.find_all('div', class_='jobsearch-SerpJobCard'):** Finds all job cards on the page, each representing a job listing.
- **.find():** Extracts specific information like job title, company, location, and summary from each job card.
- **.text.strip():** Extracts the text content from the HTML tags and removes any leading/trailing whitespace.


In [51]:
#Storing data in pandas DataFrame
jobs_df = pd.DataFrame({
    'Job Title': job_titles,
    'Company': companies,
    'Location': locations,
    'Summary': summaries
})


In [48]:
# Save DateFrame to Csv File
jobs_df.to_csv('job_listings_indeed.csv', index=False)
print("Job listings have been scraped and saved to job_listings_indeed.csv")


Job listings have been scraped and saved to job_listings_indeed.csv


In [49]:
pd.read_csv('job_listings_indeed.csv')  

Unnamed: 0,Job Title,Location
