# Web Scraping Job Vacancies : from [Indeed website](www.indeed.com/)

### URL = https://www.indeed.com/jobs?q={query}&l={location}

## Introduction

In this project, we'll build a web scraper to extract job listings from a popular job search platform. We'll extract job titles, companies, locations, job descriptions, and other relevant information.

Here are the main steps we'll follow in this project:

1. Write the Python code to extract job data from our job search platform
2. Save the data to a CSV file

## Prerequisites

Before starting this project, you should have some basic knowledge of Python programming and HTML structure. In addition, you need to create Scrapfly account to get API key, and you will use the following packages in your Python environment:

- scrapfly-sdk
- BeautifulSoup
- csv
- markdownify
- pandas

you can install additional packages using `!pip install packagename` :

- `!pip install scrapfly-sdk`
- `!pip install BeautifulSoup`
- `!pip install pandas`
- `!pip install markdownify`

In [37]:
# Importing Required Libraries
from scrapfly import ScrapflyClient, ScrapeConfig
from bs4 import BeautifulSoup
import pandas as pd
import re
from markdownify import markdownify as md

In [38]:
# f"https://www.linkedin.com/jobs/search?keywords={query}&location={location}"
def get_html_content(url):
    client = ScrapflyClient(key="YOUR Scrapfly API KEY")
    result = client.scrape(ScrapeConfig(url= url, asp=True,))
    return result.content

# asp=True => enable Anti Scraping Protection

In [33]:
query = "data analytic"
location = "USA"
url= f"https://www.indeed.com/jobs?q={query}&l={location}"

search_result = get_html_content(url)

# Parsing the webpage content
soup = BeautifulSoup(search_result, 'html.parser')

jobs_container = soup.find('div', {'id': 'mosaic-provider-jobcards'})

all_jobs = jobs_container.find_all('li')

table_content = []

for row in all_jobs:
    td = row.find('td', {'class': 'resultContent'})
    
    if not td :
        continue
    #---------------------------
    job_info= td.find('h2', {'class': 'jobTitle'})
    job_link = job_info.find('a').attrs['href']
    job_title = job_info.text.strip()
    #---------------------------
    div_company_location = td.find('div', {'class': 'company_location'})
    company_name = div_company_location.find('span', {'data-testid': 'company-name'}).text.strip()
    location = div_company_location.find('div', {'data-testid': 'text-location'}).text.strip()
    #---------------------------
    posted_date =  td.find('div', {'class': 'jobMetaDataGroup'})
    posted_date = posted_date.find('span', {'data-testid': 'myJobsStateDate'}).text.strip()
    posted_date = re.sub(r'Posted', '', posted_date).strip()
    #---------------------------
    job_detail = get_html_content(f"https://www.indeed.com{job_link}")
    soup2 = BeautifulSoup(job_detail, 'html.parser')
    #---------------------------
    salaryInfoAndJobType = soup2.find('div', {'id': 'salaryInfoAndJobType'})
    if salaryInfoAndJobType :
        salaryInfoAndJobType = salaryInfoAndJobType.find_all('span')
        salaryInfo = None

        if len(salaryInfoAndJobType) == 1 :
            jobType = salaryInfoAndJobType[0].text.strip()
        else :
            salaryInfo = salaryInfoAndJobType[0].text.strip()
            jobType = salaryInfoAndJobType[1].text.strip()
    #---------------------------
    job_benefits_container = soup2.find('div', {'id': 'benefits'})
    job_benefits = None
    if job_benefits_container : 
        job_benefits_list = job_benefits_container.find_all('li')
        job_benefits = [text.text.strip() for text in job_benefits_list]
    #---------------------------
    job_description_container = soup2.find('div', {'id': 'jobDescriptionText'})
    # Convert HTML to Markdown
    job_description = md(str(job_description_container))
    #---------------------------
    print("job title : ", job_title)
    print('-'*50)
    
    table_content.append({
        'job title': job_title,
        'job link': f"https://www.indeed.com{job_link}",
        'company name': company_name,
        'location': location,
        'posted date': posted_date,
        'salary Info': salaryInfo,
        'job Type': jobType,
        'job description': job_description,
    })
    

job title :  Data Analytics Specialist
--------------------------------------------------
job title :  Audit Data Scientist
--------------------------------------------------
job title :  Molecular Breeding Data Analyst (McClean)
--------------------------------------------------
job title :  Data Reporting and Analytics Consultant IV - SQL, Tableau
--------------------------------------------------
job title :  Senior Data Analyst
--------------------------------------------------
job title :  Manager, Data Reporting and Analytics - Python, Tableau, SQL
--------------------------------------------------
job title :  Data Reporting and Analytics Consultant V, Programming
--------------------------------------------------
job title :  Data Scientist
--------------------------------------------------
job title :  Behavioral Data Scientist
--------------------------------------------------
job title :  Cloud Data Analytics Specialist
--------------------------------------------------
job 

In [34]:
print(f"Number of Jobs Found : {len(table_content)}")

Number of Jobs Found : 15


In [35]:
data = pd.DataFrame(table_content) 
data

Unnamed: 0,job title,job link,company name,location,posted date,salary Info,job Type,job description
0,Data Analytics Specialist,https://www.indeed.com/rc/clk?jk=d958dbc0f83a5...,"California State University, Long Beach","Long Beach, CA 90840 (State College Area area)",11 days ago,"$4,094 - $7,881 a month",- Full-time,\n\n\n**Job no:** 541893 \n **Work type:** St...
1,Audit Data Scientist,https://www.indeed.com/rc/clk?jk=9a27035fd66e0...,Logistics Management Institute,Remote,6 days ago,,Full-time,\n Overview: \n \n LMI is seeking for a **Audi...
2,Molecular Breeding Data Analyst (McClean),https://www.indeed.com/rc/clk?jk=0c6e7ce0e9311...,North Dakota State University,"Fargo, ND",11 days ago,"Desde $80,000 por año",- Full-time,\n\n\n\n**Description \& Details:**\n\n\nNDSU ...
3,Data Reporting and Analytics Consultant IV - S...,https://www.indeed.com/rc/clk?jk=3773fbb1ac1ac...,Kaiser Permanente,"Oakland, CA",4 days ago,"$127,600 - $165,110 por año",- Full-time,"\n\n\n**Remote from any KP location in CA, OR,..."
4,Senior Data Analyst,https://www.indeed.com/rc/clk?jk=0843d5fec93c4...,Tulane University,"New Orleans, LA",30+ days ago,"$127,600 - $165,110 por año",- Full-time,\n\n\n\n The Senior Data Analyst at the Cowen ...
5,"Manager, Data Reporting and Analytics - Python...",https://www.indeed.com/rc/clk?jk=f7f2dc634f81b...,Kaiser Permanente,"Hyattsville, MD 20785",25 days ago,"$133,500 - $172,700 a year",- Full-time,\n\n**Job Summary:**\n \n This manager level...
6,"Data Reporting and Analytics Consultant V, Pro...",https://www.indeed.com/rc/clk?jk=91b8d9cff9d31...,Kaiser Permanente,"Santa Ana, CA",5 days ago,"$141,600 - $183,150 a year",- Full-time,\n\n**Job Summary:**\n \n In addition to the...
7,Data Scientist,https://www.indeed.com/rc/clk?jk=5f6d77ee91a9f...,"Vizient, Inc.","Chicago, IL",5 days ago,,"$77,400 - $127,600 por año","\n\n\nWhen you’re the best, we’re the best. We..."
8,Behavioral Data Scientist,https://www.indeed.com/rc/clk?jk=d499a5703adc7...,Parsons,"Springfield, VA",30+ days ago,"$96,400 - $168,700 por año",- Full-time,"\n In a world of possibilities, pursue one wit..."
9,Cloud Data Analytics Specialist,https://www.indeed.com/rc/clk?jk=4601949755650...,Logistics Management Institute,Remote,6 days ago,,Full-time,\n Overview: \n \n LMI is seeking for a **Audi...


In [36]:
# saving data to csv file :
data.to_csv("indeed_jobs.csv")