Skip to content

Job listing webscraper using an ETL process to clean, transform and visualize word frequencies.

Notifications You must be signed in to change notification settings

DylanRCastillo/Indeed-Web-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Indeed Web Scraper

Background

Using an ETL process to scrap Indeed job listing information and analyzing and visualizing word frequencies to optimize resume performance.

Objective

Part 1:

Extract, Transform, and Load

  • 4 Functions, including:

    • First extract & transform functions to grab url for each job listing
    • Second extract_expand & transform_expand function to grab the job title, company, job description, website, and date scraped saving information into a dicitonary.

    data1 data1

  • For loop, including:

    • Runs all the above functions to webscrape and save information into a CSV for later use

    data1

Part 2:

Data Cleaning

  • Create an overview table of CSV, including:
    • Company
    • Job Description
    • Website
    • Date Scraped data1

Cleaning

  • Separating and removing any unnecessary tickers from job description. Includes:

    • "-", "/", "$", ":"

    data1

Filtering & Tokenizing

  • Create a list of job listing stop words and uses a list comprehension to only saving relevant word tokens. Includes:

    data1

  • Create a list of resume stop words and uses a list comprehension to only saving relevant word tokens. Includes:

    data1

Saving to DataFrame

  • Looping through words inside resume and checking if they are in job description.

    data1

  • Looping through words inside job description and checking if they are in resume.

    data1

Visualizing Results

  • Visualized word frequencies using world cloud. Includes:

    • Job Listing

    data1

    • Resume

    data1

About

Job listing webscraper using an ETL process to clean, transform and visualize word frequencies.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published