Overview

This project creates a tool for scraping data from job-postings on CareerBuilder, aiming to build a report bi-weekly about the labor market. The report aim to answer these questions.

What jobs are currently in high demand?
Are there any direct competitors currently hiring?
What is the salary norm of the market?

Workflow

Techniques

The work-flow of this project is resembled in the scraper.py files, which contains these main functions:

get_soup: Using Selenium to drive the browser.
get_search_soups: Get the html_soup of the job-searching-page
extract_search_page: Using BeautifulSoup to parse information from the html_soup
extract_job_links: Getting the soup of the job_link pages & parse data from them.
merge_search_page_n_job_link: Merge the df_search_page with df_job_link

Execution

scraper.py is a source code file, all the execution of the code source can be found in data_collection.py

The data collected then being saved as .csv files & being cleaned using the cleaning.py, after that, users can find the results in EDA.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
__pycache__		__pycache__
data		data
figures		figures
notebooks		notebooks
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
flow.excalidraw		flow.excalidraw
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

Workflow

Techniques

Execution

About

Uh oh!

Releases

Packages

Uh oh!

Languages

DataSpi/scraping-jobs

Folders and files

Latest commit

History

Repository files navigation

Overview

Workflow

Techniques

Execution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages