Skip to content

Bennykillua/Pyspark-Online-Job-Postings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pyspark-Online Job Postings

Dataset of 19,000 online job posts from 2004 to 2015

This repo was created for the HDSC’ 22 Premier Project: Real Life Machine Learning Topics

View our list of contributors here

Project Details

Scope: Jobs and Career

Project Description

The labor market is continuously evolving with the use of technology to advertise job openings, hence, the need to understand the demand for certain professions and job titles. There is also a need to identify skills that are most frequently required by employers, how the distribution of necessary skills changes over time, and make recommendations to job seekers and employers.

Learn more about the dataset here

Acknowledgements

The data collection and initial research was funded by the American University of Armenia’s research grant (2015).

About dataset

The dataset consists of 19,000 job postings that were posted through the Armenian human resource portal CareerCenter. The data was extracted from the Yahoo! mailing group https://groups.yahoo.com/neo/groups/careercenter-am. This was the only online human resource portal in the early 2000s. A job posting usually has some structure, although some fields of the posting are not necessarily filled out by the client (poster). The data was cleaned by removing posts that were not job related or had no structure. The data consists of job posts from 2004-2015.

You can get more information here

Documentation

Read all about the project here

Exploratory data analysis

Everything about the EDA process can be found here here

About

Dataset of 19,000 online job posts from 2004 to 2015

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages