Skip to content

This was a group project that i worked on for Data & Visual Analytics coursework at Georgia Tech. The motivation behind the project is to use ML, NLP to help job seekers with insights.

Notifications You must be signed in to change notification settings

Blitz464/Job-recommendation-scoring

Repository files navigation

DESCRIPTION

This repository contains code for cleaning & processing data, creating an LSTM-based model to extract skills from text data, and generating job fit scores for resumes. The model is implemented using TensorFlow and Keras and utilizes pre-trained GloVe word embeddings for enhanced performance.

Datasets:

Data Preparation(notebooks):

  • translate_desc_to_en.ipynb: Jupyter notebook containing code for cleaning the raw data. Steps include text translation, fuzzy job title matching, generating coordinates from location name etc.
  • data_prep_for_final_model.ipynb: Jupyter notebook - preparing the data for the model. includes grouping sectors, removing clusters with small sample sizes etc.

Model building and validation:

  • LSTM_model_creation.ipynb: Reference to source for Jupyter Notebook containing the code for LSTM model creation
  • np_train_skills_no_commas.csv: CSV file containing the training data.
  • lstm_skill_extractor.h5: Saved model file.
  • keyword_extraction_model-main.ipynb: Code for extracting keywords using the model output and further data processing and cleaning and data manipulation. Includes job similarity score calculation between different job titles.
  • Job_fit_scoring_based_on_resume_input.ipynb: Code for generating job fit scores for resumes.

EXECUTION

Ensure you have the necessary dependencies installed:

pip install pandas tensorflow keras tqdm nltk numpy matplotlib seaborn scikit-learn googletrans beautifulsoup4 geopandas

Execute the cells in the notebooks sequentially to clean the data, train the model and generate results.

The training process includes loading the dataset, tokenizing text, creating word embeddings, and training the LSTM model.

The trained model is saved as lstm_skill_extractor.h5 in the same project directory.

The model's performance is evaluated using accuracy, precision, recall, and a confusion matrix on a test set.

The model output is used to predict the best job fit for a sample of resume and compared the recommendation with actual job the resume was applied for.

Additional Notes

Other files like the Tableau Dashboard is not uploaded yet. Which contains the final visualizations such as top keyword clouds, extrinsic evaluation results on sample resumes, Job similarity scores between different jobs etc.

Apart from LSTM_model_creation.ipynb, which was inspired by the work done RemeAjayi, I took charge of the entire coding process for rest of the notebooks shared in the repo.

About

This was a group project that i worked on for Data & Visual Analytics coursework at Georgia Tech. The motivation behind the project is to use ML, NLP to help job seekers with insights.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published