DESCRIPTION

This repository contains code for cleaning & processing data, creating an LSTM-based model to extract skills from text data, and generating job fit scores for resumes. The model is implemented using TensorFlow and Keras and utilizes pre-trained GloVe word embeddings for enhanced performance.

Datasets:

Data Souce Link.txt: Link to raw scraped data with 160K+ job postings from Glassdoor (Kaggle).
resume_dataset.csv: Sample data used to check model performance in job recommendation.
ne_50m_admin_0_countries.shp: Shapefile used for data visualization and extracting missing coordinates.
glove.6B.100d.txt: Pre-trained GloVe word embeddings file reference.

Data Preparation(notebooks):

translate_desc_to_en.ipynb: Jupyter notebook containing code for cleaning the raw data. Steps include text translation, fuzzy job title matching, generating coordinates from location name etc.
data_prep_for_final_model.ipynb: Jupyter notebook - preparing the data for the model. includes grouping sectors, removing clusters with small sample sizes etc.

Model building and validation:

LSTM_model_creation.ipynb: Reference to source for Jupyter Notebook containing the code for LSTM model creation
np_train_skills_no_commas.csv: CSV file containing the training data.
lstm_skill_extractor.h5: Saved model file.
keyword_extraction_model-main.ipynb: Code for extracting keywords using the model output and further data processing and cleaning and data manipulation. Includes job similarity score calculation between different job titles.
Job_fit_scoring_based_on_resume_input.ipynb: Code for generating job fit scores for resumes.

EXECUTION

Ensure you have the necessary dependencies installed:

pip install pandas tensorflow keras tqdm nltk numpy matplotlib seaborn scikit-learn googletrans beautifulsoup4 geopandas

Execute the cells in the notebooks sequentially to clean the data, train the model and generate results.

The training process includes loading the dataset, tokenizing text, creating word embeddings, and training the LSTM model.

The trained model is saved as lstm_skill_extractor.h5 in the same project directory.

The model's performance is evaluated using accuracy, precision, recall, and a confusion matrix on a test set.

The model output is used to predict the best job fit for a sample of resume and compared the recommendation with actual job the resume was applied for.

Additional Notes

Other files like the Tableau Dashboard is not uploaded yet. Which contains the final visualizations such as top keyword clouds, extrinsic evaluation results on sample resumes, Job similarity scores between different jobs etc.

Apart from LSTM_model_creation.ipynb, which was inspired by the work done RemeAjayi, I took charge of the entire coding process for rest of the notebooks shared in the repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DESCRIPTION

Datasets:

Data Preparation(notebooks):

Model building and validation:

EXECUTION

Additional Notes

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Job_fit_scoring_based_on_resume_input.ipynb		Job_fit_scoring_based_on_resume_input.ipynb
LSTM_model_creation.ipynb		LSTM_model_creation.ipynb
README.md		README.md
data_prep_for_final_model.ipynb		data_prep_for_final_model.ipynb
keyword_extraction_model-main.ipynb		keyword_extraction_model-main.ipynb
lstm_skill_extractor.h5		lstm_skill_extractor.h5
ne_50m_admin_0_countries.shp		ne_50m_admin_0_countries.shp
np_train_skills_no_commas.csv		np_train_skills_no_commas.csv
resume_dataset.xlsx		resume_dataset.xlsx
translate_desc_to_en.ipynb		translate_desc_to_en.ipynb

Blitz464/Job-recommendation-scoring

Folders and files

Latest commit

History

Repository files navigation

DESCRIPTION

Datasets:

Data Preparation(notebooks):

Model building and validation:

EXECUTION

Additional Notes

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages