We will be using pre-trained Transformer model from Hugging Face for this NLP task of matching given job descriptions with job titles and find the best matches.



In [1]:
%%capture
!pip install -U sentence-transformers

In [2]:
import numpy as np
import pandas as pd
from sentence_transformers import SentenceTransformer, util

In [3]:
# Read-in the job descriptions from a csv file

jobs = pd.read_csv("/content/Jobs.csv")
jobs

Unnamed: 0,job_id,job_description
0,134498,AWS and Python/Pyspark Developer will be respo...
1,167896,Talenteq Technology India Pvt Ltd is hiring ca...
2,198356,We are hiring for HR Recruiter. Share your res...
3,15472,Pandaj Web Services is looking to hire Team Le...
4,28761,Experience: 1-2 years\nLocation: Technopolis I...


In [4]:
# let's check out the first job entry

print(jobs['job_description'][0])

AWS and Python/Pyspark Developer will be responsible for developing and debugging process flows Related to Data Analytics using AWS services. Understanding of AWS services mentioned below is a must:

1. Amazon S3
2. AWS Lambda
3. Amazon Redshift
4. AWS Glue and Data Catalog
5. Amazon EC2
6. AWS Athena
7. Data Lake

Location: Noida, Bangalore
Experience: Minimum 1 years
Designation: Business Analyst / Senior Business Analyst

Role and Responsibilities:

- Develop Server-less Lambda functions to achieve connectivity between services using any scripting language e.g. Python.
- Usage of AWS Glue Crawlers for populating Data Catalog with metadata.
- AWS Glue Job creation for ETL services for large sets of data in Pyspark
- Data validation using services within AWS.
- Implementation of EC2 servers and their working.
- Usage of Amazon Redshift for Database.
- Performing ETL operations in Python.
- Good knowledge of python OOPS concepts and libraries such as pandas, NumPy, mat plot, etc.

Cand

In [5]:
# Let's download the model in the local system

checkpoint = "all-MiniLM-L6-v2"
model = SentenceTransformer(checkpoint)

In [8]:
job_desc = list(jobs['job_description'])

job_titles = [
    'Software Developer',
    'Sales Lead',
    'Painter',
    'Stock Trader',
    'Electrician',
    'Human resource recruiter'
]

# Obtaining the vector embedding representations of the job descriptions and the job titles
jd_embeddings = model.encode(job_desc)
jt_embeddings = model.encode(job_titles)

# Calculating cosine similarity between all pairs of the job descriptions and the job titles
cos_sim = util.cos_sim(jd_embeddings, jt_embeddings)
cos_sim

tensor([[ 0.2658,  0.1255, -0.0247,  0.0980,  0.2656,  0.2582],
        [ 0.2969,  0.0883,  0.1079,  0.1096,  0.4701,  0.3185],
        [ 0.2149, -0.0012,  0.0590,  0.1593,  0.2143,  0.5619],
        [ 0.2784,  0.4891,  0.0545,  0.2049,  0.2674,  0.4338],
        [ 0.3686,  0.0636,  0.0934,  0.1758,  0.1846,  0.2309]])

In [9]:
# Now let's see which are the best matches for job tiltle for each job description

title_matches = np.argmax(cos_sim.numpy(), axis=1)

for jd, jt in zip(job_desc, title_matches):
    print("\nJob Description:\n\n", jd)
    print("\n>>>>> Best match for Job Title: ", job_titles[jt])
    print("---"*30)


Job Description:

 AWS and Python/Pyspark Developer will be responsible for developing and debugging process flows Related to Data Analytics using AWS services. Understanding of AWS services mentioned below is a must:

1. Amazon S3
2. AWS Lambda
3. Amazon Redshift
4. AWS Glue and Data Catalog
5. Amazon EC2
6. AWS Athena
7. Data Lake

Location: Noida, Bangalore
Experience: Minimum 1 years
Designation: Business Analyst / Senior Business Analyst

Role and Responsibilities:

- Develop Server-less Lambda functions to achieve connectivity between services using any scripting language e.g. Python.
- Usage of AWS Glue Crawlers for populating Data Catalog with metadata.
- AWS Glue Job creation for ETL services for large sets of data in Pyspark
- Data validation using services within AWS.
- Implementation of EC2 servers and their working.
- Usage of Amazon Redshift for Database.
- Performing ETL operations in Python.
- Good knowledge of python OOPS concepts and libraries such as pandas, NumPy, 