We will be using pre-trained Transformer model from Hugging Face for this NLP task of matching given job descriptions with job titles and find the best matches.



In [1]:
%%capture
!pip install -U sentence-transformers

In [2]:
import numpy as np
import pandas as pd
from sentence_transformers import SentenceTransformer, util

In [3]:
# Read-in the job descriptions from a csv file

jobs = pd.read_csv("/content/Jobs.csv")
jobs

Unnamed: 0,job_id,job_description
0,134498,AWS and Python/Pyspark Developer will be respo...
1,167896,Talenteq Technology India Pvt Ltd is hiring ca...
2,198356,We are hiring for HR Recruiter. Share your res...
3,15472,Pandaj Web Services is looking to hire Team Le...
4,28761,Experience: 1-2 years\nLocation: Technopolis I...


In [4]:
# let's check out the first job entry

print(jobs['job_description'][0])

AWS and Python/Pyspark Developer will be responsible for developing and debugging process flows Related to Data Analytics using AWS services. Understanding of AWS services mentioned below is a must:

1. Amazon S3
2. AWS Lambda
3. Amazon Redshift
4. AWS Glue and Data Catalog
5. Amazon EC2
6. AWS Athena
7. Data Lake

Location: Noida, Bangalore
Experience: Minimum 1 years
Designation: Business Analyst / Senior Business Analyst

Role and Responsibilities:

- Develop Server-less Lambda functions to achieve connectivity between services using any scripting language e.g. Python.
- Usage of AWS Glue Crawlers for populating Data Catalog with metadata.
- AWS Glue Job creation for ETL services for large sets of data in Pyspark
- Data validation using services within AWS.
- Implementation of EC2 servers and their working.
- Usage of Amazon Redshift for Database.
- Performing ETL operations in Python.
- Good knowledge of python OOPS concepts and libraries such as pandas, NumPy, mat plot, etc.

Cand

**We will be using Semantic Search for this task.**

*   Convert each job description into embedding vector
*   Convert each job title into embedding vector
*   Perform a Semantic Search of the job description embedding with the job title embedding to find the best possible match (top score)
*   Display the job title that has the top score as the best match



In [5]:
# Let's download the model in the local system

checkpoint = "clips/mfaq"
model = SentenceTransformer(checkpoint)



In [6]:
# Search for the best answers among the corpus of answers for the given question

job_titles = [
    'Software Developer',
    'Sales Lead',
    'Painter',
    'Stock Trader',
    'Electrician',
    'Human Resources Professional',
    'Strategy Consultant',
    'Chauffeur',
    'Copywriter'
]

title_embeddings = model.encode(job_titles)

matched_job_title = []
for job_desc in jobs['job_description']:

    job_embedding = model.encode(job_desc)
    scores = util.semantic_search(job_embedding, title_embeddings)
    
    matched_job_title.append(job_titles[scores[0][0]['corpus_id']])
    print(f"\n>>>>> Best match for Job Title is: {job_titles[scores[0][0]['corpus_id']]}\n")
    print("Job Description:\n", job_desc, sep='')
    print("----"*30)

jobs['job_title'] = matched_job_title


>>>>> Best match for Job Title is: Software Developer

Job Description:
AWS and Python/Pyspark Developer will be responsible for developing and debugging process flows Related to Data Analytics using AWS services. Understanding of AWS services mentioned below is a must:

1. Amazon S3
2. AWS Lambda
3. Amazon Redshift
4. AWS Glue and Data Catalog
5. Amazon EC2
6. AWS Athena
7. Data Lake

Location: Noida, Bangalore
Experience: Minimum 1 years
Designation: Business Analyst / Senior Business Analyst

Role and Responsibilities:

- Develop Server-less Lambda functions to achieve connectivity between services using any scripting language e.g. Python.
- Usage of AWS Glue Crawlers for populating Data Catalog with metadata.
- AWS Glue Job creation for ETL services for large sets of data in Pyspark
- Data validation using services within AWS.
- Implementation of EC2 servers and their working.
- Usage of Amazon Redshift for Database.
- Performing ETL operations in Python.
- Good knowledge of pytho

In [8]:
# Let's checkout the jobs dataframe

jobs

Unnamed: 0,job_id,job_description,job_title
0,134498,AWS and Python/Pyspark Developer will be respo...,Software Developer
1,167896,Talenteq Technology India Pvt Ltd is hiring ca...,Electrician
2,198356,We are hiring for HR Recruiter. Share your res...,Human Resources Professional
3,15472,Pandaj Web Services is looking to hire Team Le...,Sales Lead
4,28761,Experience: 1-2 years\nLocation: Technopolis I...,Software Developer
