Recruitment Industry Use Case: Job Description and Resume Matching leveraging Llama-2-7b, GPT-J Embeddings, FAISS, and Amazon SageMaker

In this notebook, we will deploy the Llama-2-7b model using the Python 3 kernel and the ml.t3.medium instance type. The model will function as a generation model for generating responses.

The SageMaker endpoint instance to be used is ml.g5.4xlarge.

To perform inference on the Llama models, include custom_attributes='accept_eula=true' in the header to confirm acceptance of the end-user license agreement (EULA). By default, custom_attributes='accept_eula=false' is set, so inference requests will fail unless explicitly changed to true.

This notebook involves loading resumes and job descriptions using DirectoryLoader, splitting them into smaller chunks with RecursiveCharacterTextSplitter. GPT-J 6B generates embeddings for both, which are stored in a FAISS vector store for efficient similarity search. When a recruiter queries the system, FAISS retrieves the most relevant job descriptions and resumes. A prompt template is created to pass this context to the Llama-2-7b model, deployed via SageMaker JumpStart, which then generates a response such as a recommendation score or fitment summary.

Prerequisites

In [None]:
%pip install faiss-cpu==1.7.4 --quiet

In [None]:
%pip install langchain==0.0.222 --quiet

In [None]:
%%capture 
!pip install PyYAML

Importing necessary libraries

In [None]:
import requests
import logging
import boto3
import yaml
import json
import numpy as np
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import DirectoryLoader, PDFPlumberLoader
from langchain.embeddings.sagemaker_endpoint import EmbeddingsContentHandler
from langchain.embeddings import SagemakerEndpointEmbeddings
from langchain.vectorstores import FAISS
from sagemaker.jumpstart.model import JumpStartModel
from tqdm.contrib.concurrent import process_map
from multiprocessing import cpu_count

Setting up Logs

In [None]:
logger = logging.getLogger('sagemaker')
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())
print(logger)

Log versions of dependencies

In [None]:
logger.info(f'Using requests=={requests.__version__}')
logger.info(f'Using pyyaml=={yaml.__version__}')

SageMaker endpoint and region setup

In [None]:
import os
os.environ["AWS_DEFAULT_REGION"] = "us-east-1"
TEXT_EMBEDDING_MODEL_ENDPOINT_NAME = 'huggingface-textembedding-gpt-j-6b-fp16-1705613925'
MODEL_ID = 'huggingface-textembedding-gpt-j-6b-fp16'  
MODEL_VERSION = '*'
INSTANCE_TYPE = 'ml.g4dn.2xlarge'
INSTANCE_COUNT = 1
IMAGE_SCOPE = 'inference'
MODEL_DATA_DOWNLOAD_TIMEOUT = 3600  # in seconds
CONTAINER_STARTUP_HEALTH_CHECK_TIMEOUT = 3600
CONTENT_TYPE = 'application/json'

Set up roles and clients 

In [None]:
client = boto3.client('sagemaker-runtime')
REGION_NAME = boto3.session.Session().region_name
print(REGION_NAME)

Load Resumes and Job Descriptions

In [None]:
!pip install pdfplumber
loader = DirectoryLoader("./resumes/", glob="**/*.pdf", loader_cls=PDFPlumberLoader)
job_description_loader = DirectoryLoader("./job_descriptions/", glob="**/*.txt", loader_cls=TextLoader)

resumes = loader.load()
job_descriptions = job_description_loader.load()

print(resumes[0].page_content)
print(resumes[0].metadata)

print(job_descriptions[0].page_content)
print(job_descriptions[0].metadata)

Split documents into smaller chunks (to handle large texts)

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
resume_chunks = text_splitter.split_documents(resumes)
job_description_chunks = text_splitter.split_documents(job_descriptions)

Print information about document length

In [None]:
avg_doc_length = lambda documents: sum([len(doc.page_content) for doc in documents]) // len(documents)
avg_char_count_pre = avg_doc_length(resumes)
avg_char_count_post = avg_doc_length(resume_chunks)
print(f'Average length among {len(resumes)} resumes loaded is {avg_char_count_pre} characters.')
print(f'After the split we have {len(resume_chunks)} resume chunks. Average length is {avg_char_count_post} characters.')

In [None]:
# Generate embeddings for resumes and job descriptions using GPT-J model via SageMaker

from typing import Any, Dict, List, Optional

class ContentHandler(EmbeddingsContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, inputs: list[str], model_kwargs: Dict) -> bytes:
        input_str = json.dumps({"text_inputs": inputs, **model_kwargs})
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> List[List[float]]:
        response_json = json.loads(output.read().decode("utf-8"))
        return response_json["embedding"]

content_handler = ContentHandler()

In [None]:
# SageMaker embedding model for resumes and job descriptions
embedding_model = SagemakerEndpointEmbeddings(
    endpoint_name=TEXT_EMBEDDING_MODEL_ENDPOINT_NAME,
    region_name=REGION_NAME,
    content_handler=content_handler,
)

In [None]:
# Generate embeddings for resumes
resume_embeddings = [embedding_model.embed_query(resume.page_content) for resume in resume_chunks]

In [None]:
# Generate embeddings for job descriptions
job_description_embeddings = [embedding_model.embed_query(job_desc.page_content) for job_desc in job_description_chunks]

In [None]:
# Create FAISS vector stores for efficient retrieval
faiss_resumes = FAISS.from_documents(resume_chunks, embedding_model)
faiss_job_descriptions = FAISS.from_documents(job_description_chunks, embedding_model)

In [None]:
# Add embeddings to FAISS stores
faiss_resumes.add_embeddings(resume_embeddings)
faiss_job_descriptions.add_embeddings(job_description_embeddings)

In [None]:
# Save FAISS index locally
faiss_resumes.save_local("faiss_resume_index")
faiss_job_descriptions.save_local("faiss_job_description_index")

In [None]:
# Search and retrieve relevant job descriptions based on a query
query = "Looking for a Software Engineer with experience in Java and microservices"
query_embedding = faiss_job_descriptions.embedding_function(query)
relevant_job_desc = faiss_job_descriptions.similarity_search_by_vector(query_embedding)

In [None]:
# Extract context from relevant job descriptions
context_job_desc = ""
for job_desc in relevant_job_desc:
    context_job_desc += job_desc.page_content
context_job_desc = context_job_desc.replace("\n", " ")

In [None]:
# Search and retrieve relevant resumes based on the retrieved job descriptions
context_resume = ""
for job_desc in relevant_job_desc:
    query = f"Find resumes matching job description: {job_desc.page_content}"
    query_embedding_resume = faiss_resumes.embedding_function(query)
    relevant_resumes = faiss_resumes.similarity_search_by_vector(query_embedding_resume)

    for resume in relevant_resumes:
        context_resume += resume.page_content

context_resume = context_resume.replace("\n", " ")

In [None]:
# Define the prompt template for Llama-2-7b to generate answers
template = """
    As an experienced and detail-oriented assistant, your objective is to provide clear, accurate, and concise responses based on the information provided. 
    If the answer is not available or unclear, please indicate that you are unable to provide an answer.

    CONTEXT:
    {context}

    =========
    QUESTION: {question} 
    RESPONSE:
"""       
prompt = template.format(context=context_resume, question="Is this candidate a good fit for the role?")
print("Prompt sent to Llama-2-7b:")
print(prompt)

In [None]:
# Deploy the Llama-2-7b model on SageMaker JumpStart
role = sagemaker.get_execution_role()
my_model = JumpStartModel(model_id="meta-textgeneration-llama-2-7b-f")
predictor = my_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.4xlarge",
    endpoint_name="llama-2-recruitment-model"
)

In [None]:
# Prepare the payload for the model
payload = {
    "inputs": [
        [
            {"role": "system", "content": prompt},
            {"role": "user", "content": query},
        ]
    ],
    "parameters": {"max_new_tokens": 64, "top_p": 0.9, "temperature": 0.6, "return_full_text": False}
}

In [None]:
# Get the response from Llama-2-7b
response = predictor.predict(payload, custom_attributes='accept_eula=true')
print("Response from Llama-2-7b model:")
print(response[0]['generation']['content'])