# Creating Embeddings Using Sentence Transformers

First need to change working directory to root directory

In [2]:
import os
%pwd
os.chdir("../")
%pwd

'c:\\Users\\amman\\Documents\\Generative AI\\End-to-End-AI-Resume-Matcher'

Load the cleaned resume and job listings data

In [3]:
import pandas as pd

In [4]:
df_jobs = pd.read_csv("data/jobs_desc_clean.csv")

We'll use **all-MiniLM-L6-v2**, a **fast & efficient BERT-based model** for generating embeddings.

In [5]:
model_name = "all-MiniLM-L6-v2"

# Generate embeddings using sentence transformers
from sentence_transformers import SentenceTransformer

# Load the sentence transformer model
model = SentenceTransformer(model_name)

# Generate embeddings for job listings
job_embeddings = model.encode(df_jobs["description"].tolist(), show_progress_bar=True)

  from .autonotebook import tqdm as notebook_tqdm
Batches: 100%|██████████| 63/63 [01:38<00:00,  1.56s/it]


## FAISS (Facebook AI Similarity Search)

We need to create a function to extract text from the resumes which will be uploaded as PDF files in our case

In [6]:
import fitz # pymupdf

def extract_text_from_pdf(pdf_path: str):
    
    doc = fitz.open(pdf_path)
    text = "" 

    for page in doc:
        text += page.get_text("text") +"\n"

    return text.strip()




Now we store the previously created job listing embeddings in FAISS

In [7]:
import faiss 
import numpy as np 

# Convert the job embeddings into numpy arrays 
job_embeddings = np.array(job_embeddings).astype("float32")

# Create FAISS index and add job embeddings
dimension = job_embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(job_embeddings)

print(f"FAISS index contains {index.ntotal} job descriptions.")

FAISS index contains 2000 job descriptions.


Now we create a function that finds the top k jobs for a given resume.

In [8]:
def find_top_jobs(resume_path, top_k=5):

    # Extract text from Resume PDF file
    resume_text = extract_text_from_pdf(resume_path)
    # Generate embeddings for the given resume (DO NOT STORE)
    resume_vector = model.encode([resume_text]).astype("float32")

    # Perform similarity search in FAISS
    distances, indices = index.search(resume_vector, top_k)

    matched_jobs = df_jobs.iloc[indices[0], 1]

    return matched_jobs


Lets test on an example PDF from the resume dataset

In [9]:
import os 
import random 
import glob

# Get list of different resume types to pick randomly from each time
list_of_resume_types = [name for name in os.listdir("data\PDF resumes")]

random_resume_type = random.choice(list_of_resume_types)
random_resume = os.path.relpath(random.choice(glob.glob(f'data/PDF resumes/{random_resume_type}/*.pdf')))
print(random_resume)

data\PDF resumes\BPO\30709029.pdf


In [10]:
matched_jobs = find_top_jobs(resume_path=random_resume, top_k=3)

for i in matched_jobs:
    print(i)
    print('\n')

the real estate acquisition consultant is responsible for buying and selling homes for house buyers of america this person will follow up on leads value houses estimate repairs and close deals in addition to performing extensive due diligence on all acquisitions this role is fully remote what you will dofollow up on leads value properties analyze comps and acquire new homesnegotiate acquisitions and dispositions of propertiesperform extensive due diligence on all acquisitions and prepare contracts for ratificationestimate repairs and determine arv and asis value of propertiesmeet with homeowners facetoface to present offers at their propertieswork with realtors buyers lenders and title during the closing process about youyou have years of sales experienceyou have been consistently ranked within the top of sales staff in previous rolesyou have great communication skills and computer skills including microsoft officeyou have a bachelors degree or higher why we are a great place to workou

Lets make the result more appealing for users by creating a RAG system, in which we pass the top_k jobs to an LLM to present these job descriptions in a more professional and readable manner.

In [13]:
import openai
import os
from dotenv import load_dotenv

load_dotenv()
openai_client = openai.OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def generate_rag_response(resume_pdf_path: str, top_k=3):
    """
    Retrieves top_k job descriptions using FAISS and generates a summary using an LLM.
    """

    # Step 1: Retrieve top_k job descriptions using FAISS
    matched_jobs = find_top_jobs(resume_pdf_path, top_k)

    # Step 2: Format job descriptions as input for LLM
    job_texts = "\n\n".join([f"- {desc}" for desc in matched_jobs.tolist()])

    # Step 3: Define LLM prompt
    prompt = f"""
    You are an AI career assistant helping job seekers. The user has uploaded a resume, 
    and you've retrieved the most relevant job descriptions. Your task is to take these
    jobs and present them to the user in a professional, informative and readable manner.

    The job descriptions:
    {job_texts}

    Be concise and professional.
    """

    # Step 4: Call OpenAI GPT API (Replace with your API key)
    try:
        response = openai_client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "You are a career coach."},
                {"role": "user", "content": prompt}
            ]
        )

        return response.choices[0].message.content

    except Exception as e:
        print("OpenAI API Error:", e)
        return "An error occurred while generating the response."

# Example Usage
summary = generate_rag_response(random_resume)
print(summary)


### Job Opportunity 1: Real Estate Acquisition Consultant at House Buyers of America

#### Responsibilities:
- Follow up on leads, value properties, and analyze comps
- Acquire new homes and negotiate acquisitions/dispositions
- Perform extensive due diligence on all acquisitions
- Estimate repairs and determine property values
- Meet with homeowners to present offers
- Work with realtors, buyers, lenders, and title during closing process

#### Requirements:
- Years of sales experience with top rankings
- Strong communication and computer skills
- Bachelor's degree or higher

#### Why Join Us:
- Fully remote work environment
- Competitive pay and great benefits
- Fast-growing company with strong revenue growth
- Ongoing nationwide expansion

[Learn more about us](www.housebuyersofamerica.com) | Equal Opportunity Employer

---

### Job Opportunity 2: Licensed Commercial Real Estate Salesperson in New York City

#### Responsibilities:
- Generate leads and cultivate client relationships
-