#### Resume Summarization AI Agent (RAG + LLM)
An end-to-end AI pipeline that leverages Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs) on Databricks to generate professional summaries of resumes.

#### Technologies Used:
 - Sentence Transformers for semantic similarity and intelligent chunk retrieval

 - Databricks-hosted LLaMA for scalable and secure LLM inference

 - Prompt Engineering to guide the model for clear, high-quality responses

 - Unity Catalog for distributed access to resume data

#### Problem Statement:
Recruiters and hiring managers often face the challenge of reviewing hundreds of resumes manually — a process that is both time-consuming and tedious.

#### Project Goal:
To streamline the recruitment process by generating concise and professional resume summaries using an LLM. The pipeline enhances accuracy and relevance by retrieving only the most semantically important content before passing it to the model. 



### Tools & Technologies:

This project is built using the following key components:

- Python – Core language for building and orchestrating the pipeline

- Sentence Transformers (all-MiniLM-L6-v2) – For semantic search and similarity-based retrieval of relevant resume sections

- Databricks LLM Endpoint (LLaMA 4) – Hosted large language model for generating high-quality resume summaries

- Apache Spark – To efficiently read and process resumes stored in Unity Catalog Volumes

- Prompt Engineering – To craft effective instructions and improve the LLM’s output relevance and clarity

- REST API Calls (via requests) – To communicate with the Databricks LLM endpoint for inference  


In [0]:
# resume_summary_pipeline.py

%pip install sentence-transformers

from pathlib import Path
import textwrap
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import os
import requests
from pyspark.sql import SparkSession

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


#### Read and Chunk Resume: 

LLMs perform better with concise inputs. We chunk long resumes into small parts (~800 chars) to be later searched semantically. 

In [0]:

def read_and_chunk_resume(path, width=800):
    text = Path(path).read_text()
    return textwrap.wrap(text, width=width)


***Path(path).read_text(),** reads the content of the file at the specified path and uses pathlib.Path for cross-platform reliability.*

***textwrap.wrap(text, width=800),** Splits the long resume string into chunks of **~800 characters** and Avoids cutting off mid-word and improves token management for LLMs.* 

#### Generate Embeddings:

Embeddings converts chunks into high-dimensional vectors so we can find semantically similar text based on a query.  


In [0]:

from sentence_transformers import SentenceTransformer

def get_embeddings(chunks, model_name="all-MiniLM-L6-v2"):
    model = SentenceTransformer(model_name)
    embeddings = model.encode(chunks)
    return embeddings, model


To convert text chunks into dense vector representations, a pretrained transformer model is loaded using **SentenceTransformer(model_name).** For example, the **"all-MiniLM-L6-v2"** model is an efficient choice that generates 384-dimensional embeddings. 

The **model.encode(chunks)** function takes a list of text chunks and converts each into a dense vector of floating-point numbers that capture the semantic meaning of the text. 

This process returns two outputs: embeddings, which is the list of vector representations (one per chunk), and the model itself, which can be reused later for embedding query. 

#### Semantic Retrieval: 


Finds the most relevant text chunks from a list based on semantic similarity to a given query using cosine similarity, saving tokens and improving accuracy. 

*Steps:**
  1. Encodes the query into an embedding vector.
  2. Calculates cosine similarity between the query vector and all chunk embeddings.
  3. Sorts the chunks by similarity score in descending order.
  4. Returns the top-k most relevant chunks.


In [0]:

from sklearn.metrics.pairwise import cosine_similarity

def get_top_k_chunks(query, chunks, embeddings, model, top_k=3):
    query_embedding = model.encode([query])[0]
    similarities = cosine_similarity([query_embedding], embeddings)[0]
    top_indices = similarities.argsort()[::-1][:top_k]
    return [chunks[i] for i in top_indices]


#### Read Resume from Unity Catalog via Spark: 

In [0]:

def get_resume_from_unity(file_path):
    spark = SparkSession.builder.getOrCreate()
    df = spark.read.text(file_path)
    return "\n".join(row.value for row in df.collect())

### Calling Databricks LLM: 

In [0]:
import os
import requests

def call_databricks_llm(prompt, model="databricks-llama-4-maverick", max_tokens=500):
    
    os.environ["DATABRICKS_HOST"] = DATABRICKS_HOST
    os.environ["DATABRICKS_TOKEN"] = DATABRICKS_TOKEN

    api_url = f"https://dbc-7412fe25-6807.cloud.databricks.com/serving-endpoints/databricks-llama-4-maverick/invocations"
    headers = {
        "Authorization": f"Bearer {os.environ['DATABRICKS_TOKEN']}",
        "Content-Type": "application/json"
    }

    payload = {
        "messages": [
            {"role": "system", "content": "You are a helpful assistant that summarizes resumes."},
            {"role": "user", "content": prompt}
        ],
        "max_tokens": max_tokens
    }

    response = requests.post(api_url, headers=headers, json=payload)
    if response.status_code == 200:
        response_json = response.json()
        print("🔍 Response JSON:", response_json)  # TEMP DEBUG LINE

        # Try common response formats
        if "predictions" in response_json:
            return response_json["predictions"][0]["message"]["content"]
        elif "choices" in response_json:
            return response_json["choices"][0]["message"]["content"]
        elif "data" in response_json:
            return response_json["data"]["message"]
        else:
            raise Exception(f"❌ Unknown response structure: {response_json}")
    else:
        raise Exception(f"❌ Request failed with status code {response.status_code}: {response.text}")



This script interacts with a Databricks-hosted large language model (LLM) by sending a prompt and receiving a summarized response. The API URL is constructed dynamically using the model name to ensure the request reaches the correct endpoint. The payload follows the OpenAI-style chat format, which includes a system message to set the assistant's behavior and a user message containing the actual prompt. The max_tokens parameter is used to limit the length of the model's response, ensuring it remains concise and controlled. Finally, the requests.post() function is used to make the HTTP request, sending the payload and headers to the LLM endpoint and retrieving the generated output.


#### Format Prompt with Prompt Engineering:

This function is designed to build a well-structured prompt that will be sent to a Large Language Model (LLM) for resume summarization. 

It takes two main inputs: context, which consists of relevant resume chunks retrieved through semantic search, and query, which is the specific instruction or question for the model to answer (e.g., "Summarize this candidate’s experience"). 

The goal is to format these inputs into a cohesive and clear prompt that guides the LLM to produce accurate, concise, and relevant summaries. 

The structured format helps ensure the model focuses only on the important information from the resume while following the desired task.   


In [0]:
def format_prompt(context, query):
    return f"""
[System Role]
You are a Resume Summarization Agent built to support hiring managers and recruiters by extracting key candidate insights from resumes — all in under 3 seconds.

You understand:
- Resume structure (Experience, Skills, Education, Certifications)
- Tech roles and industry-specific terminology
- Professional, recruiter-facing tone

Goal:
Generate a concise, structured summary that enables rapid screening and reduces time spent reading full resumes.

Resume:
{context}

Task:
Summarize the candidate profile using these guidelines:

1. Include Job titles, companies, industries, and total experience.
2. Highlight technical tools, platforms, or domains. 
3. Mention notable accomplishments or quantifiable impact.
4. Include relevant responsibilities, avoiding overly detailed or repetitive lines.
5. Use clear bullet points for skimming efficiency.

Constraints:
- Avoid hallucination or extrapolation. 
- Do **not** add something which is not in resume. 
- Maintain a confident and neutral tone

📤 Output Format:
- Bullet points only, no introduction or conclusion

🔍 Hiring Manager Query:
Now respond to this specific question based on the resume above:
**"{query}"**
"""




In [0]:
def summarize_resume(file_path_txt, query=" How many years of experience does she have in python, sql, ML, Databricks and business problems "):
    chunks = read_and_chunk_resume(file_path_txt)
    embeddings, model = get_embeddings(chunks)
    top_chunks = get_top_k_chunks(query, chunks, embeddings, model)
    context = "\n\n".join(top_chunks)
    prompt = format_prompt(context, query)
    summary = call_databricks_llm(prompt)
    return summary

In [0]:
from pathlib import Path
import textwrap
import os 


if __name__ == "__main__":
    result = summarize_resume("/Volumes/workspace/default/default_resume/Shivani_Kanodia.txt")
    print(result)

🔍 Response JSON: {'id': 'chatcmpl_32dd2ceb-d8b4-4371-94de-dc5453b5edb9', 'object': 'chat.completion', 'created': 1754972153, 'model': 'meta-llama-4-maverick-040225', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': '• 2.5 years of experience in Data Science, with proficiency in Python and ML\n• Utilized Python libraries like Pandas, NumPy, and Scikit-learn for data analysis\n• Worked with SQL, optimizing complex queries by restructuring joins and creating indexes\n• Experienced in Databricks, building and managing clusters to run MLflow experiments\n• Applied ML models to drive business outcomes, such as saving $300K in hiring costs\n• 3 years of experience in Business Consulting, with a background in Business Analytics\n• Demonstrated ability to drive data-driven decision-making, resulting in a 20% improvement\n• Proficient in using various tools and technologies to solve business problems'}, 'finish_reason': 'stop', 'logprobs': None}], 'usage': {'prompt_tokens': 8