# Leveraging Azure SQL DB’s Native Vector Capabilities for Enhanced Resume Matching with NVIDIA AI and Azure Document Intelligence

In this tutorial, we will explore how to leverage Azure SQL DB’s new vector data type to store embeddings and perform similarity searches using built-in vector functions, enabling advanced resume matching to identify the most suitable candidates. 

By extracting and chunking content from PDF resumes using Azure Document Intelligence, generating embeddings with NVIDIA AI, and storing these embeddings in Azure SQL DB, we can perform sophisticated vector similarity searches and retrieval-augmented generation (RAG) to identify the most suitable candidates based on their resumes.

### **Tutorial Overview**

- This Python notebook will teach you to:
    1. **Chunk PDF Resumes**: Use **`Azure Document Intelligence`** to extract and chunk content from PDF resumes.
    2. **Create Embeddings**: Generate embeddings from the chunked content using the **`NVIDIA AI Models`**.
    3. **Vector Database Utilization**: Store embeddings in **`Azure SQL DB`** utilizing the **`new Vector Data Type`** and perform similarity searches using built-in vector functions to find the most suitable candidates.
    4. **LLM Generation Augmentation**: Enhance language model generation with embeddings from a vector database. In this case, we use the embeddings to inform a llama-3.3-70b-instruct model, enabling it to provide rich, context-aware answers about candidates based on their resumes

## Dataset

We use a sample dataset from [Kaggle](https://www.kaggle.com/datasets/snehaanbhawal/resume-dataset) containing PDF resumes for this tutorial. For the purpose of this tutorial we will use 120 resumes from the **Information-Technology** folder

## Prerequisites

- **Azure Subscription**: [Create one for free](https://azure.microsoft.com/free/cognitive-services?azure-portal=true)
- **Azure SQL Database**: [Set up your database for free](https://learn.microsoft.com/azure/azure-sql/database/free-offer?view=azuresql)
- **Azure Document Intelligence** [Create a FreeAzure Doc Intelligence resource](https:/learn.microsoft.com/azure/ai-services/document-intelligence/create-document-intelligence-resource?view=doc-intel-4.0.0)
- SSMS or VS Code to manage database

## Additional Requirements for Embedding Generation

- **NVIDIA AI**: [NVIDIA AI Models](https://build.nvidia.com/)
- **Python**: Version 3.7.1 or later from Python.org. (Sample has been tested with Python 3.11)
- **Python Libraries**: Install the required libraries openai, num2words, pandas, tiktoken, and pyodbc.
- **Jupyter Notebooks**: Visual Studio Code.


## Getting Started

1. **Database Setup**: Execute SQL commands from the `CreateTable.sql` script to create the necessary table in your database.
2. **Model Deployment**: Deploy an embeddings model via NVIDIA AI

![Deployed NVIDIA AI Model](../Assets/NVIDIA_embed.png)


3. **Connection String**: Find your Azure SQL DB connection string in the Azure portal under your database settings.
4. **Configuration**: Populate the `.env` file with your SQL server connection details , NVIDIA API keys and endpoints, Azure Document Intelligence key and endpoint values.

You can retrieve the NVIDIA AI _endpoint_ and _key_:

![NVIDIA AI Endpoint and Key](../Assets/NVIDIA_key_endpoint.png)


You can [retrieve](https://learn.microsoft.com/azure/ai-services/document-intelligence/create-document-intelligence-resource?view=doc-intel-4.0.0#get-endpoint-url-and-keys) the Document Intelligence _endpoint_ and _key_:

![Azure Document Intelligence Endpoint and Key](../Assets/docintelendpoint.png)

## Running the Notebook
Use Visual Studio Code


In [3]:
#Setup the python libraries required for this notebook
#Please ensure that you navigate to the directory containing the `requirements.txt` file in your terminal
%pip install -r requirements.txt

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [4]:
#Load the env details
from dotenv import load_dotenv
load_dotenv()

True

# **PART 1: Extracting and Chunking Text from PDF Resumes using Azure Document Intelligence**

Create an instance of the [DocumentAnalysisClient](https://learn.microsoft.com/azure/ai-services/document-intelligence/create-document-intelligence-resource?view=doc-intel-4.0.0#get-endpoint-url-and-keys) using the endpoint and API key. 

[Azure Document Intelligence](https://learn.microsoft.com/azure/ai-services/document-intelligence/?view=doc-intel-4.0.0_)(previously known as Form Recognizer) is a Azure cloud service that uses machine learning to analyze text and structured data from your documents. This client will be used to send requests to the [Azure Document Intelligence](https://learn.microsoft.com/python/api/overview/azure/ai-formrecognizer-readme?view=azure-python) service and receive responses containing the extracted text from the PDF resumes.

In [5]:
%pip install azure-ai-ml

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [6]:
import os
import re
from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential

# Load environment variables
endpoint = os.getenv("AZUREDOCINTELLIGENCE_ENDPOINT")
api_key = os.getenv("AZUREDOCINTELLIGENCE_API_KEY")

# Create a DocumentAnalysisClient
document_analysis_client = DocumentAnalysisClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(api_key)
)


### **Analyze input documents using prebuilt model in Azure Document Intelligence**

- DocumentAnalysisClient provides operations for analyzing input documents using prebuilt and custom models through the `begin_analyze_document` and `begin_analyze_document_from_url` APIs. In this tutorial we are using the [prebuilt-layout](https://learn.microsoft.com/python/api/overview/azure/ai-formrecognizer-readme?view=azure-python#using-prebuilt-models)
    

### **Split text into chunks of 500 tokens**

- When faced with content that exceeds the embedding limit, we usually also chunk the content into smaller pieces and then embed those one at a time. Here we will use [tiktoken](https://github.com/openai/tiktoken?tab=readme-ov-file) to chunk the extracted text into token sizes of 500, as we will later pass the extracted chunks to to the `text-embedding-small` model for [generating text embeddings](https://learn.microsoft.com/azure/ai-services/openai/tutorials/embeddings?tabs=python-new%2Ccommand-line&pivots=programming-language-python) as this has a model input token limit of 8192.

**Note**: You need to provide the location of the folder where the PDF files reside in the below script.

In [23]:
import os
import re
import pandas as pd
import tiktoken

# Path to the directory containing PDF files
file_path = os.getenv("FILE_PATH")
folder_path = os.path.join(os.getcwd(), file_path)

def get_pdf_files(folder_path):
    for path, subdirs, files in os.walk(folder_path):
        for name in files:
            if name.endswith(".pdf"):
                yield os.path.join(path, name)

# Function to read PDF files and extract text using Azure AI Document Intelligence
def extract_text_from_pdf(pdf_path):
    with open(pdf_path, "rb") as f:
        poller = document_analysis_client.begin_analyze_document("prebuilt-layout", document=f)
    result = poller.result()
    text = ""
    for page in result.pages:
        for line in page.lines:
            text += line.content + " "
    return text

# Function to clean text and remove special characters
def clean_text(text):
    text = re.sub(r'\s+', ' ', text)  # Remove extra whitespace
    text = re.sub(r'[^a-zA-Z0-9\s]', '', text)  # Remove special characters
    return text

# Function to split text into chunks of 500 tokens
def split_text_into_token_chunks(text, max_tokens=500):
    tokenizer = tiktoken.get_encoding("cl100k_base")
    tokens = tokenizer.encode(text)
    chunks = []
    
    for i in range(0, len(tokens), max_tokens):
        chunk_tokens = tokens[i:i + max_tokens]
        chunk_text = tokenizer.decode(chunk_tokens)
        chunks.append(chunk_text)
    
    return chunks

# Count the number of PDF files in the directory
pdf_files = [f for f in get_pdf_files(folder_path)]
num_files = len(pdf_files)
print(f"Number of PDF files in the directory: {num_files}")

# Create a DataFrame to store the chunks
data = []

for file_id, pdf_file in enumerate(pdf_files):
    file_name = os.path.basename(pdf_file)  # Ensure the correct file name is used
    print(f"Processing file {file_id + 1}/{num_files}: {file_name}")
    pdf_path = os.path.join(folder_path, pdf_file)
    text = extract_text_from_pdf(pdf_path)
    cleaned_text = clean_text(text)
    chunks = split_text_into_token_chunks(cleaned_text)
    
    print(f"Number of chunks for file {file_name}: {len(chunks)}")
    
    for chunk_id, chunk in enumerate(chunks):
        chunk_text = chunk.strip() if chunk.strip() else "NULL"
        unique_chunk_id = f"{file_id}_{chunk_id}"
        print(f"File: {file_name}, Chunk ID: {chunk_id}, Unique Chunk ID: {unique_chunk_id}, Chunk Length: {len(chunk_text)}, Chunk Text: {chunk_text[:50]}...")
        data.append({
            "file_name": file_name,  # Store the correct file name
            "chunk_id": chunk_id,
            "chunk_text": chunk_text,
            "unique_chunk_id": unique_chunk_id
        })

df = pd.DataFrame(data)
df.head(3)


Number of PDF files in the directory: 120
Processing file 1/120: 10089434.pdf
Number of chunks for file 10089434.pdf: 3
File: 10089434.pdf, Chunk ID: 0, Unique Chunk ID: 0_0, Chunk Length: 3035, Chunk Text: INFORMATION TECHNOLOGY TECHNICIAN I Summary Versat...
File: 10089434.pdf, Chunk ID: 1, Unique Chunk ID: 0_1, Chunk Length: 2959, Chunk Text: Disaster Recovery plan and procedures  Researching...
File: 10089434.pdf, Chunk ID: 2, Unique Chunk ID: 0_2, Chunk Length: 2048, Chunk Text: Installing configuring and supporting McAfee antiv...
Processing file 2/120: 10247517.pdf
Number of chunks for file 10247517.pdf: 3
File: 10247517.pdf, Chunk ID: 0, Unique Chunk ID: 1_0, Chunk Length: 3191, Chunk Text: INFORMATION TECHNOLOGY MANAGER Professional Summar...
File: 10247517.pdf, Chunk ID: 1, Unique Chunk ID: 1_1, Chunk Length: 2744, Chunk Text: network which entailed changing software and LAN c...
File: 10247517.pdf, Chunk ID: 2, Unique Chunk ID: 1_2, Chunk Length: 729, Chunk Text: i4 City Sta

Unnamed: 0,file_name,chunk_id,chunk_text,unique_chunk_id
0,10089434.pdf,0,INFORMATION TECHNOLOGY TECHNICIAN I Summary Ve...,0_0
1,10089434.pdf,1,Disaster Recovery plan and procedures Researc...,0_1
2,10089434.pdf,2,Installing configuring and supporting McAfee a...,0_2


In [24]:
#read the top5 rows of the dataframe
df.head(5)


Unnamed: 0,file_name,chunk_id,chunk_text,unique_chunk_id
0,10089434.pdf,0,INFORMATION TECHNOLOGY TECHNICIAN I Summary Ve...,0_0
1,10089434.pdf,1,Disaster Recovery plan and procedures Researc...,0_1
2,10089434.pdf,2,Installing configuring and supporting McAfee a...,0_2
3,10247517.pdf,0,INFORMATION TECHNOLOGY MANAGER Professional Su...,1_0
4,10247517.pdf,1,network which entailed changing software and L...,1_1


### **Tokenization vs. Character Length (OPTIONAL)**

In this section, we will explore the difference between the character length of a text chunk and its tokenized representation. Character length simply counts the number of characters in a text, while tokenization breaks the text into meaningful units called tokens.

Character Length First, let’s add a new column to our DataFrame to view the length of each chunk in terms of characters: Here, chunk\_length represents the number of characters in each chunk.

In [25]:
# Add a new column 'chunk_length' to the DataFrame to view the length of each chunk
df['chunk_length'] = df['chunk_text'].apply(len)

# Display the first few rows of the DataFrame with the new column
print(df[['file_name', 'chunk_id', 'chunk_length']].head(5))


      file_name  chunk_id  chunk_length
0  10089434.pdf         0          3035
1  10089434.pdf         1          2959
2  10089434.pdf         2          2048
3  10247517.pdf         0          3191
4  10247517.pdf         1          2744


### Tokenization
To understand how text ultimately is tokenized, it can be helpful to run the below code: 

- We use the tiktoken library to tokenize the text. Tokenization breaks the text into smaller units, which can be words, subwords, or characters, depending on the tokenizer used. You can see that in some cases an entire word is represented with a single token whereas in others parts of words are split across multiple tokens. 

- If you then check the length of the decode variable, you'll find it matches 500 our specified token number. It is simply a way of making sure none of the data we pass to the model for tokenization and embedding exceeds the input token limit of 8,192

- When we pass the documents to the embeddings model, it will break the documents into tokens similar (though not necessarily identical) to the examples below and then convert the tokens to a series of floating point numbers that will be accessible via vector search

In [26]:
import tiktoken
tokenizer = tiktoken.get_encoding("cl100k_base")
sample_encode = tokenizer.encode(df.chunk_text[0]) 
decode = tokenizer.decode_tokens_bytes(sample_encode)
decode


[b'IN',
 b'FORMATION',
 b' TECHNO',
 b'LOGY',
 b' TECH',
 b'NIC',
 b'IAN',
 b' I',
 b' Summary',
 b' Vers',
 b'atile',
 b' Systems',
 b' Administrator',
 b' possessing',
 b' superior',
 b' troubleshooting',
 b' skills',
 b' for',
 b' networking',
 b' issues',
 b' end',
 b' user',
 b' problems',
 b' and',
 b' network',
 b' security',
 b' Experienced',
 b' in',
 b' server',
 b' management',
 b' systems',
 b' analysis',
 b' and',
 b' offering',
 b' inde',
 b'pth',
 b' understanding',
 b' of',
 b' IT',
 b' infrastructure',
 b' areas',
 b' Detail',
 b'oriented',
 b' independent',
 b' and',
 b' focused',
 b' on',
 b' taking',
 b' a',
 b' systematic',
 b' approach',
 b' to',
 b' solving',
 b' complex',
 b' problems',
 b' Demonstr',
 b'ated',
 b' exceptional',
 b' technical',
 b' knowledge',
 b' and',
 b' skills',
 b' while',
 b' working',
 b' with',
 b' various',
 b' teams',
 b' to',
 b' achieve',
 b' shared',
 b' goals',
 b' and',
 b' objectives',
 b' Highlights',
 b' ',
 b' Active',
 b' Dir

In [11]:
len(decode)

500

# **PART 2 : Generating Embeddings for Text Chunks using Azure Open AI**

- After extracting and chunking the text from PDF resumes, we will generate embeddings for each chunk. These embeddings are numerical representations of the text that capture its semantic meaning. By creating embeddings for the text chunks, we can perform advanced similarity searches and enhance language model generation.

- We will use the NVIDIA AI Models to generate these embeddings. The `get_embedding` function defined below takes a piece of text as input and returns its embedding using the `nvidia/embed-qa-4` model

- Ensure the Environment Variables are set correctly in the .env file

In [27]:
df.head(3)

Unnamed: 0,file_name,chunk_id,chunk_text,unique_chunk_id,chunk_length
0,10089434.pdf,0,INFORMATION TECHNOLOGY TECHNICIAN I Summary Ve...,0_0,3035
1,10089434.pdf,1,Disaster Recovery plan and procedures Researc...,0_1,2959
2,10089434.pdf,2,Installing configuring and supporting McAfee a...,0_2,2048


In [28]:
print(df)

        file_name  chunk_id  \
0    10089434.pdf         0   
1    10089434.pdf         1   
2    10089434.pdf         2   
3    10247517.pdf         0   
4    10247517.pdf         1   
..            ...       ...   
309  91635250.pdf         2   
310  91697974.pdf         0   
311  91697974.pdf         1   
312  92069209.pdf         0   
313  92069209.pdf         1   

                                            chunk_text unique_chunk_id  \
0    INFORMATION TECHNOLOGY TECHNICIAN I Summary Ve...             0_0   
1    Disaster Recovery plan and procedures  Researc...             0_1   
2    Installing configuring and supporting McAfee a...             0_2   
3    INFORMATION TECHNOLOGY MANAGER Professional Su...             1_0   
4    network which entailed changing software and L...             1_1   
..                                                 ...             ...   
309  inmates provided receipts and inspected items ...           117_2   
310  INFORMATION TECHNOLOGY COORDIN

In [29]:
import os
import pandas as pd
import json
from openai import OpenAI

# Set NVIDIA API credentials
nvidia_api_key = os.getenv("NVIDIA_API_KEY")  # Ensure this is set
nvidia_base_url = "https://integrate.api.nvidia.com/v1"

# Initialize OpenAI client for NVIDIA API
client = OpenAI(
    api_key=nvidia_api_key,
    base_url=nvidia_base_url
)

def get_embedding(text):
    """
    Get sentence embedding using NVIDIA's embed-qa-4 model.

    Args:
        text (str): Text to embed.

    Returns:
        list: A list containing the embedding.
    """
    try:
        response = client.embeddings.create(
            input=[text],
            model="nvidia/embed-qa-4",
            encoding_format="float",
            extra_body={"input_type": "query", "truncate": "NONE"}
        )
        return response.data[0].embedding
    except Exception as e:
        print(f"Error generating embedding: {e}")
        return None

# Example usage
all_filenames = []
all_chunkids = []
all_chunks = []
all_embeddings = []

# Assuming df is already defined with the required columns
for index, row in df.iterrows():
    filename = row['file_name']
    chunkid = row['unique_chunk_id']
    chunk = row['chunk_text']
    
    embedding = get_embedding(chunk)
    
    if embedding is not None:
        all_filenames.append(filename)
        all_chunkids.append(chunkid)
        all_chunks.append(chunk)
        all_embeddings.append(embedding)
    
    if (index + 1) % 50 == 0:  # Print progress every 50 rows
        print(f"Completed {index + 1} rows")

# Create a new DataFrame with the results
result_df = pd.DataFrame({
    'filename': all_filenames,
    'chunkid': all_chunkids,
    'chunk': all_chunks,
    'embedding': all_embeddings
})

print(result_df.head())  # Display the first few rows

# Save to CSV if needed
result_df.to_csv("embedded_results_nvidia.csv", index=False)


Completed 50 rows
Error generating embedding: Error code: 400 - {'type': 'urn:inference-service:problem-details:bad-request', 'title': 'Bad Request', 'status': 400, 'detail': 'Input length 519 exceeds maximum allowed token size 512'}
Error generating embedding: Error code: 400 - {'type': 'urn:inference-service:problem-details:bad-request', 'title': 'Bad Request', 'status': 400, 'detail': 'Input length 523 exceeds maximum allowed token size 512'}
Completed 100 rows
Error generating embedding: Error code: 400 - {'type': 'urn:inference-service:problem-details:bad-request', 'title': 'Bad Request', 'status': 400, 'detail': 'Input length 518 exceeds maximum allowed token size 512'}
Completed 150 rows
Error generating embedding: Error code: 400 - {'type': 'urn:inference-service:problem-details:bad-request', 'title': 'Bad Request', 'status': 400, 'detail': 'Input length 520 exceeds maximum allowed token size 512'}
Completed 200 rows
Error generating embedding: Error code: 400 - {'type': 'urn:i

# **PART 3 : Using Azure SQL DB as a Vector Database to store and query embeddings**

### **Load the embeddings into the Vector Database : Azure SQL DB**

First let us define a function to connect to Azure SQLDB

In [30]:
#lets define a function to connect to SQLDB
import os
from dotenv import load_dotenv
import pyodbc
import struct
from azure.identity import DefaultAzureCredential

# Load environment variables from .env file
load_dotenv()

def get_mssql_connection():
    # Retrieve the connection string from the environment variables
    entra_connection_string = os.getenv('ENTRA_CONNECTION_STRING')

    sql_connection_string = os.getenv('SQL_CONNECTION_STRING')

    # Determine the authentication method and connect to the database
    if entra_connection_string:
        # Entra ID Service Principal Authentication
        credential = DefaultAzureCredential(exclude_interactive_browser_credential=False)    
        token = credential.get_token('https://database.windows.net/.default')
        token_bytes = token.token.encode('UTF-16LE')
        token_struct = struct.pack(f'<I{len(token_bytes)}s', len(token_bytes), token_bytes)
        SQL_COPT_SS_ACCESS_TOKEN = 1256  # This connection option is defined by Microsoft in msodbcsql.h
        conn = pyodbc.connect(entra_connection_string, attrs_before={SQL_COPT_SS_ACCESS_TOKEN: token_struct})
    
    elif sql_connection_string:
        # SQL Authentication
        conn = pyodbc.connect(sql_connection_string)
        
    else:
        raise ValueError("No valid connection string found in the environment variables.")

    return conn


### **Insert embeddings into the native 'Vector' Data Type**

We will insert our vectors into the SQL Table now. Azure SQL DB now has a dedicated, native, data type for storing vectors: the `vector` data type. Read about the preview [here](https://devblogs.microsoft.com/azure-sql/eap-for-vector-support-refresh-introducing-vector-type)

The table embeddings has a column called vector which is vector(1536) type. Ensure you have created the table using the script `CreateTable.sql` before running the below code.

In [31]:
import pyodbc
import pandas as pd

# Retrieve the connection string from the function get_mssql_connection()
conn = get_mssql_connection()

# Create a cursor object
cursor = conn.cursor()

# Enable fast_executemany
cursor.fast_executemany = True

# Loop through the DataFrame rows and insert them into the table
for index, row in result_df.iterrows():
    chunkid = row['chunkid']
    filename = row['filename']
    chunk = row['chunk']
    embedding = row['embedding']
    
    # Use placeholders for the parameters in the SQL query
    query = f"""
    INSERT INTO resumedocs (chunkid, filename, chunk, embedding)
    VALUES (?, ?, ?, CAST(? AS VECTOR(1024)))
    """
    # Execute the query with the parameters
    cursor.execute(query, chunkid, filename, chunk, json.dumps(embedding))

# Commit the changes
conn.commit()

# Print a success message
print("Data inserted successfully into the 'resumedocs' table.")

# Close the connection
conn.close()


Data inserted successfully into the 'resumedocs' table.


Let's take a look at the data in the Resume Docs table:

In [32]:
from prettytable import PrettyTable

import pyodbc
import pandas as pd

# Load environment variables from .env file
load_dotenv()

# Retrieve the connection string from the environment variables
conn = get_mssql_connection()

# Create a cursor object
cursor = conn.cursor()

# Use placeholders for the parameters in the SQL query
query = "SELECT TOP(10) filename, chunkid, chunk, CAST(embedding AS NVARCHAR(MAX)) as embedding FROM dbo.resumedocs ORDER BY Id"

# Execute the query with the parameters
cursor.execute(query)
queryresults = cursor.fetchall()

# Get column names from cursor.description
column_names = [column[0] for column in cursor.description]

# Create a PrettyTable object
table = PrettyTable()

# Add column names to the table
table.field_names = column_names

# Set max width for each column to truncate data
table.max_width = 20

# Add rows to the table
for row in queryresults:
    # Truncate each value to 20 characters
    truncated_row = [str(value)[:20] for value in row]
    table.add_row(truncated_row)

# Print the table
print(table)

# Commit the changes
conn.commit()
# Close the connection
conn.close()


+--------------+---------+----------------------+----------------------+
|   filename   | chunkid |        chunk         |      embedding       |
+--------------+---------+----------------------+----------------------+
| 10089434.pdf |   0_0   | INFORMATION TECHNOLO | [8.8548660e-004,-4.9 |
| 10089434.pdf |   0_1   | Disaster Recovery pl | [7.2402954e-003,-4.9 |
| 10089434.pdf |   0_2   | Installing configuri | [1.4591217e-003,-6.4 |
| 10247517.pdf |   1_0   | INFORMATION TECHNOLO | [1.2557983e-002,-4.5 |
| 10247517.pdf |   1_1   | network which entail | [1.7562866e-002,-4.9 |
| 10247517.pdf |   1_2   | i4 City State 2015 M | [2.1152496e-003,-4.9 |
| 10265057.pdf |   2_0   | WORKING RF SYSTEMS E | [-6.2789917e-003,-4. |
| 10265057.pdf |   2_1   | surveys ElectricalVa | [6.9570541e-004,-2.8 |
| 10553553.pdf |   3_0   | INFORMATION TECHNOLO | [-1.4390945e-003,-4. |
| 10553553.pdf |   3_1   | XP Vista and Mac ope | [2.0690918e-002,-5.7 |
+--------------+---------+----------------------+--

### **Performing Vector Similarity Search in Azure SQL DB using VECTOR\_DISTANCE built in function**

Let's now query our ResumeDocs table to get the top similar candidates given the User search query.

What we are doing: Given any user search query, we can obtain the vector representation of that text. We then use this vector to calculate the cosine distance against all the resume embeddings stored in the database. By selecting only the closest matches, we can identify the resumes most relevant to the user’s query. This helps in finding the most suitable candidates based on their resumes.

The most common distance is the cosine similarity, which can be calculated quite easily in SQL with the help of the new distance functions.

```
VECTOR_DISTANCE('distance metric', V1, V2)

```

We can use **cosine**, **euclidean**, and **dot** as the distance metric today.

We will define the function `vector_search_sql`.

In [33]:
import os
import pyodbc
import json
from dotenv import load_dotenv

def vector_search_sql(query, num_results=5):
    # Load environment variables from .env file
    load_dotenv()

    # Use the get_mssql_connection function to get the connection string details
    conn = get_mssql_connection()

    # Create a cursor object
    cursor = conn.cursor()

    # Generate the query embedding for the user's search query
    user_query_embedding = get_embedding(query)
    
    # SQL query for similarity search using the function vector_distance to calculate cosine similarity
    sql_similarity_search = f"""
    SELECT TOP(?) filename, chunkid, chunk,
           1-vector_distance('cosine', CAST(? AS VECTOR(1024)), embedding) AS similarity_score,
           vector_distance('cosine', CAST(? AS VECTOR(1024)), embedding) AS distance_score
    FROM dbo.resumedocs
    ORDER BY distance_score 
    """

    cursor.execute(sql_similarity_search, num_results, json.dumps(user_query_embedding), json.dumps(user_query_embedding))
    results = cursor.fetchall()

    # Close the database connection
    conn.close()

    return results
    
#example usage
vector_search_sql("database administrator", num_results=3)

[('38753827.pdf', '95_3', 'Certified Administrator CCA Technical Skills', 0.6364656039761232, 0.3635343960238768),
 ('20237244.pdf', '43_1', 'Digital Media Collector', 0.6110346951661374, 0.3889653048338626),
 ('29051656.pdf', '81_1', 'and ACFS file systems  Migrated multiple standalone databases to RAC databases using Rconfig RMAN and OEM  Migrated single instance databases from NONASM to ASM file system to improve performance  Installed Oracle 10g 11g and 12c software and created multiple databases including Plugable and container databases in oracle 12c  Implemented TDE Data Redaction and database Auditing to improved data integrity and security  Applied PSU on standalone TEST environment using OPatch and same PSU Grid and RDBMS Patching on production cluster using OPatch Auto  Replicated data in real time using Oracle Golden Gate and Oracle streams  Analyzed interpreted and troubleshot Golden Gate related issues  Designed and implemented different backup strategies like Cold Hot ba

# **Part 4 : Use embeddings retrieved from a Azure SQL vector database to augment LLM generation**

Lets create a helper function to feed prompts into the [Completions model](https://build.nvidia.com/meta/llama-3_3-70b-instruct/) & create interactive loop where you can pose questions to the model and receive information grounded in your data.

The function `generate_completion` is defined to help ground the gpt-4o model with prompts and system instructions.   
Note that we are passing the results of the `vector_search_sql` we defined earlier to the model and we define the system prompt .  
We are using [NVIDIA llama-3_3-70b-instruct](https://build.nvidia.com/meta/llama-3_3-70b-instruct/) model here. 

You can get more information on using NVIDIA AI models [here](https://build.nvidia.com/)

In [34]:
import os
from dotenv import load_dotenv
from openai import OpenAI

# Load environment variables from a .env file
load_dotenv()

# Use environment variables for the API key
chat_api_key = os.getenv("NVIDIA_CHAT_API_KEY") 
nvidia_endpoint = "https://integrate.api.nvidia.com/v1"
chat_model = "meta/llama-3.3-70b-instruct"  # NVIDIA-supported LLaMA model

# Initialize OpenAI client for NVIDIA API
client = OpenAI(
    base_url=nvidia_endpoint,
    api_key=chat_api_key
)

def generate_completion(search_results, user_input):
    system_prompt = '''
You are an intelligent & funny assistant who will exclusively answer based on the data provided in the `search_results`:
- Use the information from `search_results` to generate your top 3 responses. If the data is not a perfect match for the user's query, use your best judgment to provide helpful suggestions and include the following format:
  File: {filename}
  Chunk ID: {chunkid}
  Similarity Score: {similarity_score}
  Add a small snippet from the Relevant Text: {chunktext}
  Do not use the entire chunk.
- Avoid any other external data sources.
- Add a summary about why the candidate may be a good fit even if exact skills and the role being hired for are not matching. Ensure you call out which skills match the description and which ones are missing. If the candidate doesn't have prior experience for the hiring role, highlight areas to focus on during the interview.
- Add a Microsoft-related interesting fact about the technology that was searched.
'''

    messages = [{"role": "system", "content": system_prompt}]

    # Create an empty list to store the results
    result_list = []

    # Iterate through the search results and append relevant information to the list
    for result in search_results:
        filename = result  # Assuming filename is in the result
        chunkid = result
        chunktext = result
        similarity_score = result  # Assuming similarity_score is part of the result
        
        # Append the relevant information as a dictionary to the result_list
        result_list.append({
            "filename": filename,
            "chunkid": chunkid,
            "chunktext": chunktext,
            "similarity_score": similarity_score
        })

    # Convert the search results into a structured message
    messages.append({"role": "system", "content": f"{result_list}"})
    messages.append({"role": "user", "content": user_input})

    # Make a request to NVIDIA's LLaMA model
    completion = client.chat.completions.create(
        model=chat_model,
        messages=messages,
        temperature=0.2,
        top_p=0.7,
        max_tokens=1024,
        stream=True
    )

    # Print the response in a streaming manner
    response_text = ""
    for chunk in completion:
        if chunk.choices[0].delta.content is not None:
            print(chunk.choices[0].delta.content, end="")
            response_text += chunk.choices[0].delta.content

    return response_text


In [35]:
import json

print("🤖 Ask me anything about hiring roles and required skills! Type 'end' to exit.\n")

while True:
    user_input = input("You: ")
    if user_input.lower() == "end":
        print("Goodbye! 👋")
        break

    # Perform vector search to get relevant stored data
    search_results = vector_search_sql(user_input)

    # Generate AI response based on search results
    completion_results = generate_completion(search_results, user_input)

    # Ensure response is a dictionary
    if isinstance(completion_results, str):  # If response is a string, convert it to JSON
        try:
            completion_results = json.loads(completion_results)
        except json.JSONDecodeError:
            print("\n⚠️ Error: Invalid JSON response from AI model.")
            continue

    # Check if response has the expected structure
    if "choices" in completion_results and len(completion_results["choices"]) > 0:
        ai_response = completion_results["choices"][0]["message"]["content"]
        print(f"\nAI: {ai_response}\n")
    else:
        print("\n⚠️ Error: Unexpected response format from AI model.\n")


🤖 Ask me anything about hiring roles and required skills! Type 'end' to exit.

Based on the provided search results, here are my top 3 responses:

1. **File: 70089206.pdf, Chunk ID: 110_0, Similarity Score: 0.6360205426579945**
The candidate has experience in software development, configuration management, and testing, with skills in languages like JavaScript, C, and Oracle PL/SQL. They also have experience with Serena Version Manager and Serena Business Manager.

2. **File: 10089434.pdf, Chunk ID: 0_2, Similarity Score: 0.6129052569918564**
This candidate has experience in installing, configuring, and supporting Microsoft Windows Server, Microsoft Office, and Microsoft Exchange. They also have skills in programming languages like C, Java, and Python, as well as experience with database administration and network administration.

3. **File: 10265057.pdf, Chunk ID: 2_0, Similarity Score: 0.6114908210512771**
The candidate has experience as a Working RF Systems Engineer, with skills in R

In [37]:
!pip install gradio

Collecting gradio
  Downloading gradio-5.21.0-py3-none-any.whl.metadata (16 kB)
Collecting aiofiles<24.0,>=22.0 (from gradio)
  Using cached aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)
Collecting fastapi<1.0,>=0.115.2 (from gradio)
  Downloading fastapi-0.115.11-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.5.0-py3-none-any.whl.metadata (3.0 kB)
Collecting gradio-client==1.7.2 (from gradio)
  Downloading gradio_client-1.7.2-py3-none-any.whl.metadata (7.1 kB)
Collecting groovy~=0.1 (from gradio)
  Downloading groovy-0.1.2-py3-none-any.whl.metadata (6.1 kB)
Collecting markupsafe~=2.0 (from gradio)
  Downloading MarkupSafe-2.1.5-cp312-cp312-win_amd64.whl.metadata (3.1 kB)
Collecting orjson~=3.0 (from gradio)
  Downloading orjson-3.10.15-cp312-cp312-win_amd64.whl.metadata (42 kB)
Collecting pydub (from gradio)
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting python-multipart>=0.0.18 (from gradio)
  Downloading pyth


[notice] A new release of pip is available: 24.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [44]:
import os
import pyodbc
import json
import gradio as gr
from dotenv import load_dotenv
from openai import OpenAI

# Load environment variables
load_dotenv()

# Database connection setup


# NVIDIA Chat API setup
chat_api_key = os.getenv("NVIDIA_CHAT_API_KEY") 
nvidia_endpoint = "https://integrate.api.nvidia.com/v1"
chat_model = "meta/llama-3.3-70b-instruct"
client = OpenAI(base_url=nvidia_endpoint, api_key=chat_api_key)


def get_embedding(text):
    """
    Get sentence embedding using NVIDIA's embed-qa-4 model.

    Args:
        text (str): Text to embed.

    Returns:
        list: A list containing the embedding.
    """
    try:
        response = client.embeddings.create(
            input=[text],
            model="nvidia/embed-qa-4",
            encoding_format="float",
            extra_body={"input_type": "query", "truncate": "NONE"}
        )
        return response.data[0].embedding
    except Exception as e:
        print(f"Error generating embedding: {e}")
        return None


# Function to perform vector search
def vector_search_sql(query, num_results=5):
    conn = get_mssql_connection()
    cursor = conn.cursor()
    user_query_embedding = get_embedding(query)

    sql_similarity_search = """
    SELECT TOP(?) filename, chunkid, chunk,
           1-vector_distance('cosine', CAST(? AS VECTOR(1024)), embedding) AS similarity_score,
           vector_distance('cosine', CAST(? AS VECTOR(1024)), embedding) AS distance_score
    FROM dbo.resumedocs
    ORDER BY distance_score 
    """
    
    cursor.execute(sql_similarity_search, num_results, json.dumps(user_query_embedding), json.dumps(user_query_embedding))
    results = cursor.fetchall()
    conn.close()
    
    formatted_results = [
        {
            "Filename": row[0],
            "Chunk ID": row[1],
            "Chunk": row[2],
            "Similarity Score": round(row[3], 4),
            "Distance Score": round(row[4], 4)
        }
        for row in results
    ]
    return formatted_results

# Function to generate AI insights
def generate_completion(search_results, user_input):
    system_prompt = '''
You are an intelligent & funny assistant who will exclusively answer based on the data provided in the `search_results`:
- Use the information from `search_results` to generate your top 3 responses. If the data is not a perfect match for the user's query, use your best judgment to provide helpful suggestions and include the following format:
  File: {filename}
  Chunk ID: {chunkid}
  Similarity Score: {similarity_score}
  Add a small snippet from the Relevant Text: {chunktext}
  Do not use the entire chunk.
- Avoid any other external data sources.
- Add a summary about why the candidate may be a good fit even if exact skills and the role being hired for are not matching.
- Add a Microsoft-related interesting fact about the technology that was searched.
'''

    messages = [{"role": "system", "content": system_prompt}]
    messages.append({"role": "system", "content": f"{json.dumps(search_results, indent=4)}"})
    messages.append({"role": "user", "content": user_input})

    completion = client.chat.completions.create(
        model=chat_model,
        messages=messages,
        temperature=0.2,
        top_p=0.7,
        max_tokens=1024,
        stream=True
    )

    response_text = ""
    for chunk in completion:
        if chunk.choices[0].delta.content is not None:
            response_text += chunk.choices[0].delta.content

    return response_text

# Gradio UI
def search_and_generate(query, num_results):
    results = vector_search_sql(query, num_results)
    ai_response = generate_completion(results, query)
    return json.dumps(results, indent=4), ai_response

iface = gr.Interface(
    fn=search_and_generate,
    inputs=[gr.Textbox(label="Search Query"), gr.Slider(minimum=1, maximum=10, step=1, label="Number of Results")],
    outputs=[gr.Textbox(label="Search Results"), gr.Textbox(label="AI Insights")],
    title="Resume Search with AI Insights",
    description="Enter a search query to find the most relevant resume chunks and receive AI-powered insights."
)

if __name__ == "__main__":
    iface.launch()


* Running on local URL:  http://127.0.0.1:7866

To create a public link, set `share=True` in `launch()`.
