# Leveraging LLMs in Code: An Interactive Exploration

Welcome to our Lunch & Learn session! Today, we'll explore how to harness the power of Large Language Models (LLMs) within our code. We'll cover:

- **Basics:** Understanding LLMs and effective prompt engineering.
- **API Calling:** How to interact with LLMs via API requests.
- **Retrieval-Augmented Generation (RAG):** Combining external data with LLM responses.
- **Function Calling:** Integrating LLMs with your code logic through function calls.
- **Miscellaneous:** Best practices, debugging, security, and real-world applications.

Let's dive in and explore the possibilities together!


# 1. Basics

In this section, we'll cover the foundational concepts necessary for working with LLMs.

## What are LLMs?
- **Definition:** Large Language Models are AI systems that understand and generate human-like text.
- **Examples:** Chatbots, code assistants, content generators, etc.
- **Key Points:** 
  - Trained on massive datasets
  - Use deep learning techniques
  - Adapt to various contexts and tasks

## Prompt Engineering
- **Importance:** The quality of your input prompt greatly influences the output.
- **Tips:**
  - Be clear and specific.
  - Experiment with different phrasings.
  - Use context to guide the model.
- **Example:**
  - Poor prompt: "Tell me about Python."
  - Better prompt: "Can you provide a brief overview of Python programming, including its key features and use cases?"


# 3. Retrieval-Augmented Generation (RAG)

Enhance LLM outputs by integrating external data retrieval.

## What is RAG?
- **Concept:** Combine LLM capabilities with a retrieval system (like a vector database or document store).
- **Workflow:**
  1. Retrieve relevant data based on a query.
  2. Feed retrieved context into the LLM.
  3. Generate a more informed and accurate response.

## Example Use Case
- **Scenario:** Answering customer queries by pulling information from internal documents.
- **Benefits:** 
  - More accurate responses.
  - Context-aware generation.

## Example Code Outline (Pseudo-code)

In [34]:
# Import libraries
import os
import json
import requests
import numpy as np
import faiss
from openai import OpenAI
from dotenv import load_dotenv

# Load API keys from .env
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
MERGE_API_KEY = os.getenv("MERGE_API_KEY")
MERGE_ACCOUNT_TOKEN = os.getenv("MERGE_ACCOUNT_TOKEN")
openai.api_key = OPENAI_API_KEY

client = OpenAI()

print("Environment loaded. OpenAI and Merge API keys are set. OpenAI client created.")


Environment loaded. OpenAI and Merge API keys are set. OpenAI client created.


In [35]:
def fetch_hr_data():
    # Replace the URL with the actual endpoint from Merge that returns HR data
    url = "https://api.merge.dev/api/hris/v1/dependents"  
    headers = {"Authorization": f"Bearer {MERGE_API_KEY}", "X-Account-Token": MERGE_ACCOUNT_TOKEN}
    response = requests.get(url, headers=headers)
    response.raise_for_status()  # Raises an error if the API call fails
    data = response.json()
    
    # Assume data contains a list of dependents records in the "results" key
    documents = []
    for item in data.get("results", []):
        # Build a simple text snippet for each employee record:
        doc = (
            f"Name: {item.get('first_name', '')} {item.get('last_name', '')}\n"
            f"Relationship: {item.get('relationship', '')}\n"
            f"Gender: {item.get('gender', '')}"
        )
        documents.append(doc)
    return documents

# Fetch and inspect the documents
documents = fetch_hr_data()
print(f"Fetched {len(documents)} documents from Merge API.")
print("Sample document:\n", documents[0] if documents else "No data")


Fetched 20 documents from Merge API.
Sample document:
 Name: Samantha Harris
Relationship: SPOUSE
Gender: FEMALE


In [36]:
def compute_embeddings(docs):
    embeddings = []
    for doc in docs:
        res = client.embeddings.create(
            input=doc,
            model="text-embedding-3-small"
        )
        embedding = res.data[0].embedding
        embeddings.append(embedding)
    return np.array(embeddings).astype('float32')

# Compute embeddings for our HR documents
embeddings = compute_embeddings(documents)
print("Computed embeddings shape:", embeddings.shape)


Computed embeddings shape: (20, 1536)


In [37]:
# Determine the dimensionality of our embeddings
dimension = embeddings.shape[1]
# Create a FAISS index (L2 distance based)
index = faiss.IndexFlatL2(dimension)
# Add our embeddings to the index
index.add(embeddings)
print("FAISS index built with", index.ntotal, "documents.")


FAISS index built with 20 documents.


In [47]:
def query_index(query, k=3):
    # Compute the query's embedding
    client = OpenAI()
    res = client.embeddings.create(
        input=query,
        model="text-embedding-3-small"
    )
    query_embedding = np.array(res.data[0].embedding, dtype='float32')
    # Perform the search in the FAISS index
    distances, indices = index.search(np.array([query_embedding]), k)
    return indices[0]

# Example query
query = "Who are the sons?"
top_indices = query_index(query, k=12)
print("Top matching document indices:", top_indices)

# Display the top matching documents
print("\nTop matching documents:")
for idx in top_indices:
    print("\n--- Document ---")
    print(documents[idx])


Top matching document indices: [17  9  5  8 18 10 16 12 15 14 19  7]

Top matching documents:

--- Document ---
Name: Benjamin Adams
Relationship: CHILD
Gender: MALE

--- Document ---
Name: Mason Vance
Relationship: CHILD
Gender: MALE

--- Document ---
Name: Brock Walsh
Relationship: CHILD
Gender: MALE

--- Document ---
Name: Mark Vance
Relationship: CHILD
Gender: MALE

--- Document ---
Name: Fred Abbott
Relationship: CHILD
Gender: MALE

--- Document ---
Name: Matthew Vance
Relationship: CHILD
Gender: MALE

--- Document ---
Name: Michael Adams
Relationship: SPOUSE
Gender: MALE

--- Document ---
Name: Jonathan Sterling
Relationship: SPOUSE
Gender: MALE

--- Document ---
Name: Sean Anderson
Relationship: SPOUSE
Gender: MALE

--- Document ---
Name: Susie Sterling
Relationship: CHILD
Gender: FEMALE

--- Document ---
Name: Kevin Abbott
Relationship: SPOUSE
Gender: MALE

--- Document ---
Name: Sierra Vance
Relationship: CHILD
Gender: FEMALE


In [48]:
# Retrieve the documents corresponding to the top indices
retrieved_docs = [documents[i] for i in top_indices]

# Construct the prompt
prompt = (
    "You are an HR assistant. Use the following HR documents to answer the question.\n\n"
    f"HR Documents:\n{chr(10).join(retrieved_docs)}\n\n"
    f"Question: {query}\n\nAnswer:"
)

print("Augmented prompt for the LLM:\n")
print(prompt)


Augmented prompt for the LLM:

You are an HR assistant. Use the following HR documents to answer the question.

HR Documents:
Name: Benjamin Adams
Relationship: CHILD
Gender: MALE
Name: Mason Vance
Relationship: CHILD
Gender: MALE
Name: Brock Walsh
Relationship: CHILD
Gender: MALE
Name: Mark Vance
Relationship: CHILD
Gender: MALE
Name: Fred Abbott
Relationship: CHILD
Gender: MALE
Name: Matthew Vance
Relationship: CHILD
Gender: MALE
Name: Michael Adams
Relationship: SPOUSE
Gender: MALE
Name: Jonathan Sterling
Relationship: SPOUSE
Gender: MALE
Name: Sean Anderson
Relationship: SPOUSE
Gender: MALE
Name: Susie Sterling
Relationship: CHILD
Gender: FEMALE
Name: Kevin Abbott
Relationship: SPOUSE
Gender: MALE
Name: Sierra Vance
Relationship: CHILD
Gender: FEMALE

Question: Who are the sons?

Answer:


In [49]:
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "developer", "content": prompt}
    ],
)
answer = completion.choices[0].message.content.strip()
print("Generated Answer:\n", answer)


Generated Answer:
 The sons listed in the HR documents are Benjamin Adams, Mason Vance, Brock Walsh, Mark Vance, Fred Abbott, and Matthew Vance.


In [54]:
query = "Who are the kings of the company?"
top_indices = query_index(query, k=12)
retrieved_docs = [documents[i] for i in top_indices]
prompt = (
    "You are an HR assistant. Use the following HR documents to answer the question.\n\n"
    f"HR Documents:\n{chr(10).join(retrieved_docs)}\n\n"
    f"Question: {query}\n\nAnswer:"
)
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "developer", "content": prompt}
    ],
)
answer = completion.choices[0].message.content.strip()
print("Generated Answer:\n", answer)

Generated Answer:
 The term "kings of the company" typically refers to dominant or leading male figures within a company, but based on the HR documents provided, there is no explicit mention of any specific employee titles or roles that would categorize anyone as such. The documents simply list individuals in relation to others, generally as "SPOUSE" or "CHILD," and their genders. Therefore, there is not enough information in the provided documents to accurately determine who might be considered "kings of the company."


# 4. Function Calling

Discover how LLMs can be used to trigger and simulate function calls.

## Overview
- **Definition:** Use LLMs to interpret natural language inputs and execute code functions.
- **Applications:**
  - Automating routine tasks.
  - Integrating user interfaces with backend logic.
  - Enhancing code assistants.

## How It Works
- **Step 1:** Parse the input using an LLM.
- **Step 2:** Identify which function to call based on the parsed input.
- **Step 3:** Validate parameters and execute the function.
- **Step 4:** Return the output.

## Example Scenario
- **Task:** A user asks, "Can you schedule a meeting for me tomorrow at 3 PM?"
- **Process:**
  1. The LLM parses the input.
  2. It maps the request to a `schedule_meeting()` function.
  3. Executes the function with appropriate parameters.

## Example Code Outline
```python
def schedule_meeting(time, date, participants):
    # Function logic here
    return f"Meeting scheduled on {date} at {time} with {participants}"

# Simulated input from LLM
user_input = "Schedule a meeting for tomorrow at 3 PM with the team."
parsed_params = {
    "time": "3 PM",
    "date": "tomorrow",
    "participants": "the team"
}

result = schedule_meeting(**parsed_params)
print(result)