## LLM Insights - Guest Lecture by Dr. Alla Abdella
Welcome to this Jupyter Notebook demonstrating embeddings, retrieval with RAG, and usage of LLM-based pipelines with LangChain. We'll explore how to set up local LLMs, build a vector store, and create multi-step pipelines with memory and prompt templates.

# Embeddings & Retrieval: A Hands-On LangChain Demo

In [None]:
# Install Core LangChain and Related Libraries
!pip install -U langchain langchainhub langchain-nomic langchain_community langchain-groq tiktoken chromadb langgraph

# Install Sentence Embedding Models
!pip install sentence-transformers

# Install LLM Libraries
!pip install transformers gpt4all anthropic

# Install Additional Data Processing and Visualization Libraries
!pip install pandas scikit-learn matplotlib plotly

# Install Streamlit for Web Apps
!pip install streamlit

# Install Tavily for API Integrations
!pip install tavily-python

# Upgrade Specific Tools and Libraries
!pip install --upgrade langchain
!pip install --upgrade --quiet langchain-text-splitters
!pip install llama-index




Collecting langchain
  Downloading langchain-0.3.17-py3-none-any.whl.metadata (7.1 kB)
Collecting langchainhub
  Downloading langchainhub-0.1.21-py3-none-any.whl.metadata (659 bytes)
Collecting langchain-nomic
  Downloading langchain_nomic-0.1.4-py3-none-any.whl.metadata (1.6 kB)
Collecting langchain_community
  Downloading langchain_community-0.3.16-py3-none-any.whl.metadata (2.9 kB)
Collecting langchain-groq
  Downloading langchain_groq-0.2.4-py3-none-any.whl.metadata (3.0 kB)
Collecting tiktoken
  Downloading tiktoken-0.8.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting chromadb
  Downloading chromadb-0.6.3-py3-none-any.whl.metadata (6.8 kB)
Collecting langgraph
  Downloading langgraph-0.2.69-py3-none-any.whl.metadata (17 kB)
Collecting langchain-core<0.4.0,>=0.3.33 (from langchain)
  Downloading langchain_core-0.3.33-py3-none-any.whl.metadata (6.3 kB)
Collecting types-requests<3.0.0.0,>=2.31.0.2 (from langchainhub)
  Downloading types_reques

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=1.11.0->sentence-transformers)
 

**Explanation of Installed Libraries:**  
- **langchain, langchainhub, etc.**: Libraries that help in chaining together different steps of a Language Model pipeline—like retrieving documents, running prompts, storing conversation memory, etc.
- **sentence-transformers**: Allows us to convert sentences into vectors (fancy math shapes) so we can compare how similar they are.
- **transformers, gpt4all, anthropic**: Different ways of using and working with advanced AI language models.
- **pandas, scikit-learn, matplotlib, plotly**: Tools for data analysis and making pretty graphs.
- **streamlit**: A library to build web apps easily.
- **tavily-python**: Helps with certain API integrations.
- **tiktoken**: A library from OpenAI to manage text tokens efficiently.
- **llama-index**: Another library to assist with building large language model indexes.


In [None]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain.chains import SequentialChain

from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMChain
from langchain.chains import SimpleSequentialChain
from langchain_groq import ChatGroq
import bs4
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import GPT4AllEmbeddings
from sentence_transformers import SentenceTransformer

import os
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] = ""
os.environ["GROQ_API_KEY"] = ""
os.environ["LANGSMITH_API_KEY"] = ""
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["HF_TOKEN"] = ""

**What this code does:**
- **Imports** a variety of components from the LangChain ecosystem: chains, memory, prompts, vector stores, embeddings, etc.
- **ChatGroq** is a specialized class from `langchain_groq` that integrates with a particular LLM service.
- **Sets environment variables** so we don’t have to manually input keys or tokens when connecting to these services.

In [None]:
model_name = "sentence-transformers/all-MiniLM-L6-v2"

# Initialize the SentenceTransformer model.
model = SentenceTransformer(model_name)

**Explanation:**  
We set up a model named `all-MiniLM-L6-v2`, which is a well-known **Sentence Transformers** model. It is good for quickly generating embeddings (vectors) for short or medium-length sentences.

In [None]:
# Use the model to encode a simple string.
model.encode("Hello Students")

In [None]:
# Let's see the length of the vector.
len(model.encode("Hello Students"))

**What’s happening here?**  
- We first **encode** the phrase "Hello Students" into a numerical vector.
- Then we check the **length** of that vector.
- This length depends on the dimension of the embedding that the model outputs (384 for `all-MiniLM-L6-v2`).

In [None]:
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sentence_transformers import SentenceTransformer

# 1. Initialize a pre-trained model
# model_name = "sentence-transformers/all-MiniLM-L6-v2"
# model = SentenceTransformer(model_name)

# 2. Define sentences related to Urdu students and Pakistani culture
sentences = [
    # Greetings
    "Assalam-o-Alaikum! How are you today?",
    "Good morning! I hope you are doing well.",
    "Hi there! How has your day been?",

    # Technology
    "Learning Python is essential for students.",
    "Programming in Python is very interesting.",
    "Python is a useful language for students.",

    # Food
    "Biryani is a delicious dish from Pakistan.",
    "I love eating spicy biryani with friends.",
    "Pakistani biryani is the best comfort food.",
]

# 3. Compute embeddings
embeddings = model.encode(sentences)

# 4. Apply PCA to reduce to 2D for visualization
pca = PCA(n_components=2)
reduced_embeddings = pca.fit_transform(embeddings)

# 5. Plot the PCA results
plt.figure(figsize=(12, 8))
colors = ['red', 'green', 'blue']  # Unique color for each category
categories = ['Greetings', 'Technology', 'Food']  # Labels for the legend

# Define category boundaries
category_size = 3  # Number of sentences per category
num_categories = len(sentences) // category_size

for i in range(num_categories):
    start_idx = i * category_size
    end_idx = start_idx + category_size
    cluster = reduced_embeddings[start_idx:end_idx]
    label_group = sentences[start_idx:end_idx]
    color = colors[i % len(colors)]
    plt.scatter(cluster[:, 0], cluster[:, 1], color=color, label=categories[i])  # Use category label

    # Annotate each point
    for j, txt in enumerate(label_group):
        plt.annotate(txt, (cluster[j, 0], cluster[j, 1]), fontsize=9, alpha=0.7)

plt.title("PCA of Sentence Embeddings")
plt.xlabel("PCA Component 1")
plt.ylabel("PCA Component 2")
plt.legend(title="Categories")
plt.grid(True)
plt.tight_layout()
plt.show()

**Step-by-step**:
1. We have a list of sentences in different categories (Greetings, Technology, Food).
2. We turn them into embeddings using our model.
3. We then use **PCA** to reduce the dimension to 2D so we can plot them.
4. We color them by category to see if sentences in similar topics end up close to each other in the chart.

In [None]:
sentence = "I love math"

# The tokens are the individual words or subwords.
tokens =  ["I", "love", "math"]

# This dictionary maps each token to its unique integer ID.
tokens2id = {token: i for i, token in enumerate(tokens)}

tokens2id


**Explanation:**
- This snippet demonstrates a simple example of how **tokenization** might work: splitting text into tokens, then assigning each token an ID.
- In real NLP systems, tokenization can be more complex (handling punctuation, unknown words, etc.).

In [None]:
# #!pip install llama_index==0.10.18
# """
# This is a simple application for sentence embeddings: semantic search
#
# We have a corpus with various sentences. Then, for a given query sentence,
# we want to find the most similar sentence in this corpus.
#
# This script outputs for various queries the top 5 most similar sentences in the corpus.
# """

# import torch
# from sentence_transformers import SentenceTransformer
# import torch
# import numpy as np
# import matplotlib.pyplot as plt
# from mpl_toolkits.mplot3d import Axes3D  # Required for 3D plotting even if not referenced explicitly
# from sklearn.decomposition import PCA
# import plotly.graph_objects as go
# # from llama_index.embeddings.fastembed import FastEmbedEmbedding

# embedder = SentenceTransformer("all-MiniLM-L6-v2")
# #embedder = FastEmbedEmbedding(model_name="BAAI/bge-small-en-v1.5")

# corpus = [
#     "A man is eating food.",
#     "A man is eating a piece of bread.",
#     "The girl is carrying a baby.",
#     "A man is riding a horse.",
#     "A woman is playing violin.",
#     "Two men pushed carts through the woods.",
#     "A man is riding a white horse on an enclosed ground.",
#     "A monkey is playing drums.",
#     "A cheetah is running behind its prey.",
# ]
# # corpus_embeddings = embedder.encode(corpus, convert_to_tensor=True)
#
# # queries = [
# #     "A man is eating pasta.",
# # ]
#
# # top_k = min(2, len(corpus))
# # We'll store the query embeddings (as tensor) so that we can do PCA on all vectors together:
# query_embeddings_list = []
# results = {}  # To store top-2 indices for each query
#
# for query in queries:
#     query_embedding = embedder.encode(query, convert_to_tensor=True)
#     # similarity_scores = torch.matmul(query_embedding, corpus_embeddings.T)[0]
#     # scores, indices = torch.topk(similarity_scores, k=top_k)
#
#     print("\nQuery:", query)
#     print("Top 5 most similar sentences in corpus:")
#
#     for score, idx in zip(scores, indices):
#         print(corpus[idx], f"(Score: {score:.4f})")
#
#     query_embeddings_list.append(query_embedding[0].numpy())
#     results[query] = {
#         'top2_indices': indices[:2].numpy(),
#         'similarity_scores': scores[:2].numpy()
#     }
#
# # The rest of the code does 3D PCA plotting, omitted here.


**Explanation:**
This entire cell is commented out code. It’s a more extensive example of how to perform **semantic search** with embeddings:
- Creating embeddings for a **corpus** of sentences.
- Creating an embedding for a **query**.
- Using **cosine similarity** (or dot product) to find which sentences in the corpus are closest to the query.
- Retrieving the top `k` most similar sentences.
- Optionally applying dimensionality reduction to plot them in 2D/3D.

In [None]:
!pip install colab-xterm
%load_ext colabxterm

**What is `colab-xterm`?**  
`colab-xterm` allows you to open a terminal-like interface directly in Google Colab. This can be handy for installing packages, exploring the file system, or running Linux commands in a more interactive shell.

In [None]:
%xterm
# curl -fsSL https://ollama.com/install.sh | sh
# ollama serve &
# ollama run llama3.2:3b

**Explanation**:
- The magic command `%xterm` will open an xterm in the Colab environment.
- The commented lines show how one might install and run `ollama`, a local LLM server, though they’re optional.

In [None]:
import requests
import os
import json

### Index

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import GPT4AllEmbeddings

urls = [
     "https://lilianweng.github.io/posts/2023-06-23-agent/",
     "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
     # You can add or remove URLs here.
]

# Load each URL with WebBaseLoader.
docs = [WebBaseLoader(url).load() for url in urls]

# Flatten the list of lists:
docs_list = [item for sublist in docs for item in sublist]

# Split the text into smaller chunks.
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=250,
    chunk_overlap=50
)
doc_splits = text_splitter.split_documents(docs_list)

# Build a Chroma vectorstore from the splitted documents.
vectorstore = Chroma.from_documents(
    documents=doc_splits,
    collection_name="rag-chroma1",
    embedding=GPT4AllEmbeddings(),
)

# Convert vectorstore to a retriever.
retriever = vectorstore.as_retriever(k=5)

# Query and retrieve.
query = "What are the critical features of a gun"
documents = retriever.invoke(query)  # get relevant doc chunks

# Construct the context from retrieved documents.
context = "\n\n".join([f"Document {i+1}:\n{doc.page_content}" for i, doc in enumerate(documents)])

# RAG prompt.
prompt = f"""
You are an expert in Generative AI and autonomous agent systems. Below is the context retrieved from relevant documents. Use this context to provide a detailed and accurate answer to the user's query.
If you don't know the answer, say you don't know.

Context:
{context}

Query:
{query}

Answer:
"""

# Next, we call a local LLM API endpoint.
url = 'http://localhost:11434/api/generate'
payload = {
    "model": "llama3.2:3b",
    "prompt": prompt,
    "num_predict": 2000,
    "temperature": 0.0,
    "stream": True
}

response = requests.post(url, json=payload, stream=True)

if response.status_code == 200:
    print("Model Response:")
    assembled_response = ""
    for line in response.iter_lines(decode_unicode=True):
        if line.strip():
            try:
                data = json.loads(line)
                if "response" in data:
                    assembled_response += data["response"]
                    print(data["response"], end='', flush=True)
                if data.get("done", False):
                    break
            except json.JSONDecodeError as e:
                print(f"Error decoding JSON: {e}")
else:
    print(f"Error: {response.status_code} - {response.text}")

**Explanation**:
1. **Load** web documents from the provided URLs.
2. **Split** them up into smaller pieces with `RecursiveCharacterTextSplitter`.
3. **Embed** these chunks using `GPT4AllEmbeddings` and store them in a `Chroma` vector database.
4. Turn that database into a **retriever**.
5. Issue a **query**, retrieve the most relevant chunks.
6. Build a new prompt that includes those chunks as context.
7. **POST** that prompt to a local LLM endpoint for the final answer.

In [None]:
retriever.invoke(query)

**Explanation**:
We simply call `retriever.invoke(query)` again to see the raw retrieved documents or chunks. This typically returns a list of documents best matching your query.

Empty cell.

Another empty cell.

Again, empty cell.

One more empty code cell.

In [None]:
#export LANGCHAIN_TRACING_V2=true
from langsmith import traceable
from langchain_groq import ChatGroq
import bs4
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import GPT4AllEmbeddings
import os
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] = ""
os.environ["GROQ_API_KEY"] = ""
os.environ["LANGSMITH_API_KEY"] = ""
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["HF_TOKEN"] = ""

### LLM
from langchain_groq import ChatGroq

# Here we configure a ChatGroq instance.
llm = ChatGroq(
    model="llama-3.3-70b-versatile",#"mixtral-8x7b-32768",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
)

chat = llm

# Re-initialize the ChatGroq LLM.
llm = ChatGroq(
    model="llama-3.3-70b-versatile",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
)

@traceable
def get_messages():
    messages = [
        (
            "system",
            "You are a helpful AI assistant with deep expertise in AI, data engineering, and healthcare."
        ),
        (
            "human",
            """ Please give me advices on how i start learing about the data science filed

            Please provide markdown that demonstrates:
              1. Use of syntax highlighting for different programming languages
              2. Colored code blocks
              3. Varied text styling (bold, italic, headers)
              4. Include at least one example of colored terminal/output text
            """
        ),
    ]
    return messages

@traceable(run_type="llm")
def invoke_llm(messages):
    return llm.invoke(messages)

@traceable
def parse_output(response):
    return response.content

@traceable
def run_pipeline():
    messages = get_messages()
    response = invoke_llm(messages)
    result = parse_output(response)
    return result

result = run_pipeline()

from IPython.display import display, Markdown
display(Markdown(result))

**Explanation**:
- Using `@traceable` from `langsmith` to track function calls.
- `get_messages()` sets up system/human messages.
- `invoke_llm(messages)` calls the LLM.
- `parse_output(response)` returns the model output.
- `run_pipeline()` orchestrates.
- Finally, we display the result using `Markdown(result)`.

# Today's Agenda

1. **Concepts of embeddings and embedding layers**  
2. **Cost function(s) for training an embedding model**  
3. **Practical demonstration in Python using Sentence Transformers**  
4. **Sample training data in multiple formats**  
5. **Location of the embedding layer in BERT-based models**  
6. **Example code with multilingual sentences (English, Urdu, Arabic)**  
7. **PCA visualization to show similar sentences clustering**  
8. **Build a simple RAG (Retrieval-Augmented Generation)**  
9. **Build Habib Conversational Agent**  
10. **LangChain Chains**  
11. **LangChain Memory**

---

## Slide 1: Introduction to Embeddings

**Slide Content:**
- **Definition**: An embedding is a dense vector representation of text (or tokens) in a continuous vector space.
- **Key Idea**:
  - Each dimension in the vector space captures latent semantic or syntactic information.
- **Why embeddings?**
  - They allow models to capture semantic similarity, context, and relationships between words/sentences.

**Notes:**
- They are the foundation of modern NLP tasks (retrieval, classification, similarity, etc.).

---

## Slide 2: What is an Embedding Layer?

**Slide Content:**
- **Embedding Layer**: A neural network layer that maps each token to a trainable dense vector.
- **In BERT-like models**:
  - The embedding layer is part of the initial model component.
  - Input tokens are converted into embeddings, then fed into multiple attention layers.

**Notes:**
- In **BERT** and many transformer-based models:
  1. **Token embeddings**  
  2. **Position embeddings**  
- The entire model (including the embedding layer) is typically trained end-to-end during fine-tuning.

---

## Slide 3: Cost Functions for Training Embeddings

**Slide Content:**
- **Objective**: Ensure that semantically similar sentences/words have closer embeddings, while dissimilar ones are farther apart.
- Common **Loss Functions** in Sentence-Transformers:
  1. **Triplet Loss**
  2. **Contrastive Loss**
  3. **Multiple Negative Ranking Loss**

** Notes:**
- The choice depends on how your data is formatted.
- Triplet: (anchor, positive, negative).
- Contrastive or CosineSimilarityLoss can use pairs with similarity scores.

---

## Slide 4: Sample Training Data Formats

**Slide Content:**
1. **Triplet Format** \((anchor, positive, negative)\)  
2. **Constructive/Contrastive Pairs**  
   \((sentence_1, sentence_2, label)\)  
3. **Sentence Pairs with a Score**  
   \((sentence_1, sentence_2, score)\)

** Notes:**
- Triplet Example: ("How to prepare for board exams", "Tips to study effectively", "Best recipes for biryani").
- Pair with Score Example: ("The cat sits on the mat", "A cat is lying on a rug", 0.8).

---

## Slide 5: Where is the Embedding Layer in BERT?

**Slide Content:**
- **BERT Architecture** (simplified):
  1. **Input Embeddings** (Token + Position )
  2. **Transformer Layers**
  3. **Pooler**
- Embedding layer is at the very beginning.

** Notes:**
- Typically, we do **fine-tuning** on the entire BERT model.
- **Sentence-Transformers** adds a pooling layer on top to create fixed-size sentence embeddings.

---

## Slide 6:  Detailed Python Example 1

**Code Explanation**  
1. Load a pre-trained sentence-transformer model.  
2. Use sample sentences in English.
3. Compute embeddings, reduce to 2D via PCA.
4. Plot to illustrate that similar sentences cluster.

```python
!pip install sentence-transformers scikit-learn matplotlib --quiet

"""
Simple application for sentence embeddings: semantic search

We have a corpus with various sentences. Then, for a given query sentence,
we want to find the most similar sentence.
We output the top 5 most similar.
"""

import torch
from sentence_transformers import SentenceTransformer
import torch
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D  # 3D plotting
from sklearn.decomposition import PCA
import plotly.graph_objects as go
from llama_index.embeddings.fastembed import FastEmbedEmbedding

# embedder = SentenceTransformer("all-MiniLM-L6-v2")
embedder = FastEmbedEmbedding(model_name="BAAI/bge-small-en-v1.5")

corpus = [
    "A man is eating food.",
    "A man is eating a piece of bread.",
    "The girl is carrying a baby.",
    "A man is riding a horse.",
    "A woman is playing violin.",
    "Two men pushed carts through the woods.",
    "A man is riding a white horse on an enclosed ground.",
    "A monkey is playing drums.",
    "A cheetah is running behind its prey.",
]
corpus_embeddings = embedder.get_text_embedding_batch(corpus, show_progress=True)

if isinstance(corpus_embeddings, list):
    corpus_embeddings = torch.tensor(corpus_embeddings)

queries = [
    "A man is eating pasta.",
]

top_k = min(2, len(corpus))
query_embeddings_list = []
results = {}

for query in queries:
    query_embedding = embedder.get_text_embedding_batch([query], show_progress=True)
    if isinstance(query_embedding, list):
        query_embedding = torch.tensor(query_embedding)

    similarity_scores = torch.matmul(query_embedding, corpus_embeddings.T)[0]

    scores, indices = torch.topk(similarity_scores, k=top_k)

    print("\nQuery:", query)
    print("Top 5 most similar sentences in corpus:")

    for score, idx in zip(scores, indices):
        print(corpus[idx], f"(Score: {score:.4f})")

    query_embeddings_list.append(query_embedding[0].numpy())
    results[query] = {
        'top2_indices': indices[:2].numpy(),
        'similarity_scores': scores[:2].numpy()
    }

corpus_np = corpus_embeddings.cpu().detach().numpy()
all_embeddings = np.vstack([corpus_np, np.array(query_embeddings_list)])

pca = PCA(n_components=3)
all_embeddings_3d = pca.fit_transform(all_embeddings)

num_corpus = corpus_np.shape[0]
corpus_3d = all_embeddings_3d[:num_corpus]
queries_3d = all_embeddings_3d[num_corpus:]

fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(corpus_3d[:, 0], corpus_3d[:, 1], corpus_3d[:, 2], c='blue', marker='o', s=80, label='Corpus')

for i, txt in enumerate(corpus):
    ax.text(corpus_3d[i, 0], corpus_3d[i, 1], corpus_3d[i, 2], txt, size=9, zorder=1, color='k')

ax.scatter(queries_3d[:, 0], queries_3d[:, 1], queries_3d[:, 2], c='red', marker='^', s=120, label='Query')

for i, query in enumerate(queries):
    q_point = queries_3d[i]
    top2_indices = results[query]['top2_indices']
    for idx in top2_indices:
        c_point = corpus_3d[idx]
        ax.plot([q_point[0], c_point[0]], [q_point[1], c_point[1]], [q_point[2], c_point[2]], 'g--', linewidth=1.5)

ax.set_xlabel("PCA Component 1")
ax.set_ylabel("PCA Component 2")
ax.set_zlabel("PCA Component 3")
ax.set_title("3D Visualization of Sentence Embeddings\nLines connect each query to its 2 most similar corpus sentences")
ax.legend()
plt.show()
```

---
## Slide 7:  Detailed Python Example 2

```python
# Install necessary libraries
#!pip install sentence-transformers scikit-learn matplotlib plotly --quiet

import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sentence_transformers import SentenceTransformer

# 1. Initialize a pre-trained model
model_name = "sentence-transformers/all-MiniLM-L6-v2"
model = SentenceTransformer(model_name)

# 2. Define sentences related to Urdu students and Pakistani culture
sentences = [
    # Greetings
    "Assalam-o-Alaikum! How are you today?",
    "Good morning! I hope you are doing well.",
    "Hi there! How has your day been?",

    # Technology
    "Learning Python is essential for students.",
    "Programming in Python is very interesting.",
    "Python is a useful language for students.",

    # Food
    "Biryani is a delicious dish from Pakistan.",
    "I love eating spicy biryani with friends.",
    "Pakistani biryani is the best comfort food.",
]

# 3. Compute embeddings
embeddings = model.encode(sentences)

# 4. Apply PCA to reduce to 2D for visualization
pca = PCA(n_components=2)
reduced_embeddings = pca.fit_transform(embeddings)

# 5. Plot the PCA results
plt.figure(figsize=(12, 8))
colors = ['red', 'green', 'blue',  'orange', 'cyan']  # Unique color for each category
categories = ['Greetings', 'Sports', 'Questions', 'Technology', 'Food']  # Just an example

category_size = 3  # Number of sentences per category
num_categories = len(sentences) // category_size

for i in range(num_categories):
    start_idx = i * category_size
    end_idx = start_idx + category_size
    cluster = reduced_embeddings[start_idx:end_idx]
    label_group = sentences[start_idx:end_idx]
    color = colors[i % len(colors)]
    plt.scatter(cluster[:, 0], cluster[:, 1], color=color, label=categories[i])
    for j, txt in enumerate(label_group):
        plt.annotate(txt, (cluster[j, 0], cluster[j, 1]), fontsize=9, alpha=0.7)

plt.title("PCA of Sentence Embeddings: Urdu Students and Pakistani Culture")
plt.xlabel("PCA Component 1")
plt.ylabel("PCA Component 2")
plt.legend(title="Categories")
plt.grid(True)
plt.tight_layout()
plt.show()
```

---

## Slide 8: What is a Vector Store & RAG?

**Indexing & Vector Stores**
- After embeddings are created, we can store them in a **vector store**.
- A **vector store** (e.g., Chroma) allows efficient similarity searches.

**RAG (Retrieval-Augmented Generation)**
- Combine a language model with external knowledge.
- Retrieve relevant documents, feed them to the LLM as context.
- The LLM then generates a final answer with that context.

---

## Slide 9: Build a Simple RAG

```python
import requests
import json
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import GPT4AllEmbeddings

urls = [
     "https://lilianweng.github.io/posts/2023-06-23-agent/",
     "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
]

docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=250,
    chunk_overlap=50
)
doc_splits = text_splitter.split_documents(docs_list)

vectorstore = Chroma.from_documents(
    documents=doc_splits,
    collection_name="rag-chroma1",
    embedding=GPT4AllEmbeddings(),
)

retriever = vectorstore.as_retriever(k=5)
query = "What are the critical features of a generative AI-powered autonomous agent system?"
documents = retriever.invoke(query)
context = "\n\n".join([f"Document {i+1}:\n{doc.page_content}" for i, doc in enumerate(documents)])

prompt = f"""
You are an expert in Generative AI and autonomous agent systems.
Below is the context retrieved from relevant documents.
Use this context to provide a detailed and accurate answer to the user's query.
If you don't know the answer, say you don't know.

Context:
{context}

Query:
{query}

Answer:
"""

url = 'http://localhost:11434/api/generate'
payload = {
    "model": "llama3.2:3b",
    "prompt": prompt,
    "num_predict": 2000,
    "temperature": 0.0,
    "stream": True
}

response = requests.post(url, json=payload, stream=True)

if response.status_code == 200:
    print("Model Response:")
    assembled_response = ""
    for line in response.iter_lines(decode_unicode=True):
        if line.strip():
            try:
                data = json.loads(line)
                if "response" in data:
                    assembled_response += data["response"]
                    print(data["response"], end='', flush=True)
                if data.get("done", False):
                    break
            except json.JSONDecodeError as e:
                print(f"Error decoding JSON: {e}")
else:
    print(f"Error: {response.status_code} - {response.text}")
```

---
## Slide 9: Build a Simple RAG with Langchain

```python
import bs4
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import GPT4AllEmbeddings
import os
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] = "<insert your key here>"
os.environ["GROQ_API_KEY"] = "<insert your key here>"
os.environ["LANGSMITH_API_KEY"] = "<insert your key here>"
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["HF_TOKEN"] = "<insert your key here>"

from langchain_groq import ChatGroq
llm = ChatGroq(
    model="llama-3.3-70b-versatile",#"mixtral-8x7b-32768",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
)
chat = llm

# INDEXING
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=GPT4AllEmbeddings())
retriever = vectorstore.as_retriever()

# Prompt from LangChain Hub
prompt = hub.pull("rlm/rag-prompt")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

from langchain_core.runnables import RunnablePassthrough

rag_chain = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough()
    }
    | prompt
    | llm
    | StrOutputParser()
)

print(rag_chain.invoke("What is Task Decomposition?"))
```
---

## Slide 10: Using LangChain Chains

```python
from langchain.chains import SequentialChain
from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMChain
from langchain_groq import ChatGroq

llm = ChatGroq(
    model="llama-3.3-70b-versatile",
    temperature=0.9,
    max_tokens=400,
    verbose=True,
)

first_prompt = ChatPromptTemplate.from_template(
    "Translate the following review to English:\n\n{Review}"
)
chain_one = LLMChain(llm=llm, prompt=first_prompt, output_key="English_Review")

second_prompt = ChatPromptTemplate.from_template(
    "Can you summarize the following review in 1 sentence:\n\n{English_Review}"
)
chain_two = LLMChain(llm=llm, prompt=second_prompt, output_key="summary")

third_prompt = ChatPromptTemplate.from_template(
    "What language is the following review:\n\n{Review}"
)
chain_three = LLMChain(llm=llm, prompt=third_prompt, output_key="language")

fourth_prompt = ChatPromptTemplate.from_template(
    "Write a follow up response to the following summary in the specified language:\n\nSummary: {summary}\nLanguage: {language}"
)
chain_four = LLMChain(llm=llm, prompt=fourth_prompt, output_key="followup_message")

fifth_prompt = ChatPromptTemplate.from_template(
    "Translate the followup message to English language:\n\n{followup_message}"
)
chain_five = LLMChain(llm=llm, prompt=fifth_prompt, output_key="english_translation")

overall_chain = SequentialChain(
    chains=[chain_one, chain_two, chain_three, chain_four, chain_five],
    input_variables=["Review"],
    output_variables=["English_Review", "summary", "language", "followup_message", "english_translation"],
    verbose=True
)

review = "میں نے کنگ سائز سیٹ آرڈر کیا تھا۔ میری واحد تنقید یہ ہوگی کہ کاش بیچنے والا کنگ سائز سیٹ کے ساتھ چار تکیے کے غلاف فراہم کرتا..."

results = overall_chain(review)
results
```

---
## Slide 11: LangChain Memory

```python
from langchain.chains import ConversationChain
from langchain.memory import ConversationSummaryBufferMemory
from langchain_groq import ChatGroq

llm = ChatGroq(model="llama-3.3-70b-versatile", temperature=0.9)
memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=100)

# Store sample conversation
memory.save_context({"input": "Hello"}, {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"}, {"output": "Cool"})
conversation = ConversationChain(llm=llm, memory=memory, verbose=True)
response = conversation.predict(input="What is on the schedule today?")
print(response)

# Example 2
schedule = "At 6:30 AM, you have a GEN AI Practical Presentation..."
memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=100)
memory.save_context({"input": "Hello"}, {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"}, {"output": "Cool"})
memory.save_context({"input": "What is on the schedule today?"}, {"output": f"{schedule}"})
memory.load_memory_variables({})
conversation = ConversationChain(llm=llm, memory=memory, verbose=True)
print(conversation.predict(input="suggest some good questions to ask about genAI in practice?"))
```

---
## Slide 12: Build “Habib” Conversational Agent (Example)

```python
import re
import json
import requests

class HabibUniversity:
    def __init__(self, model = "llama3.2:1b", temperature=0):
        """
        Initialize the ChatOllama object with the specified model and temperature.
        """
        self.model = model
        self.temperature = temperature
        self.url = 'http://localhost:11434/api/generate'
        self.system_prompt = """
        Your name is Habib-Pro. You are a knowledgeable and empathetic virtual assistant for Habib University...
        """

    def generate(self, prompt):
        full_prompt = f"{self.system_prompt}\n\nUser: {prompt}\nAssistant:"
        payload = {
            "model": self.model,
            "prompt": full_prompt,
            "temperature": self.temperature
        }
        response = requests.post(self.url, json=payload)
        return self._handle_response(response)

    def _handle_response(self, response):
        assembled_response = ""
        for line in response.iter_lines(decode_unicode=True):
            if line.strip():
                try:
                    data = json.loads(line)
                    if "response" in data:
                        assembled_response += data["response"]
                        print(data["response"], end='', flush=True)
                    if data.get("done", False):
                        break
                except json.JSONDecodeError as e:
                    print(f"Error decoding JSON: {e}")

if __name__ == "__main__":
    local_llm = "llama3.2:1b"
    llm = HabibUniversity(model=local_llm, temperature=0)
    prompt = "what's your name"
    response = llm.generate(prompt)
    print(response)
```

---
## Putting It All Together

- **Embeddings** represent text in dense vector form.
- **Vector Stores** like Chroma store these vectors.
- **RAG** combines external knowledge with LLM prompts.
- **LangChain** orchestrates multi-step processes.
- **Local/Open-Source LLMs** can be integrated if you have the server.

**End of Notebook**

---  

### Thank You!

- Questions?

```
```