<a href="https://colab.research.google.com/github/JoergNeumann/GenAI/blob/main/Pinecone_RAG_Demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RAG (Retrieval Augmented Generation) mit Pinecone

Libraries importieren

In [None]:
!pip install -qU \
    langchain_openai \
    langchain-community \
    langchain-pinecone \
    openai \
    datasets \
    pinecone \
    tiktoken

LangChain Setup

In [None]:
import os
from langchain_openai import ChatOpenAI

# OpenAI API Key aus Colab Secret auslesen
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

# Chat erstellen
chat = ChatOpenAI(
    openai_api_key=os.environ["OPENAI_API_KEY"],
    model='gpt-3.5-turbo'
)

Wir importieren Llama 2-Papers über das Dataset `"jamescalam/llama-2-arxiv-papers"`.
Der Datenimport erfolgt mit Hilfe der Hugging Face Datasets Library.

In [None]:
from datasets import load_dataset

dataset = load_dataset(
    "jamescalam/llama-2-arxiv-papers-chunked",
    split="train"
)

dataset

Dataset anzeigen

In [None]:
dataset[0]

Pincone Setup. Erfordert einen [API key](https://app.pinecone.io).

In [None]:
from pinecone import Pinecone

# API Key aus Colab Secret auslesen
from google.colab import userdata
api_key = userdata.get('PINECONE_API_KEY')

# configure client
pc = Pinecone(api_key=api_key)

Index erstellen und Cloud Provider / Region wählen in der gespeichert werden soll.

In [None]:
from pinecone import ServerlessSpec

spec = ServerlessSpec(
    cloud="aws", region="us-west-2"
)

Index initialisieren. Da wir OpenAI's `text-embedding-ada-002`-Model verwenden, stellen wir die `dimension` auf `1536`.

In [None]:
import time

index_name = 'llama-2-rag'
existing_indexes = [
    index_info["name"] for index_info in pc.list_indexes()
]

# check if index already exists (it shouldn't if this is first time)
if index_name not in existing_indexes:
    # if does not exist, create index
    pc.create_index(
        index_name,
        dimension=1536,  # dimensionality of ada 002
        metric='dotproduct',
        spec=spec
    )
    # wait for index to be initialized
    while not pc.describe_index(index_name).status['ready']:
        time.sleep(1)

# connect to index
index = pc.Index(index_name)
time.sleep(1)
# view index stats
index.describe_index_stats()

Vector Embeddings mit Hilfe von `text-embedding-ada-002` erzeugen

In [None]:
from langchain_community.embeddings.openai import OpenAIEmbeddings

embed_model = OpenAIEmbeddings(model="text-embedding-ada-002")

texts = [
    'this is the first chunk of text',
    'then another second chunk of text is here'
]

res = embed_model.embed_documents(texts)
len(res), len(res[0])

Wir bekommen für die 2 Chunks 2 x 1536-dimensionale embeddings.
Nun können wir die Texte indexieren, Embeddings erzeugen und diese speichern.

In [None]:
from tqdm.auto import tqdm  # for progress bar

data = dataset.to_pandas()  # this makes it easier to iterate over the dataset

batch_size = 100

for i in tqdm(range(0, len(data), batch_size)):
    i_end = min(len(data), i+batch_size)
    # get batch of data
    batch = data.iloc[i:i_end]
    # generate unique ids for each chunk
    ids = [f"{x['doi']}-{x['chunk-id']}" for i, x in batch.iterrows()]
    # get text to embed
    texts = [x['chunk'] for _, x in batch.iterrows()]
    # embed text
    embeds = embed_model.embed_documents(texts)
    # get metadata to store in Pinecone
    metadata = [
        {'text': x['chunk'],
         'source': x['source'],
         'title': x['title']} for i, x in batch.iterrows()
    ]
    # add to Pinecone
    index.upsert(vectors=zip(ids, embeds, metadata))

Erzeugten Index untersuchen

In [None]:
index.describe_index_stats()

LangChain VectorStore erzeugen

In [None]:
from langchain_pinecone import PineconeVectorStore

index = pc.Index(index_name)
vectorstore = PineconeVectorStore(index=index, embedding=OpenAIEmbeddings())

Index nach Frage durchsuchen

In [None]:
query = "What is so special about Llama 2?"

vectorstore.similarity_search(query, k=3)

Auf dem VectorStore wird nun die Suche ausgeführt und aus dem Ergebnis ein Agumented Prompt erstellt.

In [None]:
def augment_prompt(query: str):
    # get top 3 results from knowledge base
    results = vectorstore.similarity_search(query, k=3)
    # get the text from the results
    source_knowledge = "\n".join([x.page_content for x in results])
    # feed into an augmented prompt
    augmented_prompt = f"""Using the contexts below, answer the query.

    Contexts:
    {source_knowledge}

    Query: {query}"""
    return augmented_prompt

print(augment_prompt(query))

Abfrage absetzen

In [None]:
from langchain_openai import ChatOpenAI
from langchain.schema import (
    HumanMessage
)

# create a new user prompt
messages = [
    HumanMessage(content = augment_prompt(query) )
]

res = chat(messages)

print(res.content)