[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/docs/gen-qa-openai.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/docs/gen-qa-openai.ipynb)

# Installing Libraries

In [None]:
!pip install -qU \
    openai==0.27.7 \
    pinecone-client==3.1.0 \
    pinecone-datasets==0.7.0 \
    tqdm \
    pinecone-notebooks==0.1.1

---

## Building a Knowledge Base

In [None]:
from pinecone_datasets import load_dataset

dataset = load_dataset('youtube-transcripts-text-embedding-ada-002')
# we drop sparse_values as they are not needed for this example
dataset.documents.drop(['metadata'], axis=1, inplace=True)
dataset.documents.rename(columns={'blob': 'metadata'}, inplace=True)
dataset.head()

Unnamed: 0,id,values,sparse_values,metadata
0,35Pdoyi6ZoQ-t0.0,"[-0.010402066633105278, -0.018359748646616936,...",,"{'channel_id': 'UCv83tO5cePwHMt1952IVVHw', 'en..."
1,35Pdoyi6ZoQ-t18.48,"[-0.011849376372992992, 0.0007984379190020263,...",,"{'channel_id': 'UCv83tO5cePwHMt1952IVVHw', 'en..."
2,35Pdoyi6ZoQ-t32.36,"[-0.014534404501318932, -0.0003158661129418760...",,"{'channel_id': 'UCv83tO5cePwHMt1952IVVHw', 'en..."
3,35Pdoyi6ZoQ-t51.519999999999996,"[-0.011597747914493084, -0.007550035137683153,...",,"{'channel_id': 'UCv83tO5cePwHMt1952IVVHw', 'en..."
4,35Pdoyi6ZoQ-t67.28,"[-0.015879768878221512, 0.0030445053707808256,...",,"{'channel_id': 'UCv83tO5cePwHMt1952IVVHw', 'en..."


## Creating an Index

In [None]:
import os
from pinecone import ServerlessSpec
from pinecone import Pinecone

api_key = os.environ.get("PINECONE_API_KEY")
pc = Pinecone(api_key=api_key)
cloud = os.environ.get('PINECONE_CLOUD') or 'aws'
region = os.environ.get('PINECONE_REGION') or 'us-east-1'
spec = ServerlessSpec(cloud=cloud, region=region)

In [None]:
index_name = 'gen-qa-openai-fast'

In [None]:
# check if index already exists
if index_name not in pc.list_indexes().names():
    # if does not exist, create index
    pc.create_index(
        index_name,
        dimension=1536,  # dimensionality of text-embedding-ada-002
        metric='cosine',
        spec=spec
    )
# connect to index
index = pc.Index(index_name)
# add data to index
for batch in dataset.iter_documents(batch_size=100):
    index.upsert(batch)

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}

Now we've added all of our langchain docs to the index. With that we can move on to retrieval and then answer generation.

## Retrieval

In [None]:
import openai
from google.colab import userdata

openai.api_key = userdata.get('OPENAI_API_KEY')

embed_model = "text-embedding-ada-002"

In [None]:
query = (
    "Which training method should I use for sentence transformers when " +
    "I only have pairs of related sentences?"
)

res = openai.Embedding.create(
    input=[query],
    engine=embed_model
)

# retrieve from Pinecone
xq = res['data'][0]['embedding']

# get relevant contexts (including the questions)
res = index.query(vector=xq, top_k=2, include_metadata=True)

res
limit = 3750

import time

In [None]:
def retrieve(query):
    res = openai.Embedding.create(
        input=[query],
        engine=embed_model
    )

    # retrieve from Pinecone
    xq = res['data'][0]['embedding']

    # get relevant contexts
    contexts = []
    time_waited = 0
    while (len(contexts) < 3 and time_waited < 60 * 12):
        res = index.query(vector=xq, top_k=3, include_metadata=True)
        contexts = contexts + [
            x['metadata']['text'] for x in res['matches']
        ]

    prompt_start = (
        "Answer the question based on the context below.\n\n"+
        "Context:\n"
    )
    prompt_end = (
        f"\n\nQuestion: {query}\nAnswer:"
    )
    # append contexts until hitting limit
    for i in range(1, len(contexts)):
        if len("\n\n---\n\n".join(contexts[:i])) >= limit:
            prompt = (
                prompt_start +
                "\n\n---\n\n".join(contexts[:i-1]) +
                prompt_end
            )
            break
        elif i == len(contexts)-1:
            prompt = (
                prompt_start +
                "\n\n---\n\n".join(contexts) +
                prompt_end
            )
    return prompt

In [None]:
def complete(prompt):
    # instructions
    sys_prompt = "You are a helpful assistant that always answers questions."
    # query text-davinci-003
    res = openai.ChatCompletion.create(
        model='gpt-4o-mini-2024-07-18',
        messages=[
            {"role": "system", "content": sys_prompt},
            {"role": "user", "content": prompt}
        ],
        temperature=0
    )
    return res['choices'][0]['message']['content'].strip()

In [None]:
query = (
    "How do you finetune the LLM model"
)
query_with_contexts = retrieve(query)

In [None]:
query_with_contexts

"Answer the question based on the context below.\n\nContext:\nSo typically with transforming what we do is we download a pre-trained model from Hugging Face and then at that point we can either use a pre-trained model as is, which in a lot of cases it will be good enough to actually do that. But then at other times we might want to actually fine-tune the model and that is what I'll be showing you how to do here. So core of BERT, there are two different training or fine-tuning approaches that we can use. And we can even use both of those together. But for this video, what we're going to have a look at is how to use a Master Language modeling, which is called MLM. And MLM is really the probably the most important of those two core training approaches. The other one being next sentence prediction. So what MLM is, is we essentially give BERT a input sequence. So like this, so this would be our input sequence and we ask BERT to predict the input sequence. To predict the same input sequence 

In [None]:
# then we complete the context-infused query
complete(query_with_contexts)

"To fine-tune a large language model (LLM), you typically follow these steps:\n\n1. **Download a Pre-trained Model**: Start by downloading a pre-trained model from a repository like Hugging Face.\n\n2. **Prepare a Curated Dataset**: Create a small, curated dataset that reflects the specific values or tasks you want the model to learn. For example, in the context provided, this could involve generating around 80 text samples that align with the desired behavioral values.\n\n3. **Fine-tuning Process**: Use the pre-trained model as a base and fine-tune it on your curated dataset. This involves training the model for a few steps on the new data, allowing it to adjust its weights to better align with the specified values or tasks.\n\n4. **Evaluate the Model**: After fine-tuning, evaluate the model's performance to see if it adheres more closely to the defined values or performs better on the specific tasks.\n\n5. **Iterate if Necessary**: If the results are not satisfactory, you may need to

---