## Simple RAG Pipeline

Download the "mini-llama-articales.csv" for try-outs

In [9]:
import requests

url = "https://raw.githubusercontent.com/AlaFalaki/tutorial_notebooks/main/data/mini-llama-articles.csv"
response = requests.get(url)

if response.status_code == 200:
    with open("mini-llama-articles.csv", "wb") as file:
        file.write(response.content)
    print("File downloaded successfully!")
else:
    print(f"Failed to download file. Status code: {response.status_code}")


File downloaded successfully!


Before loading the data, it’s necessary to define a function for dividing the text into segments. Chunking is an important step before augmenting prompts due to the limited context windows of language models, which prevent the use of multiple full-length articles as context. It also enables providing only relevant information to the model, improving accuracy.

In [1]:
import csv 
# Split the input text into chunks of specified size.
def split_into_chunks(text, chunk_size=1024):
  chunks = []
  for i in range(0, len(text), chunk_size):
    chunks.append(text[i:i+chunk_size])

  return chunks

chunks = []
# Load the file as a CSV
with open("./mini-llama-articles.csv", mode="r", encoding="utf-8") as file:
  csv_reader = csv.reader(file)
  for idx, row in enumerate(csv_reader):
    if idx == 0: continue; # Skip header row
    chunks.extend(split_into_chunks(row[1]))

print("number of articles:", idx)
print("number of chunks:", len(chunks))

number of articles: 14
number of chunks: 174


In [35]:
chunks[0]

"LLM Variants and Meta's Open Source Before shedding light on four major trends, I'd share the latest Meta's Llama 2 and Code Llama. Meta's Llama 2 represents a sophisticated evolution in LLMs. This suite spans models pretrained and fine-tuned across a parameter spectrum of 7 billion to 70 billion. A specialized derivative, Llama 2-Chat, has been engineered explicitly for dialogue-centric applications. Benchmarking revealed Llama 2's superior performance over most extant open-source chat models. Human-centric evaluations, focusing on safety and utility metrics, positioned Llama 2-Chat as a potential contender against proprietary, closed-source counterparts. The development trajectory of Llama 2 emphasized rigorous fine-tuning methodologies. Meta's transparent delineation of these processes aims to catalyze community-driven advancements in LLMs, underscoring a commitment to collaborative and responsible AI development. Code Llama is built on top of Llama 2 and is available in three mode

In [33]:
with open("./mini-llama-articles.csv", mode="r", encoding="utf-8") as file:
  csv_reader = csv.reader(file)
  for idx, row in enumerate(csv_reader):
    if idx == 0: continue;
    print(row[2])
    exit

https://pub.towardsai.net/beyond-gpt-4-whats-new-cbd61a448eb9#dda8
https://pub.towardsai.net/building-a-q-a-bot-over-private-documents-with-openai-and-langchain-be975559c1e8#bead
https://pub.towardsai.net/enhancing-e-commerce-product-search-using-llms-30d5a2117f71#e5f3
https://pub.towardsai.net/exploring-large-language-models-part-3-ab60ee236950#d193
https://pub.towardsai.net/fine-tuning-a-llama-2-7b-model-for-python-code-generation-865453afdf73#bf4e
https://pub.towardsai.net/foundation-models-37074a2d70a1#7ebc
https://pub.towardsai.net/gptq-quantization-on-a-llama-2-7b-fine-tuned-model-with-huggingface-a7b291fbb871#34d2
https://pub.towardsai.net/llama-by-meta-leaked-by-an-anonymous-forum-questions-arises-on-meta-e1216e51db6#9001
https://pub.towardsai.net/llama-gpt4all-simplified-local-chatgpt-ab7d28d34923#485a
https://pub.towardsai.net/inside-code-llama-meta-ais-entrance-in-the-code-llm-space-9f286d13a48d#c9e0
https://pub.towardsai.net/metas-llama-2-revolutionizing-open-source-languag

In [2]:
import pandas as pd

# Convert the list to a Pandas Dataframe
df = pd.DataFrame(chunks, columns=['chunk'])

print(df.keys())

Index(['chunk'], dtype='object')


In [6]:
import openai

# Defining a function that converts a text to embedding vector using OpenAI's Ada model.
def get_embedding(text):
  try:
    # Remove newlines
    text = text.replace("\n", " ")
    res = openai.Embedding.create(input=[text], model="text-embedding-3-small")

    return res.data[0].embedding

  except:
        return 

In [20]:
df

Unnamed: 0,chunk
0,LLM Variants and Meta's Open Source Before she...
1,ational code model;Codel Llama - Python specia...
2,"erm ""multimodal"" refers to their ability to pr..."
3,"es it matter? LLM connections, like the LlamaI..."
4,understand data in the AI-driven future. Fro...
...,...
169,versity. In-breadth Evolving solves this probl...
170,"ns are done, the initial instruction dataset (..."
171,"er, the Prompt should be as follows: Best Use..."
172,"sis, and visualization.Machine Learning Pipeli..."


In [19]:
df['chunk'][0]

"LLM Variants and Meta's Open Source Before shedding light on four major trends, I'd share the latest Meta's Llama 2 and Code Llama. Meta's Llama 2 represents a sophisticated evolution in LLMs. This suite spans models pretrained and fine-tuned across a parameter spectrum of 7 billion to 70 billion. A specialized derivative, Llama 2-Chat, has been engineered explicitly for dialogue-centric applications. Benchmarking revealed Llama 2's superior performance over most extant open-source chat models. Human-centric evaluations, focusing on safety and utility metrics, positioned Llama 2-Chat as a potential contender against proprietary, closed-source counterparts. The development trajectory of Llama 2 emphasized rigorous fine-tuning methodologies. Meta's transparent delineation of these processes aims to catalyze community-driven advancements in LLMs, underscoring a commitment to collaborative and responsible AI development. Code Llama is built on top of Llama 2 and is available in three mode

In [7]:
from tqdm.notebook import tqdm
import numpy as np

# Generate embedding
print("Generating embeddings...")
embeddings = []
for index, row in tqdm(df.iterrows()):
  embeddings.append(get_embedding(row['chunk']))

# Add the "embedding" column to the dataframe
embeddings_values = pd.Series(embeddings)
df.insert(loc=1, column='embedding', value=embeddings_values)

Generating embeddings...


0it [00:00, ?it/s]

__Cosine Similarity Metric:__

It measures the cosine of the angle between two vectors in a multi-dimensional space, indicating how closely the vectors are oriented regardless of their size

In [8]:
from sklearn.metrics.pairwise import cosine_similarity

QUESTION = "How many parameters LLaMA2 model has?"
QUESTION_emb = get_embedding(QUESTION)

# The similarity between the questions and each part of the essay.
cosine_similarities = cosine_similarity([QUESTION_emb], df['embedding'].tolist())

In [9]:
import numpy as np

number_of_chunks_to_retrieve = 3

# Sort and find the index of N highest scored chunks
indices = np.argsort(cosine_similarities[0])[::-1][:number_of_chunks_to_retrieve]
print(indices)

[114  75  89]


#### Inference from GenAI of Google

In [None]:
import google.generativeai as genai
# Use the Gemini API to answer the questions based on the retrieved pieces of text.
try:
    # Formulating the system prompt and condition the model to answer only AI-related questions.
    system_prompt = (
        "You are an assistant and expert in answering questions from a chunks of content. "
        "Only answer AI-related question, else say that you cannot answer this question."
    )

    # Create a user prompt with the user's question
    prompt = (
        "Read the following informations that might contain the context you require to answer the question. You can use the informations starting from the <START_OF_CONTEXT> tag and end with the <END_OF_CONTEXT> tag. Here is the content:\n\n<START_OF_CONTEXT>\n{}\n<END_OF_CONTEXT>\n\n"
        "Please provide an informative and accurate answer to the following question based on the avaiable context. Be concise and take your time. \nQuestion: {}\nAnswer:"
    )
    # Add the retrieved pieces of text to the prompt.
    prompt = prompt.format("".join(df.chunk[indices]), QUESTION)

    model = genai.GenerativeModel(model_name= "gemini-1.5-flash", system_instruction=system_prompt)
    
    result = model.generate_content(prompt,request_options={"timeout": 1000},)
    res = result.text

except Exception as e:
    print(f"An error occurred: {e}")

#### Inference from OpenAI

In [10]:
import os
import openai

openai.api_key=os.getenv("OPENAI_API_KEY2")

try:
    # Formulating the system prompt to condition the model
    system_prompt = (
        "You are an assistant and expert in answering questions from chunks of content. "
        "Only answer AI-related questions; otherwise, say that you cannot answer this question."
    )

    # Create a user prompt with the user's question
    prompt = (
        "Read the following information that might contain the context you require to answer the question. You can use the information starting from the <START_OF_CONTEXT> tag and end with the <END_OF_CONTEXT> tag. Here is the content:\n\n<START_OF_CONTEXT>\n{}\n<END_OF_CONTEXT>\n\n"
        "Please provide an informative and accurate answer to the following question based on the available context. Be concise and take your time. \nQuestion: {}\nAnswer:"
    )
    # Add the retrieved pieces of text to the prompt
    prompt = prompt.format("".join(df.chunk[indices]), QUESTION)

    # Use OpenAI's chat completion API to generate the answer
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",  
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": prompt}
        ],
        max_tokens=1000,  
        temperature=0.7,  
        timeout=1000 
    )

    # Extract the assistant's response
    res = response["choices"][0]["message"]["content"]
    print("Generated Answer:", res)

except Exception as e:
    print(f"An error occurred: {e}")

Generated Answer: The LLaMA2 model is available in four different sizes with varying parameters: 7 billion, 13 billion, 34 billion, and 70 billion parameters.
