## RAG Application - English

## 1. Setup

Importing Libraries

In [101]:
! pip install langchain_openai langchain whisper langchain_community scikit-learn langchain_pinecone langchain[docarray] docarray pydantic==1.10.8 pytube  python-dotenv tiktoken ruff --quiet

[0m

Loading the environment variables we need to use.

In [102]:
import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")


Let's define the LLM model that we'll use as part of the workflow.

In [103]:
from langchain_openai.chat_models import ChatOpenAI

model = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model="gpt-3.5-turbo")

## 2. Loading the document

In [104]:
from langchain_community.document_loaders.csv_loader import CSVLoader


loader = CSVLoader(file_path='/workspaces/youtube-rag/convered.csv')
data = loader.load()

In [105]:
with open("/workspaces/youtube-rag/convered.csv") as file:
    transcription = file.read()

transcription[:100]

'ssage,id\n"Uruguay (official full name in  ; pron.  , Eastern Republic of  Uruguay) is a country loca'

## 3. Chunking & Indexing the CSV

In [106]:
from langchain_community.document_loaders import CSVLoader

loader = CSVLoader("/workspaces/youtube-rag/convered.csv")
text_documents = loader.load()
text_documents

[Document(metadata={'source': '/workspaces/youtube-rag/convered.csv', 'row': 0}, page_content='ssage: Uruguay (official full name in  ; pron.  , Eastern Republic of  Uruguay) is a country located in the southeastern part of South America.  It is home to 3.3 million people, of which 1.7 million live in the capital Montevideo and its metropolitan area.\nid: 0'),
 Document(metadata={'source': '/workspaces/youtube-rag/convered.csv', 'row': 1}, page_content='ssage: It is bordered by Brazil to the north, by Argentina across the bank of both the Uruguay River to the west and the estuary of RÃ\xado de la Plata to the southwest, and the South Atlantic Ocean to the southeast. It is the second smallest independent country in South America, larger only than Suriname and the French overseas department of French Guiana.\nid: 1'),
 Document(metadata={'source': '/workspaces/youtube-rag/convered.csv', 'row': 2}, page_content='ssage: Montevideo was founded by the Spanish in the early 18th century as a m

In [107]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=30)
documents=text_splitter.split_documents(text_documents)

## 4. Creating embeddings

Let's generate embeddings for an arbitrary query:

In [108]:
from langchain_openai.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
embedded_query = embeddings.embed_query("Who is Mary's sister?")

print(f"Embedding length: {len(embedded_query)}")
print(embedded_query[:10])

Embedding length: 1536
[-0.001359404530376196, -0.03437049686908722, -0.0114255640655756, 0.001291395165026188, -0.02616560459136963, 0.009161713533103466, -0.015621816739439964, 0.0018229621928185225, -0.011800787411630154, -0.03324482589960098]


## 5. Using Pinecone as Vector Database

In [109]:
from langchain_pinecone import PineconeVectorStore

index_name = "tmf2"

vectorstore = PineconeVectorStore.from_documents(
    documents, embeddings, index_name=index_name
)

--------------------------------------------------    INGESTION COMPLETES HERE     -----------------------------------------------------------

### Retrival & Generation

In [110]:
query = ["What are the three sections of a beetle?",
"What did Coolidge do after graduating from Amherst?", "who is sunita williams?"]

results = []
for q in query:
    result = vectorstore.similarity_search(q)
    results.append(result)

#a = vectorstore.similarity_search(query)

In [163]:
from langchain.prompts import ChatPromptTemplate
import pandas as pd
df = pd.read_csv("/workspaces/youtube-rag/test.csv")
sample_df = df.sample(n=20, random_state=1)  # random_state for reproducibility
print(sample_df)
df = sample_df
query= df['question']


                                                                                                  question  \
900                                                                        Did Wilson's father own slaves?   
570  Was Another primary objective of Fillmore to preserve the Union from the intensifying slavery debate?   
791                                                                               Are turtles ectothermic?   
189                                                                               Does snow fall in Egypt?   
372                                                                    What is the first word on the page?   
191                                                    Are elephants the largest land animals alive today?   
643                                                  Have penguins an average sense of hearing for birds ?   
474                                                               Different species of kangaroos eat what?   
65        

In [164]:
from langchain.prompts import ChatPromptTemplate

results = []
all_contexts = []

# Loop through each query, retrieve context, and store it
for q in query:
    result = vectorstore.similarity_search(q)
    context_text = ([doc.page_content for doc in result])
    all_contexts.append(context_text)

In [165]:
from langchain.prompts import ChatPromptTemplate


# Combine all contexts into a single string
#final_context = "\n\n".join(all_contexts)
#print(final_context)

# Create prompt template
PROMPT_TEMPLATE = """
Answer the question based only on the following context:
{context}
Answer the question based on the above context: {question}.
Provide a detailed answer.
Don’t justify your answers.
Don’t give information not mentioned in the CONTEXT INFORMATION.
Do not say "according to the context" or "mentioned in the context" or similar.
"""

prompt_template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)

In [166]:
responses = []
for i, q in enumerate(query):
    prompt = prompt_template.format(context=all_contexts[i], question=q)
    model = ChatOpenAI()  
    response_text = model.predict(prompt)
    responses.append(response_text)


for response in responses:
    print(response)

Yes.
Yes.
Yes, turtles are ectothermic, like other reptiles.
No.
The first word on the page is "the".
Yes, elephants are the largest land animals alive today.
Yes.
Different species of kangaroos eat a wide variety of grasses.
Yes, certain species of beetles are considered pests.
Yes, Woodrow Wilson created the League of Nations.
Yes.
Sea turtles lay their eggs on the beach.
Yes.
Yes.
Kangaroos use a method of locomotion called "crawl-walking" where they raise their hind feet forward.
Ford received a commission as ensign in the U.S. Naval Reserve on April 13, 1942.
No.
Canada has 13 provinces and territories.
Kangaroos.
Samuel de Champlain also established a permanent European settlement at Quebec City in 1608.


In [168]:
df = df.drop(columns=['id'])
df['answer'] = responses
#df["contexts"]
print("Existing columns:", df.columns)



Existing columns: Index(['question', 'ground_truth', 'answer'], dtype='object')


In [169]:
df

Unnamed: 0,question,ground_truth,answer
900,Did Wilson's father own slaves?,yes,Yes.
570,Was Another primary objective of Fillmore to preserve the Union from the intensifying slavery debate?,yes,Yes.
791,Are turtles ectothermic?,Yes,"Yes, turtles are ectothermic, like other reptiles."
189,Does snow fall in Egypt?,Yes.,No.
372,What is the first word on the page?,James,"The first word on the page is ""the""."
191,Are elephants the largest land animals alive today?,yes,"Yes, elephants are the largest land animals alive today."
643,Have penguins an average sense of hearing for birds ?,Yes,Yes.
474,Different species of kangaroos eat what?,different diets,Different species of kangaroos eat a wide variety of grasses.
65,Are certain species of beetles considered pests?,Yes,"Yes, certain species of beetles are considered pests."
890,Did Woodrow Wilson create the League of Nations?,yes,"Yes, Woodrow Wilson created the League of Nations."


### Evaluation


In [170]:
%pip install ragas

Note: you may need to restart the kernel to use updated packages.


In [171]:
import pandas as pd 
from datasets import Dataset, Features, Sequence, Value

print("Existing columns:", df.columns)

df['contexts'] = all_contexts



Existing columns: Index(['question', 'ground_truth', 'answer'], dtype='object')


In [172]:
df

Unnamed: 0,question,ground_truth,answer,contexts
900,Did Wilson's father own slaves?,yes,Yes.,"[in his name. Grant's father Jesse Grant was involved; General James H. Wilson later explained,, in his name. Grant's father Jesse Grant was involved; General James H. Wilson later explained,, was Wilson's residence during his term as president of the university., was Wilson's residence during his term as president of the university.]"
570,Was Another primary objective of Fillmore to preserve the Union from the intensifying slavery debate?,yes,Yes.,"[was to preserve the Union from the intensifying slavery debate., was to preserve the Union from the intensifying slavery debate., reflected on the Democratic Party). Another primary objective of Fillmore was to preserve the Union, reflected on the Democratic Party). Another primary objective of Fillmore was to preserve the Union]"
791,Are turtles ectothermic?,Yes,"Yes, turtles are ectothermic, like other reptiles.","[ssage: Like other reptiles, turtles are ectothermic (or ""cold-blooded"" Reptile blood isn't, ssage: Like other reptiles, turtles are ectothermic (or ""cold-blooded"" Reptile blood isn't, This smaller group consists primarily of various freshwater turtles., This smaller group consists primarily of various freshwater turtles.]"
189,Does snow fall in Egypt?,Yes.,No.,"[in Cairo, Alexandria and other cities., in Cairo, Alexandria and other cities., the south in Egypt in spring, bringing sand and dust, and sometimes raises the temperature in the, the south in Egypt in spring, bringing sand and dust, and sometimes raises the temperature in the]"
372,What is the first word on the page?,James,"The first word on the page is ""the"".","[the Power of Words(2006) ISBN 1-4000-4039-6., the Power of Words(2006) ISBN 1-4000-4039-6., Utama, who, landing on the island after a thunderstorm, spotted an auspicious beast on the shore, Utama, who, landing on the island after a thunderstorm, spotted an auspicious beast on the shore]"
191,Are elephants the largest land animals alive today?,yes,"Yes, elephants are the largest land animals alive today.","[ssage: Elephants are mammals, and the largest land animals alive today. The elephant's gestation, ssage: Elephants are mammals, and the largest land animals alive today. The elephant's gestation, Today, according to IUCNâs African Elephant Status Report 2007, Today, according to IUCNâs African Elephant Status Report 2007]"
643,Have penguins an average sense of hearing for birds ?,Yes,Yes.,"[ssage: Penguins have an average sense of hearing for birds (Wever et al 1969); this is used by, ssage: Penguins have an average sense of hearing for birds (Wever et al 1969); this is used by, the fact that penguins look remarkably like Great Auks in general shape., the fact that penguins look remarkably like Great Auks in general shape.]"
474,Different species of kangaroos eat what?,different diets,Different species of kangaroos eat a wide variety of grasses.,"[kangaroos are predominantly grazers eating a wide variety of grasses whereas some other species, kangaroos are predominantly grazers eating a wide variety of grasses whereas some other species, a danger to smaller kangaroo species when other food sources are lacking., a danger to smaller kangaroo species when other food sources are lacking.]"
65,Are certain species of beetles considered pests?,Yes,"Yes, certain species of beetles are considered pests.","[beetles. These include the following:, beetles. These include the following:, of many beetle families are predatory like the adults (ground beetles, ladybirds, rove beetles)., of many beetle families are predatory like the adults (ground beetles, ladybirds, rove beetles).]"
890,Did Woodrow Wilson create the League of Nations?,yes,"Yes, Woodrow Wilson created the League of Nations.","[of the League of Nations"", Woodrow Wilson, delivered 25 Sept 1919 in Pueblo, CO. John B. Duff,, of the League of Nations"", Woodrow Wilson, delivered 25 Sept 1919 in Pueblo, CO. John B. Duff,, a peacemaking organization, which later emerged as the League of Nations., a peacemaking organization, which later emerged as the League of Nations.]"


In [173]:
import os
from ragas import evaluate
import pandas as pd
from datasets import Dataset
from ragas.metrics import faithfulness, answer_correctness
import nest_asyncio

nest_asyncio.apply()

dataset = Dataset.from_pandas(df)
dataset 
import nest_asyncio

nest_asyncio.apply()
#dataset_split = ds['test']

score = evaluate(dataset, metrics= [ faithfulness, answer_correctness])
print(score)

Evaluating:   0%|          | 0/40 [00:00<?, ?it/s]

Evaluating: 100%|██████████| 40/40 [00:09<00:00,  4.18it/s]


{'faithfulness': 0.5500, 'answer_correctness': 0.2394}


In [174]:
score.to_pandas()

Unnamed: 0,question,ground_truth,answer,contexts,__index_level_0__,faithfulness,answer_correctness
0,Did Wilson's father own slaves?,yes,Yes.,"[in his name. Grant's father Jesse Grant was involved; General James H. Wilson later explained,, in his name. Grant's father Jesse Grant was involved; General James H. Wilson later explained,, was Wilson's residence during his term as president of the university., was Wilson's residence during his term as president of the university.]",900,0.0,0.225871
1,Was Another primary objective of Fillmore to preserve the Union from the intensifying slavery debate?,yes,Yes.,"[was to preserve the Union from the intensifying slavery debate., was to preserve the Union from the intensifying slavery debate., reflected on the Democratic Party). Another primary objective of Fillmore was to preserve the Union, reflected on the Democratic Party). Another primary objective of Fillmore was to preserve the Union]",570,1.0,0.225871
2,Are turtles ectothermic?,Yes,"Yes, turtles are ectothermic, like other reptiles.","[ssage: Like other reptiles, turtles are ectothermic (or ""cold-blooded"" Reptile blood isn't, ssage: Like other reptiles, turtles are ectothermic (or ""cold-blooded"" Reptile blood isn't, This smaller group consists primarily of various freshwater turtles., This smaller group consists primarily of various freshwater turtles.]",791,1.0,0.196595
3,Does snow fall in Egypt?,Yes.,No.,"[in Cairo, Alexandria and other cities., in Cairo, Alexandria and other cities., the south in Egypt in spring, bringing sand and dust, and sometimes raises the temperature in the, the south in Egypt in spring, bringing sand and dust, and sometimes raises the temperature in the]",189,0.0,0.22311
4,What is the first word on the page?,James,"The first word on the page is ""the"".","[the Power of Words(2006) ISBN 1-4000-4039-6., the Power of Words(2006) ISBN 1-4000-4039-6., Utama, who, landing on the island after a thunderstorm, spotted an auspicious beast on the shore, Utama, who, landing on the island after a thunderstorm, spotted an auspicious beast on the shore]",372,0.0,0.195419
5,Are elephants the largest land animals alive today?,yes,"Yes, elephants are the largest land animals alive today.","[ssage: Elephants are mammals, and the largest land animals alive today. The elephant's gestation, ssage: Elephants are mammals, and the largest land animals alive today. The elephant's gestation, Today, according to IUCNâs African Elephant Status Report 2007, Today, according to IUCNâs African Elephant Status Report 2007]",191,1.0,0.194992
6,Have penguins an average sense of hearing for birds ?,Yes,Yes.,"[ssage: Penguins have an average sense of hearing for birds (Wever et al 1969); this is used by, ssage: Penguins have an average sense of hearing for birds (Wever et al 1969); this is used by, the fact that penguins look remarkably like Great Auks in general shape., the fact that penguins look remarkably like Great Auks in general shape.]",643,1.0,0.232379
7,Different species of kangaroos eat what?,different diets,Different species of kangaroos eat a wide variety of grasses.,"[kangaroos are predominantly grazers eating a wide variety of grasses whereas some other species, kangaroos are predominantly grazers eating a wide variety of grasses whereas some other species, a danger to smaller kangaroo species when other food sources are lacking., a danger to smaller kangaroo species when other food sources are lacking.]",474,0.0,0.197509
8,Are certain species of beetles considered pests?,Yes,"Yes, certain species of beetles are considered pests.","[beetles. These include the following:, beetles. These include the following:, of many beetle families are predatory like the adults (ground beetles, ladybirds, rove beetles)., of many beetle families are predatory like the adults (ground beetles, ladybirds, rove beetles).]",65,0.0,0.200468
9,Did Woodrow Wilson create the League of Nations?,yes,"Yes, Woodrow Wilson created the League of Nations.","[of the League of Nations"", Woodrow Wilson, delivered 25 Sept 1919 in Pueblo, CO. John B. Duff,, of the League of Nations"", Woodrow Wilson, delivered 25 Sept 1919 in Pueblo, CO. John B. Duff,, a peacemaking organization, which later emerged as the League of Nations., a peacemaking organization, which later emerged as the League of Nations.]",890,0.0,0.195058
