<a href="https://colab.research.google.com/github/Akshay-S-12/RAG-CRICKET-/blob/main/RAG(CRICKET).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install -U langchain langchain-community langchainhub chromadb sentence-transformers transformers accelerate




In [2]:
import os
import warnings
warnings.filterwarnings("ignore")


In [3]:
from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader(
    web_paths=[
        "https://en.wikipedia.org/wiki/Cricket",
        "https://en.wikipedia.org/wiki/History_of_cricket",
        "https://en.wikipedia.org/wiki/Cricket_World_Cup"
    ]
)

docs = loader.load()

print("Total documents loaded:", len(docs))
#print(docs[0].page_content[:500])




Total documents loaded: 3


In [4]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)

splits = text_splitter.split_documents(docs)

print(len(splits))
print(splits[0])
print(splits[1])


315
page_content='Cricket - Wikipedia





























Jump to content







Main menu





Main menu
move to sidebar
hide



		Navigation
	


Main pageContentsCurrent eventsRandom articleAbout WikipediaContact us





		Contribute
	


HelpLearn to editCommunity portalRecent changesUpload fileSpecial pages



















Search











Search






















Appearance
















Donate

Create account

Log in








Personal tools





Donate Create account Log in




























Contents
move to sidebar
hide




(Top)





1
History




Toggle History subsection





1.1
Origins








1.2
Growth of amateur and professional cricket in England








1.3
English cricket in the 18th and 19th centuries








1.4
Cricket becomes an international sport








1.5
Cricket in the 20th century








1.6
Cricket in the 21st century










2
Laws and gameplay




Toggle Laws and gameplay subsection





2.1
Playing area








2.2
Match st

In [5]:
from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)


  embeddings = HuggingFaceEmbeddings(


In [6]:
from langchain_community.vectorstores import Chroma

vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=embeddings
)

print("Total vectors:", vectorstore._collection.count())


Total vectors: 315


In [7]:
query = "Are the recordings of the course available?"
results = vectorstore.similarity_search(query, k=3)

for i, doc in enumerate(results):
    print(f"\nResult {i+1}:\n", doc.page_content[:400])



Result 1:
 ^ "ZC to launch six-team T10 tournament in March 2023". ESPNcricinfo. Archived from the original on 11 December 2022. Retrieved 11 December 2022.

^ "The Hundred – everything you need to know (or have forgotten)". ESPNcricinfo. 2 August 2022. Archived from the original on 11 December 2022. Retrieved 11 December 2022.

Result 2:
 A Game of Cricket at The Royal Academy Club in Marylebone Fields, now Regent's Park, depiction by unknown artist, c. 1790–1799
19th-century cricket[edit]
Main articles: History of cricket (1801–1825), History of English cricket (1826–1845), and History of English cricket (1846–1863)
View of Geneva's Plaine de Plainpalais with cricketers, 1817

Result 3:
 Haygarth, Arthur (1862). Frederick Lillywhite's Cricket Scores & Biographies, Volume 1 (1744–1826). London: Frederick Lillywhite.
Major, John (2007). More Than A Game. London: HarperCollins. ISBN 978-0-00-718364-7.
McCann, Tim (2004). Sussex Cricket in the Eighteenth Century. Lewes: Sussex Record So

In [8]:
#retrieval
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})


In [9]:
#generation part
from transformers import pipeline
from langchain_community.llms import HuggingFacePipeline

generator = pipeline(
    "text2text-generation",
    model="google/flan-t5-base",
    max_new_tokens=100
)

llm = HuggingFacePipeline(pipeline=generator)



Device set to use cpu
  llm = HuggingFacePipeline(pipeline=generator)


In [10]:
#augmentation
from langchain_core.prompts import PromptTemplate

prompt = PromptTemplate(
    input_variables=["context", "question"],
    template="""
    Use ONLY the context below to answer the question in one clear sentence.

    Context:
    {context}

    Question:
    {question}

    Answer:
    """
)



In [11]:
#rag chain
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

def format_docs(docs):
    return "\n".join(doc.page_content for doc in docs)

rag_chain = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough()
    }
    | prompt
    | llm
    | StrOutputParser()
)





In [12]:
rag_chain.invoke("Who scored the most runs in cricket?")


'Sachin Tendulkar'

In [13]:
rag_chain.invoke("When was the first Cricket World Cup held?")


'June 1975'

In [14]:
rag_chain.invoke("When did cricket originate?")


'sometime during the medieval period'