In [1]:
import ollama
import time
import os
import json
import numpy as np
from numpy.linalg import norm
from openai import OpenAI
import time

In [2]:
def parse_file(filename):
    with open(filename, encoding="utf-8-sig") as f:
        paragraphs = []
        buffer = []
        for line in f.readlines():
            line = line.strip()
            if line:
                buffer.append(line)
            elif len(buffer):
                paragraphs.append((" ").join(buffer))
                buffer = []
        if len(buffer):
            paragraphs.append((" ").join(buffer))
        return paragraphs


def save_embeddings(filename, embeddings):
    # create dir if it doesn't exist
    if not os.path.exists("embeddings"):
        os.makedirs("embeddings")
    # dump embeddings to json
    with open(f"embeddings/{filename}.json", "w") as f:
        json.dump(embeddings, f)




def load_embeddings(filename):
    # check if file exists
    if not os.path.exists(f"embeddings/{filename}.json"):
        return False
    # load embeddings from json
    with open(f"embeddings/{filename}.json", "r") as f:
        return json.load(f)



def get_embeddings(filename, modelname, chunks):
    # check if embeddings are already saved
    if (embeddings := load_embeddings(filename)) is not False:
        return embeddings
    # get embeddings from ollama
    embeddings = [
        ollama.embeddings(model=modelname, prompt=chunk)["embedding"]
        for chunk in chunks
    ]
    # save embeddings
    save_embeddings(filename, embeddings)
    return embeddings

def find_most_similar(needle, haystack):
    needle_norm = norm(needle)

    similarity_scores = [
        np.dot(needle, item) / (needle_norm * norm(item)) for item in haystack
    ]

    # print(needle, haystack[0])
    # print(np.dot(needle, haystack[0]))
    return sorted(zip(similarity_scores, range(len(haystack))), reverse=True)


def rag_function(prompt):

    SYSTEM_PROMPT = """You are a helpful reading assistant who answers questions
        based on snippets of text provided in context. Answer only using the context provided,
        being as concise as possible. If you're unsure, just say that you don't know.
        Do not give answers that are outside the context given.
        Answer in about 150 words for each prompt.
        Context:
    """
    start_time = time.time()
    
    filename = "data.txt"
    paragraphs = parse_file(filename)

    embeddings = get_embeddings(filename, "nomic-embed-text", paragraphs)



    
    prompt_embedding = ollama.embeddings(model="nomic-embed-text", prompt=prompt)["embedding"]
    most_similar_chunks = find_most_similar(prompt_embedding, embeddings)[:5]



    response = ollama.chat(
    model="gemma2:9b" ,
    messages=[
        {
            "role": "system",
            "content": SYSTEM_PROMPT
            + "\n".join(paragraphs[item[1]] for item in most_similar_chunks),
        },
        {"role": "user", "content": prompt},
    ],
    )


    
    end_time = time.time()
    print("\n")
    print(response['message']['content'])
    print("\n")
    print(f"Execution Time {end_time-start_time} ")


In [3]:
#Strainght forward Question, with 27B model
prompt="Enlist the achievements of Sachin Tendulkar"
rag_function(prompt)



Sachin Tendulkar is widely regarded as one of the greatest batsmen in cricket history.  

He holds numerous records:

* **Highest run-scorer** in both ODI and Test cricket, with over 18,000 and 15,000 runs respectively.
* **Most player of the match awards** in international cricket.
* **First sportsperson to receive the Bharat Ratna**, India's highest civilian award.
* Named **one of Time's most influential people** in the world in 2010.
* Awarded the **Sir Garfield Sobers Trophy for cricketer of the year** at the 2010 ICC Awards.

He was part of the Indian team that won the **2011 Cricket World Cup**, and was named "Player of the Tournament" at the **2003 World Cup**.  


Tendulkar is also recognized by Wisden as one of the greatest Test and ODI batsmen of all time, ranking him second behind Don Bradman and Viv Richards respectively. 



Execution Time 10.876875400543213 


In [4]:
#Reasoning RAG

prompt="Who is better batsman Sachin or Sourav?"
rag_function(prompt)



The provided text states that Sachin Tendulkar is widely regarded as one of the greatest batsmen in the history of cricket and is the all-time highest run-scorer in both ODI and Test cricket.  

It also says that Sourav Ganguly is a former cricketer who is popularly called the Maharaja of Indian Cricket. 


The text does not provide a comparison between Sachin Tendulkar and Sourav Ganguly as batsmen. 



Execution Time 4.4356677532196045 


In [5]:
#Reasoning RAG

prompt="Who is better Captain Dhoni or Sourav?"
rag_function(prompt)



The text provides information about both MS Dhoni and Sourav Ganguly's accomplishments as captains, but it doesn't directly compare them to say who is "better."  

It highlights Dhoni's success leading India to victory in the 2007 ICC World Twenty20, the 2011 Cricket World Cup, and the 2013 ICC Champions Trophy. It also mentions Ganguly leading India to win the 2002 ICC Champions Trophy and reaching the final of the 2003 Cricket World Cup.  

To determine who is considered a "better" captain, one would need to consider additional factors beyond the victories mentioned in the text. 



Execution Time 6.0236194133758545 


In [6]:
#Analytical RAG

prompt="Difference between total runs scored by Sachin and Dhoni"
rag_function(prompt)



The text states that Sachin Tendulkar holds the record for the highest run-scorer in both ODI and Test cricket with more than 18,000 runs and 15,000 runs respectively.  It also says that MS Dhoni has scored 17,266 runs in international cricket.  


We cannot determine the exact difference between the total runs scored by Sachin and Dhoni from the given text. 



Execution Time 4.317958116531372 


In [7]:
#Analytical RAG

prompt="Who has most accolades Dhoni or sachin?"
rag_function(prompt)



While both Dhoni and Sachin have achieved incredible success, the text doesn't provide a direct comparison of their total accolades.  

It does state that Dhoni received India's highest sports honor (Khel Ratna Award) in 2008, followed by the Padma Shri in 2009 and Padma Bhushan in 2018. Additionally, he holds an honorary rank of Lieutenant Colonel in the Indian Territorial Army.

The text focuses more on Dhoni's accomplishments as a cricketer, highlighting his captaincy record and achievements in various tournaments like the World Cup and Champions Trophy.  



Execution Time 5.381108522415161 


In [8]:
#Analytical RAG

prompt="Amongst three, who is the best player in World cups?"
rag_function(prompt)



Based on the context provided, it states that MS Dhoni led India to victory in the 2011 Cricket World Cup, while Sachin Tendulkar was part of the winning team in the 2011 Cricket World Cup. It also mentions that Sourav Ganguly led India to the final of the 2003 Cricket World Cup.

Therefore, it can be inferred that MS Dhoni and Sachin Tendulkar are the two players who have won the Cricket World Cup based on the information given.  





Execution Time 8.55366039276123 


In [9]:
#Analytical RAG

prompt="Amongst three, who has most awards?"
rag_function(prompt)



Based on the text provided, Sachin Tendulkar has received the most awards. 

The text states that he received the Arjuna Award, Khel Ratna Award, Padma Shri, Padma Vibhushan, and the Bharat Ratna.  It also mentions that he was named cricketer of the year at the 2010 ICC Awards and was included in Time magazine's list of the most influential people in the world in 2010.


Let me know if you have any other questions!


Execution Time 4.5958571434021 


In [10]:
#Summary RAG

prompt="Give me summary of Dhoni's career, in less than 200 words?"
rag_function(prompt)



MS Dhoni is an Indian cricketer renowned for his exceptional skills as a right-handed batter and wicketkeeper. He captained the Indian cricket team across all formats from 2007 to 2017, achieving significant success. 

Under his leadership, India won the 2007 ICC World Twenty20, the 2011 Cricket World Cup, and the 2013 ICC Champions Trophy, making him the only captain to win three different limited-overs ICC tournaments.  He also led CSK to victory in five IPL titles (2010, 2011, 2018, 2021, and 2023), cementing his legacy as a successful captain. Dhoni retired from international cricket in 2019 but continues to play in the IPL.  He is widely regarded as one of the most prolific wicketkeeper-batsmen and captains in cricket history. 





Execution Time 8.176948070526123 


In [11]:
#Summary RAG

prompt="In about 150 words, give an overview of Sourav's career"
rag_function(prompt)



Sourav Ganguly, nicknamed "Dada," is a former Indian cricketer widely regarded as one of India's most successful captains. He led the Indian national cricket team to victory in the 2002 ICC Champions Trophy and guided them to the finals of major tournaments like the 2003 Cricket World Cup, the 2000 ICC Champions Trophy, and the 2004 Asia Cup.

Known as the "Maharaja of Indian Cricket," Ganguly's captaincy era was marked by a shift in India's cricketing approach, emphasizing aggressive batting and fielding.  After retiring from playing, he transitioned into cricket commentary.  





Execution Time 6.233846187591553 


In [12]:
#Analytical RAG

prompt="Enlist the total runs scored by all three players"
rag_function(prompt)



The provided text gives the total runs for Sachin Tendulkar (34,357), Ganguly (11,363), and Dhoni (17,266). 


Let me know if you'd like to know more about any specific player! 



Execution Time 3.293548583984375 
