In [1]:
import ollama
import time
import os
import json
import numpy as np
from numpy.linalg import norm
from openai import OpenAI
import time

In [5]:
def parse_file(filename):
    with open(filename, encoding="utf-8-sig") as f:
        paragraphs = []
        buffer = []
        for line in f.readlines():
            line = line.strip()
            if line:
                buffer.append(line)
            elif len(buffer):
                paragraphs.append((" ").join(buffer))
                buffer = []
        if len(buffer):
            paragraphs.append((" ").join(buffer))
        return paragraphs


def save_embeddings(filename, embeddings):
    # create dir if it doesn't exist
    if not os.path.exists("embeddings"):
        os.makedirs("embeddings")
    # dump embeddings to json
    with open(f"embeddings/{filename}.json", "w") as f:
        json.dump(embeddings, f)




def load_embeddings(filename):
    # check if file exists
    if not os.path.exists(f"embeddings/{filename}.json"):
        return False
    # load embeddings from json
    with open(f"embeddings/{filename}.json", "r") as f:
        return json.load(f)



def get_embeddings(filename, modelname, chunks):
    # check if embeddings are already saved
    if (embeddings := load_embeddings(filename)) is not False:
        return embeddings
    # get embeddings from ollama
    embeddings = [
        ollama.embeddings(model=modelname, prompt=chunk)["embedding"]
        for chunk in chunks
    ]
    # save embeddings
    save_embeddings(filename, embeddings)
    return embeddings

def find_most_similar(needle, haystack):
    needle_norm = norm(needle)

    similarity_scores = [
        np.dot(needle, item) / (needle_norm * norm(item)) for item in haystack
    ]

    # print(needle, haystack[0])
    # print(np.dot(needle, haystack[0]))
    return sorted(zip(similarity_scores, range(len(haystack))), reverse=True)


def rag_function(prompt):

    SYSTEM_PROMPT = """You are a helpful reading assistant who answers questions
        based on snippets of text provided in context. Answer only using the context provided,
        being as concise as possible. If you're unsure, just say that you don't know.
        Do not give answers that are outside the context given.
        Answer in about 150 words for each prompt.
        Context:
    """
    start_time = time.time()
    
    filename = "data.txt"
    paragraphs = parse_file(filename)

    embeddings = get_embeddings(filename, "nomic-embed-text", paragraphs)



    
    prompt_embedding = ollama.embeddings(model="nomic-embed-text", prompt=prompt)["embedding"]
    most_similar_chunks = find_most_similar(prompt_embedding, embeddings)[:5]


    
    response = ollama.chat(
    model="gemma2:9b" ,
    messages=[
        {
            "role": "system",
            "content": SYSTEM_PROMPT
            + "\n".join(paragraphs[item[1]] for item in most_similar_chunks),
        },
        {"role": "user", "content": prompt},
    ],
    )


    
    end_time = time.time()
    print("\n")
    print(response['message']['content'])
    print("\n")
    print(f"Execution Time {end_time-start_time} ")


In [4]:
#Strainght forward Question, with 27B model
prompt="Enlist the achievements of Sachin Tendulkar"
rag_function(prompt)



Sachin Tendulkar is widely regarded as one of the greatest batsmen in cricket history. He is the all-time highest run-scorer in both ODI and Test cricket, with over 18,000 runs in ODIs and more than 15,000 runs in Tests.

He holds the record for receiving the most player of the match awards in international cricket. Tendulkar was part of the Indian team that won the 2011 Cricket World Cup, his first win in six World Cup appearances. He was named "Player of the Tournament" at the 2003 World Cup.

Tendulkar also received numerous awards from the Indian government, including the Arjuna Award (1994), the Khel Ratna Award (1997), the Padma Shri (1998), and the Padma Vibhushan (2008). He was the first sportsperson to receive the Bharat Ratna, India's highest civilian award. In 2010, Time magazine included him in its annual list of the most influential people in the world.

In 2019, he was inducted into the ICC Cricket Hall of Fame.  



Execution Time 361.8600928783417 


In [6]:
#Strainght forward Question, with 27B model
prompt="Enlist the achievements of Sachin Tendulkar"
rag_function(prompt)



Sachin Tendulkar is widely regarded as one of the greatest batsmen in cricket history.  

He holds the record for the all-time highest run-scorer in both ODI and Test cricket, with over 18,000 runs and 15,000 runs respectively. He also has the most player of the match awards in international cricket. 

Tendulkar has received numerous awards, including the Arjuna Award, Khel Ratna Award, Padma Shri, Padma Vibhushan, and the Bharat Ratna, India's highest civilian award.  He was also included in Time's annual list of the most influential people in the world in 2010 and awarded the Sir Garfield Sobers Trophy for cricketer of the year at the 2010 ICC Awards.

Tendulkar played for Mumbai domestically and represented India internationally for over 24 years. He was part of the Indian team that won the 2011 Cricket World Cup and was named "Player of the Tournament" at the 2003 World Cup.  He retired from all forms of cricket in November 2013 after playing his 200th Test match.   





Executi

In [7]:
#Reasoning RAG

prompt="Who is better batsman Sachin or Sourav?"
rag_function(prompt)



The text states that Sachin Tendulkar is widely regarded as one of the greatest batsmen in the history of cricket and is considered the world's most prolific batsman of all time.  It also says that he holds the record for the most runs in both ODI and Test cricket. 


The passage doesn't directly compare Sourav Ganguly to Sachin Tendulkar in terms of batting ability. 



Execution Time 79.15769934654236 


In [8]:
#Reasoning RAG

prompt="Who is better Captain Dhoni or Sourav?"
rag_function(prompt)



The text provides information about both MS Dhoni and Sourav Ganguly's achievements as captains but does not offer a comparison to determine who is "better".  

Both are recognized as highly successful Indian cricket captains. 
* Dhoni led India to victory in the 2007 ICC World Twenty20, the 2011 Cricket World Cup, and the 2013 ICC Champions Trophy. He also captained Chennai Super Kings to five IPL victories.

* Ganguly led India to win the 2002 ICC Champions Trophy and reached the final of the 2003 Cricket World Cup, among other achievements.


The passage focuses on their individual accomplishments rather than a comparative analysis.  



Execution Time 90.57676148414612 


In [9]:
#Analytical RAG

prompt="Difference between total runs scored by Sachin and Dhoni"
rag_function(prompt)



The provided text states that Sachin Tendulkar has scored more than 18,000 runs in ODI and 15,000 runs in Test cricket, while MS Dhoni has scored over 17,266 runs in international cricket. 


However, the text doesn't give a precise breakdown of how many runs each player has scored in ODIs and Tests. Therefore, it is impossible to determine the exact difference between their total runs based on the context provided. 



Execution Time 83.1014564037323 


In [10]:
#Analytical RAG

prompt="Who has most accolades Dhoni or sachin?"
rag_function(prompt)



Based on the provided text, it's difficult to say definitively who has more accolades, Dhoni or Sachin Tendulkar.  

The passage details Dhoni's achievements including various awards like the Khel Ratna Award, Padma Shri, and Padma Bhushan, as well as his successful captaincy record in international cricket and the IPL. 

However, the text focuses primarily on Dhoni's accomplishments. While it mentions Sachin Tendulkar's status as one of the greatest batsmen and his numerous records, it doesn't provide specific details about the awards he has received.  



Execution Time 85.93707776069641 


In [11]:
#Analytical RAG

prompt="Amongst three, who is the best player in World cups?"
rag_function(prompt)



Based on the text provided,  the context states that Sachin Tendulkar was part of the Indian team that won the 2011 Cricket World Cup and had previously been named "Player of the Tournament" at the 2003 World Cup.  

While it mentions Dhoni's success as captain in various ICC tournaments, it doesn't specify any individual World Cup achievements for him. Sourav Ganguly's accomplishments as captain are also detailed, but not focused on World Cup performance.


Therefore, based on the information given, Sachin Tendulkar appears to have the strongest World Cup record out of the three mentioned. 



Execution Time 128.3356900215149 


In [12]:
#Analytical RAG

prompt="Amongst three, who has most awards?"
rag_function(prompt)



Based on the context provided, Sachin Tendulkar has received the most awards.  

Here's why:

* **Tendulkar:** Received the Arjuna Award, Khel Ratna Award, Padma Shri, Padma Vibhushan, and the Bharat Ratna (India's highest civilian award). He was also named one of Time's most influential people and received the Sir Garfield Sobers Trophy.
* **Dhoni:** Received the Major Dhyan Chand Khel Ratna Award, Padma Shri, and Padma Bhushan. He also holds an honorary rank in the Indian Territorial Army.
* **Ganguly:** Received the Padma Shri award and was elected president of the Board of Control for Cricket in India. 


Let me know if you have any other questions about these cricketers! 



Execution Time 84.32193922996521 


In [13]:
#Summary RAG

prompt="Give me summary of Dhoni's career, in less than 200 words?"
rag_function(prompt)



Mahendra Singh Dhoni is a highly successful Indian cricketer known for his skills as a right-handed batter and wicketkeeper. He captained the Indian national cricket team across all formats from 2007 to 2017, leading them to victory in the 2007 ICC World Twenty20, the 2011 Cricket World Cup, and the 2013 ICC Champions Trophy.  

He is recognized as one of the most prolific wicketkeeper-batsmen and captains in cricket history. Dhoni led the Chennai Super Kings (CSK) to ten IPL finals, winning five titles. He also holds the distinction of being the first wicketkeeper to score over five thousand runs in the IPL. 


Dhoni retired from international cricket in 2019 but continues to play for CSK in the IPL. 



Execution Time 98.09301280975342 


In [14]:
#Summary RAG

prompt="In about 150 words, give an overview of Sourav's career"
rag_function(prompt)



Sourav Ganguly, nicknamed "Dada," is a former Indian cricketer and current commentator widely recognized as one of India's most successful captains.  

He led the Indian national team to victory in the 2002 ICC Champions Trophy and guided them to the finals of several other tournaments, including the 2003 Cricket World Cup, the 2000 ICC Champions Trophy, and the 2004 Asia Cup. Known for his aggressive batting style and leadership qualities, Ganguly earned the title "Maharaja of Indian Cricket."  



Execution Time 96.34282732009888 


In [None]:
#Analytical RAG

prompt="Enlist the total runs scored by all three players"
rag_function(prompt)