In [3]:
import ollama
import time
import os
import json
import numpy as np
from numpy.linalg import norm
from openai import OpenAI
import time

In [4]:
def parse_file(filename):
    with open(filename, encoding="utf-8-sig") as f:
        paragraphs = []
        buffer = []
        for line in f.readlines():
            line = line.strip()
            if line:
                buffer.append(line)
            elif len(buffer):
                paragraphs.append((" ").join(buffer))
                buffer = []
        if len(buffer):
            paragraphs.append((" ").join(buffer))
        return paragraphs


def save_embeddings(filename, embeddings):
    # create dir if it doesn't exist
    if not os.path.exists("embeddings"):
        os.makedirs("embeddings")
    # dump embeddings to json
    with open(f"embeddings/{filename}.json", "w") as f:
        json.dump(embeddings, f)




def load_embeddings(filename):
    # check if file exists
    if not os.path.exists(f"embeddings/{filename}.json"):
        return False
    # load embeddings from json
    with open(f"embeddings/{filename}.json", "r") as f:
        return json.load(f)



def get_embeddings(filename, modelname, chunks):
    # check if embeddings are already saved
    if (embeddings := load_embeddings(filename)) is not False:
        return embeddings
    # get embeddings from ollama
    embeddings = [
        ollama.embeddings(model=modelname, prompt=chunk)["embedding"]
        for chunk in chunks
    ]
    # save embeddings
    save_embeddings(filename, embeddings)
    return embeddings

def find_most_similar(needle, haystack):
    needle_norm = norm(needle)

    similarity_scores = [
        np.dot(needle, item) / (needle_norm * norm(item)) for item in haystack
    ]

    # print(needle, haystack[0])
    # print(np.dot(needle, haystack[0]))
    return sorted(zip(similarity_scores, range(len(haystack))), reverse=True)


def rag_function(prompt):

    SYSTEM_PROMPT = """You are a helpful reading assistant who answers questions
        based on snippets of text provided in context. Answer only using the context provided,
        being as concise as possible. If you're unsure, just say that you don't know.
        Do not give answers that are outside the context given.
        Answer in about 150 words for each prompt.
        Context:
    """
    start_time = time.time()
    
    filename = "data.txt"
    paragraphs = parse_file(filename)

    embeddings = get_embeddings(filename, "nomic-embed-text", paragraphs)



    
    prompt_embedding = ollama.embeddings(model="nomic-embed-text", prompt=prompt)["embedding"]
    most_similar_chunks = find_most_similar(prompt_embedding, embeddings)[:5]



    response = ollama.chat(
    model="llama3.1:8b" ,
    messages=[
        {
            "role": "system",
            "content": SYSTEM_PROMPT
            + "\n".join(paragraphs[item[1]] for item in most_similar_chunks),
        },
        {"role": "user", "content": prompt},
    ],
    )


    
    end_time = time.time()
    print("\n")
    print(response['message']['content'])
    print("\n")
    print(f"Execution Time {end_time-start_time} ")


In [5]:
#Strainght forward Question, with 27B model
prompt="Enlist the achievements of Sachin Tendulkar"
rag_function(prompt)



Based on the provided context, here are some of Sachin Tendulkar's notable achievements:

* Regarded as one of the greatest batsmen in the history of cricket
* Hailed as the world's most prolific batsman of all time
* All-time highest run-scorer in both ODI and Test cricket with more than 18,000 runs and 15,000 runs respectively
* Holds the record for receiving the most player of the match awards in international cricket
* Member of Parliament (Rajya Sabha) by presidential nomination from 2012 to 2018
* Received several awards from the government of India:
	+ Arjuna Award (1994)
	+ Khel Ratna Award (1997)
	+ Padma Shri (1998)
	+ Padma Vibhushan (2008)
	+ Bharat Ratna, India's highest civilian award (2013), making him the first sportsperson and youngest recipient
* Included in Time magazine's list of most influential people in the world (2010)
* Awarded the Sir Garfield Sobers Trophy for cricketer of the year at the 2010 International Cricket Council (ICC) Awards
* Part of the Indian 

In [6]:
#Reasoning RAG

prompt="Who is better batsman Sachin or Sourav?"
rag_function(prompt)



Based on the provided context, it's difficult to make a direct comparison between Sachin Tendulkar and Sourav Ganguly as batsmen, as they played in different eras and had different styles of play. However, according to the text, Sachin Tendulkar is "widely regarded as one of the greatest batsmen in the history of cricket" and holds numerous records, including being the all-time highest run-scorer in both ODI and Test cricket.

While Sourav Ganguly is also a successful cricketer and former captain of the Indian national team, there is no direct comparison or statement made that suggests he is better than Sachin Tendulkar as a batsman.


Execution Time 4.7582032680511475 


In [7]:
#Reasoning RAG

prompt="Who is better batsman Sachin or Sourav? Present the comparsion as a table"
rag_function(prompt)



Based on the provided text, here's a comparison table between Sachin Tendulkar and Sourav Ganguly as batsmen:

| **Category** | **Sachin Tendulkar** | **Sourav Ganguly** |
| --- | --- | --- |
| All-time highest run-scorer (ODI) | More than 18,000 runs | Not mentioned |
| All-time highest run-scorer (Test) | More than 15,000 runs | Not mentioned |
| Awards as Player of the match in international cricket | Most awards received | Not mentioned |
| ICC Champions Trophy wins as captain | Part of winning team in 2002 and 2011 | Captain who led India to win in 2002 |

Please note that this comparison is limited to the information provided in the text. There may be other statistics or achievements not mentioned here that could further compare their skills as batsmen.

Note: The table focuses on a direct comparison of batting statistics between Sachin and Sourav, but does not attempt to establish who is "better".


Execution Time 6.635613679885864 


In [8]:
#Reasoning RAG

prompt="Who is better batsman Sachin or Sourav? Present the comparsion as heading followed by bullet points"
rag_function(prompt)



**Comparing Sachin Tendulkar and Sourav Ganguly's Batting Records**

* **International Runs:**
	+ Sachin Tendulkar: Over 18,000 runs in ODIs and over 15,000 runs in Tests
	+ Sourav Ganguly: No specific numbers mentioned in the context
* **Awards and Recognition:**
	+ Sachin Tendulkar: Arjuna Award (1994), Khel Ratna Award (1997), Padma Shri (1998), Padma Vibhushan (2008), Bharat Ratna (2020)
	+ Sourav Ganguly: No specific awards mentioned in the context, but considered one of India's most successful cricket captains
* **International Matches Won as Captain:**
	+ Sachin Tendulkar: Not specifically mentioned in the context
	+ Sourav Ganguly: Led Indian team to win the 2002 ICC Champions Trophy and reach the final of the 2003 Cricket World Cup

Note that the comparison is limited by the context provided, which does not contain comprehensive information about both players.


Execution Time 6.512469053268433 


In [9]:
#Reasoning RAG

prompt="Who is better batsman Sachin or Sourav? Present the comparsion as heading(Player) followed by bullet points(achievements) "
rag_function(prompt)



**Sachin Tendulkar**

• All-time highest run-scorer in both ODI and Test cricket with more than 18,000 runs and 15,000 runs, respectively
• Holds the record for receiving the most player of the match awards in international cricket
• Regarded as one of the greatest batsmen in the history of cricket

**Sourav Ganguly**

• Led Indian national team to win the 2002 ICC Champions Trophy
• Reached the final of the 2003 Cricket World Cup, the 2000 ICC Champions Trophy, and the 2004 Asia Cup as captain


Execution Time 4.060004711151123 


In [10]:
#Reasoning RAG

prompt="Who is better Captain Dhoni or Sourav? Answer in one word only"
rag_function(prompt)



Neither 

(The context doesn't provide a direct comparison to answer this question)


Execution Time 1.3850510120391846 


In [11]:
#Analytical RAG

prompt="Difference between total runs scored by Sachin and Dhoni? Just give the number, No explanation needed"
rag_function(prompt)



18000 - 17266 = 734


Execution Time 1.2433576583862305 


In [12]:
#Analytical RAG

prompt="Who has most accolades Dhoni or sachin?"
rag_function(prompt)



Based on the context provided, it appears that both Sachin Tendulkar and MS Dhoni have received numerous awards and accolades. However, according to the text, Dhoni holds an honorary rank of Lieutenant Colonel in the Parachute Regiment of the Indian Territorial Army, which is not mentioned as being held by Sachin Tendulkar.

On the other hand, Sachin Tendulkar is described as "the world's most prolific batsman of all time" and has received the most player of the match awards in international cricket. He also holds records for receiving the most Player of the Match awards and has scored more than 18,000 runs in ODIs.

It seems that both cricketers have impressive credentials, but I would not be able to definitively say who has "most accolades" as it is subjective and depends on how one weighs different types of achievements.


Execution Time 5.667861223220825 


In [13]:
#Analytical RAG

prompt="Amongst three, who is the best player in World cups? Answer in one word only"
rag_function(prompt)



Tendulkar.


Execution Time 1.2359111309051514 


In [14]:
#Analytical RAG

prompt="Amongst three, who has most awards?"
rag_function(prompt)



Based on the provided context, Sachin Tendulkar and MS Dhoni are the two cricketers mentioned, along with Sourav Ganguly. The text does not provide a comprehensive comparison of their awards, but it can be observed that:

* Sachin Tendulkar received the Bharat Ratna (India's highest civilian award), which is not mentioned for the other two.
* MS Dhoni was awarded the Major Dhyan Chand Khel Ratna Award and Padma Shri and Padma Bhushan in 2009 and 2018, respectively.
* Sourav Ganguly received the Padma Shri in 2004.

It's difficult to say who has the most awards based on this information alone. However, it seems that Dhoni has a few more awards listed, but Tendulkar's Bharat Ratna is an extremely high honor that might outweigh others.


Execution Time 5.5461366176605225 


In [15]:
#Analytical RAG

prompt="Amongst three, who has most awards? Give name of player and number only"
rag_function(prompt)



Sachin Tendulkar (5) 

1. Arjuna Award
2. Khel Ratna Award
3. Padma Shri
4. Padma Vibhushan
5. Bharat Ratna

MS Dhoni (3)

1. Major Dhyan Chand Khel Ratna Award
2. Padma Shri
3. Padma Bhushan


Execution Time 2.767158269882202 


In [16]:
#Summary RAG

prompt="Give me summary of Dhoni's career, in less than 200 words? Answer in bullet points"
rag_function(prompt)



Here is a summary of Dhoni's career in bullet points:

* **Early Career:**
	+ Made his first-class debut for Bihar in 1999
	+ Made his international debut for India on December 23, 2004 (ODI against Bangladesh)
* **Captaincy and Success:**
	+ Became captain of the ODI side in 2007
	+ Led India to victory in the 2007 ICC World Twenty20, 2011 Cricket World Cup, and 2013 ICC Champions Trophy
	+ Won three different limited overs ICC tournaments as a captain
* **Domestic Success:**
	+ Played for Chennai Super Kings (CSK) in the Indian Premier League (IPL)
	+ Led CSK to the final on ten occasions and won five times (2010, 2011, 2018, 2021, 2023)
	+ Won two Champions League T20 titles with CSK in 2010 and 2014
* **Awards and Honors:**
	+ Received Major Dhyan Chand Khel Ratna Award in 2008
	+ Received Padma Shri in 2009 and Padma Bhushan in 2018


Execution Time 7.551924705505371 


In [17]:
#Summary RAG

prompt="In about 150 words, give an overview of Sourav's career, answer in numbered bullet points"
rag_function(prompt)



Here is an overview of Sourav Ganguly's career:

1. **Captain of Indian national team**: Sourav was the captain of the Indian national cricket team and led them to several victories.
2. **Won ICC Champions Trophy (2002)**: As captain, he led India to win the 2002 ICC Champions Trophy.
3. **Reached Cricket World Cup final (2003)**: He also led India to reach the final of the 2003 Cricket World Cup.
4. **Led team to semi-finals in several tournaments**: Sourav led Indian teams to semi-finals in the 2000 ICC Champions Trophy, 2004 Asia Cup, and other tournaments.
5. **Regarded as one of India's most successful cricket captains**: He is widely regarded as one of India's most successful cricket captains.

Note: The provided context does not mention much about Sourav's individual achievements or playing career apart from his captaincy.


Execution Time 6.031343460083008 


In [18]:
#Analytical RAG

prompt="Enlist the total runs scored by all three players. Present the answer in ordered list in decreasing order"
rag_function(prompt)



Here is the list of total runs scored by the three players:

1. Sachin Tendulkar - 34,357 runs
2. Mahendra Singh Dhoni - No data available (only ODI career and IPL stats mentioned)
3. Sourav Ganguly (not MS Dhoni or Sachin) - I don't know his total runs since only ODI data is provided


Execution Time 3.061131477355957 
