In [7]:
import ollama
import time
import os
import json
import numpy as np
from numpy.linalg import norm
from openai import OpenAI
import time

In [12]:
def parse_file(filename):
    with open(filename, encoding="utf-8-sig") as f:
        paragraphs = []
        buffer = []
        for line in f.readlines():
            line = line.strip()
            if line:
                buffer.append(line)
            elif len(buffer):
                paragraphs.append((" ").join(buffer))
                buffer = []
        if len(buffer):
            paragraphs.append((" ").join(buffer))
        return paragraphs


def save_embeddings(filename, embeddings):
    # create dir if it doesn't exist
    if not os.path.exists("embeddings"):
        os.makedirs("embeddings")
    # dump embeddings to json
    with open(f"embeddings/{filename}.json", "w") as f:
        json.dump(embeddings, f)




def load_embeddings(filename):
    # check if file exists
    if not os.path.exists(f"embeddings/{filename}.json"):
        return False
    # load embeddings from json
    with open(f"embeddings/{filename}.json", "r") as f:
        return json.load(f)



def get_embeddings(filename, modelname, chunks):
    # check if embeddings are already saved
    if (embeddings := load_embeddings(filename)) is not False:
        return embeddings
    # get embeddings from ollama
    embeddings = [
        ollama.embeddings(model=modelname, prompt=chunk)["embedding"]
        for chunk in chunks
    ]
    # save embeddings
    save_embeddings(filename, embeddings)
    return embeddings

def find_most_similar(needle, haystack):
    needle_norm = norm(needle)

    similarity_scores = [
        np.dot(needle, item) / (needle_norm * norm(item)) for item in haystack
    ]

    # print(needle, haystack[0])
    # print(np.dot(needle, haystack[0]))
    return sorted(zip(similarity_scores, range(len(haystack))), reverse=True)


def rag_function(prompt):

    SYSTEM_PROMPT = """You are a helpful reading assistant who answers questions
        based on snippets of text provided in context. Answer only using the context provided,
        being as concise as possible. If you're unsure, just say that you don't know.
        Do not give answers that are outside the context given.
        Answer in about 150 words for each prompt.
        Context:
    """
    start_time = time.time()
    
    filename = "data.txt"
    paragraphs = parse_file(filename)

    embeddings = get_embeddings(filename, "nomic-embed-text", paragraphs)



    
    prompt_embedding = ollama.embeddings(model="nomic-embed-text", prompt=prompt)["embedding"]
    most_similar_chunks = find_most_similar(prompt_embedding, embeddings)[:5]



    response = ollama.chat(
    model="llama3.1:latest" ,
    messages=[
        {
            "role": "system",
            "content": SYSTEM_PROMPT
            + "\n".join(paragraphs[item[1]] for item in most_similar_chunks),
        },
        {"role": "user", "content": prompt},
    ],
    )


    
    end_time = time.time()
    print("\n")
    print(response['message']['content'])
    print("\n")
    print(f"Execution Time {end_time-start_time} ")


In [14]:
#Strainght forward Question
prompt="Enlist the achievements of Sachin Tendulkar"
rag_function(prompt)



Here are the achievements of Sachin Tendulkar mentioned in the context:

* Hailed as one of the greatest batsmen in the history of cricket
* All-time highest run-scorer in both ODI and Test cricket with more than 18,000 runs and 15,000 runs respectively
* Holds the record for receiving the most player of the match awards in international cricket
* Member of Parliament, Rajya Sabha by presidential nomination from 2012 to 2018
* Received several awards from the government of India:
	+ Arjuna Award (1994)
	+ Khel Ratna Award (1997)
	+ Padma Shri (1998)
	+ Padma Vibhushan (2008)
* First sportsperson to receive the Bharat Ratna, India's highest civilian award
* Included in Time's annual list of the most influential people in the world in 2010
* Awarded the Sir Garfield Sobers Trophy for cricketer of the year at the 2010 International Cricket Council (ICC) Awards

He also achieved many milestones in his cricket career, including:

* Playing his 200th Test match in November 2013
* Represent

In [15]:
#Reasoning RAG

prompt="Who is better batsman Sachin or Sourav?"
rag_function(prompt)



This question cannot be definitively answered based on the provided context. The text mentions both Sachin Tendulkar and Sourav Ganguly as being considered among the greatest cricketers of all time, but it does not directly compare their batting abilities. 

However, it is worth noting that Sachin Tendulkar holds several records in cricket, including being the all-time highest run-scorer in both ODI and Test cricket.


Execution Time 56.934632778167725 


In [16]:
#Reasoning RAG

prompt="Who is better Captain Dhoni or Sourav?"
rag_function(prompt)




I don't know. The text does not compare the performances of MSDhoni and Sourav Ganguly as captains or provide a direct comparison of their leadership skills or records. However, it mentions that Dhoni has captained the most international matches and is the most successful Indian captain, having led India to victory in several ICC tournaments.


Execution Time 53.167704820632935 


In [17]:
#Analytical RAG

prompt="Difference between total runs scored by Sachin and Dhoni"
rag_function(prompt)



According to the context, Sachin Tendulkar has scored more than 18,000 runs in ODI cricket and over 15,000 runs in Test cricket.

Mahendra Singh Dhoni, on the other hand, has scored 17,266 runs in international cricket.

The difference between their total runs scored is: 

More than 18,000 (Sachin) - 17,266 (Dhoni) = approximately 734 runs


Execution Time 57.58685517311096 


In [18]:
#Analytical RAG

prompt="Who has most accolades Dhoni or sachin?"
rag_function(prompt)



Based on the provided context, it appears that both Dhoni and Sachin have a significant number of accolades. However, Dhoni seems to have more honors and awards:

* Dhoni was awarded India's highest sports honor, Major Dhyan Chand Khel Ratna Award (2008)
* He received the Padma Shri (2009) and Padma Bhushan (2018), which are among the top civilian awards in India
* He holds an honorary rank of Lieutenant Colonel in the Parachute Regiment of the Indian Territorial Army (2011)

In contrast, Sachin is described as "one of the greatest batsmen in the history of cricket" and has achieved many impressive records, but there is no mention of him receiving similar high-level honors or awards.

Therefore, based on this context, it seems that Dhoni may have more accolades, but it's essential to note that Sachin's achievements are also incredible, making him one of the greatest cricketers of all time.


Execution Time 73.0515239238739 


In [19]:
#Analytical RAG

prompt="Amongst three, who is the best player in World cups?"
rag_function(prompt)



Based on the provided context, it can be inferred that Sachin Tendulkar and MS Dhoni have been part of several World Cup-winning teams for India. However, only one team led by Mahendra Singh Dhoni has won the Cricket World Cup (2011), while Sachin Tendulkar was a member of the Indian squad but the team did not win in his presence during six appearances.

Sourav Ganguly is also mentioned as being part of the team that reached the final of the 2003 Cricket World Cup under his captaincy, however, India lost the match.


Execution Time 66.65720200538635 


In [20]:
#Analytical RAG

prompt="Amongst three, who has most awards?"
rag_function(prompt)



Based on the context provided, Sachin Tendulkar and Dhoni are not mentioned together for award comparisons. However, Ganguly is not in this category either.

But considering Dhoni's achievements, he received the Major Dhyan Chand Khel Ratna Award (2008), Padma Shri (2009), and Padma Bhushan (2018). 

Tendulkar, on the other hand, has received several awards from the government of India: the Arjuna Award (1994), the Khel Ratna Award (1997), the Padma Shri (1998), and the Padma Vibhushan (2008), along with the Bharat Ratna (2013).

It's not clear how many or what awards each person has received. However, based on the information provided, Tendulkar seems to have received more individual awards from the government of India.


Execution Time 64.5874879360199 


In [21]:
#Summary RAG

prompt="Give me summary of Dhoni's career, in less than 200 words?"
rag_function(prompt)



Mahendra Singh Dhoni is an Indian professional cricketer who played as a right-handed batter and wicket-keeper. He made his first-class debut for Bihar in 1999 and debuted for the Indian cricket team on December 23, 2004. Dhoni became the captain of the ODI side in 2007 and all formats by 2008, leading India to several victories including the 2011 Cricket World Cup and the Asia Cup.

Dhoni played for Chennai Super Kings in the IPL, leading them to the final on ten occasions and winning it five times. He also led CSK to two Champions League T20 titles. Dhoni retired from test cricket in 2014 but continued playing in limited overs cricket till 2019. He holds several records, including being the first wicketkeeper to score over 5,000 runs in the IPL.

Throughout his career, Dhoni received numerous awards, including the Major Dhyan Chand Khel Ratna Award and Padma Shri, India's highest sport honors.


Execution Time 74.0443868637085 


In [22]:
#Summary RAG

prompt="In about 150 words, give an overview of Sourav's career"
rag_function(prompt)



Sourav Ganguly is a former Indian cricketer and current commentator. He is widely regarded as one of India's most successful cricket captains. As captain, he led the Indian national team to win the 2002 ICC Champions Trophy and reach the final of the 2003 Cricket World Cup, among other notable achievements. His leadership skills earned him the nickname "Dada" (meaning "elder brother" in Bengali) and "Maharaja of Indian Cricket." He was known for his aggressive batting style and ability to motivate his team. After retiring from international cricket, Ganguly went on to become a successful commentator and has been involved in various cricket-related endeavors. Despite being retired from the game, he remains a beloved figure in Indian cricketing circles, admired for his passion and dedication to the sport.


Execution Time 73.98977732658386 


In [23]:
#Analytical RAG

prompt="Enlist the total runs scored by all three players"
rag_function(prompt)



Based on the context provided, here are the total runs scored by each player:

1. Sachin Tendulkar: He is the all-time highest run-scorer in both ODI and Test cricket with more than 18,000 runs in ODI and 15,000 runs in Test cricket.
2. Ganguly: He scored 11,363 runs in his ODI career which stands at ninth position in the world for most runs scored in ODI matches.
3. Dhoni: He has scored 17,266 runs in international cricket including 10,000 plus runs at an average of more than 50 in ODIs.

Let me add them up for you:

18,000 (Sachin's ODI runs) + 11,363 (Ganguly's ODI runs) + 17,266 (Dhoni's international runs) = 46,629

Note: The context doesn't provide the exact split of Dhoni's runs between ODIs and Tests, so I've used his total international runs to calculate the combined total.


Execution Time 72.41770458221436 
