##### Install necessary packages for working with Pinecone, sentence embeddings, and data manipulation

In [1]:
!pip install pinecone-client sentence-transformers pandas
!pip install groq




[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


##### Install necessary packages for working with Pinecone, sentence embeddings, and data manipulation

In [2]:
import os
from pinecone import Pinecone, ServerlessSpec
from sentence_transformers import SentenceTransformer
import pandas as pd
from groq import Groq

  from tqdm.autonotebook import tqdm





The code initializes a connection to the Pinecone vector database and sets up or selects an index to store data. It configures the index if needed, specifying its size and how similarity is measured. Finally, it establishes a connection to the specific index and prints a confirmation message.

In essence, the code ensures the connection to Pinecone and sets up the designated area (index) where your data will be stored and accessed.

In [3]:
# Initialize Pinecone
pc = Pinecone(api_key="pcsk_6E9B6o_DHFJaybC7zzr4QT9i1tZo1vExTxji5j1syULud17p1HXAzrZPN7Zv4fs9H83L98")  # Replace with your Pinecone API key

index_name = "nyd"
if index_name not in pc.list_indexes().names():
    pc.create_index(
        name=index_name,
        dimension=384,        # Adjust based on your embedding dimension
        metric='cosine',      # Use cosine similarity
        spec=ServerlessSpec(
            cloud='aws',
            region='us-east-1'
        )
    )
index = pc.Index(index_name)
print(f"Connected to index: {index_name}")

Connected to index: nyd


In [4]:
# Load Sentence Transformer model
model = SentenceTransformer('all-MiniLM-L6-v2')

## this code uploads the dataset again into pinecone 
### we request you to not to run this cell


In [5]:
# Load and preprocess the dataset
df = pd.read_csv("total_dataset.csv")
df['Combined_Questions'] = df['question'].apply(lambda x: " ".join(x.split("?")).strip())
embeddings = model.encode(df['Combined_Questions'].tolist(), convert_to_numpy=True)

# Upload embeddings and metadata to Pinecone
for i, row in df.iterrows():
    metadata = {
        "answer": row['answer'],
        "chapter": row['chapter'],
        "verse": row['verse'],
        "sanskrit": row['sanskrit']
    }
    index.upsert([(str(i), embeddings[i].tolist(), metadata)])
print("Embeddings with metadata uploaded to Pinecone!")

Embeddings with metadata uploaded to Pinecone!


In [6]:
# Function to retrieve answers and metadata from Pinecone
def retrieve_answer(input_question, top_k=3):
    query_embedding = model.encode(input_question, convert_to_numpy=True)
    result = index.query(
        vector=query_embedding.tolist(),
        top_k=top_k,
        include_metadata=True
    )
    answers = []
    for match in result['matches']:
        metadata = match['metadata']
        answers.append({
            "score": match['score'],
            "answer": metadata['answer'],
            "chapter": metadata.get('chapter', 'N/A'),
            "verse": metadata.get('verse', 'N/A'),
            "sanskrit": metadata.get('sanskrit', 'N/A')
        })
    return answers

In [7]:
# Initialize Groq client for Llama
client = Groq(api_key="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")  # Replace with your Groq API key

In [8]:
def answer_query_from_llama(query):
    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": f"You are an assistant specialized in answering questions strictly based on the Bhagavad Gita and Patanjali Yoga Sutra. Provide the Bookname, chapter, verse, Sanskrit text, and a detailed answer to the following question: \n {query}.\n If the query is not related to it just give 'none' with no extra words."
            }
        ],
        model="llama-3.3-70b-versatile",
    )
    return chat_completion.choices[0].message.content

In [9]:
# Llama Query Refinement
def refine_query_with_llama(query, retrieved_info):
    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": f"You are an assistant specializing in refining queries for better retrieval only in Bhagavad Gita and Patanjali yoga sutra. "
                           f"Original query: '{query}'\n"
                           f"Retrieved information:\n{retrieved_info}\n"
                           "Refine the query to include specific details for improved results. "
                           "If the query is not related to Bhagavad Gita and Pantanjali yoga sutra return it unchaged"
                           "If the query is already precise, return it unchanged. Refined query:"
            }
        ],
        model="llama-3.3-70b-versatile",
    )
    return chat_completion.choices[0].message.content.strip()

In [10]:
# Llama Final Response Generation
def generate_final_response_with_llama(query, retrieved_info, llm_retrieved):
    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": f"You are an expert at combining information to generate detailed answers. "
                           f"Original query: '{query}'\n"
                           f"Retrieved information from semantic search:\n{retrieved_info}\n"
                           f"Retrieved information from Llama:\n{llm_retrieved}\n"
                            "Provide Bookname, chapter, verse, sanskrit, traslation if the query is directly belongs to sanskrit\n"
                            "If the query is not directly related to Bhagavad Gita and Pantanjali yoga sutra return 'This Question not directly related to Bhagavad Gita or Pantanjali yoga sutra' with no extra words"
                            "Don't say how you process this context in the answer"
                            "Using all the provided context, generate a complete, accurate, and concise answer."
                            "Provide the output in the following dictionary format within double quotes for keys and values: \n\n{{\n  Book_name' : <Book_name>,\n  'chapter': <chapter_number>,\n  'verse': <verse_number>,\n  'sanskrit': <sanskrit_text>,\n  'Translation' : <Translation>,\n  'answer': <enlarge_answer>\n}}\n\n"

            }
        ],
        model="llama-3.3-70b-versatile",
    )
    return chat_completion.choices[0].message.content.strip()

## generating answers in JSON

In [11]:
def final_resp(query_test):
  user_query =query_test

  # Step 1: Retrieve answer from Pinecone
  semantic_results = retrieve_answer(user_query, top_k=3)

  retrieved_info = "\n".join([
      f"Score: {item['score']}, Answer: {item['answer']}, Chapter: {item['chapter']}, Verse: {item['verse']}, Sanskrit: {item['sanskrit']}"
      for item in semantic_results
  ])

  # Step 2: Retrieve answer from Llama
  llm_result = answer_query_from_llama(user_query)

  # Step 3: Refine the query with Llama
  refined_query = refine_query_with_llama(user_query, retrieved_info)

  # Step 4: Generate final response using Llama
  final_response = generate_final_response_with_llama(refined_query, retrieved_info, llm_result)

  # Display the results
  print("=====================================================")
  print("Semantic Search Results:")
  print(retrieved_info)
  print("-----------------------------------------------------")
  print(llm_result)
  print("=====================================================")
  print(f"Refined Query: {refined_query}")
  print("-----------------------------------------------------")
  print(f"Final Response:\n{final_response}")
  print("=====================================================")
  return final_response


In [15]:
import csv
import json
import os

query_test = input("Enter the Question: ")
# Get the response from the final_resp function
final_dic = final_resp(query_test)
output_folder = "output_folder"

try:
  # Convert the JSON string to a Python dictionary
  parsed_dict = json.loads(final_dic)


# Save the parsed dictionary as a JSON file
  json_filename = f"{output_folder}/output_.json"
  with open(json_filename, "w", encoding="utf-8") as json_file:
      json.dump(parsed_dict, json_file, ensure_ascii=False, indent=4)

  print(f"Saved JSON file for this Question : {json_filename}")
except json.JSONDecodeError as e:
  print(f"Error decoding JSON for this Question : {e}")


Semantic Search Results:
Score: 0.529354334, Answer: The origin and destruction of beings have been heard in detail from You, O lotus-eyed Lord, and also Your inexhaustible greatness., Chapter: 11.0, Verse: 2.0, Sanskrit: भवाप्ययौ हि भूतानां श्रुतौ विस्तरशो मया| त्वत्तः कमलपत्राक्ष माहात्म्यमपि चाव्ययम् || 11.2 || 
Score: 0.519790769, Answer: The Blessed Lord said, "O Arjuna, hear how you shall, without doubt, know Me fully, with your mind intent on Me, practicing Yoga and taking refuge in Me.", Chapter: 7.0, Verse: 1.0, Sanskrit: मय्यासक्तमनाः पार्थ योगं युञ्जन्मदाश्रयः| असंशयं समग्रं मां यथा ज्ञास्यसि तच्छृणु || 7.1 || 
Score: 0.513937712, Answer: I will declare to you in full this knowledge combined with realization, after knowing which nothing else remains to be known here., Chapter: 7.0, Verse: 2.0, Sanskrit: ज्ञानं तेऽहं सविज्ञानमिदं वक्ष्याम्यशेषतः| यज्ज्ञात्वा नेह भूयोऽन्यज्ज्ञातव्यमवशिष्यते || 7.2 || 
-----------------------------------------------------
Bookname: Bhagavad Git