# RAG Tutorial with Groclake
Description:
This complete, end-to-end tutorial demonstrates how to create an Agentic Retrieval-Augmented Generation (RAG) system using Groclake. The process involves managing documents in DataLake, generating vectors for documents, performing vector searches, enriching search results, and utilizing ModelLake to provide contextual, AI-assisted responses. Each step is designed to showcase the capabilities of Groclake in creating a fully functional Agentic RAG system.

Groclake Documentation: https://plotch-ai.gitbook.io/groclake-by-plotch.ai

Vectorcake is a vector centric infrastructure allowing developers to create embedding vectors quickly, store vectors and build useful RAG applications.

Datalake is a data warehouse for storing various types structured and unstructured documents and records. Using Datalake, developers can store pdfs, word documents, excel sheets, google sheets, texts etc for RAG based applications.

Modelake is an infrastructure pipe for LLM based operations like chat completions, language translations, automatic speech recognition, text to speech, speech to text and speech to speech operations

# Step 1: Install the Required Library
First, install the groclake library, which will be used for managing data, vectors, and models

In [None]:
!pip install groclake

Collecting groclake
  Downloading groclake-0.1.14-py3-none-any.whl.metadata (83 bytes)
Downloading groclake-0.1.14-py3-none-any.whl (10.0 kB)
Installing collected packages: groclake
Successfully installed groclake-0.1.14


#Step 2: Set Environment Variables
Set up the API key and account ID for authenticating with the Groclake service. These are stored as environment variables to simplify access throughout the script.

In [None]:
import os

# Set API key and account ID
GROCLAKE_API_KEY = 'fe9fc289c3ff0af142b6d3bead98a923'
GROCLAKE_ACCOUNT_ID = '31df8ac36812112e6bc5ff0ad0daf847'

# Set them as environment variables
os.environ['GROCLAKE_API_KEY'] = GROCLAKE_API_KEY
os.environ['GROCLAKE_ACCOUNT_ID'] = GROCLAKE_ACCOUNT_ID

print("Environment variables set successfully.")


Environment variables set successfully.


In [None]:
import os
from groclake.vectorlake import VectorLake
from groclake.datalake import DataLake
from groclake.modellake import ModelLake
import random

class SarcasticQuizBot:
    def __init__(self):
        # Initialize API credentials
        self.GROCLAKE_API_KEY = '93db85ed909c13838ff95ccfa94cebd9'
        self.GROCLAKE_ACCOUNT_ID = '89ff7fa5adc705887aa8186792153342'
        self.setup_environment()
        self.initialize_lakes()

        # Sarcastic responses for wrong answers
        self.roasts = [
            "Wow, that's impressively wrong! Did you even read the question?",
            "Oh honey... Maybe try reading a book sometime?",
            "That's about as correct as a chocolate teapot is useful.",
            "Amazing! Every word in that answer was wrong.",
            "Did you pick that answer with your eyes closed?",
            "I'm not saying you're wrong, but... actually, yes, I am saying that.",
            "Congratulations! That's the wrongest answer I've seen all day!",
            "Even a random guess would have been better. Impressive!"
        ]

        # Add some sarcastic score comments
        self.score_comments = {
            0: "Wow, a perfect zero! That's actually harder than getting them all right!",
            1: "One right answer... did you get lucky or was that your one brain cell working overtime?",
            2: "Two correct! Your knowledge is as shallow as a puddle in the desert.",
            3: "Three right! You're climbing up from 'totally clueless' to just 'mostly clueless'.",
            4: "Four correct. Are you even trying, or is this your natural talent?",
            5: "Half right! Perfectly balanced between knowledge and ignorance.",
            6: "Six correct! Starting to show signs of intelligence... barely.",
            7: "Seven! Not bad... for a beginner who studied in their sleep.",
            8: "Eight right! Almost impressive, if I had lower standards.",
            9: "Nine correct! So close to perfection, yet so far.",
            10: "Perfect score! Who helped you? I refuse to believe you did this alone."
        }

    def setup_environment(self):
        os.environ['GROCLAKE_API_KEY'] = self.GROCLAKE_API_KEY
        os.environ['GROCLAKE_ACCOUNT_ID'] = self.GROCLAKE_ACCOUNT_ID

    def initialize_lakes(self):
        try:
            self.vectorlake = VectorLake()
            vector_create = self.vectorlake.create()
            self.vectorlake_id = vector_create["vectorlake_id"]

            self.datalake = DataLake()
            datalake_create = self.datalake.create()
            self.datalake_id = datalake_create["datalake_id"]

            print("Lakes initialized successfully!")
        except Exception as e:
            print(f"Failed to initialize lakes: {str(e)}")
            raise

    def push_document(self, document_url):
        try:
            payload_push = {
                "datalake_id": self.datalake_id,
                "document_type": "url",
                "document_data": document_url
            }
            data_push = self.datalake.push(payload_push)
            return data_push["document_id"]
        except Exception as e:
            print(f"Error pushing document: {str(e)}")
            raise

    def generate_question(self, topic, question_number):
        try:
            # Modified prompt to emphasize the need for exactly 4 options
            prompt = (
                f"Generate a challenging multiple choice question #{question_number} about {topic}. "
                "The question MUST have EXACTLY 4 options labeled A, B, C, and D. "
                "Make sure it's different from previous questions.\n\n"
                "Format your response exactly like this:\n"
                "Question: [Your question here]\n"
                "A) [First option]\n"
                "B) [Second option]\n"
                "C) [Third option]\n"
                "D) [Fourth option]\n"
                "Correct: [A, B, C, or D]"
            )

            payload = {
                "messages": [
                    {
                        "role": "system",
                        "content": "You are a quiz bot that generates multiple-choice questions. "
                                 "Always provide exactly 4 options (A, B, C, D) for each question."
                    },
                    {"role": "user", "content": prompt}
                ],
                "token_size": 7000
            }

            response = ModelLake().chat_complete(payload)
            question_data = self._parse_question(response["answer"])

            # Validate the parsed question has exactly 4 options
            if len(question_data["options"]) != 4:
                # If we don't get 4 options, generate default ones
                question_data["options"] = [
                    f"A) Option 1 for {topic}",
                    f"B) Option 2 for {topic}",
                    f"C) Option 3 for {topic}",
                    f"D) Option 4 for {topic}"
                ]
                question_data["correct_answer"] = "A"

            return question_data

        except Exception as e:
            print(f"Error generating question: {str(e)}")
            raise

    def _parse_question(self, response):
        # Split the response into lines and remove empty lines
        lines = [line.strip() for line in response.split("\n") if line.strip()]

        # Extract question (handling both with and without "Question:" prefix)
        question = lines[0]
        if question.startswith("Question:"):
            question = question[9:].strip()

        # Extract options ensuring we get exactly 4
        options = []
        for line in lines[1:5]:  # Look at next 4 lines after question
            if line.startswith(("A)", "B)", "C)", "D)")):
                options.append(line)

        # Extract correct answer
        correct_answer = None
        for line in lines:
            if line.startswith("Correct:"):
                correct_answer = line.split(":")[1].strip()
                break

        # Validate correct answer is one of A, B, C, or D
        if correct_answer not in ["A", "B", "C", "D"]:
            correct_answer = "A"  # Default to A if invalid

        return {
            "question": question,
            "options": options,
            "correct_answer": correct_answer
        }

    def run_quiz(self, topic, num_questions=10):
        print(f"\nAlright, prepare to be humiliated in {num_questions} questions about {topic}!")
        print("=" * 50)

        correct_answers = 0

        for i in range(num_questions):
            print(f"\nQuestion {i+1} of {num_questions}")
            print("-" * 30)

            question_data = self.generate_question(topic, i+1)
            print(question_data["question"])
            for option in question_data["options"]:
                print(option)

            while True:
                answer = input("\nYour answer (A/B/C/D): ").upper()
                if answer in ['A', 'B', 'C', 'D']:
                    break
                print("Really? It's not that complicated. Just pick A, B, C, or D!")

            if answer == question_data["correct_answer"]:
                print("\nCorrect! Who would've thought you had it in you!")
                correct_answers += 1
            else:
                roast = random.choice(self.roasts)
                print(f"\n{roast}")
                print(f"The correct answer was {question_data['correct_answer']}.")

            print(f"\nCurrent score: {correct_answers}/{i+1}")

        # Final score and comment
        final_score = correct_answers
        print("\n" + "=" * 50)
        print(f"\nFinal Score: {final_score}/{num_questions}")
        print(self.score_comments[final_score])

        if final_score < 5:
            print("Maybe try reading a book... or at least the back of a cereal box?")
        elif final_score < 8:
            print("Not terrible, but not good either. Story of your life?")
        else:
            print("I hate to admit it, but you might actually know something about this topic.")

def main():
    quiz_bot = SarcasticQuizBot()

    # Example usage
    document_url = "https://drive.google.com/uc?id=1PnGGUo9vpwyKpQe1lUW1N4An9l39xf9I"
    quiz_bot.push_document(document_url)

    while True:
        topic = input("\nWhat topic would you like to be roasted about? (or 'quit' to exit): ")
        if topic.lower() == 'quit':
            break

        # Run a 10-question quiz on the chosen topic
        quiz_bot.run_quiz(topic)

        play_again = input("\nWant another round of humiliation? (yes/no): ").lower()
        if play_again != 'yes':
            print("\nProbably for the best. Your ego couldn't take much more anyway.")
            break

if __name__ == "__main__":
    main()

Lakes initialized successfully!

What topic would you like to be roasted about? (or 'quit' to exit): kite

Alright, prepare to be humiliated in 10 questions about kite!

Question 1 of 10
------------------------------
In aerodynamics, what is the term used to describe the force that allows a kite to lift off and stay in the air?
A) Gravity
B) Friction
C) Lift
D) Thrust


KeyboardInterrupt: Interrupted by user

# Step 3: Initialize VectorLake and DataLake
Create instances of VectorLake and DataLake. These are core components for managing vectors and data