## Preparing Work Enviroment

In [None]:
!pip install pypdf langchain langchain_google_genai langchain_community

Collecting pypdf
  Downloading pypdf-5.0.0-py3-none-any.whl.metadata (7.4 kB)
Collecting langchain
  Downloading langchain-0.3.0-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain_google_genai
  Downloading langchain_google_genai-2.0.0-py3-none-any.whl.metadata (3.9 kB)
Collecting langchain_community
  Downloading langchain_community-0.3.0-py3-none-any.whl.metadata (2.8 kB)
Collecting langchain-core<0.4.0,>=0.3.0 (from langchain)
  Downloading langchain_core-0.3.1-py3-none-any.whl.metadata (6.2 kB)
Collecting langchain-text-splitters<0.4.0,>=0.3.0 (from langchain)
  Downloading langchain_text_splitters-0.3.0-py3-none-any.whl.metadata (2.3 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain)
  Downloading langsmith-0.1.121-py3-none-any.whl.metadata (13 kB)
Collecting tenacity!=8.4.0,<9.0.0,>=8.1.0 (from langchain)
  Downloading tenacity-8.5.0-py3-none-any.whl.metadata (1.2 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.

## Loading Libraries

In [None]:
import os
from typing import List, Union
from pypdf import PdfReader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import PromptTemplate
from langchain_google_genai import GoogleGenerativeAI
from langchain.chains import LLMChain
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field

In [None]:
# setting up api key
from google.colab import userdata
import os
os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY')

# initialize google model \
llm = GoogleGenerativeAI(model="gemini-pro", temperature=0.7)

## PDF Preprocessing

In [None]:
def get_pdf_text(pdf_docs):
    pdf_text = ""
    for pdf in pdf_docs:
        pdf_reader = PdfReader(pdf)
        for page in pdf_reader.pages:
            pdf_text += page.extract_text()
    return pdf_text

#test
pdf_docs = ["/content/Introduction to Networking.pdf"]
get_pdf_text(pdf_docs)

"What\nis\nNetworking?\nDefinition\n:\nNetworking,\nin\nthe\ncontext\nof\ninformation\ntechnology ,\nrefers\nto\nthe\npractice\nof \nconnecting\ncomputers\nand\nother\ndevices\nto\nshare\nresources,\nexchange\ndata,\nand\ncommunicate \nwith\none\nanother .\nThese\nconnections\ncan\nbe\nmade\nvia\nwired\nor\nwireless\nmeans,\nand\nthey\ncan\nspan \nvarious\ngeographical\nareas,\nfrom\nsmall\nlocal\nnetworks\nto\nvast\nglobal\nsystems\nlike\nthe\nInternet.\nPurpose\n:\nThe\nprimary\npurpose\nof\nnetworking\nis\nto\nfacilitate\nresource\nsharing,\ncommunication, \nand\ndata\nmanagement\nacross\nmultiple\ndevices.\nNetworking\nenables\nusers\nto\naccess\nand\nshare \nresources\nsuch\nas\nfiles,\nprinters,\nand\ninternet\nconnections,\nimproving\nefficiency\nand \ncollaboration.\nKey\nComponents\n:\n1.\nHardwar e\n: \n○\nComputers\nand\nDevices\n:\nEndpoints\nin\nthe\nnetwork,\nsuch\nas\nPCs,\nservers, \nsmartphones,\nand\nIoT\ndevices. \n○\nNetworking\nDevices\n:\nRouters,\nswitches,\nmode

## Creating Text Chunks

In [None]:
def get_text_chunks(text):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=50
    )
    chunks = text_splitter.split_text(text)
    return chunks

#test
pdf_text = get_pdf_text(pdf_docs)
print(len(get_text_chunks(pdf_text)))

32


## Choosing a Base Output Parser
### Working with Pydantic

In [None]:
class QuizQuestion(BaseModel):
    question: str
    options: Union[List[str], None] = Field(description="List of options for multiple choice questions, None for True/False")
    correct_answer: str = Field(description="A, B, C, or D for multiple choice; A or B for true/false questions")
    explanation: str
    difficulty: str = Field(default="Medium", description="Easy, Medium, or Hard")

class Quiz(BaseModel):
    questions: List[QuizQuestion]

In [None]:
def generate_questions_from_pdf(pdf_text, num_questions, quiz_type, quiz_context):
    output_parser = PydanticOutputParser(pydantic_object=Quiz)

    prompt_template = PromptTemplate(
        template=f"""
You are an AI-powered quiz generator. Your task is to create a quiz based on the following parameters:

Number of questions: {{num_questions}}
Quiz type: {{quiz_type}}
Topic/Context: {{quiz_context}}

Guidelines:
1. Generate ONLY {{quiz_type}} questions based on the content of the uploaded PDF.
2. Ensure all questions are related to the specified topic/context: {{quiz_context}}
3. Provide a blend of difficulty levels: 20% Easy, 20% Medium, and 60% Hard questions.
4. For each question, provide a detailed explanation of the correct answer.

The format of the quiz should be as follows:

questions:
  - question: <Question text>
    options:
      - <Option A>
      - <Option B>
      - <Option C>
      - <Option D>
    correct_answer: <A, B, C, or D>
    explanation: <Detailed explanation for the correct answer>

For True/False questions, use this format:
  - question: <Question text>
    options:
      - <True>
      - <False>
    correct_answer: <A for True, B for False>
    explanation: <Detailed explanation for the correct answer>

Example for Multiple Choice:
questions:
  - question: "What is the time complexity of a binary search tree?"
    options:
      - "O(n)"
      - "O(log n)"
      - "O(n^2)"
      - "O(1)"
    correct_answer: "B"
    explanation: "The time complexity of a binary search tree is O(log n). This is because in a balanced binary search tree, each comparison allows the operations to skip about half of the tree, so it takes about log2 n comparisons to find an element, or to insert a new element. This is much faster than the linear time (O(n)) required to find elements by key in an unsorted array, but slower than the constant time (O(1)) of hash tables."

Example for True/False:
questions:
  - question: "A binary search tree is always balanced."
    options:
      - "True"
      - "False"
    correct_answer: "B"
    explanation: "This statement is false. A binary search tree is not always balanced. While balanced binary search trees (like AVL trees or Red-Black trees) exist, a standard binary search tree can become unbalanced depending on the order of insertions and deletions. An unbalanced tree can degrade to a linked list in the worst case, losing the logarithmic time complexity advantage for operations."


Difficulty Level Guidelines:
Hard (60% of questions):
Multiple Choice Questions:
   - Ensure distractors (wrong answers) are plausible and based on common misconceptions or errors in understanding.
   - Include answers that require higher-order thinking, such as application of concepts or analysis of information.
   - Consider using "All of the above" or "None of the above" options strategically.
   - For language or writing-related questions, include answers with subtle grammatical or stylistic differences.

True/False Questions:
   - Include statements that require deep understanding of nuances or exceptions to rules.
   - Use complex sentences that combine true and false elements to test careful reading and comprehension.
   - Incorporate statements that challenge common assumptions or misconceptions in the field.

Medium (20% of questions):
Multiple Choice Questions:
   - Include distractors that are plausible but distinguishable from the correct answer with careful thought.
   - Test application of concepts rather than just recall, but avoid overly complex scenarios.
   - Use clear, unambiguous language in both the question stem and answer choices.
   - Occasionally include "All of the above" or "None of the above" options, but not too frequently.

True/False Questions:
   - Create statements that require more than surface-level knowledge to evaluate.
   - Include some statements that have qualifiers (e.g., "always," "never," "sometimes") to test for exceptions.
   - Balance the number of true and false statements.

Easy (20% of questions):
Multiple Choice Questions:
   - Use straightforward language in both the question stem and answer choices.
   - Test basic recall of key concepts, definitions, or facts.
   - Make the correct answer clearly distinguishable from the distractors.
   - Limit the number of answer choices to 3-4 options.

True/False Questions:
   - Create clear, unambiguous statements about fundamental course concepts.
   - Avoid using absolutes like "always" or "never" unless they are definitively true or false.
   - Focus on testing recall of key facts or basic understanding of concepts.

Explanation Guidelines:
1. Provide a clear and concise explanation for why the correct answer is right.
2. If applicable, briefly explain why the other options are incorrect.
3. Include relevant facts, definitions, or concepts from the source material.
4. For harder questions, explain the reasoning or steps to arrive at the correct answer.
5. If the question involves calculations, show the key steps or formulas used.
6. Relate the explanation to the broader context or topic when appropriate.
7. Use simple language and avoid jargon unless it's essential to the subject matter.

Use the following text as context for generating questions, but only if it's relevant to {{quiz_context}}:
{{pdf_text}}

{{format_instructions}}
        """,
        input_variables=["num_questions", "quiz_type", "quiz_context", "pdf_text"],
        partial_variables={"format_instructions": output_parser.get_format_instructions()}
    )

   # Create the LLMChain
    llm_chain = LLMChain(llm=llm, prompt=prompt_template)

    try:
        # Run the chain with all required inputs
        result = llm_chain.invoke({
            "num_questions": num_questions,
            "quiz_type": quiz_type,
            "quiz_context": quiz_context,
            "pdf_text": pdf_text
        })

        # Check if 'text' key exists in the result
        if 'text' not in result:
            print("Error: 'text' key not found in the LLM response.")
            print("Full LLM response:", result)
            return None

        # Try to parse the output
        parsed_output = output_parser.parse(result['text'])
        return parsed_output

    except Exception as e:
        print(f"Error generating or parsing quiz: {str(e)}")
        if 'result' in locals():
            print("\nRaw output from the model:")
            print(result.get('text', 'No text output available'))
        return None

In [None]:

def run_quiz(quiz: Quiz):
    score = 0
    total_questions = len(quiz.questions)

    for i, question in enumerate(quiz.questions, 1):
        print(f"\nQuestion {i} of {total_questions}:")
        print(question.question)

        if question.options:  # Multiple choice question
            for j, option in enumerate(question.options):
                print(f"{chr(97 + j)}. {option}")

            user_answer = input("Your answer (a, b, c, or d): ").lower().strip()
            while user_answer not in ['a', 'b', 'c', 'd']:
                user_answer = input("Invalid input. Please enter a, b, c, or d: ").lower().strip()

        else:  # True/False question
            user_answer = input("Your answer (a or b): ").lower().strip()
            while user_answer not in ['a', 'b']:
                user_answer = input("Invalid input. Please enter a or b: ").lower().strip()


        if user_answer == question.correct_answer.lower():
            print("Correct!")
            score += 1
        else:
            print(f"Incorrect. The correct answer is: {question.correct_answer}")

        print("\nExplanation:")
        print(question.explanation)

        input("\nPress Enter to continue...")
        # in the app I want the user to be able to click a "Next"

    print(f"\nQuiz completed! Your score: {score}/{total_questions}")
    percentage = (score / total_questions) * 100
    print(f"Percentage: {percentage:.2f}%")
    "..............................................."
    # generating remarks on performance
    if percentage >= 90:
        print("Excellent job! You might just be ready afterall😜")
    elif percentage >= 70:
        print("Good work!Less mistakes...🫡")
    elif percentage >= 50:
        print("Not bad, but there's room for improvement.")
    else:
        print(" We're so far from calling it a day, you might want to review the material and try again. Also review the correct answers and explanations to every question you got wrong, Try evaluating where you made mistakes. 🙁")


In [None]:
import re

def preprocess_text(text):
    # Convert to lowercase
    text = text.lower()
    # Replace multiple spaces with a single space
    text = re.sub(r'\s+', ' ', text)
    # Normalize hyphenated words to check both "fine-tuning" and "fine tuning"
    text = re.sub(r'-', ' ', text)
    return text

def validate_concepts_in_pdf(concepts, pdf_text):
    # Check if any of the provided concepts exist in the preprocessed PDF text
    for concept in concepts:
        if concept in pdf_text:
            return True
    return False

def main():
    while True:
        # Get PDF file path
        pdf_path = input("Enter the path to your PDF file: ")
        pdf_text = get_pdf_text([pdf_path])

        # Preprocess the PDF text
        pdf_text = preprocess_text(pdf_text)
        print(f"PDF text length: {len(pdf_text)} characters")

        while True:
            # Get quiz parameters
            num_questions = int(input("Enter the number of questions for the quiz: "))
            quiz_type = input("Enter the quiz type (multiple-choice or true-false): ").lower()

            # Validate quiz type
            while quiz_type not in ['multiple-choice', 'true-false']:
                quiz_type = input("Invalid input. Please enter 'multiple-choice' or 'true-false': ").lower()

            # Enter multiple quiz contexts separated by a comma
            quiz_context_input = input("Enter the topic(s) or context(s) for the quiz, separated by a comma: ")

            # Preprocess quiz context(s)
            quiz_context_preprocessed = [preprocess_text(concept.strip()) for concept in quiz_context_input.split(',')]

            # Validate if any context exists in the PDF
            while not validate_concepts_in_pdf(quiz_context_preprocessed, pdf_text):
                quiz_context_input = input("None of the provided concepts are in your uploaded PDF. Please enter valid topic(s)/context(s), separated by a comma: ")
                quiz_context_preprocessed = [preprocess_text(concept.strip()) for concept in quiz_context_input.split(',')]

            # Generate quiz
            print("\nGenerating quiz with the following parameters:")
            print(f"Quiz context: {quiz_context_input}")
            print(f"Quiz type: {quiz_type}")
            print(f"Number of questions: {num_questions}")

            # Generate questions using the first valid concept found in the PDF
            selected_context = next(concept for concept in quiz_context_preprocessed if concept in pdf_text)
            generated_quiz = generate_questions_from_pdf(pdf_text, num_questions, quiz_type, selected_context)

            # Run quiz in a loop
            while True:
                run_quiz(generated_quiz)

                print("\nWhat would you like to do next?")
                print("1. Retry this quiz")
                print("2. Generate a new quiz")
                print("3. Exit")

                choice = input("Enter your choice (1/2/3): ").strip()
                while choice not in ['1', '2', '3']:
                    choice = input("Invalid input. Please enter 1, 2, or 3: ").strip()

                if choice == '1':
                    print("Retrying the current quiz...")
                    continue
                elif choice == '2':
                    print("Generating a new quiz...")
                    break
                else:
                    # Exit and ask if the user wants to load another PDF
                    another_pdf = input("Would you like to load another PDF? (yes/no): ").lower().strip()
                    if another_pdf == 'yes':
                        print("Loading another PDF...")
                        break  # Go back to loading another PDF
                    else:
                        print("Thank you for using the LLM adaptive quiz generator to aid your study!")
                        return  # Exit the program

            if choice == '2':
                continue  # Generate a new quiz with the same PDF
            else:
                break  # Exit the inner loop to load another PDF

if __name__ == "__main__":
    main()


Enter the path to your PDF file: /content/Introduction to Networking.pdf
PDF text length: 29430 characters
Enter the number of questions for the quiz: 4
Enter the quiz type (multiple-choice or true-false): CC
Invalid input. Please enter 'multiple-choice' or 'true-false': MULTIPLE-CHOICE
Enter the topic(s) or context(s) for the quiz, separated by a comma: networking

Generating quiz with the following parameters:
Quiz context: networking
Quiz type: multiple-choice
Number of questions: 4


  llm_chain = LLMChain(llm=llm, prompt=prompt_template)



Question 1 of 4:
Which of the following is NOT a primary component of a network?
a. Hardware
b. Software
c. Data
d. Network protocols
Your answer (a, b, c, or d): d
Incorrect. The correct answer is: C

Explanation:
Data is not a primary component of a network. Networks are composed of hardware, software, and network protocols, which facilitate data transmission and communication.

Press Enter to continue...Enter

Question 2 of 4:
What type of network topology provides high redundancy and reliability?
a. Star topology
b. Bus topology
c. Ring topology
d. Mesh topology
Your answer (a, b, c, or d): d
Correct!

Explanation:
Mesh topology offers high redundancy and reliability because multiple paths exist for data to travel between any two points. This design ensures that data can be rerouted through other available paths if one link fails.

Press Enter to continue...Enter

Question 3 of 4:
Which network protocol is responsible for transferring web pages on the internet?
a. TCP/IP
b. HTTP/H