## Structured Learning Agent using LangGraph
In this notebook, we will build a structured learning agent using LangGraph. The system will guide learners through a structured learning process of defined but customizable checkpoints. Verifying understanding at each step and providing feynman-style teaching when needed.

## Motivation
- Access to personalized 1:1 tutoring is expensive and not accessible to everyone.
- Provide individualized learning experience to each learner and feedback 24/7.
- Use own notes and web-retrieved content as context.
- Offer patient, simple explanations of complex topics.

## Key Components
1. Learning State Graph: Orchestrates the sequential learning workflow.
2. Checkpoint System: Defines structured learning milestones.
3. Web Search Integration: Dynamically retrives relevant learning materials.
4. Context Processing: Chunks and process learning materials.
5. Question Generation: Creates checkpoint-specific verification questions.
6. Understanding Verification: Evaluates learner comprehension with a clear threshold.
7. Feynman-style Teaching: Provides patient, simple explanations of complex topics.

## Method
The system follows a structured learning cycle.
1. Checkpoint Definition
* Generate sequential learning milestones with clear success criteria.

2. Context Building
* Processes student-provided materials or retrieves relevant web content.
3. Context Validation
* Validates context based on checkpoint criteria.
* Performs additional web searches if context doesn't meet criteria.
4. Embedding Storage
* Stores embeddings for retrieving only relevant chunks during verification.
5. Understanding Verification
* Generates checkpoint-specific questions.
* Evaluates learner's answers against correct answers.
* Provides clear feedback on understanding level.
6. Progressive Learning
* Advances to the next checkpoint when understanding is verified.
* Provides Feynman-style teaching when needed.

### Conclusion
This structured learning agent provides a personalized, 24/7 learning experience. It can be easily extended to include additional features such as progress tracking, personalized recommendations, and more.




## Requirements
#!pip install langchain-community langchain-openai langgraph pydantic python-dotenv semantic-chunkers semantic-router tavily-python

In [4]:
import os 
import operator
import uuid
from typing import Annotated, Dict, List, Optional, Tuple, TypedDict

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from IPython.display import Image, display
from langchain_community.utils.math import cosine_similarity
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.messages import HumanMessage, SystemMessage
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import END, StateGraph, START
from pydantic import BaseModel, Field
from dotenv import load_dotenv
from semantic_chunkers import StatisticalChunker
from semantic_router.encoders import OpenAIEncoder

  from .autonotebook import tqdm as notebook_tqdm


Setup
This agent is implemented using OpenAI's models, but can be used also with self-hosted LLM and embedding models.

In [5]:
load_dotenv()
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')
os.environ["TAVILY_API_KEY"] = os.getenv("TAVILY_API_KEY")

tavily_search = TavilySearchResults(max_results=3)
llm = ChatOpenAI(model="gpt-4o", temperature=0)
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

  tavily_search = TavilySearchResults(max_results=3)


## Data Models Definition
We will define data structures for our adaptive learning system using pydantic models. These models type safety and provide clear structure for:
* Learning goals and objectives
* Checkpoint definitions and tracking
* Search Queries for dynamic content.
* Verification of learning progres.
* Feynman teaching output format.
* Question generation.

Each model is designed to capture specific aspects of the learning process while maintaining type safety.

In [6]:
class Goals(BaseModel):
    """ Structure for defining learning goals."""
    goals: str = Field(None, description="Learning goals")

class LearningCheckpoint(BaseModel):
    """ Structure for a single checkpoint """
    description: str = Field(..., description="Main checkpoint description")
    criteria: List[str] = Field(...,description="List of success criteria")
    verification: str = Field(..., description="How to verify this checkpoint")

class Checkpoints(BaseModel):
    """ Main checkpoints container with index tracking """
    checkpoints: List[LearningCheckpoint] = Field(
        ...,
        description="List of checkpoints covering foundation, applicationm and mastery levels"
    )

class SearchQuery(BaseModel):
    """ Structure for search query collection"""
    search_queries: list = Field(None, description="Search queries for retrievel.")

class LearningVerification(BaseModel):
    """Structure for verification results"""
    understanding_level: float = Field(...,ge=0, le=1)
    feedback: str
    suggestions: List[str]
    context_alignment:bool

class FeynmanTeaching(BaseModel):
    """ Structure for feynman teaching Method """
    simplified_explanation: str
    KeyConcepts: List[str]
    analogies: List[str]
    
class QuestionOutput(BaseModel):
    """ Structure for question output """
    questions: str

class InContext(BaseModel):
    """ Structure for context verification """
    is_in_context: str = Field(..., description="Yes or No")

## Learning State Definition
* The Learning topic and goals
* Context and search results
* Current progress through checkpoints
* Verification results and teaching outputs
* Current question-answer pair

In [14]:
class LearningtState(BaseModel):
    topic: str
    goals: List[str]
    context: str
    context_chunks: Annotated[list, operator.add]
    context_key: str
    search_queries: SearchQuery
    checkpoints: Checkpoints
    verifications: LearningVerification
    teachings: FeynmanTeaching
    current_checkpoint: int
    current_question: QuestionOutput
    current_answer: str

## Helper Functions
The system uses three utility functions:
1. extract_content_from_chunks: Processes and combines text chunks into coherent content.
2. Format_checkpoints_as_message: Converts checkpoint data into prompt format
3. generate_checkpoint_message: Creates formatted message for context retrievel

In [8]:
def extract_content_from_chunks(chunks):
    """Extract and combine content from chunks with splits attribute.
    
    Args:
        chunks: List of chunk objects that may contain splits attribute
        
    Returns:
        str: Combined content from all chunks joined with newlines
    """
    content = []
    
    for chunk in chunks:
        if hasattr(chunk, 'splits') and chunk.splits:
            chunk_content = ' '.join(chunk.splits)
            content.append(chunk_content)
    
    return '\n'.join(content)

def format_checkpoints_as_message(checkpoints: Checkpoints) -> str:
    """Convert Checkpoints object to a formatted string for the message.
    
    Args:
        checkpoints (Checkpoints): Checkpoints object containing learning checkpoints
        
    Returns:
        str: Formatted string containing numbered checkpoints with descriptions and criteria
    """
    message = "Here are the learning checkpoints:\n\n"
    for i, checkpoint in enumerate(checkpoints.checkpoints, 1):
        message += f"Checkpoint {i}:\n"
        message += f"Description: {checkpoint.description}\n"
        message += "Success Criteria:\n"
        for criterion in checkpoint.criteria:
            message += f"- {criterion}\n"
    return message

def generate_checkpoint_message(checks: List[LearningCheckpoint]) -> HumanMessage:
    """Generate a formatted message for learning checkpoints that need context.
    
    Args:
        checks (List[LearningCheckpoint]): List of learning checkpoint objects
        
    Returns:
        HumanMessage: Formatted message containing checkpoint descriptions, criteria and 
                     verification methods, ready for context search
    """
    formatted_checks = []
    
    for check in checks:
        checkpoint_text = f"""
        Description: {check.description}
        Success Criteria:
        {chr(10).join(f'- {criterion}' for criterion in check.criteria)}
        Verification Method: {check.verification}
        """
        formatted_checks.append(checkpoint_text)
    
    all_checks = "\n---\n".join(formatted_checks)
    
    checkpoints_message = HumanMessage(content=f"""The following learning checkpoints need additional context:
        {all_checks}
        
        Please generate search queries to find relevant information.""")
    
    return checkpoints_message

## Prompt Configuration
Here we will define core instructions prompts for our LLM. Each Message serves a specific purpose in the learning process.
1. learning_checkpoint_generator: Creates structured learning milestones with clear criteria.
2. checkpoint_based_query_generator: Generates targeted search queries for content retrieval.
3. question_generator: Creates verification questions aligned with checkpoints.
4. answer_verifier: Evaluates learner responses against success criteria.
5. feyman_teacher: Crafts simplified explanations using Feynman technique.

In [9]:
learning_checkpoints_generator = SystemMessage(content="""You will be given a learning topic title and learning objectives.
Your goal is to generate clear learning checkpoints that will help verify understanding and progress through the topic.
The output should be in the following dictionary structure:
checkpoint 
-> description (level checkpoint description)
-> criteria
-> verification (How to verify this checkpoint (Feynman Methods))
Requirements for each checkpoint:
- Description should be clear and concise
- Criteria should be specific and measurable (3-5 items)
- Verification method should be practical and appropriate for the level
- Verification will be checked by language model, so it must by in natural language
- All elements should align with the learning objectives
- Use action verbs and clear language
Ensure all checkpoints progress logically from foundation to mastery.
IMPORTANT - ANSWER ONLY 3 CHECKPOINTS""")

checkpoint_based_query_generator = SystemMessage(content="""You will be given learning checkpoints for a topic.
Your goal is to generate search queries that will retrieve content matching each checkpoint's requirements from retrieval systems or web search.
Follow these steps:
1. Analyze each learning checkpoint carefully
2. For each checkpoint, generate ONE targeted search query that will retrieve:
   - Content for checkpoint verification""")

validate_context = SystemMessage(content="""You will be given a learning criteria and context.
Check if the the criteria could be answered using the context.
Always answer YES or NO""")

question_generator = SystemMessage(content="""You will be given a checkpoint description, success criteria, and verification method.
Your goal is to generate an appropriate question that aligns with the checkpoint's verification requirements.
The question should:
1. Follow the specified verification method
2. Cover all success criteria
3. Encourage demonstration of understanding
4. Be clear and specific
Output should be a single, well-formulated question that effectively tests the checkpoint's learning objectives.""")

answer_verifier = SystemMessage(content="""You will be given a student's answer, question, checkpoint details, and relevant context.
Your goal is to analyze the answer against the checkpoint criteria and provided context.
Analyze considering:
1. Alignment with verification method specified
2. Coverage of all success criteria
3. Use of relevant concepts from context
4. Depth and accuracy of understanding
Output should include:
- understanding_level: float between 0 and 1
- feedback: detailed explanation of the assessment
- suggestions: list of specific improvements
- context_alignment: boolean indicating if the answer aligns with provided context""")

feynman_teacher = SystemMessage(content="""You will be given verification results, checkpoint criteria, and learning context.
Your goal is to create a Feynman-style teaching explanation for concepts that need reinforcement.
The explanation should include:
1. Simplified explanation without technical jargon
2. Concrete, relatable analogies
3. Key concepts to remember
Output should follow the Feynman technique:
- simplified_explanation: clear, jargon-free explanation
- key_concepts: list of essential points
- analogies: list of relevant, concrete comparisons
Focus on making complex ideas accessible and memorable.""")

## Context Storage
The ContextStore class manages context chunks and embeddings in memory, optimizing token usage by allowing access to only relevant context during answer verification.

In [10]:
class ContextStore:
    """Store for managing context chunks and their embeddings in memory.
    
    A class that provides storage and retrieval of context data using an in-memory store.
    Each context entry consists of context chunks and their corresponding embeddings.
    """
    
    def __init__(self):
        """Initialize ContextStore with an empty in-memory store."""
        self.store = InMemoryStore()
        
    def save_context(self, context_chunks: list, embeddings: list, key: str = None):
        """Save context chunks and their embeddings to the store.
        
        Args:
            context_chunks (list): List of context chunk objects
            embeddings (list): List of corresponding embeddings for the chunks
            key (str, optional): Custom key for storing the context. Defaults to None,
                               in which case a UUID is generated.
            
        Returns:
            str: The key used to store the context
        """
        namespace = ("context",)
        
        if key is None:
            key = str(uuid.uuid4())
            
        value = {
            "chunks": context_chunks,
            "embeddings": embeddings
        }
        
        self.store.put(namespace, key, value)
        return key
        
    def get_context(self, context_key: str):
        """Retrieve context data from the store using a key.
        
        Args:
            context_key (str): The key used to store the context
            
        Returns:
            dict: The stored context value containing chunks and embeddings
        """
        namespace = ("context",)
        memory = self.store.get(namespace, context_key)
        return memory.value

### Core Learning System Functions
The learning system is powered by eight main functions that process and update the LearningState:

### Content Generation and Processing
1. generate_checkpoints: Creates learning milestones from topic and goals
2. generate_query: Formulates checkpoint-based search queries
3. search_web: Retrieves content via Tavilysearch
4. chunk_context: Segments learning materials
5. context_validation: Ensures context meets checkpoint requirements

### Learning Verification and Support
6. generate_question: Creates verification questions
7. verify_answer: Evaluates against checkpoint criteria
teach_concept: Provides Feynman-style explanations

In [15]:
def generate_query(state: LearningtState):
    """Generates search queries based on learning checkpoints from current state."""
    structured_llm = llm.with_structured_output(SearchQuery) 
    checkpoints_message = HumanMessage(content=format_checkpoints_as_message(state['checkpoints']))  
    messages = [checkpoint_based_query_generator, checkpoints_message]
    search_queries = structured_llm.invoke(messages)
    return {"search_queries": search_queries}

def search_web(state: LearningtState):
    """Retrieves and processes web search results based on search queries."""
    search_queries = state["search_queries"].search_queries
    
    all_search_docs = []
    for query in search_queries:
        search_docs = tavily_search.invoke(query)
        all_search_docs.extend(search_docs)
    
    formatted_search_docs = [
        f'Context: {doc["content"]}\n Source: {doc["url"]}\n'
        for doc in all_search_docs
    ]

    chunk_embeddings = embeddings.embed_documents(formatted_search_docs)
    context_key = context_store.save_context(
        formatted_search_docs,
        chunk_embeddings,
        key=state.get('context_key')
    )
    
    return {"context_chunks": formatted_search_docs}

def generate_checkpoints(state: LearningtState):
    """Creates learning checkpoints based on given topic and goals."""
    structured_llm = llm.with_structured_output(Checkpoints)
    messages = [
        learning_checkpoints_generator,
        SystemMessage(content=f"Topic: {state['topic']}"),
        SystemMessage(content=f"Goals: {', '.join(str(goal) for goal in state['goals'])}")
    ]
    checkpoints = structured_llm.invoke(messages)
    return {"checkpoints": checkpoints}

def chunk_context(state: LearningtState):
    """Splits context into manageable chunks and generates their embeddings."""
    encoder = OpenAIEncoder(name="text-embedding-3-large")
    chunker = StatisticalChunker(
        encoder=encoder,
        min_split_tokens=128,
        max_split_tokens=512
    )
    
    chunks = chunker([state['context']])
    content = []
    for chunk in chunks:
        content.append(extract_content_from_chunks(chunk))

    chunk_embeddings = embeddings.embed_documents(content)
    context_key = context_store.save_context(
        content,
        chunk_embeddings,
        key=state.get('context_key')
    )
    return {"context_chunks": content, "context_key": context_key}

def context_validation(state: LearningtState):
    """Validates context coverage against checkpoint criteria using stored embeddings."""
    context = context_store.get_context(state['context_key'])
    chunks = context['chunks']
    chunk_embeddings = context['embeddings']
    
    checks = []
    structured_llm = llm.with_structured_output(InContext)
    
    for checkpoint in state['checkpoints'].checkpoints:
        query = embeddings.embed_query(checkpoint.verification)
        
        similarities = cosine_similarity([query], chunk_embeddings)[0]
        top_3_indices = sorted(range(len(similarities)), 
                             key=lambda i: similarities[i], 
                             reverse=True)[:3]
        relevant_chunks = [chunks[i] for i in top_3_indices]
        
        messages = [
            validate_context,
            HumanMessage(content=f"""
            Criteria:
            {chr(10).join(f"- {c}" for c in checkpoint.criteria)}
            
            Context:
            {chr(10).join(relevant_chunks)}
            """)
        ]
        
        response = structured_llm.invoke(messages)
        if response.is_in_context.lower() == "no":
            checks.append(checkpoint)
    
    if checks:
        structured_llm = llm.with_structured_output(SearchQuery)
        checkpoints_message = generate_checkpoint_message(checks)
        
        messages = [checkpoint_based_query_generator, checkpoints_message]
        search_queries = structured_llm.invoke(messages)
        return {"search_queries": search_queries}
    
    return {"search_queries": None}

def generate_question(state: LearningtState):
    """Generates assessment questions based on current checkpoint verification requirements."""
    structured_llm = llm.with_structured_output(QuestionOutput)
    current_checkpoint = state['current_checkpoint']
    checkpoint_info = state['checkpoints'].checkpoints[current_checkpoint]
    
    messages = [
        question_generator,
        HumanMessage(content=f"""
        Checkpoint Description: {checkpoint_info.description}
        Success Criteria:
        {chr(10).join(f"- {c}" for c in checkpoint_info.criteria)}
        Verification Method: {checkpoint_info.verification}
        
        Generate an appropriate verification question.""")
    ]
    
    question_output = structured_llm.invoke(messages)
    return {"current_question": question_output.question}

def verify_answer(state: LearningtState):
    """Evaluates user answers against checkpoint criteria using relevant context chunks."""
    structured_llm = llm.with_structured_output(LearningVerification)
    current_checkpoint = state['current_checkpoint']
    checkpoint_info = state['checkpoints'].checkpoints[current_checkpoint]
    
    context = context_store.get_context(state['context_key'])
    chunks = context['chunks']
    chunk_embeddings = context['embeddings']
    
    query = embeddings.embed_query(checkpoint_info.verification)
    
    similarities = cosine_similarity([query], chunk_embeddings)[0]
    top_3_indices = sorted(range(len(similarities)), 
                         key=lambda i: similarities[i], 
                         reverse=True)[:3]
    relevant_chunks = [chunks[i] for i in top_3_indices]
    
    messages = [
        answer_verifier,
        HumanMessage(content=f"""
        Question: {state['current_question']}
        Answer: {state['current_answer']}
        
        Checkpoint Description: {checkpoint_info.description}
        Success Criteria:
        {chr(10).join(f"- {c}" for c in checkpoint_info.criteria)}
        Verification Method: {checkpoint_info.verification}
        
        Context:
        {chr(10).join(relevant_chunks)}
        
        Assess the answer.""")
    ]
    
    verification = structured_llm.invoke(messages)
    return {"verifications": verification}
    
def teach_concept(state: LearningtState):
    """Creates simplified Feynman-style explanations for concepts that need reinforcement."""
    structured_llm = llm.with_structured_output(FeynmanTeaching)
    current_checkpoint = state['current_checkpoint']
    checkpoint_info = state['checkpoints'].checkpoints[current_checkpoint]
    
    messages = [
        feynman_teacher,
        HumanMessage(content=f"""
        Criteria: {checkpoint_info.criteria}
        Verification: {state['verifications']}
        
        Context:
        {state['context_chunks']}
        
        Create a Feynman teaching explanation.""")
    ]
    
    teaching = structured_llm.invoke(messages)
    return {"teachings": teaching}

## Helper State Management Functions
Here we define two auxiliary functions that manage the learning flow:

1. user_answer: Placeholder for collecting user responses to verification questions
2. next_checkpoint: Increments the checkpoint counter to progress through learning milestones

In [16]:
def user_answer(state: LearningtState):
    """Placeholder for handling user's answer input."""
    pass

def next_checkpoint(state: LearningtState):
    """Advances to the next checkpoint in the learning sequence."""
    current_checkpoint = state['current_checkpoint'] + 1
    return {'current_checkpoint': current_checkpoint}