# Memory: A Deep Dive: Part I (Semi Aunonomous Schema Generation)


## Overview
This code implements a semi-autonomous memory schema generation system that combines Large Language Models (LLMs) and graph-based workflows to create structured memory representations. The system is built on LangGraph and uses GPT-4 mini to generate memory schemas across four fundamental memory types: procedural, semantic, episodic, and prospective. Each generated schema follows a hierarchical structure (type → label → description) and can be iteratively refined through user interaction.

The implementation demonstrates a practical approach to automated knowledge structuring while maintaining human oversight. It showcases how modern AI tools can be used to systematically generate complex memory structures while allowing for human guidance and quality control in the generation process.

## Motivation
Several key factors motivate this implementation:

1. **Knowledge Structure Automation**
   - Manual creation of memory schemas is time-consuming and often inconsistent
   - Automated generation can significantly speed up the initial structuring of knowledge
   - LLMs can provide sophisticated, context-aware schema suggestions

2. **Flexible Control Flow**
   - The need for systems that balance automation with human oversight
   - Importance of iterative refinement in knowledge structure development
   - Value of maintaining schema history and evolution

3. **Standardized Memory Organization**
   - Need for consistent categorization across different types of memories
   - Importance of structured approaches to knowledge representation
   - Benefits of using established memory type categories (procedural, semantic, episodic, prospective)

4. **Interactive Development**
   - Recognition that initial automated outputs may need refinement
   - Value of human expertise in guiding schema development
   - Importance of flexible iteration in knowledge structure development

This implementation addresses these motivations by providing a framework that combines automated generation with human oversight, allowing for systematic development of memory schemas while maintaining quality control and flexibility.

## Key Components
1. **State Management System**: 
   - Maintains the workflow state using Pydantic models
   - Tracks the initial idea, generated memory schemas, and update instructions
   - Ensures type safety and data validation

2. **LLM Integration**: 
   - Leverages OpenAI's GPT-4 mini model
   - Generates structured memory schemas based on four memory types:
     - Episodic: For experience-based memories
     - Semantic: For factual knowledge
     - Procedural: For process-related information
     - Prospective: For future-oriented planning

3. **Graph-based Workflow**: 
   - Uses LangGraph's StateGraph for process orchestration
   - Defines clear node functions for schema generation and updates
   - Implements conditional logic for workflow control

4. **Interactive Feedback Loop**:
   - Allows user intervention in the schema generation process
   - Supports iterative refinement through user instructions
   - Provides flexibility to end or continue the generation process


## Method
The system follows this iterative approach:

1. **Initialization**:
   - Takes an initial idea as input
   - Sets up the state management system
   - Initializes the LLM client

2. **Schema Generation**:
   - Generates structured memory schemas using the LLM
   - Follows a specific format: memory_type → memory_label → memory_description
   - Maintains previous iterations in state

3. **User Interaction**:
   - Presents generated schemas to the user
   - Collects feedback on whether to continue iteration
   - Accepts new instructions for schema refinement

4. **Iteration Control**:
   - Continues or terminates based on user input
   - Maintains schema history across iterations
   - Allows for incremental improvements

## Visual Overview
A flowchart representing the design and flow of the workflow.

<div style="max-width:400px;">
    
![image.png](../images/schema_generation.png)
    
</div>

## Conclusion
This implementation demonstrates a practical approach to semi-autonomous memory schema generation. The system combines the strengths of large language models with human oversight, allowing for iterative refinement of memory schemas. The graph-based workflow provides a structured yet flexible framework for schema generation, while the interactive component ensures quality control through human feedback.

Key advantages of this approach include:
- Structured schema generation following established memory types
- Flexible iteration based on user needs
- Maintainable state management
- Clear separation of concerns between generation and control flow

Future improvements could focus on:
- Enhanced schema validation
- Persistent storage of generated schemas
- More sophisticated iteration strategies
- Advanced conflict resolution in schema updates
- Integration with larger agent systems

This system provides a foundation for building more complex memory management systems in AI agents, particularly in applications requiring structured knowledge representation and human oversight.

# Dependencies and Imports
Install dependencies and import libraries.

In [1]:
%%capture

!pip install langgraph
!pip install langgraph-sdk
!pip install langgraph-checkpoint-sqlite
!pip install langchain-community
!pip install langchain-core
!pip install langchain-openai

In [2]:
from langchain_core.prompts import ChatPromptTemplate
from langgraph.graph import StateGraph, END
from langchain.schema import HumanMessage
from langchain_openai import ChatOpenAI

from pydantic import BaseModel
from typing import Optional

import os


## Clients
Import API keys and instantiate clients.

In [3]:
os.environ['OPENAI_API_KEY'] = 'YOUR-API-KEY'
llm = ChatOpenAI(model='gpt-4o-mini')

## Define Agent State
We'll define the state that our agent will maintain throughout its operation.


In [4]:
class State(BaseModel):
    idea: str
    memory_schema: list[str] = []
    update_instructions: Optional[str] = None

## Define Node Functions
Now we'll define the main node functions that our agent will use: generate_schema and update_instructions.


In [5]:
def schema_generation_node(state: State):
    ''' Generate Memory Schema '''
    prompt = ChatPromptTemplate.from_template(
        'You are tasked with generating a memory schema for {idea} based on the previous iteration of the schema.'
        'If no previous itearation exists, create the first one.'
        'You must choose one of the following memory types: ["episodic", "semantic", "procedural", "prospective"]'
        'Previous Iteration: {memory_schema}'
        'Follow these instructions: {update_instructions}'
        'Response Format: # memory_type ## memory_label ### memory_decription'
    )
    message = HumanMessage(content=prompt.format(idea=state.idea, memory_schema=state.memory_schema, update_instructions=state.update_instructions))
    memories = llm.invoke([message]).content.strip()

    state.memory_schema.append(memories)
    return state


def update_instrctions_node(state: State):
    ''' Update Generation Instructions '''
    new_instructions = input('Please provide updated instructions')
    state.update_instructions = new_instructions
    return state
    

## Define Edge Functions
Now we'll define the conditional edge function that our agent will use to control the workflow.

In [6]:
def update_schema(state: State):
    
    print('Proposed Schema:')
    print('----------------')
    
    for element in state.memory_schema:
        print(element)
    instructions = input('Do you wish to iterate over this schema? (yes or no)')
    
    if instructions in ['no', 'n', 'quit', 'q', 'exit', 'e']:
        print('Final Schema:')
        print('-------------')
        for element in state.memory_schema:
            print(element)
            
        return END
    else:
        return 'update_instrctions_node'
    

## Build Workflow
Now we'll create our workflow and compile it.


In [7]:
builder = StateGraph(State)

# Add nodes to the graph
builder.add_node('schema_generation_node', schema_generation_node)
builder.add_node('update_instrctions_node', update_instrctions_node)

# Add edges to the graph
builder.set_entry_point('schema_generation_node')
builder.add_conditional_edges('schema_generation_node', update_schema)
builder.add_edge('update_instrctions_node', 'schema_generation_node')

# Compile the graph
graph = builder.compile()

# Main Function
Define the function that runs the instanciates the workflow and its state.

In [8]:
def run_schema_generator(idea: str):
    state = State(idea=idea)
    
    for output in graph.stream(state):
        pass

# Run Program
Instanciate the main function and observe outputs.

In [9]:
run_schema_generator(idea='a set of agents that provide private tutoring online')


Proposed Schema:
----------------
# episodic  
## agent_interaction_history  
### A record of individual interactions between tutors and students, including session dates, topics covered, student progress notes, and feedback from both students and tutors. This memory type allows agents to recall specific past tutoring experiences, enhancing personalized learning and rapport building.  

# semantic  
## tutoring_knowledge_base  
### A comprehensive repository of educational resources, concepts, and subject matter expertise that tutors can draw upon. This includes definitions, theories, and problem-solving techniques relevant to various subjects, enabling agents to provide accurate and informative responses during sessions.  

# procedural  
## tutoring_session_protocol  
### A set of established procedures and best practices for conducting effective tutoring sessions. This includes guidelines on session structure, engagement techniques, assessment methods, and communication strategies, 

Do you wish to iterate over this schema? (yes or no) yes
Please provide updated instructions Add something to keep track of curiosities.


Proposed Schema:
----------------
# episodic  
## agent_interaction_history  
### A record of individual interactions between tutors and students, including session dates, topics covered, student progress notes, and feedback from both students and tutors. This memory type allows agents to recall specific past tutoring experiences, enhancing personalized learning and rapport building.  

# semantic  
## tutoring_knowledge_base  
### A comprehensive repository of educational resources, concepts, and subject matter expertise that tutors can draw upon. This includes definitions, theories, and problem-solving techniques relevant to various subjects, enabling agents to provide accurate and informative responses during sessions.  

# procedural  
## tutoring_session_protocol  
### A set of established procedures and best practices for conducting effective tutoring sessions. This includes guidelines on session structure, engagement techniques, assessment methods, and communication strategies, 

Do you wish to iterate over this schema? (yes or no) no


Final Schema:
-------------
# episodic  
## agent_interaction_history  
### A record of individual interactions between tutors and students, including session dates, topics covered, student progress notes, and feedback from both students and tutors. This memory type allows agents to recall specific past tutoring experiences, enhancing personalized learning and rapport building.  

# semantic  
## tutoring_knowledge_base  
### A comprehensive repository of educational resources, concepts, and subject matter expertise that tutors can draw upon. This includes definitions, theories, and problem-solving techniques relevant to various subjects, enabling agents to provide accurate and informative responses during sessions.  

# procedural  
## tutoring_session_protocol  
### A set of established procedures and best practices for conducting effective tutoring sessions. This includes guidelines on session structure, engagement techniques, assessment methods, and communication strategies, ensuri

{'schema_generation_node': {'idea': 'a set of agents that provide private tutoring online', 'memory_schema': ["# episodic  \n## agent_tutoring_experience  \n### This memory schema stores specific experiences and interactions that agents have had with students during online tutoring sessions. It includes details such as memorable student questions, breakthroughs in understanding, emotional moments, and unique pedagogical approaches used during sessions. This episodic memory allows agents to recall personal experiences that can inform future tutoring strategies and foster a more personalized learning environment.  \n\n# semantic  \n## subject_knowledge  \n### This memory schema contains general knowledge and facts related to the subjects that agents tutor. It includes definitions, concepts, theories, and relevant examples across various disciplines. This semantic memory helps agents provide accurate information and explanations to students and enhances their ability to answer questions a