A lab project for experimenting with context window management, conversation visualization, and RAG operations for LLM interactions.
lab/
├── chatgpt_client.py # Simple ChatGPT client (the "boss")
├── smithers.py # Assistant LLM with context management tools
├── context_manager.py # CRUD operations for context windows
├── visualizer.py # Conversation graph visualizer
├── benchmarks/ # Long-context benchmarking suite
│ ├── base_benchmark.py # Base class for benchmarks
│ ├── needle_in_haystack.py # Needle in haystack benchmark
│ ├── oolong.py # OOLONG benchmark
│ ├── oolong_pairs.py # OOLONG PAIRS benchmark
│ ├── codeqa.py # CodeQA benchmark
│ ├── browsecomp.py # BrowseComp+ benchmark
│ └── benchmark_runner.py # Benchmark execution engine
├── examples/ # Example scripts
│ ├── example_usage.py
│ └── demo_visualizer.py
├── tests/ # Test files
│ └── test_all.py
├── requirements.txt # Python dependencies
├── .env # API keys (create this file)
└── README.md # This file
-
Install dependencies:
pip install -r requirements.txt
-
Create
.envfile:echo "OPENAI_API_KEY=your_api_key_here" > .env
Simple client for direct ChatGPT interactions.
Usage:
# Command line
python chatgpt_client.py "Hello, how are you?"
# Interactive mode
python chatgpt_client.pyCore library for managing context windows with full CRUD operations.
Features:
- Create, Read, Update, Delete context entries
- Search context by text
- Compact/summarize context ranges
- Save/load context to JSON
- Statistics and analytics
Usage:
from context_manager import ContextManager
cm = ContextManager()
cm.create("user", "Hello")
cm.create("assistant", "Hi there!")
entry = cm.read(index=0)
cm.update(0, content="Hello world")
cm.delete(index=1)Assistant LLM that helps manage context windows and performs RAG operations.
Features:
- Chat with automatic context management
- RAG (Retrieval-Augmented Generation)
- Context compaction using AI
- Full CRUD operations via CLI
- Integration with visualizer
Usage:
python smithers.pyCommands:
chat <message>- Chat with Smitherscreate <role> <content>- Add context entryread [index|start:end|role]- Read contextupdate <index> <field> <value>- Update entrydelete <index|start:end|role>- Delete entriessearch <query>- Search contextcompact [start:end]- Compact contextstats- Show context statisticssave <filepath>- Save context to fileload <filepath>- Load context from filerag <query>- RAG-enhanced chatvisualize [output.png] [start:end]- Visualize conversation graphexit/quit- Exit
Programmatic Usage:
from smithers import Smithers
smithers = Smithers()
response = smithers.chat("What is 2+2?")
cm = smithers.get_context_manager()Creates graph visualizations of conversations with node sizes based on word count.
Features:
- Top-down hierarchical layout
- Node size = word count
- Color-coded by role (user/assistant/system)
- Save to PNG/PDF or display interactively
Usage:
# From saved context file
python visualizer.py context.json output.png
# Programmatically
from visualizer import ConversationVisualizer
from context_manager import ContextManager
cm = ContextManager()
# ... add context ...
viz = ConversationVisualizer(cm)
viz.visualize(output_file="graph.png")From Smithers:
python smithers.py
> visualize conversation.png
> visualize output.png 0:10 # Specific range-
Set up environment:
pip install -r requirements.txt echo "OPENAI_API_KEY=your_key" > .env
-
Try the simple client:
python chatgpt_client.py "Hello!" -
Use Smithers with context management:
python smithers.py > chat Hello, I'm working on a project > chat Tell me about context windows > stats > visualize conversation.png
-
Run benchmarks:
python benchmarks/benchmark_runner.py --benchmark needle_in_haystack --context-lengths 1000 5000
-
See examples:
python examples/example_usage.py python examples/demo_visualizer.py
# 1. Start Smithers
python smithers.py
# 2. Have a conversation
> chat What are context windows?
> chat How do I manage large contexts?
> chat Explain RAG
# 3. Check your context
> stats
> read
# 4. Search for specific topics
> search RAG
# 5. Visualize the conversation
> visualize conversation_graph.png
# 6. Save for later
> save my_conversation.json
# 7. Load and continue
> load my_conversation.json
> chat Continue our discussionopenai- OpenAI API clientpython-dotenv- Environment variable managementnetworkx- Graph creation (for visualizer)matplotlib- Plotting (for visualizer)
Note: Visualization features are optional. The rest of the system works without networkx and matplotlib installed.
Comprehensive benchmarking system for evaluating long-context window performance.
Available Benchmarks:
- Needle in Haystack: Find specific information in very long contexts
- OOLONG: Out-Of-LOng-context Needle - information at various positions
- OOLONG PAIRS: Find and relate pairs of information across long contexts
- CodeQA: Answer questions about code in long contexts
- BrowseComp+: Browse and comprehend information across long documents
Usage:
# Run all benchmarks
python benchmarks/benchmark_runner.py --benchmark all --context-lengths 1000 5000 10000 20000
# Run specific benchmark
python benchmarks/benchmark_runner.py --benchmark needle_in_haystack --context-lengths 5000 10000 --runs 5
# Save results
python benchmarks/benchmark_runner.py --benchmark all --output results.jsonProgrammatic Usage:
from benchmarks.benchmark_runner import BenchmarkRunner
runner = BenchmarkRunner(model="gpt-4o")
results = runner.run_benchmark(
"needle_in_haystack",
context_lengths=[1000, 5000, 10000],
num_runs=3
)
runner.save_results(results, "results.json")Individual Benchmark Usage:
from benchmarks.needle_in_haystack import NeedleInHaystackBenchmark
benchmark = NeedleInHaystackBenchmark()
test_case = benchmark.generate_test_case(context_length=5000, needle_position="middle")
# Use test_case["context"] and test_case["question"]
# Evaluate with: benchmark.evaluate(response, test_case["expected_answer"])- This is a lab project for experimentation with long-context windows
- Context is stored in memory by default (use
save/loadfor persistence) - All tools are designed to be simple and understandable
- The visualizer requires networkx and matplotlib (install separately if needed)
- Benchmarks are designed to test context window limits and retrieval capabilities
- chatgpt_client.py: Simple ChatGPT interface (the "boss")
- smithers.py: Assistant with context management tools
- context_manager.py: Core CRUD operations for context arrays
- visualizer.py: Graph visualization of conversations
- benchmarks/base_benchmark.py: Base class for all benchmarks
- benchmarks/needle_in_haystack.py: Needle in haystack benchmark
- benchmarks/oolong.py: OOLONG benchmark
- benchmarks/oolong_pairs.py: OOLONG PAIRS benchmark
- benchmarks/codeqa.py: CodeQA benchmark
- benchmarks/browsecomp.py: BrowseComp+ benchmark
- benchmarks/benchmark_runner.py: Benchmark execution engine
- examples/example_usage.py: Code examples for all features
- examples/demo_visualizer.py: Demo script for visualization
- tests/test_all.py: Comprehensive test suite