This project demonstrates a workflow pattern for building personalized LLM context through conversational data extraction. The system uses an AI agent to conduct interviews, extract contextual information, and format it for use in RAG pipelines.
This project models a workflow where an AI agent conducts structured interviews to build a personal context repository. The implementation:
- Extracts contextual information through conversational interviews
- Outputs structured markdown files suitable for vector database ingestion
- Maintains platform-agnostic data storage for use across different LLM systems
- Provides a reference implementation of agent-driven context collection
- Iteratively build a personal context collection through structured interviews
- Store context in portable markdown format for integration with any vector database
- Compatible with local RAG systems (LlamaIndex, ChromaDB) or cloud platforms (Pinecone, Weaviate)
- Expand context repository through repeated interview sessions
- Training data for personal AI assistants
- Structured documentation of domain expertise
- Team knowledge capture and onboarding materials
- Interview data collection for research purposes
- Vector databases: Pinecone, ChromaDB, Weaviate, Qdrant
- RAG frameworks: LangChain, LlamaIndex
- Custom retrieval pipelines
- Fine-tuning datasets
The agent generates contextually relevant questions based on responses:
The system parses interview transcripts and extracts structured context data suitable for vector database ingestion.
Context is exported as downloadable markdown files:
The markdown format provides a compact, portable structure compatible with most LLM and vector database systems.
The intended workflow involves conducting multiple interview sessions over time, with each session adding to the personal context repository. This incremental approach builds a comprehensive context dataset for RAG-enhanced LLM interactions.
This project was developed through collaboration between Daniel Rosehill and Claude (Anthropic). It demonstrates a workflow pattern for agent-driven context collection and RAG pipeline data preparation.
Built with Streamlit and OpenAI API, implementing an agent-driven interview workflow:
- Interview System: Conversational interface for conducting structured interviews via OpenAI API
- Question Generation: Context-aware follow-up questions based on prior responses
- API Key Management: Local storage and configuration for OpenAI credentials
- Session Management: State preservation across interview sessions
- Context Extraction: LLM-based processing to extract structured context from interview transcripts
- Frontend: Streamlit interface
- Backend Processing:
- OpenAI API integration for conversation and context extraction
- Local file system for configuration and session data
- Markdown generation pipeline
- Data Export: Markdown file generation for extracted context
- Configuration: Local storage for API keys and settings
- Chat interface with conversation history
- Session progress tracking
- Markdown file download
- API key configuration
- Error handling
- Context output formatted for vector database ingestion
- Platform-agnostic markdown storage
- Compatible with RAG frameworks (LangChain, LlamaIndex)
- Suitable for training data or fine-tuning datasets
Development: Claude (Anthropic) Project Direction and Implementation: Daniel Rosehill



