Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .env.sample
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# PostgreSQL Configuration
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=json_rag
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres

# ChromaDB Configuration
CHROMA_HOST=localhost
CHROMA_PORT=8000
CHROMA_COLLECTION=json_chunks

# Vector Store Selection
VECTOR_STORE=postgres # Options: postgres, chroma
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,13 @@ A tool for efficiently loading and integrating nested JSON data structures into
- State transitions and system conditions
- Hybrid search combining vector similarity, relationships, and filters

* **Intelligent JSON Chunking**:
- Structure-aware chunking preserving JSON hierarchy
- Semantic boundary detection for meaningful chunks
- Context-retaining chunk sizes optimized for embedding
- Overlap control to maintain continuity between chunks
- Metadata preservation across chunk boundaries

* **Smart Data Processing**:
- Automatic entity detection and relationship mapping
- Cross-file relationship detection and validation
Expand Down Expand Up @@ -98,6 +105,12 @@ The codebase is organized into logical modules:
- **analysis/**: Modules for analyzing data patterns, cross-file relationships, and user intent
- **core/**: Core system configuration and shared components
- **processing/**: Data processing and relationship detection modules
- The `parsing.py` module implements intelligent chunking:
- Recursively analyzes JSON structure to identify natural semantic boundaries
- Preserves parent-child relationships during chunking
- Maintains context by tracking paths and ancestry for each chunk
- Applies size constraints while respecting semantic coherence
- Assigns unique IDs and metadata to chunks for relationship tracking
- **retrieval/**: Relationship-aware search and context assembly
- **storage/**: Database interaction and relationship persistence
- **utils/**: Shared utility functions and helpers
Expand Down
Loading