Skip to content

Feature/rag backend#521

Merged
bedanley merged 32 commits into
rag20from
feature/rag-backend
Oct 28, 2025
Merged

Feature/rag backend#521
bedanley merged 32 commits into
rag20from
feature/rag-backend

Conversation

@bedanley
Copy link
Copy Markdown
Contributor

@bedanley bedanley commented Oct 21, 2025

Upgrading RAG to support Collections

Overview

This PR introduces a comprehensive collections management system for LISA's RAG functionality, enabling users to organize documents with different chunking strategies, access controls, and metadata without requiring infrastructure changes.

Key Features

1. Collection Management

  • Create/Read/Update/Delete collections within vector stores
  • Hierarchical organization with inheritance from parent repositories
  • Flexible chunking strategies per collection (FIXED_SIZE, with extensibility for SEMANTIC/RECURSIVE)
  • Access control with group-based permissions and private collections
  • Metadata tagging for organization and filtering

2. Enhanced Document Ingestion

  • Collection-aware ingestion with automatic routing
  • Chunking strategy override support during ingestion
  • Backward compatibility with existing model-based collections
  • Improved job tracking with detailed status information

3. Access Control Framework

  • Generic access control system with caching
  • Permission levels: READ, WRITE, ADMIN
  • Group-based access with inheritance
  • Private collections for user-specific data

4. API Enhancements

  • Collection CRUD endpoints with pagination and filtering
  • Enhanced similarity search with collection support
  • Improved document management with collection context
  • Better error handling and validation

Technical Changes

Database Schema

  • New DynamoDB table: LisaRagCollectionsTable with GSIs for querying
  • Enhanced document table: Added CollectionIndex GSI
  • Collection fields: collectionId, repositoryId, chunkingStrategy, metadata, etc.

Backend Components

  • collection_repo.py: DynamoDB operations for collections
  • collection_service.py: Business logic and orchestration
  • collection_validation.py: Validation rules and constraints
  • collection_access_control.py: Permission checking
  • chunking_strategy_factory.py: Extensible chunking strategy system
  • access_control.py: Generic access control framework

API Endpoints

POST /repository/{repositoryId}/collection
GET /repository/{repositoryId}/collection/{collectionId}
PUT /repository/{repositoryId}/collection/{collectionId}
DELETE /repository/{repositoryId}/collection/{collectionId}
GET /repository/{repositoryId}/collections

SDK Updates

  • lisapy/collection.py: Collection management methods
  • Enhanced lisapy/rag.py: Collection-aware document operations

Testing

  • Unit tests: Collection validation, access control, chunking strategies
  • Integration tests: End-to-end collection lifecycle with document ingestion
  • Test utilities: Reusable authentication and resource management helpers

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@bedanley bedanley marked this pull request as ready for review October 22, 2025 03:48
Comment thread lambda/repository/collection_validation.py Outdated
Comment thread lambda/repository/ingestion_job_repo.py Outdated
Comment thread lambda/repository/ingestion_job_repo.py Outdated
Comment thread lambda/repository/job_status.py Outdated
Comment thread lambda/repository/job_status.py
Comment thread lambda/repository/job_status.py
Comment thread lambda/repository/lambda_functions.py Outdated
Comment thread lambda/repository/lambda_functions.py Outdated
@bedanley bedanley force-pushed the feature/rag-backend branch from f8ac200 to 8fa1d50 Compare October 23, 2025 21:04
Comment thread lambda/repository/lambda_functions.py Outdated
Comment thread lambda/repository/lambda_functions.py Outdated
Comment thread lambda/repository/lambda_functions.py
Comment thread lambda/repository/lambda_functions.py Outdated
Comment thread lambda/repository/lambda_functions.py
Comment thread lambda/repository/rag_document_repo.py Outdated
Comment thread lambda/repository/collection_service.py Outdated
Comment thread lambda/repository/repository_service.py Outdated
Comment thread lambda/utilities/access_control_helpers.py Outdated
Comment thread test/lambda/test_access_control.py Outdated
Comment thread lib/schema/ragSchema.ts Outdated
Comment thread lambda/utilities/access_control_helpers.py Outdated
Comment thread lambda/utilities/access_control.py Outdated
Comment thread lib/rag/api/repository.ts Outdated
@bedanley bedanley force-pushed the feature/rag-backend branch from 8c1c4f0 to ebc504b Compare October 28, 2025 14:18
@bedanley bedanley merged commit 4399a2a into rag20 Oct 28, 2025
1 check passed
@bedanley bedanley deleted the feature/rag-backend branch October 28, 2025 14:22
bedanley added a commit that referenced this pull request Oct 30, 2025
* Add collection rag schema
* Add collection repo
* Add collection service
* Add collections CRUD API
* Add Collections Table
* Update document ingestion using collections
* Update delete docs from collections
* Add sdk and collection tests
* Add RAG Collection API and Tests
bedanley added a commit that referenced this pull request Nov 11, 2025
* Add collection rag schema
* Add collection repo
* Add collection service
* Add collections CRUD API
* Add Collections Table
* Update document ingestion using collections
* Update delete docs from collections
* Add sdk and collection tests
* Add RAG Collection API and Tests
bedanley added a commit that referenced this pull request Nov 12, 2025
* Feature/rag backend (#521)
* Add collection rag schema
* Add collection repo
* Add collection service
* Add collections CRUD API
* Add Collections Table
* Update document ingestion using collections
* Update delete docs from collections
* Add sdk and collection tests
* Add RAG Collection API and Tests
* rag collection UI
* default collection deletion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants