Skip to content

Conversation

@Jakedismo
Copy link
Owner

Pull Request

Description

Brief description of the changes and the motivation behind them.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance improvement
  • Code refactoring
  • CI/CD changes

Testing

  • Tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have added integration tests if applicable

Performance Impact

  • No performance impact expected
  • Performance improvement (include benchmarks if applicable)
  • Potential performance regression (explain and provide mitigation)

Security Considerations

  • No security implications
  • Security review completed
  • Dependency security audit passed

Documentation

  • Code is self-documenting with clear variable names and logic
  • Comments added for complex logic
  • README updated if applicable
  • API documentation updated if applicable

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

Related Issues

Fixes #(issue number)
Related to #(issue number)

Screenshots (if applicable)

Add screenshots to help explain the changes.

Additional Notes

Any additional information that would be helpful for reviewers.

…port

This commit adds Jina AI as a new embeddings provider option with the following features:

- Added JinaEmbeddingProvider implementing the EmbeddingProvider trait
- Support for Jina embeddings API (jina-embeddings-v4 model)
- Configurable task parameter (default: code.query for code embeddings)
- Late chunking support enabled by default for improved accuracy
- Integrated reranking functionality using Jina reranker-v3 model
- Reranking is enabled by default when Jina provider is selected
- Added Jina variant to EmbeddingProvider enum
- Created JinaEmbeddingConfig struct with all necessary configuration options
- Added 'jina' feature flag to Cargo.toml
- Registered jina_provider module in lib.rs
- Updated example_embedding.toml with Jina configuration example

The implementation follows the same pattern as existing providers (OpenAI, Ollama)
with retry logic, batch processing, and proper error handling.
This commit adds comprehensive SurrealDB support as an alternative to RocksDB:

## Configuration System
- Added DatabaseBackend enum to select between RocksDB and SurrealDB
- Created SurrealDbConfig with support for:
  - Multiple connection types (file://, mem://, http://, ws://)
  - Namespace and database selection for multi-tenancy
  - Optional authentication (username/password)
  - Strict mode for schema enforcement
  - Auto-migration on startup
- Updated Settings struct to use new DatabaseConfig structure
- Maintained backward compatibility with legacy rocksdb config

## Storage Implementation (surrealdb_storage.rs)
- Implemented GraphStore trait for full CRUD operations
- Features:
  - In-memory caching with DashMap for performance
  - Flexible JSON-based node representation
  - Automatic schema initialization
  - Built-in migration runner
  - Conversion between CodeNode and SurrealDB format
  - Support for embeddings, metadata, and all node attributes

## Schema Manager (surrealdb_schema.rs)
- Flexible schema definition system with:
  - TableSchema, FieldDefinition, and IndexDefinition types
  - Type-safe field type mappings to SurrealDB types
  - Dynamic field and index addition without downtime
  - Schema export/import functionality (JSON format)
  - Helper functions for standard node/edge schemas
- Designed for easy schema evolution and modifications

## Migration System (surrealdb_migrations.rs)
- Versioned migration framework with:
  - UP/DOWN migration support for rollbacks
  - Migration checksum verification for integrity
  - Migration status tracking and reporting
  - Automatic migration application
  - Template generation for new migrations
- Includes default migrations for initial schema
- Migration files stored in migrations/ directory

## Feature Flag & Dependencies
- Added 'surrealdb' feature flag to Cargo.toml
- Optional dependency on surrealdb v2.2
- Conditional compilation for zero overhead when not used

## Documentation & Examples
- Comprehensive SURREALDB_GUIDE.md covering:
  - Installation and configuration
  - Schema management best practices
  - Migration creation and management
  - Usage examples and advanced queries
  - Performance optimization tips
  - Migration path from RocksDB
  - Troubleshooting guide
- Example configuration file (surrealdb_example.toml)
- Updated default.toml with new database structure

## Migration SQL
- Initial schema migration (001_initial_schema.sql)
- Creates nodes, edges, schema_versions, and metadata tables
- Includes all necessary indexes for performance

This implementation is designed to be:
- Flexible: Easy to add new fields without migrations
- Maintainable: Clear migration system for schema evolution
- Production-ready: Strict mode, authentication, and validation
- Performant: Caching, indexes, and batch operations
- Well-documented: Comprehensive guide and examples
Changed default connection from file-based to WebSocket on standard SurrealDB port (8000):
- Updated SurrealDbConfig defaults in core config and storage
- Updated example configuration files
- Updated documentation to reflect WebSocket as primary connection method
- Added note about requiring SurrealDB server for WebSocket connections
- Maintained file:// examples for embedded mode as alternative
Changed configuration system to use a centralized user-level directory:

## Primary Changes
- Updated ConfigManager to prioritize ~/.codegraph/ as primary config directory
- Added fallback to ./config/ for backward compatibility
- Added fallback to current directory as last resort

## New Features
- `ConfigManager::init_user_config_dir()` - Initialize ~/.codegraph with defaults
- `ConfigManager::get_config_dir()` - Get config dir with custom override
- Automatic directory priority: ~/.codegraph → ./config → current dir
- Creates README.txt in ~/.codegraph explaining the directory structure

## Benefits
- **Centralized**: All CodeGraph configs in one uniform location
- **User-level**: Config follows user across projects
- **Standard practice**: Follows Unix/Linux convention (~/.config pattern)
- **Backward compatible**: Existing ./config setups continue to work
- **Cleaner projects**: Keeps project directories focused on code

## Documentation
- Created comprehensive CONFIGURATION_GUIDE.md
- Updated README.md to reflect new config directory
- Added config/README.md explaining migration
- Updated SURREALDB_GUIDE.md with new paths

## Configuration Loading Order
1. ~/.codegraph/default.toml (base)
2. ~/.codegraph/{environment}.toml (dev/staging/prod)
3. ~/.codegraph/local.toml (machine-specific overrides)
4. ./config/ (backward compatibility)
5. Environment variables (CODEGRAPH__* prefix)

Users can initialize ~/.codegraph by:
- Calling ConfigManager::init_user_config_dir(true)
- Manually: mkdir -p ~/.codegraph && cp config/*.toml ~/.codegraph/

Existing projects using ./config will continue to work seamlessly.
@Jakedismo Jakedismo merged commit 45ede99 into main Nov 6, 2025
@Jakedismo Jakedismo deleted the claude/add-jina-embeddings-provider-011CUrSj3ypAwj1HJ88ibMcC branch November 6, 2025 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants