-
Notifications
You must be signed in to change notification settings - Fork 5
Add Jina as selectable embeddings provider #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Jakedismo
merged 4 commits into
main
from
claude/add-jina-embeddings-provider-011CUrSj3ypAwj1HJ88ibMcC
Nov 6, 2025
Merged
Add Jina as selectable embeddings provider #46
Jakedismo
merged 4 commits into
main
from
claude/add-jina-embeddings-provider-011CUrSj3ypAwj1HJ88ibMcC
Nov 6, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…port This commit adds Jina AI as a new embeddings provider option with the following features: - Added JinaEmbeddingProvider implementing the EmbeddingProvider trait - Support for Jina embeddings API (jina-embeddings-v4 model) - Configurable task parameter (default: code.query for code embeddings) - Late chunking support enabled by default for improved accuracy - Integrated reranking functionality using Jina reranker-v3 model - Reranking is enabled by default when Jina provider is selected - Added Jina variant to EmbeddingProvider enum - Created JinaEmbeddingConfig struct with all necessary configuration options - Added 'jina' feature flag to Cargo.toml - Registered jina_provider module in lib.rs - Updated example_embedding.toml with Jina configuration example The implementation follows the same pattern as existing providers (OpenAI, Ollama) with retry logic, batch processing, and proper error handling.
This commit adds comprehensive SurrealDB support as an alternative to RocksDB: ## Configuration System - Added DatabaseBackend enum to select between RocksDB and SurrealDB - Created SurrealDbConfig with support for: - Multiple connection types (file://, mem://, http://, ws://) - Namespace and database selection for multi-tenancy - Optional authentication (username/password) - Strict mode for schema enforcement - Auto-migration on startup - Updated Settings struct to use new DatabaseConfig structure - Maintained backward compatibility with legacy rocksdb config ## Storage Implementation (surrealdb_storage.rs) - Implemented GraphStore trait for full CRUD operations - Features: - In-memory caching with DashMap for performance - Flexible JSON-based node representation - Automatic schema initialization - Built-in migration runner - Conversion between CodeNode and SurrealDB format - Support for embeddings, metadata, and all node attributes ## Schema Manager (surrealdb_schema.rs) - Flexible schema definition system with: - TableSchema, FieldDefinition, and IndexDefinition types - Type-safe field type mappings to SurrealDB types - Dynamic field and index addition without downtime - Schema export/import functionality (JSON format) - Helper functions for standard node/edge schemas - Designed for easy schema evolution and modifications ## Migration System (surrealdb_migrations.rs) - Versioned migration framework with: - UP/DOWN migration support for rollbacks - Migration checksum verification for integrity - Migration status tracking and reporting - Automatic migration application - Template generation for new migrations - Includes default migrations for initial schema - Migration files stored in migrations/ directory ## Feature Flag & Dependencies - Added 'surrealdb' feature flag to Cargo.toml - Optional dependency on surrealdb v2.2 - Conditional compilation for zero overhead when not used ## Documentation & Examples - Comprehensive SURREALDB_GUIDE.md covering: - Installation and configuration - Schema management best practices - Migration creation and management - Usage examples and advanced queries - Performance optimization tips - Migration path from RocksDB - Troubleshooting guide - Example configuration file (surrealdb_example.toml) - Updated default.toml with new database structure ## Migration SQL - Initial schema migration (001_initial_schema.sql) - Creates nodes, edges, schema_versions, and metadata tables - Includes all necessary indexes for performance This implementation is designed to be: - Flexible: Easy to add new fields without migrations - Maintainable: Clear migration system for schema evolution - Production-ready: Strict mode, authentication, and validation - Performant: Caching, indexes, and batch operations - Well-documented: Comprehensive guide and examples
Changed default connection from file-based to WebSocket on standard SurrealDB port (8000): - Updated SurrealDbConfig defaults in core config and storage - Updated example configuration files - Updated documentation to reflect WebSocket as primary connection method - Added note about requiring SurrealDB server for WebSocket connections - Maintained file:// examples for embedded mode as alternative
Changed configuration system to use a centralized user-level directory:
## Primary Changes
- Updated ConfigManager to prioritize ~/.codegraph/ as primary config directory
- Added fallback to ./config/ for backward compatibility
- Added fallback to current directory as last resort
## New Features
- `ConfigManager::init_user_config_dir()` - Initialize ~/.codegraph with defaults
- `ConfigManager::get_config_dir()` - Get config dir with custom override
- Automatic directory priority: ~/.codegraph → ./config → current dir
- Creates README.txt in ~/.codegraph explaining the directory structure
## Benefits
- **Centralized**: All CodeGraph configs in one uniform location
- **User-level**: Config follows user across projects
- **Standard practice**: Follows Unix/Linux convention (~/.config pattern)
- **Backward compatible**: Existing ./config setups continue to work
- **Cleaner projects**: Keeps project directories focused on code
## Documentation
- Created comprehensive CONFIGURATION_GUIDE.md
- Updated README.md to reflect new config directory
- Added config/README.md explaining migration
- Updated SURREALDB_GUIDE.md with new paths
## Configuration Loading Order
1. ~/.codegraph/default.toml (base)
2. ~/.codegraph/{environment}.toml (dev/staging/prod)
3. ~/.codegraph/local.toml (machine-specific overrides)
4. ./config/ (backward compatibility)
5. Environment variables (CODEGRAPH__* prefix)
Users can initialize ~/.codegraph by:
- Calling ConfigManager::init_user_config_dir(true)
- Manually: mkdir -p ~/.codegraph && cp config/*.toml ~/.codegraph/
Existing projects using ./config will continue to work seamlessly.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request
Description
Brief description of the changes and the motivation behind them.
Type of Change
Testing
Performance Impact
Security Considerations
Documentation
Checklist
Related Issues
Fixes #(issue number)
Related to #(issue number)
Screenshots (if applicable)
Add screenshots to help explain the changes.
Additional Notes
Any additional information that would be helpful for reviewers.