A robust Python tool for migrating vectors between Pinecone indexes, with support for batch processing, namespace filtering, and automatic target index creation.
- Cross-environment migration: Migrate vectors between different Pinecone environments
- Namespace support: Migrate specific namespaces or all available namespaces
- Batch processing: Configurable batch sizes for optimal performance
- Auto-creation: Optionally create target indexes automatically
- Progress tracking: Real-time progress bars and detailed logging
- Error handling: Comprehensive error handling and recovery
- CLI interface: Easy-to-use command line interface
- Programmatic API: Use as a Python library in your own projects
git clone https://github.com/your-username/pinecone-vector-migration.git
cd pinecone-vector-migration
pip install -e .
pip install pinecone-vector-migration
Create a .env
file in your project root:
# Source Pinecone Configuration
SOURCE_API_KEY=your-source-api-key
SOURCE_ENV=your-source-environment
SOURCE_INDEX=your-source-index
# Target Pinecone Configuration
TARGET_API_KEY=your-target-api-key
TARGET_ENV=your-target-environment
TARGET_INDEX=your-target-index
# Optional Configuration
NAMESPACE=specific-namespace # Single namespace
NAMESPACES=namespace1,namespace2,namespace3 # Multiple namespaces
CREATE_TARGET=true # Auto-create target index
UPSERT_BATCH_SIZE=100 # Batch size for upserts
python -m pinecone_migration.cli
python -m pinecone_migration.cli \
--source-api-key YOUR_SOURCE_KEY \
--source-env YOUR_SOURCE_ENV \
--target-api-key YOUR_TARGET_KEY \
--target-env YOUR_TARGET_ENV \
--source-index source-idx \
--target-index target-idx \
--create-target
# Single namespace
python -m pinecone_migration.cli --namespace production
# Multiple namespaces
python -m pinecone_migration.cli --namespaces \"prod,staging,dev\"
python -m pinecone_migration.cli \
--batch-size 50 \
--create-target \
--verbose
from pinecone_migration import PineconeMigrator
# Initialize migrator
migrator = PineconeMigrator(
source_api_key=\"your-source-key\",
source_environment=\"your-source-env\",
target_api_key=\"your-target-key\",
target_environment=\"your-target-env\",
batch_size=100
)
# Perform migration
stats = migrator.migrate(
source_index=\"source-index\",
target_index=\"target-index\",
namespaces=[\"namespace1\", \"namespace2\"], # None for all namespaces
create_target=True
)
print(f\"Migrated {sum(stats.values())} vectors\")
Option | Description | Default |
---|---|---|
--source-api-key |
Source Pinecone API key | From env |
--source-env |
Source Pinecone environment | From env |
--target-api-key |
Target Pinecone API key | From env |
--target-env |
Target Pinecone environment | From env |
--source-index |
Source index name | From env |
--target-index |
Target index name | From env |
--namespace |
Single namespace to migrate | None |
--namespaces |
Comma-separated namespaces | None |
--create-target |
Create target index if missing | False |
--batch-size |
Batch size for upserts | 100 |
--verbose, -v |
Enable verbose logging | False |
# Migrate all vectors from development to production
SOURCE_API_KEY=dev-key-123 \
SOURCE_ENV=us-west1-gcp \
SOURCE_INDEX=dev-vectors \
TARGET_API_KEY=prod-key-456 \
TARGET_ENV=us-east1-gcp \
TARGET_INDEX=prod-vectors \
CREATE_TARGET=true \
python -m pinecone_migration.cli
# Migrate only user and product namespaces
python -m pinecone_migration.cli \
--namespaces \"users,products\" \
--batch-size 50 \
--verbose
from pinecone_migration import PineconeMigrator
# Migrate from project A to project B
migrator = PineconeMigrator(
source_api_key=\"project-a-key\",
source_environment=\"us-west1-gcp\",
target_api_key=\"project-b-key\",
target_environment=\"us-east1-gcp\"
)
# Create target index and migrate
stats = migrator.migrate(
source_index=\"embeddings-v1\",
target_index=\"embeddings-v2\",
create_target=True
)
The tool includes comprehensive error handling:
- Connection errors: Validates API keys and environments
- Index errors: Checks if indexes exist before migration
- Rate limiting: Implements batch processing to avoid rate limits
- Data validation: Validates vector dimensions and metadata
- Recovery: Continues migration after temporary failures
-
Batch Size: Adjust based on your vector size and network conditions
- Small vectors (< 1KB): Use batch sizes 100-200
- Large vectors (> 10KB): Use batch sizes 10-50
-
Concurrent Migrations: Run multiple migrations in parallel for different namespaces
-
Network: Use the same cloud region for source and target when possible
Contributions are welcome! Please feel free to submit a Pull Request.
git clone https://github.com/your-username/pinecone-vector-migration.git
cd pinecone-vector-migration
# Install development dependencies
pip install -e \".[dev]\"
# Run tests
pytest
# Run linting
flake8 src/
This project is licensed under the MIT License - see the LICENSE file for details.
If you encounter any issues or need support:
- Check the Issues page
- Create a new issue with detailed information about your problem
- Include your environment details and error messages
- Backup: Always backup your data before migration
- Rate Limits: Be aware of Pinecone rate limits for your plan
- Costs: Monitor your Pinecone usage during migration
- Testing: Test with a small subset before full migration
Made with β€οΈ for the vector database community