Skip to content

πŸš€ A robust Python tool for migrating vectors between Pinecone indexes with batch processing, namespace filtering, and cross-environment support

License

Notifications You must be signed in to change notification settings

braisdev/pinecone-vector-migration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Pinecone Vector Migration Tool

A robust Python tool for migrating vectors between Pinecone indexes, with support for batch processing, namespace filtering, and automatic target index creation.

πŸš€ Features

  • Cross-environment migration: Migrate vectors between different Pinecone environments
  • Namespace support: Migrate specific namespaces or all available namespaces
  • Batch processing: Configurable batch sizes for optimal performance
  • Auto-creation: Optionally create target indexes automatically
  • Progress tracking: Real-time progress bars and detailed logging
  • Error handling: Comprehensive error handling and recovery
  • CLI interface: Easy-to-use command line interface
  • Programmatic API: Use as a Python library in your own projects

πŸ“¦ Installation

From Source

git clone https://github.com/your-username/pinecone-vector-migration.git
cd pinecone-vector-migration
pip install -e .

Using pip (when published)

pip install pinecone-vector-migration

πŸ”§ Configuration

Environment Variables (Recommended)

Create a .env file in your project root:

# Source Pinecone Configuration
SOURCE_API_KEY=your-source-api-key
SOURCE_ENV=your-source-environment
SOURCE_INDEX=your-source-index

# Target Pinecone Configuration  
TARGET_API_KEY=your-target-api-key
TARGET_ENV=your-target-environment
TARGET_INDEX=your-target-index

# Optional Configuration
NAMESPACE=specific-namespace                    # Single namespace
NAMESPACES=namespace1,namespace2,namespace3    # Multiple namespaces
CREATE_TARGET=true                             # Auto-create target index
UPSERT_BATCH_SIZE=100                         # Batch size for upserts

πŸ–₯️ Usage

Command Line Interface

Basic Migration (using .env file)

python -m pinecone_migration.cli

Migration with Command Line Arguments

python -m pinecone_migration.cli \
  --source-api-key YOUR_SOURCE_KEY \
  --source-env YOUR_SOURCE_ENV \
  --target-api-key YOUR_TARGET_KEY \
  --target-env YOUR_TARGET_ENV \
  --source-index source-idx \
  --target-index target-idx \
  --create-target

Migrate Specific Namespaces

# Single namespace
python -m pinecone_migration.cli --namespace production

# Multiple namespaces
python -m pinecone_migration.cli --namespaces \"prod,staging,dev\"

Advanced Options

python -m pinecone_migration.cli \
  --batch-size 50 \
  --create-target \
  --verbose

Programmatic Usage

from pinecone_migration import PineconeMigrator

# Initialize migrator
migrator = PineconeMigrator(
    source_api_key=\"your-source-key\",
    source_environment=\"your-source-env\",
    target_api_key=\"your-target-key\",
    target_environment=\"your-target-env\",
    batch_size=100
)

# Perform migration
stats = migrator.migrate(
    source_index=\"source-index\",
    target_index=\"target-index\",
    namespaces=[\"namespace1\", \"namespace2\"],  # None for all namespaces
    create_target=True
)

print(f\"Migrated {sum(stats.values())} vectors\")

πŸ“‹ CLI Options

Option Description Default
--source-api-key Source Pinecone API key From env
--source-env Source Pinecone environment From env
--target-api-key Target Pinecone API key From env
--target-env Target Pinecone environment From env
--source-index Source index name From env
--target-index Target index name From env
--namespace Single namespace to migrate None
--namespaces Comma-separated namespaces None
--create-target Create target index if missing False
--batch-size Batch size for upserts 100
--verbose, -v Enable verbose logging False

πŸ” Examples

Example 1: Complete Environment Migration

# Migrate all vectors from development to production
SOURCE_API_KEY=dev-key-123 \
SOURCE_ENV=us-west1-gcp \
SOURCE_INDEX=dev-vectors \
TARGET_API_KEY=prod-key-456 \
TARGET_ENV=us-east1-gcp \
TARGET_INDEX=prod-vectors \
CREATE_TARGET=true \
python -m pinecone_migration.cli

Example 2: Selective Namespace Migration

# Migrate only user and product namespaces
python -m pinecone_migration.cli \
  --namespaces \"users,products\" \
  --batch-size 50 \
  --verbose

Example 3: Cross-Project Migration

from pinecone_migration import PineconeMigrator

# Migrate from project A to project B
migrator = PineconeMigrator(
    source_api_key=\"project-a-key\",
    source_environment=\"us-west1-gcp\",
    target_api_key=\"project-b-key\", 
    target_environment=\"us-east1-gcp\"
)

# Create target index and migrate
stats = migrator.migrate(
    source_index=\"embeddings-v1\",
    target_index=\"embeddings-v2\",
    create_target=True
)

πŸ›‘οΈ Error Handling

The tool includes comprehensive error handling:

  • Connection errors: Validates API keys and environments
  • Index errors: Checks if indexes exist before migration
  • Rate limiting: Implements batch processing to avoid rate limits
  • Data validation: Validates vector dimensions and metadata
  • Recovery: Continues migration after temporary failures

πŸ“Š Performance Tips

  1. Batch Size: Adjust based on your vector size and network conditions

    • Small vectors (< 1KB): Use batch sizes 100-200
    • Large vectors (> 10KB): Use batch sizes 10-50
  2. Concurrent Migrations: Run multiple migrations in parallel for different namespaces

  3. Network: Use the same cloud region for source and target when possible

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

git clone https://github.com/your-username/pinecone-vector-migration.git
cd pinecone-vector-migration

# Install development dependencies
pip install -e \".[dev]\"

# Run tests
pytest

# Run linting
flake8 src/

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ› Issues and Support

If you encounter any issues or need support:

  1. Check the Issues page
  2. Create a new issue with detailed information about your problem
  3. Include your environment details and error messages

πŸ”— Related Projects

⚠️ Important Notes

  • Backup: Always backup your data before migration
  • Rate Limits: Be aware of Pinecone rate limits for your plan
  • Costs: Monitor your Pinecone usage during migration
  • Testing: Test with a small subset before full migration

Made with ❀️ for the vector database community

About

πŸš€ A robust Python tool for migrating vectors between Pinecone indexes with batch processing, namespace filtering, and cross-environment support

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages