AI-native File System - A next-generation file system designed from the ground up for AI/ML workloads, featuring content addressing, vector-first metadata, versioned snapshots, and semantic search capabilities.
Version: 0.1.0-alpha
Test Coverage: 92.3% (170+ tests)
Implementation: Core functionality complete
Docker: Production-ready containerization
- Content Addressing: BLAKE3-based content addressing
- Vector Search: Semantic similarity search with FAISS
- Asset Kinds: Complete implementation of Blob, Tensor (Arrow2), Embed (FlatBuffers), Artifact (ZIP+MANIFEST)
- Strong Causality: Transaction system ensuring "Asset B SHALL NOT be visible until A is fully committed"
- Ed25519 Signatures: Complete snapshot root signing and verification with namespace key management
- Encryption: AES-256-GCM with KMS integration
- Versioning: Merkle tree-based snapshots with Ed25519 signatures
- gRPC API: High-performance RPC interface with reflection (dev mode)
- URI Schemes: Canonical
aifs://andaifs-snap://identifiers - Authorization: Macaroon-based capability tokens
- Compression: Gzip compression for transport
- Error Handling: Structured error responses with google.rpc.Status
- Docker Support: Production-ready containerization with Docker Compose
- Testing: Comprehensive test suite with 26+ asset kinds tests, 23+ strong causality tests, and 23+ Ed25519 signature tests
- Performance optimization and benchmarking
- Advanced monitoring and metrics
- FUSE layer for POSIX compatibility
- Pre-signed URLs for direct streaming
- Ingest operators for automatic embedding generation
- Strong causality for lineage tracking
- Advanced performance optimization
# Pull and run the latest version from Docker Hub
docker pull uriber/aifs:latest
docker run -p 50051:50051 -v aifs-data:/data/aifs uriber/aifs:latest
# Or use a specific version
docker pull uriber/aifs:v0.1.0-alpha
docker run -p 50051:50051 -v aifs-data:/data/aifs uriber/aifs:v0.1.0-alpha
# Test the API
grpcurl -plaintext localhost:50051 grpc.health.v1.Health/Checkcd local_implementation
# Build and run with Docker Compose
docker-compose up -d
# Or build and run manually
./docker-build.sh
docker run -p 50051:50051 -v aifs-data:/data/aifs aifs:latestcd local_implementation
python install.pycd local_implementation
pip install -r requirements.txtIf you encounter FAISS installation problems, the system will automatically fall back to scikit-learn for vector operations. However, for optimal performance, you should install FAISS.
# Try the dedicated FAISS installer
python install_faiss.py# Install Anaconda/Miniconda first, then:
conda install -c conda-forge faiss-cpu# macOS (with Homebrew)
brew install swig libomp openblas cmake
pip install faiss-cpu
# Ubuntu/Debian
sudo apt-get install swig libomp-dev openblas-dev cmake build-essential
pip install faiss-cpu
# CentOS/RHEL
sudo yum install swig libomp-devel openblas-devel cmake gcc-c++
pip install faiss-cpupip install faiss-cpu --only-binary=all- With FAISS: High-performance vector similarity search
- Without FAISS: Uses scikit-learn fallback (slower but functional)
The implementation follows the layered architecture specified in the AIFS RFC:
┌─────────────────────────────────────┐
│ Application Layer │
│ (CLI, FUSE, Client) │
├─────────────────────────────────────┤
│ gRPC API Layer │
│ (Built-in Services) │
├─────────────────────────────────────┤
│ Core AIFS Services │
│ (Asset, Storage, Vector, Auth) │
├─────────────────────────────────────┤
│ Storage Layer │
│ (Encrypted, Content-Addressed) │
└─────────────────────────────────────┘
- AES-256-GCM Encryption: All data chunks are encrypted at rest
- Ed25519 Signatures: Cryptographic verification of snapshots
- Macaroon Authorization: Capability-based access control
- Content Addressing: BLAKE3-based deduplication
- Proper Merkle Trees: Binary tree structure for efficient verification
- Merkle Proofs: Cryptographic proofs for asset inclusion
- Snapshot Signatures: Ed25519-signed snapshot roots
- Lineage Tracking: DAG-based transformation history
- FAISS Integration: High-performance similarity search (when available)
- scikit-learn Fallback: Functional vector search when FAISS unavailable
- Embedding Storage: Vector database for AI workloads
- Semantic Search: k-NN search over embeddings
- Metadata Indexing: Rich metadata querying
- SQLite Metadata Store: ACID-compliant metadata storage
- Encrypted Storage Backend: AES-256-GCM encrypted chunks
- Namespace Management: Multi-tenant isolation
- Lineage Graph: Parent-child relationship tracking
- zstd Compression: Efficient data compression
- Streaming I/O: Chunked data transfer
- Content Deduplication: Eliminates redundant storage
- Sharded Storage: Efficient file system organization
local_implementation/
├── aifs/ # Core AIFS implementation
│ ├── __init__.py
│ ├── asset.py # Asset management
│ ├── auth.py # Authorization system
│ ├── client.py # gRPC client
│ ├── compression.py # Compression service
│ ├── crypto.py # Cryptographic operations
│ ├── errors.py # Structured error handling
│ ├── fuse.py # FUSE layer
│ ├── merkle.py # Merkle tree implementation
│ ├── metadata.py # Metadata store
│ ├── proto/ # Protocol definitions
│ ├── server.py # gRPC server
│ ├── storage.py # Storage backend
│ ├── uri.py # URI scheme handling
│ └── vector_db.py # Vector database (FAISS + fallback)
├── tests/ # Comprehensive test suite
│ ├── test_asset_manager.py # Asset manager tests
│ ├── test_auth.py # Authorization tests
│ ├── test_basic.py # Basic functionality tests
│ ├── test_builtin_services.py # Built-in services tests
│ ├── test_compression.py # Compression tests
│ ├── test_crypto.py # Cryptographic tests
│ ├── test_merkle_tree.py # Merkle tree tests
│ ├── test_storage.py # Storage tests
│ ├── test_blake3_uri.py # BLAKE3 and URI tests
│ ├── test_error_handling.py # Error handling tests
│ ├── test_encryption_kms.py # Encryption and KMS tests
│ ├── test_grpc_server.py # gRPC server tests
│ └── test_merkle_blake3.py # Merkle tree with BLAKE3 tests
├── examples/ # Usage examples
├── install.py # Automated installer
├── install_faiss.py # FAISS installation helper
├── run_tests.py # Test runner
├── start_server.py # Server startup script
├── aifs_cli.py # Command-line interface
├── requirements.txt # Dependencies
├── Dockerfile # Docker container definition
├── docker-compose.yml # Docker Compose orchestration
├── docker-build.sh # Docker build script
├── docker-run.sh # Docker run script
├── .dockerignore # Docker build exclusions
├── DOCKER.md # Docker documentation
├── Makefile # Development automation
└── README_IMPLEMENTATION.md # Detailed implementation guide
python run_tests.pypython run_tests.py merkle_tree # Merkle tree tests
python run_tests.py crypto # Cryptographic tests
python run_tests.py storage # Storage tests
python run_tests.py compression # Compression tests
python run_tests.py asset_manager # Asset manager testsThe test suite covers:
- ✅ All core components
- ✅ Cryptographic operations
- ✅ Authorization system
- ✅ Storage backend
- ✅ Merkle tree operations
- ✅ Vector search (FAISS + fallback)
- ✅ Error handling
- ✅ Edge cases
# Start with Docker Compose (recommended)
docker-compose up -d
# Or run manually
./docker-build.sh
docker run -p 50051:50051 -v aifs-data:/data/aifs aifs:latest
# Development mode with gRPC reflection
docker run -p 50051:50051 -v aifs-data:/data/aifs aifs:latest python start_server.py --dev# Start the server
python start_server.py --port 50051 --storage-dir ~/.aifs
# Development mode with gRPC reflection
python start_server.py --dev --port 50051 --storage-dir ~/.aifs# Store an asset
python aifs_cli.py put --kind blob ./tests/files/data.txt
# Search for assets
python aifs_cli.py search --query "test data"
# Create a snapshot
python aifs_cli.py snapshot --namespace test --assets asset1,asset2
# List assets
python aifs_cli.py listfrom aifs.client import AIFSClient
# Connect to server
client = AIFSClient("localhost:50051")
# Store asset
asset_id = client.put_asset(
data=b"Hello, AIFS!",
kind="blob",
metadata={"description": "Test asset"}
)
# Retrieve asset
asset = client.get_asset(asset_id)
# Vector search
results = client.vector_search(query_embedding, k=10)# Mount AIFS as a filesystem
python -c "
from aifs.fuse import AIFSFuse
from aifs.client import AIFSClient
import fuse
client = AIFSClient('localhost:50051')
fuse_ops = AIFSFuse(client, 'default')
fuse.FUSE(fuse_ops, '/mnt/aifs')
"export AIFS_ROOT_DIR=~/.aifs # Data directory
export AIFS_SERVER_PORT=50051 # Server port
export AIFS_ENCRYPTION_KEY=your_key # Encryption key (32 bytes)
export AIFS_PRIVATE_KEY=your_priv_key # Ed25519 private key# Custom configuration
from aifs.server import serve
serve(
root_dir="~/.aifs",
port=50051,
max_workers=20
)- AES-256-GCM: Military-grade encryption for all data
- Key Derivation: HKDF-based key derivation
- Nonce Management: Secure random nonce generation
- Authenticated Encryption: Integrity and confidentiality
- Ed25519 Signatures: Fast, secure digital signatures
- Public Key Verification: Cryptographic proof of authenticity
- Timestamp Validation: Prevents replay attacks
- Namespace Isolation: Multi-tenant security
- Macaroon Tokens: Capability-based access control
- Method Restrictions: Fine-grained permission control
- Namespace Scoping: Resource isolation
- Expiry Management: Time-limited access tokens
- Content Deduplication: Eliminates redundant storage
- Sharded Storage: Efficient file system organization
- Compression: zstd compression for space efficiency
- Streaming I/O: Efficient large file handling
- FAISS Integration: High-performance vector search (when available)
- scikit-learn Fallback: Functional vector search (slower)
- Index Optimization: Optimized for similarity queries
- Caching: Metadata and embedding caching
- Parallel Processing: Multi-threaded operations
- gRPC Streaming: Efficient data transfer
- Compression: zstd compression for network efficiency
- Connection Pooling: Reusable connections
- Load Balancing: Ready for horizontal scaling
- Hash Algorithm: Uses BLAKE3 for content addressing (Rust dependency included)
- Vector Search: Falls back to scikit-learn if FAISS unavailable
- Performance: Local implementation, not production-optimized
- Scalability: Single-node implementation
- Monitoring: Basic metrics only
- BLAKE3 Integration: Install Rust for full spec compliance
- FAISS Optimization: Ensure FAISS is always available
- Performance Optimization: Meet RFC performance targets
- Distributed Storage: Multi-node deployment
- Advanced Monitoring: OpenTelemetry integration
- Load Testing: Performance benchmarking
- Security Audit: Penetration testing
error: command 'swig' failed: No such file or directory
Solutions:
- Use the FAISS installer:
python install_faiss.py - Install system dependencies:
- macOS:
brew install swig libomp openblas cmake - Ubuntu:
sudo apt-get install swig libomp-dev openblas-dev cmake
- macOS:
- Use conda:
conda install -c conda-forge faiss-cpu - Accept fallback: System will use scikit-learn automatically
error: Cargo, the Rust package manager, is not installed
Solution: BLAKE3 is now included with Rust dependency in Docker images
error: Permission denied
Solution: Check file permissions and run with appropriate user
error: Address already in use
Solution: Change port or stop existing service
# Enable debug logging
export AIFS_LOG_LEVEL=DEBUG
python start_server.py
# Run tests with verbose output
python run_tests.py --verbose# Check vector database backend
python -c "from aifs.vector_db import VectorDB; vdb = VectorDB('/tmp'); print(vdb.get_stats())"
# Verify FAISS installation
python -c "import faiss; print('FAISS version:', faiss.__version__)"class AssetManager:
def put_asset(data, kind, embedding=None, metadata=None, parents=None)
def get_asset(asset_id)
def vector_search(query_embedding, k=10)
def create_snapshot(namespace, asset_ids, metadata=None)
def verify_snapshot(snapshot_id, public_key)class StorageBackend:
def put(data)
def get(hash_hex)
def exists(hash_hex)
def delete(hash_hex)
def get_chunk_info(hash_hex)class CryptoManager:
def sign_snapshot(merkle_root, timestamp, namespace)
def verify_snapshot_signature(signature, merkle_root, timestamp, namespace, public_key)
def get_public_key()class MerkleTree:
def get_root_hash()
def get_proof(asset_id)
def verify_proof(asset_id, proof, root_hash)class VectorDB:
def add(asset_id, embedding)
def search(query_embedding, k=10)
def delete(asset_id)
def get_stats() # Shows backend (FAISS or scikit-learn)- Architecture Specification
- Implementation Guide
- API Documentation
- Docker Documentation
- Changelog
- Client App Specification
# Clone repository
git clone https://github.com/UriBer/AIFS.git
cd local_implementation
# Install development dependencies
pip install -r requirements.txt
pip install pytest pytest-cov black flake8
# Run code formatting
black aifs/ tests/
# Run linting
flake8 aifs/ tests/
# Run tests with coverage
pytest --cov=aifs tests/- Python: PEP 8 compliance
- Type Hints: Full type annotation
- Documentation: Comprehensive docstrings
- Testing: 90%+ test coverage target
This implementation is provided under the same license as the main project. See the LICENSE file for details.
- Open Source Community: For the excellent libraries used
- Architecture Specification: See docs/spec/rfc/0001-aifs-architecture.md
- Implementation Guide: See local_implementation/docs/implementation/README_IMPLEMENTATION.md
- Changelog: See docs/CHANGELOG.md
Note: This implementation prioritizes functionality and security over performance optimization. For production deployment, additional performance tuning and security hardening is recommended.
Vector Search Note: The system automatically falls back to scikit-learn if FAISS is unavailable, ensuring functionality while maintaining the option for high-performance vector search when FAISS is properly installed.