β οΈ ALPHA RELEASE: Spectra-NSA is under active development. Breaking changes may occur. Use for research and experimentation.
Advanced neural embedding model combining spectral-entropy attention, Matryoshka representations, and anomalous detection for state-of-the-art semantic similarity tasks.
- π§ Spectral-Entropy Attention - Novel attention mechanism with learnable fractal enrichment
- π Matryoshka Embeddings - Multi-scale semantic representations (768β512β256)
- π Anomalous Detection - Built-in out-of-distribution detection via learned anomalous basis
- π Multiple Size Presets - M456 (458M), M600 (600M), M700 (700M), M1B (1B params)
- ποΈ Component Toggles - Flexible ablation studies via CLI flags
- π Real-time Monitoring - Integrated SOTA comparison and health checks
- πΎ Auto-backup to Google Drive - Automatic checkpoint backup every 6 saves
| Model | Parameters | STS-B Target | VRAM | Training Time (A100) | Best For |
|---|---|---|---|---|---|
| M456 | 458M | >0.825 | ~35GB | 4-6h | Colab Pro β |
| M600 | 600M | >0.835 | ~37GB | 6-8h | Balanced |
| M700 | 700M | >0.840 | ~38GB | 8-10h | SOTA target |
| M1B | 1.0B | >0.850 | ~55GB | 12-16h | Research |
Default: M456 - Optimized for Google Colab Pro (A100 40GB)
pip install transformers datasets accelerate scipy torchpython anomalous_embedding_ultimate.py --mode train --epochs 3Expected output:
- Training time: ~4-6 hours (A100, fp32 debug mode)
- STS-B score: >0.825
- Automatic checkpoints every 500 steps
- Drive backup every 6 checkpoints
python anomalous_embedding_ultimate.py --mode eval --checkpoint checkpoints/best_sts.ptpython anomalous_embedding_ultimate.py \
--mode extract \
--checkpoint checkpoints/best_sts.pt \
--texts "sample text" "another text"# SOTA target (M700)
python anomalous_embedding_ultimate.py --size M700 --mode train --epochs 3
# Research scale (M1B) - requires 80GB GPU
python anomalous_embedding_ultimate.py --size M1B --mode train --epochs 3# Disable spectral attention
python anomalous_embedding_ultimate.py --no-spectral --mode train --epochs 1
# Disable anchor64 head
python anomalous_embedding_ultimate.py --no-anchor --mode train --epochs 1
# Minimal configuration
python anomalous_embedding_ultimate.py \
--no-spectral --no-anchor --no-bridge --no-matryoshka \
--mode train --epochs 1-
CustomEncoder - Transformer backbone with spectral-entropy attention
- Learnable fractal depth mixing
- Learnable spectral fusion weights
- Temperature-annealed contrastive learning
-
Matryoshka Heads - Multi-scale embeddings
- Semantic: 768, 512, 256 dimensions
- Entity: 384, 192, 96 dimensions
- Progressive nesting for efficient inference
-
Anomalous Projection - OOD detection
- Learned anomalous basis (16 prototypes)
- Spectral regularization
- Ranking head for retrieval tasks
-
LossStack - Multi-objective training
- InfoNCE (semantic, anchor, fast retrieval)
- Triplet margin loss
- Matryoshka angular alignment
- Bridge loss (semanticβentity)
- Spectral entropy regularization
- Temperature Scheduling: Cosine decay (0.07β0.05)
- Gradient Accumulation: Effective batch size 64
- Mixed Precision: FP16 support (currently disabled for debugging)
- Early Stopping: Configurable patience (default: disabled)
- Auto-backup: Google Drive sync every 6 checkpoints
NSA_2.0/
βββ anomalous_embedding_ultimate.py # Main training script
βββ training_monitor.py # Real-time monitoring & health checks
βββ anomalous_eval_suite.py # Comprehensive evaluation suite
βββ colab_training.ipynb # Google Colab training notebook
βββ ULTIMATE_GUIDE.md # Detailed usage guide
βββ DATASET_INFO.md # Dataset information
βββ GOOGLE_DRIVE_SETUP.md # Drive setup instructions
βββ checkpoints/ # Saved models (auto-created)
Key parameters in Config dataclass:
# Model Architecture
hidden_size: int = 1024 # M456 default
num_layers: int = 24
num_heads: int = 16
spectral_dim: int = 192
max_length: int = 160 # VRAM optimized
# Training
batch_train: int = 8 # Physical batch size
grad_accum: int = 8 # Effective batch = 64
epochs: int = 3
lr: float = 2e-4
fp16: bool = False # Debug mode (use fp32)
# Monitoring
save_every: int = 500 # Checkpoint frequency
eval_every: int = 500 # STS evaluation frequency
early_stop_patience: int = 9999 # Disabled (duration by epochs)| Step | Epoch | Event |
|---|---|---|
| 500 | 0.04 | First STS evaluation (~0.68) |
| 3,000 | 0.24 | 1st Drive backup |
| 12,250 | 1.00 | End epoch 1 (STS ~0.78) |
| 24,500 | 2.00 | End epoch 2 (STS ~0.82) |
| 36,750 | 3.00 | Final (STS >0.825) |
Total training steps: ~36,750
Drive backups: ~12 (every 3,000 steps)
Checkpoints saved: ~73 (every 500 steps)
- β STS-B > 0.825 (competitive with BGE-base)
- β Fits in Colab Pro (40GB VRAM)
- β Training completes in 4-6h (A100)
- β No OOM errors
- β STS-B > 0.840 (competitive with GTE-large)
- β NDCG@10 > 0.420
- β Recall@1 > 0.550
- β Training completes in 8-10h (A100)
For M456:
# Edit Config in anomalous_embedding_ultimate.py
batch_train = 6 # Reduce from 8
max_length = 128 # Reduce from 160For M700:
- Switch to M456 or use 80GB GPU
- Enable gradient checkpointing (future)
Check GPU type:
!nvidia-smiShould show: A100-SXM4-40GB or A100-80GB
- Verify all components enabled (no
--no-*flags) - Check temperature annealing is active
- Monitor loss components (should decrease)
- Wait until warmup ends (8% of training)
Verify Drive is mounted:
import os
print(os.path.exists("/content/drive/MyDrive")) # Should be TrueIf you use this work in your research, please cite:
@software{spectra_nsa,
title = {Spectra-NSA: Neural Semantic Architecture - Advanced Embeddings},
author = {Your Name},
year = {2025},
url = {https://github.com/yourusername/spectra-nsa},
note = {Alpha release - Architecture under active development}
}This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Copyright 2025
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
ALPHA SOFTWARE: This architecture is experimental and under active development.
- β Use for: Research, experimentation, benchmarking
β οΈ Not recommended for: Production systems without extensive testing- π Breaking changes: May occur between releases
- π Known issues: NaN debugging enabled (fp32 mode for stability)
Current Status:
- Core architecture: Stable
- Training pipeline: Stable
- Evaluation suite: Stable
- Multi-GPU support: In development
- Gradient checkpointing: Planned
- Mixed precision (fp16): Disabled for debugging
Contributions welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
Areas of interest:
- Multi-GPU training optimization
- FP16 stability improvements
- Additional evaluation benchmarks
- Memory optimization techniques
- Documentation improvements
For questions, issues, or collaboration:
- Open an issue on GitHub
- Email: daniele.tl.project@gmail.com
- Discord: nexus_walker_dc
- Hugging Face - Transformers & Datasets libraries
- PyTorch Team - Deep learning framework
- Sentence-Transformers - MS MARCO dataset
- MTEB - Benchmark datasets
- Google Colab - Training infrastructure
- ULTIMATE_GUIDE.md - Detailed usage guide
- DATASET_INFO.md - Dataset information
- GOOGLE_DRIVE_SETUP.md - Colab setup guide
- colab_training.ipynb - Training notebook
Built with β€οΈ for the NLP research community
π Links:
- Repository: spectra-nsa
- Documentation: Coming Soon
- Paper: In preparation
Last updated: November 15, 2025