Not the neurons we want, but the neurons we need
Activation-free neural layers that learn non-linearity through geometric operations
📚 Documentation · 📄 Read the Paper · 📝 Read the Blog · 🐛 Report Bug · 🌐 Azetta.ai
NMN replaces traditional Linear + ReLU with a single geometric operation that learns non-linearity without activation functions:
# Traditional approach
y = relu(linear(x)) # dot product → activation
# NMN approach
y = yat(x) # geometric operation with built-in non-linearityThe Yat-Product (ⵟ) balances similarity and distance to create inherently non-linear transformations—no activations needed.
| Feature | Description |
|---|---|
| 🔥 Activation-Free | Learn complex non-linear relationships without ReLU, sigmoid, or tanh |
| 🌐 Multi-Framework | PyTorch, TensorFlow, Keras, Flax (Linen & NNX) |
| 🧮 Geometric Foundation | Based on distance-similarity tradeoff, not just correlations |
| ✅ Full Framework Parity | Dense, Conv, ConvTranspose, Attention, Embedding, and Squashers across all 5 frameworks |
| 🧠 Complete Layer Suite | Dense, Conv1D/2D/3D, ConvTranspose1D/2D/3D, Multi-Head Attention, Embeddings |
| ⚡ Production Ready | Comprehensive tests, CI/CD, high code coverage |
The core operation that powers NMN:
🔍 Geometric Interpretation (click to expand)
Rewriting in terms of norms and angles:
$$ ⵟ(\mathbf{w}, \mathbf{x}) = \frac{|\mathbf{w}|^2 |\mathbf{x}|^2 \cos^2\theta}{|\mathbf{w}|^2 - 2\langle\mathbf{w}, \mathbf{x}\rangle + |\mathbf{x}|^2 + \epsilon} $$
Output is maximized when:
- ✅ Vectors are aligned (small θ → large cos²θ)
- ✅ Vectors are close (small Euclidean distance)
- ✅ Vectors have large magnitude (amplifies the signal)
This creates a fundamentally different learning dynamic:
| Traditional Neuron | Yat Neuron |
|---|---|
| Measures correlation only | Balances similarity AND proximity |
| Requires activation for non-linearity | Non-linearity is intrinsic |
| Can fire for distant but aligned vectors | Penalizes distance between w and x |
The same principle applied to local patches:
Where W is the kernel and X is the input patch.
pip install nmn
# Framework-specific installations
pip install "nmn[torch]" # PyTorch
pip install "nmn[keras]" # Keras/TensorFlow
pip install "nmn[nnx]" # Flax NNX (JAX)
pip install "nmn[linen]" # Flax Linen (JAX)
pip install "nmn[all]" # Everything|
PyTorch import torch
from nmn.torch import YatNMN
layer = YatNMN(
in_features=128,
out_features=64,
epsilon=1e-5
)
x = torch.randn(32, 128)
y = layer(x) # (32, 64) — non-linear output! |
Keras import keras
from nmn.keras import YatNMN
layer = YatNMN(
features=64,
epsilon=1e-5
)
x = keras.ops.zeros((32, 128))
y = layer(x) # (32, 64) |
|
Flax NNX import jax.numpy as jnp
from flax import nnx
from nmn.nnx import YatNMN
layer = YatNMN(
in_features=128,
out_features=64,
rngs=nnx.Rngs(0)
)
x = jnp.zeros((32, 128))
y = layer(x) # (32, 64) |
TensorFlow import tensorflow as tf
from nmn.tf import YatNMN
layer = YatNMN(features=64)
x = tf.zeros((32, 128))
y = layer(x) # (32, 64) |
All layers are available across all 5 frameworks with verified numerical equivalence.
| Layer | PyTorch | TensorFlow | Keras | Flax NNX | Flax Linen |
|---|---|---|---|---|---|
| YatNMN (Dense) | ✅ | ✅ | ✅ | ✅ | ✅ |
| YatConv1D | ✅ | ✅ | ✅ | ✅ | ✅ |
| YatConv2D | ✅ | ✅ | ✅ | ✅ | ✅ |
| YatConv3D | ✅ | ✅ | ✅ | ✅ | ✅ |
| YatConvTranspose1D | ✅ | ✅ | ✅ | ✅ | ✅ |
| YatConvTranspose2D | ✅ | ✅ | ✅ | ✅ | ✅ |
| YatConvTranspose3D | ✅ | ✅ | ✅ | ✅ | ✅ |
| MultiHeadAttention | ✅ | ✅ | ✅ | ✅ | ✅ |
| YatEmbed | ✅ | ✅ | ✅ | ✅ | ✅ |
| Squashers | ✅ | ✅ | ✅ | ✅ | ✅ |
| Variant | Description | Complexity |
|---|---|---|
| RotaryYatAttention | YAT + Rotary Position Embeddings (RoPE) | O(n²) |
| Spherical YAT-Performer | YAT + FAVOR+ random features | O(n) |
All implementations are verified to produce numerically equivalent outputs given identical inputs and weights:
┌─────────────────────────────────────────────────────────────┐
│ Cross-Framework Consistency Test │
├─────────────────────────────────────────────────────────────┤
│ Framework Pair │ Max Error │ Status │
├──────────────────────────┼──────────────┼───────────────────┤
│ PyTorch ↔ TensorFlow │ < 1e-6 │ ✅ PASS │
│ PyTorch ↔ Keras │ < 1e-6 │ ✅ PASS │
│ PyTorch ↔ Flax NNX │ < 1e-6 │ ✅ PASS │
│ PyTorch ↔ Flax Linen │ < 1e-6 │ ✅ PASS │
│ TensorFlow ↔ Keras │ < 1e-7 │ ✅ PASS │
│ Flax NNX ↔ Flax Linen │ < 1e-7 │ ✅ PASS │
└──────────────────────────┴──────────────┴───────────────────┘
# PyTorch
from nmn.torch import MultiHeadYatAttention
attn = MultiHeadYatAttention(embed_dim=512, num_heads=8)
output = attn(query, key, value)
# Flax NNX — with Rotary Position Embeddings
from nmn.nnx import RotaryYatAttention
from flax import nnx
attn = RotaryYatAttention(
num_heads=8,
in_features=512,
rngs=nnx.Rngs(0)
)
output = attn(x)
# Flax NNX — Spherical YAT-Performer (O(n) linear complexity)
from nmn.nnx import MultiHeadAttention
attn = MultiHeadAttention(
num_heads=8,
in_features=512,
use_performer=True,
rngs=nnx.Rngs(0)
)
output = attn(x)# PyTorch
from nmn.torch import YatEmbed
embed = YatEmbed(num_embeddings=10000, embedding_dim=128)
output = embed(token_ids)
# Flax NNX
from nmn.nnx import Embed
from flax import nnx
embed = Embed(
num_embeddings=10000,
features=128,
constant_alpha=True,
rngs=nnx.Rngs(0)
)
output = embed(token_ids)
# YAT attend for attention-based retrieval
scores = embed.attend(query)Alternatives to standard activation functions, available in all frameworks:
from nmn.nnx import softermax, softer_sigmoid, soft_tanh
y1 = softermax(x, n=2) # Smoother softmax with power n
y2 = softer_sigmoid(x, sharpness=1) # Smooth sigmoid variant
y3 = soft_tanh(x) # Smooth tanh variantSee EXAMPLES.md for comprehensive usage guides including:
- Framework-specific quick starts (PyTorch, Keras, TensorFlow, Flax)
- Architecture examples (CNN, Transformer)
- Advanced features (custom squashers, attention)
Quick run:
# PyTorch Examples
python src/nmn/torch/examples/quick_example.py # Quick demo
python src/nmn/torch/examples/vision/resnet_training.py # ResNet training
# Flax NNX Examples
python src/nmn/nnx/examples/vision/aether_resnet50_tpu.py # ResNet50 on TPU
python src/nmn/nnx/examples/language/m3za.py # MiniBERT pre-training
python src/nmn/nnx/examples/language/m3za_perf.py # Performance evaluationComprehensive test suite with cross-framework validation:
# Install test dependencies
pip install "nmn[test]"
# Run all tests
pytest tests/ -v
# Run specific framework tests
pytest tests/test_torch/ -v # PyTorch
pytest tests/test_keras/ -v # Keras
pytest tests/test_nnx/ -v # Flax NNX
# Cross-framework consistency validation
pytest tests/integration/test_cross_framework_consistency.py -v
# With coverage report
pytest tests/ --cov=nmn --cov-report=htmlBased on the research papers:
Deep Learning 2.0: Artificial Neurons that Matter — Reject Correlation, Embrace Orthogonality
Deep Learning 2.1: Mind and Cosmos — Towards Cosmos-Inspired Interpretable Neural Networks
Traditional neurons compute:
This has limitations:
- Correlation-based: Only measures alignment, ignores proximity
- Requires activation: Non-linearity is external
- Spurious activations: Can fire strongly for distant but aligned vectors
The Yat-Product addresses these by combining:
- Squared dot product (similarity) in the numerator
- Squared distance (proximity) in the denominator
- Epsilon for numerical stability
The result is a neuron that responds geometrically — activated when inputs are both similar AND close to weights.
We welcome contributions! See CONTRIBUTING.md for guidelines.
# Development setup
git clone https://github.com/azettaai/nmn.git
cd nmn
pip install -e ".[dev,test]"
# Run tests
pytest tests/ -v
# Format code
black src/ tests/
isort src/ tests/Areas for contribution:
- 🐛 Bug fixes (open issues)
- ✨ New layer types (normalization, graph, etc.)
- 📚 Documentation and tutorials
- ⚡ Performance optimizations
- 🎨 Example applications
| Parameter | Type | Description |
|---|---|---|
in_features |
int | Input dimension (Dense) or channels (Conv) |
out_features |
int | Output dimension or filters |
kernel_size |
int | tuple | Convolution kernel size |
epsilon |
float | Numerical stability (default: 1e-5) |
use_bias |
bool | Include bias term (default: True) |
constant_alpha |
bool | Use fixed √2 scaling (default: varies) |
spherical |
bool | Enable spherical mode (default: False) |
# PyTorch
from nmn.torch import YatNMN, YatConv2D, MultiHeadYatAttention, YatEmbed
from nmn.torch import softermax, softer_sigmoid, soft_tanh
# Keras
from nmn.keras import YatNMN, YatConv2D, MultiHeadYatAttention, YatEmbed
from nmn.keras import softermax, softer_sigmoid, soft_tanh
# TensorFlow
from nmn.tf import YatNMN, YatConv2D, MultiHeadYatAttention, YatEmbed
from nmn.tf import softermax, softer_sigmoid, soft_tanh
# Flax NNX (includes advanced attention variants)
from nmn.nnx import YatNMN, YatConv, MultiHeadAttention, Embed
from nmn.nnx import RotaryYatAttention, softermax
# Flax Linen
from nmn.linen import YatNMN, YatConv2D, MultiHeadAttention, YatEmbed
from nmn.linen import softermax, softer_sigmoid, soft_tanh📋 Full reference → EXAMPLES.md
If you use NMN in your research, please cite:
@software{nmn2024,
author = {Bouhsine, Taha},
title = {NMN: Neural Matter Networks},
year = {2024},
url = {https://github.com/azettaai/nmn}
}
@article{bouhsine2024dl2,
author = {Bouhsine, Taha},
title = {Deep Learning 2.0: Artificial Neurons that Matter --- Reject Correlation, Embrace Orthogonality},
year = {2024}
}- 🐛 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
- 🌐 Company: azetta.ai
- 📧 Contact: taha@azetta.ai
AGPL-3.0 — Free for personal, academic, and commercial use with attribution.
If you modify and deploy on a network, you must share the source code.
For alternative licensing, contact us at taha@azetta.ai.
This project was originally developed under the mlnomadpy organization and is now maintained by Azetta.ai.
The foundations of NMN were established through extensive research and community contributions. We're grateful to everyone who has contributed code, feedback, and ideas to make this project better.
