⚛️ NMN — Neural Matter Networks

Not the neurons we want, but the neurons we need

Activation-free neural layers that learn non-linearity through geometric operations

📚 Documentation · 📄 Read the Paper · 📝 Read the Blog · 🐛 Report Bug · 🌐 Azetta.ai

🎯 TL;DR

NMN replaces traditional Linear + ReLU with a single geometric operation that learns non-linearity without activation functions:

# Traditional approach
y = relu(linear(x))  # dot product → activation

# NMN approach
y = yat(x)  # geometric operation with built-in non-linearity

The Yat-Product (ⵟ) balances similarity and distance to create inherently non-linear transformations—no activations needed.

✨ Key Features

Feature	Description
🔥 Activation-Free	Learn complex non-linear relationships without ReLU, sigmoid, or tanh
🌐 Multi-Framework	PyTorch, TensorFlow, Keras, Flax (Linen & NNX)
🧮 Geometric Foundation	Based on distance-similarity tradeoff, not just correlations
✅ Full Framework Parity	Dense, Conv, ConvTranspose, Attention, Embedding, and Squashers across all 5 frameworks
🧠 Complete Layer Suite	Dense, Conv1D/2D/3D, ConvTranspose1D/2D/3D, Multi-Head Attention, Embeddings
⚡ Production Ready	Comprehensive tests, CI/CD, high code coverage

📐 The Mathematics

Yat-Product (ⵟ)

The core operation that powers NMN:

$$ ⵟ(\mathbf{w}, \mathbf{x}) = \frac{\langle \mathbf{w}, \mathbf{x} \rangle^2}{|\mathbf{w} - \mathbf{x}|^2 + \epsilon} $$

🔍 Geometric Interpretation (click to expand)

Rewriting in terms of norms and angles:

$$ ⵟ(\mathbf{w}, \mathbf{x}) = \frac{|\mathbf{w}|^2 |\mathbf{x}|^2 \cos^2\theta}{|\mathbf{w}|^2 - 2\langle\mathbf{w}, \mathbf{x}\rangle + |\mathbf{x}|^2 + \epsilon} $$

Output is maximized when:

✅ Vectors are aligned (small θ → large cos²θ)
✅ Vectors are close (small Euclidean distance)
✅ Vectors have large magnitude (amplifies the signal)

This creates a fundamentally different learning dynamic:

Traditional Neuron	Yat Neuron
Measures correlation only	Balances similarity AND proximity
Requires activation for non-linearity	Non-linearity is intrinsic
Can fire for distant but aligned vectors	Penalizes distance between w and x

Yat-Convolution (ⵟ*)

The same principle applied to local patches:

$$ ⵟ^*(\mathbf{W}, \mathbf{X}) = \frac{(\sum_{i,j} w_{ij} \cdot x_{ij})^2}{\sum_{i,j}(w_{ij} - x_{ij})^2 + \epsilon} $$

Where W is the kernel and X is the input patch.

🚀 Quick Start

Installation

pip install nmn

# Framework-specific installations
pip install "nmn[torch]"    # PyTorch
pip install "nmn[keras]"    # Keras/TensorFlow
pip install "nmn[nnx]"      # Flax NNX (JAX)
pip install "nmn[linen]"    # Flax Linen (JAX)
pip install "nmn[all]"      # Everything

Basic Usage

PyTorch

import torch
from nmn.torch import YatNMN

layer = YatNMN(
    in_features=128,
    out_features=64,
    epsilon=1e-5
)

x = torch.randn(32, 128)
y = layer(x)  # (32, 64) — non-linear output!

Keras

import keras
from nmn.keras import YatNMN

layer = YatNMN(
    features=64,
    epsilon=1e-5
)

x = keras.ops.zeros((32, 128))
y = layer(x)  # (32, 64)

Flax NNX

import jax.numpy as jnp
from flax import nnx
from nmn.nnx import YatNMN

layer = YatNMN(
    in_features=128,
    out_features=64,
    rngs=nnx.Rngs(0)
)

x = jnp.zeros((32, 128))
y = layer(x)  # (32, 64)

TensorFlow

import tensorflow as tf
from nmn.tf import YatNMN

layer = YatNMN(features=64)

x = tf.zeros((32, 128))
y = layer(x)  # (32, 64)

📦 Layer Support Matrix

All layers are available across all 5 frameworks with verified numerical equivalence.

Layer	PyTorch	TensorFlow	Keras	Flax NNX	Flax Linen
YatNMN (Dense)	✅	✅	✅	✅	✅
YatConv1D	✅	✅	✅	✅	✅
YatConv2D	✅	✅	✅	✅	✅
YatConv3D	✅	✅	✅	✅	✅
YatConvTranspose1D	✅	✅	✅	✅	✅
YatConvTranspose2D	✅	✅	✅	✅	✅
YatConvTranspose3D	✅	✅	✅	✅	✅
MultiHeadAttention	✅	✅	✅	✅	✅
YatEmbed	✅	✅	✅	✅	✅
Squashers	✅	✅	✅	✅	✅

Advanced Attention Variants (Flax NNX)

Variant	Description	Complexity
RotaryYatAttention	YAT + Rotary Position Embeddings (RoPE)	O(n²)
Spherical YAT-Performer	YAT + FAVOR+ random features	O(n)

🔬 Cross-Framework Consistency

All implementations are verified to produce numerically equivalent outputs given identical inputs and weights:

┌─────────────────────────────────────────────────────────────┐
│              Cross-Framework Consistency Test               │
├─────────────────────────────────────────────────────────────┤
│  Framework Pair          │ Max Error    │ Status            │
├──────────────────────────┼──────────────┼───────────────────┤
│  PyTorch ↔ TensorFlow    │ < 1e-6       │ ✅ PASS           │
│  PyTorch ↔ Keras         │ < 1e-6       │ ✅ PASS           │
│  PyTorch ↔ Flax NNX      │ < 1e-6       │ ✅ PASS           │
│  PyTorch ↔ Flax Linen    │ < 1e-6       │ ✅ PASS           │
│  TensorFlow ↔ Keras      │ < 1e-7       │ ✅ PASS           │
│  Flax NNX ↔ Flax Linen   │ < 1e-7       │ ✅ PASS           │
└──────────────────────────┴──────────────┴───────────────────┘

⚙️ Advanced Features

Attention Mechanisms

# PyTorch
from nmn.torch import MultiHeadYatAttention

attn = MultiHeadYatAttention(embed_dim=512, num_heads=8)
output = attn(query, key, value)

# Flax NNX — with Rotary Position Embeddings
from nmn.nnx import RotaryYatAttention
from flax import nnx

attn = RotaryYatAttention(
    num_heads=8,
    in_features=512,
    rngs=nnx.Rngs(0)
)
output = attn(x)

# Flax NNX — Spherical YAT-Performer (O(n) linear complexity)
from nmn.nnx import MultiHeadAttention

attn = MultiHeadAttention(
    num_heads=8,
    in_features=512,
    use_performer=True,
    rngs=nnx.Rngs(0)
)
output = attn(x)

Embeddings

# PyTorch
from nmn.torch import YatEmbed

embed = YatEmbed(num_embeddings=10000, embedding_dim=128)
output = embed(token_ids)

# Flax NNX
from nmn.nnx import Embed
from flax import nnx

embed = Embed(
    num_embeddings=10000,
    features=128,
    constant_alpha=True,
    rngs=nnx.Rngs(0)
)
output = embed(token_ids)
# YAT attend for attention-based retrieval
scores = embed.attend(query)

Squashing Functions

Alternatives to standard activation functions, available in all frameworks:

from nmn.nnx import softermax, softer_sigmoid, soft_tanh

y1 = softermax(x, n=2)              # Smoother softmax with power n
y2 = softer_sigmoid(x, sharpness=1) # Smooth sigmoid variant
y3 = soft_tanh(x)                   # Smooth tanh variant

See EXAMPLES.md for comprehensive usage guides including:

Framework-specific quick starts (PyTorch, Keras, TensorFlow, Flax)
Architecture examples (CNN, Transformer)
Advanced features (custom squashers, attention)

Quick run:

# PyTorch Examples
python src/nmn/torch/examples/quick_example.py         # Quick demo
python src/nmn/torch/examples/vision/resnet_training.py # ResNet training

# Flax NNX Examples
python src/nmn/nnx/examples/vision/aether_resnet50_tpu.py  # ResNet50 on TPU
python src/nmn/nnx/examples/language/m3za.py                # MiniBERT pre-training
python src/nmn/nnx/examples/language/m3za_perf.py           # Performance evaluation

🧪 Testing

Comprehensive test suite with cross-framework validation:

# Install test dependencies
pip install "nmn[test]"

# Run all tests
pytest tests/ -v

# Run specific framework tests
pytest tests/test_torch/ -v      # PyTorch
pytest tests/test_keras/ -v      # Keras
pytest tests/test_nnx/ -v        # Flax NNX

# Cross-framework consistency validation
pytest tests/integration/test_cross_framework_consistency.py -v

# With coverage report
pytest tests/ --cov=nmn --cov-report=html

📚 Theoretical Foundation

Based on the research papers:

Deep Learning 2.0: Artificial Neurons that Matter — Reject Correlation, Embrace Orthogonality

Deep Learning 2.1: Mind and Cosmos — Towards Cosmos-Inspired Interpretable Neural Networks

Why Yat-Product?

Traditional neurons compute: $y = \sigma(\mathbf{w}^\top \mathbf{x} + b)$

This has limitations:

Correlation-based: Only measures alignment, ignores proximity
Requires activation: Non-linearity is external
Spurious activations: Can fire strongly for distant but aligned vectors

The Yat-Product addresses these by combining:

Squared dot product (similarity) in the numerator
Squared distance (proximity) in the denominator
Epsilon for numerical stability

The result is a neuron that responds geometrically — activated when inputs are both similar AND close to weights.

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

# Development setup
git clone https://github.com/azettaai/nmn.git
cd nmn
pip install -e ".[dev,test]"

# Run tests
pytest tests/ -v

# Format code
black src/ tests/
isort src/ tests/

Areas for contribution:

🐛 Bug fixes (open issues)
✨ New layer types (normalization, graph, etc.)
📚 Documentation and tutorials
⚡ Performance optimizations
🎨 Example applications

📖 Quick API Reference

Common Parameters

Parameter	Type	Description
`in_features`	int	Input dimension (Dense) or channels (Conv)
`out_features`	int	Output dimension or filters
`kernel_size`	int \| tuple	Convolution kernel size
`epsilon`	float	Numerical stability (default: 1e-5)
`use_bias`	bool	Include bias term (default: True)
`constant_alpha`	bool	Use fixed √2 scaling (default: varies)
`spherical`	bool	Enable spherical mode (default: False)

Framework Imports

# PyTorch
from nmn.torch import YatNMN, YatConv2D, MultiHeadYatAttention, YatEmbed
from nmn.torch import softermax, softer_sigmoid, soft_tanh

# Keras
from nmn.keras import YatNMN, YatConv2D, MultiHeadYatAttention, YatEmbed
from nmn.keras import softermax, softer_sigmoid, soft_tanh

# TensorFlow
from nmn.tf import YatNMN, YatConv2D, MultiHeadYatAttention, YatEmbed
from nmn.tf import softermax, softer_sigmoid, soft_tanh

# Flax NNX (includes advanced attention variants)
from nmn.nnx import YatNMN, YatConv, MultiHeadAttention, Embed
from nmn.nnx import RotaryYatAttention, softermax

# Flax Linen
from nmn.linen import YatNMN, YatConv2D, MultiHeadAttention, YatEmbed
from nmn.linen import softermax, softer_sigmoid, soft_tanh

📋 Full reference → EXAMPLES.md

📄 Citation

If you use NMN in your research, please cite:

@software{nmn2024,
  author = {Bouhsine, Taha},
  title = {NMN: Neural Matter Networks},
  year = {2024},
  url = {https://github.com/azettaai/nmn}
}

@article{bouhsine2024dl2,
  author = {Bouhsine, Taha},
  title = {Deep Learning 2.0: Artificial Neurons that Matter --- Reject Correlation, Embrace Orthogonality},
  year = {2024}
}

📬 Support & Community

🐛 Issues: GitHub Issues
💬 Discussions: GitHub Discussions
🌐 Company: azetta.ai
📧 Contact: taha@azetta.ai

📜 License

AGPL-3.0 — Free for personal, academic, and commercial use with attribution.

If you modify and deploy on a network, you must share the source code.

For alternative licensing, contact us at taha@azetta.ai.

🙏 Acknowledgments

This project was originally developed under the mlnomadpy organization and is now maintained by Azetta.ai.

The foundations of NMN were established through extensive research and community contributions. We're grateful to everyone who has contributed code, feedback, and ideas to make this project better.

_{Built with ❤️ by Azetta.ai ·
Originally created by ML Nomad}

Name		Name	Last commit message	Last commit date
Latest commit History 153 Commits
.claude		.claude
.github/workflows		.github/workflows
src/nmn		src/nmn
tests		tests
website		website
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
EXAMPLES.md		EXAMPLES.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
TODO.md		TODO.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚛️ NMN — Neural Matter Networks

🎯 TL;DR

✨ Key Features

📐 The Mathematics

Yat-Product (ⵟ)

Yat-Convolution (ⵟ*)

🚀 Quick Start

Installation

Basic Usage

📦 Layer Support Matrix

Advanced Attention Variants (Flax NNX)

🔬 Cross-Framework Consistency

⚙️ Advanced Features

Attention Mechanisms

Embeddings

Squashing Functions

🧪 Testing

📚 Theoretical Foundation

Why Yat-Product?

🤝 Contributing

📖 Quick API Reference

Common Parameters

Framework Imports

📄 Citation

📬 Support & Community

📜 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

⚛️ NMN — Neural Matter Networks

🎯 TL;DR

✨ Key Features

📐 The Mathematics

Yat-Product (ⵟ)

Yat-Convolution (ⵟ*)

🚀 Quick Start

Installation

Basic Usage

📦 Layer Support Matrix

Advanced Attention Variants (Flax NNX)

🔬 Cross-Framework Consistency

⚙️ Advanced Features

Attention Mechanisms

Embeddings

Squashing Functions

🧪 Testing

📚 Theoretical Foundation

Why Yat-Product?

🤝 Contributing

📖 Quick API Reference

Common Parameters

Framework Imports

📄 Citation

📬 Support & Community

📜 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages