sparse-layers

Structured sparse layers for building memory-efficient neural networks on PyTorch. Drop-in replacements for standard layers using butterfly factorization, SSE attention, and other sparse primitives.

Install

pip install sparse-layers

Usage

import torch
from sparse_layers import ButterflyLinear, ButterflyMLP

# Drop-in replacement for nn.Linear with O(n log n) parameters
layer = ButterflyLinear(in_features=256, out_features=256)
x = torch.randn(32, 256)
y = layer(x)

# MLP with butterfly-factorized linear layers
mlp = ButterflyMLP(in_features=256, hidden_features=512, out_features=256)
y = mlp(x)

Attention

from sparse_layers import ButterflyMultiHeadAttention

# Multi-head attention with butterfly-factorized Q/K/V projections
bf_attn = ButterflyMultiHeadAttention(d_model=256, num_heads=8)

seq = torch.randn(32, 128, 256)  # (batch, seq_len, d_model)
out = bf_attn(seq, seq, seq)

SSE Attention

State Space Exploration modules for efficient sequence modeling with sparse attention patterns.

from sparse_layers.modules import SSEAttention, SSEAttentionConfig

config = SSEAttentionConfig(d_model=256, num_partitions=4)
sse = SSEAttention(config)

x = torch.randn(32, 128, 256)
out = sse(x)

Architecture

Organized following the Flash-Attention pattern:

Ops (`sparse_layers.ops`)

Primitive operations and utility functions.

Module	Description
`butterfly`	Butterfly factor multiply, power-of-2 utilities
`SSEMaskingOps`	Masking utilities for variable-length SSE
`SSEVarlenOps`	Variable-length sequence operations

Modules (`sparse_layers.modules`)

Composable building blocks — single units of computation.

Module	Description
`ButterflyLinear`	Linear layer using butterfly matrix factorization — O(n log n) parameters instead of O(n²)
`PaddedButterflyLinear`	ButterflyLinear with automatic padding for non-power-of-2 dimensions
`SSEAttention`	Sparse attention with state-space-inspired partitioning
`SSEAttentionAdaptive`	SSE with adaptive implementation selection (naive/batched)
`SSEMultiPartitionState`	Manages partition states across sequence chunks
`SSEPartitionSelector`	Selects active partitions per query position
`SSESparseSoftmax`	Sparse softmax over selected partitions
`LinearAttention`	Linear attention baseline (O(n) complexity)

Models (`sparse_layers.models`)

Composed architectures built from modules.

Module	Description
`ButterflyMLP`	Two-layer MLP with butterfly-factorized linear layers
`ButterflyMultiHeadAttention`	Multi-head attention with butterfly-factorized Q/K/V projections
`SSEMultiHeadAttention`	Multi-head variant of SSE attention

The package also includes baseline implementations (SimpleMLP, CustomMLP, CustomLinear, MultiHeadAttention) used internally for validation and testing.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
src/sparse_layers		src/sparse_layers
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sparse-layers

Install

Usage

Attention

SSE Attention

Architecture

Ops (`sparse_layers.ops`)

Modules (`sparse_layers.modules`)

Models (`sparse_layers.models`)

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

sparse-layers

Install

Usage

Attention

SSE Attention

Architecture

Ops (sparse_layers.ops)

Modules (sparse_layers.modules)

Models (sparse_layers.models)

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Ops (`sparse_layers.ops`)

Modules (`sparse_layers.modules`)

Models (`sparse_layers.models`)

Packages