Nested-Learning

PyTorch implementation of Google’s Nested Learning & HOPE Architecture

This repository contains a clean, modular, and fully-interpretable implementation of Nested Learning (NL) and the HOPE architecture introduced by Google DeepMind. It includes standalone implementations of:

CMS (Contextual Multi-Scale Memory)
Nested Optimizers (GDMemory, MomentumMemory, AdamMemory, DMGD, PreconditionedMomentum)
Self-Modifying MLP (rank-1 & rank-k)
HOPE Block (CMS + Self-Modifying MLP + Linear Attention Fast Memory)
Full HOPE architecture assembly
Example training on Tiny Shakespeare

This repo aims to make the original research easy to understand and easy to build on.

🌟 Features

✔ Modular PyTorch implementation of all major components of Nested Learning ✔ Clean code broken into separate notebooks ✔ Implementations match Google’s paper structure ✔ Self-Modifying MLP with low-rank ΔW updates ✔ CMS with parallel multi-timescale memories ✔ Linear Attention with fast KV memory updates ✔ Compare different nested optimizers ✔ Ready for custom tasks and experiments

📁 Repository Structure

Nested-Learning/
│
├── HOPE-implementation.ipynb             # Full HOPE block + assembly
├── cms-implementation.ipynb              # CMS multi-level memory
├── nested-optimizer-implementations.ipynb# GD, Momentum, Adam, DMGD, PCM
├── self-modifying-mlp.ipynb              # Rank-1 & Rank-k ΔW weight updates
├── tiny-shakespear.ipynb                 # Example training on Tiny Shakespeare
│
└── NL.pdf                                # Original Google research paper
└── NL-Handwritten-Notes.pdf              # Handwritten notes for mathematical and theoretical reference

Each component is designed to be independently testable and can be imported into larger models.

🧠 What is Nested Learning?

Nested Learning introduces a new way for models to:

learn multi-timescale memory
perform context-dependent fast learning inside the forward pass
update their own weights on the fly using low-rank modifications
combine slow learning (SGD) + fast learning (inner-loop adaptation)

The HOPE architecture is the first fully-scalable implementation of these ideas.

This repo re-creates the core components in a simplified but faithful manner.

🔩 Implemented Components

1. Nested Optimizers

File: nested-optimizer-implementations.ipynb

GDMemory
MomentumMemory
AdamMemory
DeepMomentumMemory (DMGD)
PreconditionedMomentumMemory

These treat optimizer state as differentiable memory.

2. CMS: Contextual Multi-Scale Memory

File: cms-implementation.ipynb

A stack of memory levels updated at different speeds:

Outputs the aggregated multi-scale context.

3. Self-Modifying MLP

File: self-modifying-mlp.ipynb

The model predicts a low-rank update to its own weights:

Implemented in:

Rank-1
Rank-k (paper-accurate)

4. Linear Attention Fast Memory

Part of HOPE-implementation.ipynb

A fast KV memory updated with:

Used as a long-term associative memory.

5. Full HOPE Block

File: HOPE-implementation.ipynb

Combines:

CMS
Linear Attention
Self-Modifying MLP
FFN + LayerNorm
Memory dictionary (cms, KV) handling

This is the main unit used to build Nested Learning models.

6. Example Training: Tiny Shakespeare

File: tiny-shakespear.ipynb

Runs:

Toy sequence modeling
Shows how memories evolve over time
Demonstrates HOPE block in real inference

🚀 Getting Started

Install dependencies

pip install torch numpy

(Optional Jupyter)

pip install notebook jupyterlab

🧪 Quick Usage Example

Using HOPE block inside a model:

from hope_block import HOPEBlock

block = HOPEBlock(dim=64, cms_levels=3, rank=4)

x = torch.randn(8, 64)
memories = None

out, new_memories = block(x, memories)

📝 Roadmap

Add full training loop for HOPE-Transformer
Add memory visualizations
Add benchmark on character-level tasks
Add support for multi-head CMS
Add GPT-style stacked HOPE layers

🤝 Contributing

Pull requests are welcome. If you extend the architecture (multi-head CMS, recurrent HOPE, etc.), feel free to submit!

📜 License

MIT License

🌐 References

Nested Learning: Scaling Learning with Nested Architectures (Google, 2024–2025)
Original Paper: Included as NL.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Nested-Learning

PyTorch implementation of Google’s Nested Learning & HOPE Architecture

🌟 Features

📁 Repository Structure

🧠 What is Nested Learning?

🔩 Implemented Components

1. Nested Optimizers

2. CMS: Contextual Multi-Scale Memory

3. Self-Modifying MLP

4. Linear Attention Fast Memory

5. Full HOPE Block

6. Example Training: Tiny Shakespeare

🚀 Getting Started

Install dependencies

🧪 Quick Usage Example

Using HOPE block inside a model:

📝 Roadmap

🤝 Contributing

📜 License

🌐 References

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
HOPE-implementation.ipynb		HOPE-implementation.ipynb
NL-Handwritten-Notes.pdf		NL-Handwritten-Notes.pdf
NL.pdf		NL.pdf
README.md		README.md
cms-implementation.ipynb		cms-implementation.ipynb
nested-optimizer-implementations.ipynb		nested-optimizer-implementations.ipynb
self-modifying-mlp.ipynb		self-modifying-mlp.ipynb
tiny-shakespear.ipynb		tiny-shakespear.ipynb

DewashishCodes/Nested-Learning

Folders and files

Latest commit

History

Repository files navigation

Nested-Learning

PyTorch implementation of Google’s Nested Learning & HOPE Architecture

🌟 Features

📁 Repository Structure

🧠 What is Nested Learning?

🔩 Implemented Components

1. Nested Optimizers

2. CMS: Contextual Multi-Scale Memory

3. Self-Modifying MLP

4. Linear Attention Fast Memory

5. Full HOPE Block

6. Example Training: Tiny Shakespeare

🚀 Getting Started

Install dependencies

🧪 Quick Usage Example

Using HOPE block inside a model:

📝 Roadmap

🤝 Contributing

📜 License

🌐 References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages