🧭 Nautilus — Reinforcement Learning Examples

Nautilus is a learn-by-building reinforcement learning (RL) codebase. It’s designed to teach modern RL algorithms through clean, reproducible engineering — starting simple (bandits and tabular methods), then scaling to deep algorithms like DQN and PPO.

The goal isn’t just to run agents — it’s to understand why they work, and build the tooling right.

🚀 Quickstart

# Create and activate a new environment
conda create -n nautilus python=3.11
conda activate nautilus

# Optional: install PyTorch with CUDA if available
# (Choose correct CUDA toolkit from https://pytorch.org/get-started/locally/)
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

# Then install the repo and dev tools
pip install -e .[dev]
pre-commit install

## Optional for venv instead
# Create and activate environment
python -m venv .venv
source .venv/bin/activate       # Windows: .venv\Scripts\activate

# Install dependencies and dev tools
pip install -U pip
pip install -e .[dev]
pre-commit install
## End Optional

Run your first agent:

python scripts/train_dqn.py --env CartPole-v1

See progress with TensorBoard:

make tb

Logs, configs, and checkpoints are stored under:

runs/{algo}/{env}/{YYYYmmdd-HHMMSS}/

📂 Repository structure

nautilus/
  core/         # buffers, networks, samplers, advantages
  algos/        # implementations (dqn/, ppo/, tabular/)
  envs/         # gym + dm-control wrappers
  utils/        # logging, seeding, config, checkpointing
  runners/      # train loops and CLI entrypoints
  configs/      # YAML configs per algorithm/env
  tests/        # pytest suites
scripts/        # runnable scripts (train_dqn.py, train_ppo.py)
notebooks/      # learning notebooks & experiments

🧭 Learning roadmap

Stage	Concepts	Implementation Targets
M1 – Foundations	MDPs, returns, buffers, exploration	utils/, buffers/, samplers/, basic train loop
M2 – Bandits	ε-greedy, UCB, regret	`algos/bandits/`
M3 – Tabular Q-learning	DP vs TD, off-policy updates	`algos/tabular/q_learning.py`
M4 – Deep Q-Network (DQN)	replay buffer, target net, ε-schedule	`algos/dqn/agent.py`, Atari wrappers
M5 – Policy Gradients → PPO	REINFORCE, GAE(λ), clipping, entropy bonus	`algos/ppo/agent.py`
M6 – Extras	Prioritized replay, n-step, distributed eval	`envs/`, `utils/`, `runners/`

Each milestone comes with:

Concept notebook (notebooks/)
Unit tests (tests/)
Reproducible configs (configs/)
TensorBoard plots (runs/)

🧪 Development

Lint, format, and test:

make lint
make test

Run pre-commit hooks manually:

pre-commit run --all-files

⚙️ Configuration

All hyperparameters and environment settings live in configs/, e.g.:

# configs/algos/dqn/cartpole.yaml
seed: 1
env: CartPole-v1
steps: 50000
batch_size: 64
gamma: 0.99
lr: 0.001
sync_interval: 500

CLI overrides work out of the box:

python scripts/train_dqn.py --env CartPole-v1 --steps 100000

📒 Learning resources

These implementations are inspired by:

Understanding Deep Learning — Simon Prince (Chapter 19)
Sutton & Barto — Reinforcement Learning: An Introduction
OpenAI Spinning Up and CleanRL

The idea is to re-implement, not copy, so each concept is fully understood and engineered cleanly.

🧠 Road to mastery

Once DQN and PPO are solid, we’ll expand Nautilus to:

Distributional & Dueling DQN, Noisy Nets
SAC / TD3 for continuous control
Multi-agent RL experiments
LLM-driven agentic policy optimization
Mixed-precision + MPI training

🤝 Contributing

Contributions, questions, and refactors are welcome. Open an issue or PR — especially for docs, configs, or new environments.

If you’re learning RL: fork the repo, add your own experiments, and share results!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
nautilus		nautilus
nautilus_rl.egg-info		nautilus_rl.egg-info
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧭 Nautilus — Reinforcement Learning Examples

🚀 Quickstart

📂 Repository structure

🧭 Learning roadmap

🧪 Development

⚙️ Configuration

📒 Learning resources

🧠 Road to mastery

🤝 Contributing

🪶 License

🌊 “Build, test, understand — dive deeper.”

About

Uh oh!

Releases

Packages

Languages

License

acb-code/nautilus

Folders and files

Latest commit

History

Repository files navigation

🧭 Nautilus — Reinforcement Learning Examples

🚀 Quickstart

📂 Repository structure

🧭 Learning roadmap

🧪 Development

⚙️ Configuration

📒 Learning resources

🧠 Road to mastery

🤝 Contributing

🪶 License

🌊 “Build, test, understand — dive deeper.”

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages