A competitive Riichi Mahjong AI trained via offline reinforcement learning (Conservative Q-Learning) on expert-level game records.
Architecture: 1D CNN with channel attention
- 192 channels, 40 residual blocks
- Based on the Mortal v4 observation encoder
- Auxiliary heads for multi-task learning
Training method: CQL (Conservative Q-Learning) with auxiliary supervision
- Primary: DQN action-value + CQL regularization + next-rank prediction
- Auxiliary heads: score prediction, rank prediction (4-player), score-gap prediction
- Curriculum: initial training on top ~250 players, then expanded to top ~750 players
Training data:
- ~1.38 million games from ~750 top-level 4-player Riichi Mahjong players
- East+South (半荘) format, competitive lobby level
| Parameter | Value |
|---|---|
| conv_channels | 192 |
| num_blocks | 40 |
| batch_size | 256 |
| lr_peak | 1e-4 |
| lr_final | 1e-5 |
| warmup_steps | 200 |
| weight_decay | 0.1 |
| max_grad_norm | 1.0 |
| gamma | 1.0 |
| min_q_weight (CQL) | 5.0 |
| next_rank_weight | 0.2 |
| score_weight | 1.0 |
| rank_weight | 0.5 |
| gap_weight | 0.3 |
| DDP | 2× GPU |
Tested over 4000 games (random seating, East+South):
| Model | Avg Rank | 1st % | 2nd % | 3rd % | 4th % |
|---|---|---|---|---|---|
| v4 (baseline) | 2.419 | 27.3 | 26.2 | 23.7 | 22.8 |
| c3 (ours) | 2.492 | 25.5 | 24.5 | 25.2 | 24.8 |
The model approaches but does not yet surpass the Mortal v4 baseline. Training is ongoing.
weights/ # Download from Releases
model-c3-best.pth # Main model checkpoint (132MB)
grp-best.pth # GRP (Game Result Predictor) network (2.2MB)
scripts/
train_main.py # Training script (CQL + auxiliary heads)
dataloader.py # Data loader (reads .mjson via libriichi)
mortal_bot_server.py # Inference bridge (connects model to game engine)
verify_worker_http.sh # Distributed verification worker
mahjong/ # Game engine + AI controller (TypeScript)
src/ # Engine source (game logic, shanten, scoring, AI)
scripts/verify.ts # Verification orchestrator
scripts/verify-worker.ts # Per-worker game runner
package.json
cf-verify/ # Cloudflare Worker coordinator for distributed testing
Download from Releases:
mkdir -p weights
curl -L -o weights/model-c3-best.pth https://github.com/lynkas/c3/releases/download/v1.0/model-c3-best.pth
curl -L -o weights/grp-best.pth https://github.com/lynkas/c3/releases/download/v1.0/grp-best.pthimport torch
import sys
sys.path.insert(0, "path/to/mortal/mortal")
from model import Brain
# Load checkpoint
ckpt = torch.load("weights/model-c3-best.pth", map_location="cpu")
model = Brain(version=4, conv_channels=192, num_blocks=40)
model.load_state_dict(ckpt["mortal"])
model.eval()Requires: Mortal's libriichi compiled, a Python venv with PyTorch, and game data in .mjson format.
# Single GPU
python scripts/train_main.py \
--grp checkpoints/grp-best.pth \
--train-glob "data/train/*.mjson" \
--val-glob "data/val/*.mjson" \
--save checkpoints/model.pth \
--tensorboard runs/experiment \
--device cuda \
--conv-channels 192 --num-blocks 40 \
--batch-size 256 \
--lr-peak 1e-4 --lr-final 1e-5 \
--warmup-steps 200 --max-steps 200000 \
--save-every 400 --val-steps 50 --patience 100 \
--weight-decay 0.1 --max-grad-norm 1.0 \
--score-weight 1.0 --rank-weight 0.5 --gap-weight 0.3
# Multi-GPU (DDP)
CUDA_VISIBLE_DEVICES=0,1 torchrun --standalone --nproc_per_node=2 scripts/train_main.py \
[same args as above]Key arguments:
| Argument | Description |
|---|---|
--grp |
Path to GRP (Game Result Predictor) checkpoint |
--train-glob |
Glob pattern for training data files |
--val-glob |
Glob pattern for validation data files |
--save |
Output checkpoint path (also used for resume) |
--patience |
Early stopping patience (in validation cycles) |
--score-weight |
Weight for score prediction auxiliary loss |
--rank-weight |
Weight for rank prediction auxiliary loss |
--gap-weight |
Weight for gap prediction auxiliary loss |
To resume training, simply point --save to an existing checkpoint.
The verification system runs model-vs-model games across multiple machines, coordinated by a Cloudflare Worker.
cd cf-verify
npm install
# Edit wrangler.toml with your account_id, KV namespace, and tokens
npx wrangler deploycurl -X POST "https://your-worker.workers.dev/job" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_ADMIN_TOKEN" \
-d '{
"total": 1000,
"strategies": "mortal:model-a.pth,mortal:model-b.pth,mortal:model-c.pth,mortal:model-d.pth",
"difficulties": "0,0,0,0",
"end_round": 8,
"shuffle_seats": true
}'Each worker claims batches, runs games locally, and reports results back.
COORDINATOR="https://your-worker.workers.dev" \
TOKEN="YOUR_WORKER_TOKEN" \
WORKER_NAME="my-machine" \
WORKERS=2 \
MODELS_DIR="./weights" \
DEVICE_MAP="cuda:0,cuda:0,cuda:0,cuda:0" \
MAHJONG_DIR="/path/to/mahjong" \
MORTAL_PYTHON="/path/to/venv/bin/python3" \
MORTAL_SERVER="/path/to/mortal_bot_server.py" \
bash scripts/verify_worker_http.shWorker environment variables:
| Variable | Required | Description |
|---|---|---|
COORDINATOR |
✅ | Coordinator URL |
TOKEN |
✅ | Worker auth token |
WORKER_NAME |
Identifier shown on dashboard | |
WORKERS |
Parallel game workers (default: 4) | |
MODELS_DIR |
Directory containing model .pth files | |
DEVICE_MAP |
Comma-separated devices for each strategy (e.g. cuda:0,cuda:0,cuda:1,cuda:1) |
|
MAHJONG_DIR |
Path to mahjong project (with scripts/verify.ts) | |
MORTAL_PYTHON |
Python interpreter with torch + libriichi | |
MORTAL_SERVER |
Path to mortal_bot_server.py |
- Dashboard: Visit the coordinator URL in a browser
- API:
curl https://your-worker.workers.dev/status # Job progress + worker status curl https://your-worker.workers.dev/aggregate # Overall rankings curl https://your-worker.workers.dev/aggregate?worker=my-machine # Per-worker stats
# Extend a job's target
curl -X PATCH "https://your-worker.workers.dev/job" \
-H "Authorization: Bearer YOUR_ADMIN_TOKEN" \
-d '{"job_id": "abc123", "total": 4000, "set_current": true}'
# List all jobs
curl https://your-worker.workers.dev/jobsThe model uses multi-task learning with three auxiliary prediction heads that share the backbone:
- ScoreHead — Predicts current 4-player scores (MSE loss)
- RankHead — Predicts rank distribution of all 4 players (Cross-entropy)
- GapHead — Predicts log-scale score gap to top/bottom player (Huber loss)
These heads force the backbone to internalize game-state awareness (score positions, rank dynamics) that the raw observation encoding makes difficult to learn through policy gradients alone.
The model is trained in stages (curriculum learning):
Starting from Mortal v4 pre-trained weights, fine-tune with CQL on top ~242 players:
python scripts/train_main.py \
--grp checkpoints/grp-best.pth \
--train-glob "data/top_players_242/*.mjson" \
--val-glob "data/val/*.mjson" \
--save checkpoints/model-c.pth \
--lr-peak 3e-5 --lr-final 1e-6 \
--weight-decay 0.2 --score-weight 1.0 \
--rank-weight 0 --gap-weight 0From model-c, continue training on ~750 players:
cp checkpoints/model-c-best.pth checkpoints/model-c2.pth
python scripts/train_main.py \
--save checkpoints/model-c2.pth \
--train-glob "data/top_players_750/*.mjson" \
--lr-peak 3e-5 --lr-final 1e-6 \
--weight-decay 0.2 --score-weight 1.0 \
--rank-weight 0 --gap-weight 0From model-c2, add rank/gap prediction heads with higher learning rate:
cp checkpoints/model-c2-best.pth checkpoints/model-c3.pth
python scripts/train_main.py \
--save checkpoints/model-c3.pth \
--train-glob "data/top_players_750/*.mjson" \
--lr-peak 1e-4 --lr-final 1e-5 \
--weight-decay 0.1 \
--score-weight 1.0 --rank-weight 0.5 --gap-weight 0.3c3-best was achieved after ~14,400 steps of stage 3 training.
Training data uses .mjson format (gzipped JSON lines), where each file contains one game in mjai event format. The dataloader (scripts/dataloader.py) handles parsing via libriichi.
- Python 3.10+
- PyTorch 2.0+
- libriichi (Rust-compiled Python extension)
- Node.js + tsx (for verification scripts)
MIT
Model weights, training code, and verification tools are released under the MIT License.
Note: The runtime inference environment requires Mortal (AGPL-3.0) components (libriichi, observation encoder). This repository does not include Mortal source code.
Built on the Mortal framework by Equim.