Skip to content

[codex] Add original ChatBot-10B architecture#1

Merged
DoRmAmMu1997 merged 2 commits into
mainfrom
codex/chatbot-original-10b
May 10, 2026
Merged

[codex] Add original ChatBot-10B architecture#1
DoRmAmMu1997 merged 2 commits into
mainfrom
codex/chatbot-original-10b

Conversation

@DoRmAmMu1997
Copy link
Copy Markdown
Owner

Summary

  • Adds an original untrained ChatBot-10B dense decoder config, not a fine-tune of external model weights.
  • Replaces the simple Transformer internals with RMSNorm, RoPE, grouped-query attention, SwiGLU, tied embeddings, and richer generation controls.
  • Adds BPE tokenizer tooling, dataset manifests/loaders, tiny configs, parameter estimation, tests, CI, and updated training docs.

Test Plan

  • python -m compileall chatbot.py chat_llm.py train_llm.py train_tokenizer.py train_10b.py src scripts
  • pytest -q (11 passed locally)
  • python scripts\estimate_params.py --config configs\chatbot-10b.yaml (reports 9.999B params)
  • git diff --check

Notes

  • Full 10B weights are intentionally not committed.
  • Full 10B training requires external multi-GPU/cloud infrastructure; CI validates the tiny CPU path.

Co-authored-by: Codex codex@openai.com

DoRmAmMu1997 and others added 2 commits May 10, 2026 21:34
Co-authored-by: Codex <codex@openai.com>
Co-authored-by: Codex <codex@openai.com>
@DoRmAmMu1997 DoRmAmMu1997 marked this pull request as ready for review May 10, 2026 16:20
@DoRmAmMu1997 DoRmAmMu1997 merged commit aa95282 into main May 10, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant