Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
mistral_inference		mistral_inference
mistral_key_components		mistral_key_components
related_papers		related_papers
server		server
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
main_ds_test.py		main_ds_test.py

Repository files navigation

Mistral_From_Scratch

Mistral and Mixtral from scratch with detailed documentation

Mistral: https://huggingface.co/docs/transformers/en/model_doc/mistral, https://arxiv.org/abs/2310.06825
Mixtral MoE (Mixtral of Experts): https://huggingface.co/docs/transformers/en/model_doc/mixtral, https://arxiv.org/abs/2401.04088
RoPE (Rotary Position Embedding): https://arxiv.org/pdf/2104.09864
RMSNorm (Root Mean Square Layer Normalization): https://arxiv.org/pdf/1910.07467
MHA (Multi-Head Attention, Q=KV): https://arxiv.org/abs/1706.03762
MQA (Multi-Query Attention, 1 KV only): https://arxiv.org/abs/1911.02150
GQA (Grouped-Query Attention, Q<KV): https://arxiv.org/pdf/2305.13245
KVCache: https://medium.com/@joaolages/kv-caching-explained-276520203249
SiLU (Sigmoid Linear Unit Activation): https://arxiv.org/abs/1702.03118v3
SwiGLU Activation: https://arxiv.org/pdf/2002.05202v1
Inference: https://medium.com/@javaid.nabi/all-you-need-to-know-about-llm-text-generation-03b138e0ed19

About

Mistral and Mixtral (MoE) from scratch

mixture-of-experts kv-cache large-language-models mistral-7b mixtral-8x7b peft-fine-tuning-llm

Report repository

Releases

No releases published

Packages

No packages published

Contributors 2

Languages