Skip to content
DeepSeek logo

DeepSeek-V3

Playground
What are some popular tourist attractions in Paris?
What is the history of the Great Wall of China?
What are some of the most famous works of Shakespeare?

Model navigation navigation

DeepSeek

Learn more: [original model announcement]

DeepSeek-V3 is a Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance.

DeepSeek-V3 was pre-train on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities.

About

A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.
Context
128k input · 4k output
Training date
Undisclosed
Rate limit tier
Provider support

Languages

 (2)
English, and Chinese