Skip to content
DeepSeek logo

DeepSeek-V3

Playground
Can you explain the concept of time dilation in physics?
What are some of the most famous works of Shakespeare?
Can you explain the basics of machine learning?

Model navigation navigation

DeepSeek

Learn more: [original model announcement]

DeepSeek-V3 is a Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance.

DeepSeek-V3 was pre-train on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities.

About

a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.
Context
128k input · 4k output
Training date
Undisclosed
Rate limit tier
Provider support

Languages

 (2)
English, and Chinese