-
Notifications
You must be signed in to change notification settings - Fork 9.1k
fix(route/huggingface): add huggingface group models detail #20646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Successfully generated as following: http://localhost:1200/huggingface/models/deepseek-ai - Success ✔️<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
<channel>
<title>Huggingface deepseek-ai Models</title>
<link>https://huggingface.co/deepseek-ai/models?sort=created</link>
<atom:link href="http://localhost:1200/huggingface/models/deepseek-ai" rel="self" type="application/rss+xml"></atom:link>
<description>Huggingface deepseek-ai Models - Powered by RSSHub</description>
<generator>RSSHub</generator>
<webMaster>contact@rsshub.app (RSSHub)</webMaster>
<language>en</language>
<lastBuildDate>Sat, 06 Dec 2025 11:14:50 GMT</lastBuildDate>
<ttl>5</ttl>
<item>
<title>deepseek-ai/DeepSeek-V3.2-Speciale</title>
<description>deepseek-ai/DeepSeek-V3.2-Speciale Text Generation • 685B • Updated 5 days ago • 4.72k • 516
---
license: mit
library_name: transformers
base_model:
- deepseek-ai/DeepSeek-V3.2-Exp-Base
base_model_relation: finetune
---
# DeepSeek-V3.2: Efficient Reasoning &amp; Agentic AI
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" referrerpolicy="no-referrer">
</div>
<hr>
<div align="center" style="line-height: 1;">
<a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Chat" src="https://img.shields.io/badge/%F0%9F%A4%96%20Chat-DeepSeek%20V3-536af5?color=536af5&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
<img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&amp;logoColor=white&amp;color=7289da" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
<img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
<img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://huggingface.co/deepseek-ai/LICENSE" style="margin: 2px;">
<img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&amp;color=f5de53" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<p align="center">
<a href="https://huggingface.co/deepseek-ai/DeepSeek-V3.2/blob/main/assets/paper.pdf"><b>Technical Report</b>👁️</a>
</p>
## Introduction
We introduce **DeepSeek-V3.2**, a model that harmonizes high computational efficiency with superior reasoning and agent performance. Our approach is built upon three key technical breakthroughs:
1. **DeepSeek Sparse Attention (DSA):** We introduce DSA, an efficient attention mechanism that substantially reduces computational complexity while preserving model performance, specifically optimized for long-context scenarios.
2. **Scalable Reinforcement Learning Framework:** By implementing a robust RL protocol and scaling post-training compute, *DeepSeek-V3.2* performs comparably to GPT-5. Notably, our high-compute variant, **DeepSeek-V3.2-Speciale**, **surpasses GPT-5** and exhibits reasoning proficiency on par with Gemini-3.0-Pro.
- *Achievement:* 🥇 **Gold-medal performance** in the 2025 International Mathematical Olympiad (IMO) and International Olympiad in Informatics (IOI).
3. **Large-Scale Agentic Task Synthesis Pipeline:** To integrate **reasoning into tool-use** scenarios, we developed a novel synthesis pipeline that systematically generates training data at scale. This facilitates scalable agentic post-training, improving compliance and generalization in complex interactive environments.
<div align="center">
<img src="https://huggingface.co/deepseek-ai/assets/benchmark.png" referrerpolicy="no-referrer">
</div>
We have also released the final submissions for IOI 2025, ICPC World Finals, IMO 2025 and CMO 2025, which were selected based on our designed pipeline. These materials are provided for the community to conduct secondary verification. The files can be accessed at `assets/olympiad_cases`.
## Chat Template
DeepSeek-V3.2 introduces significant updates to its chat template compared to prior versions. The primary changes involve a revised format for tool calling and the introduction of a "thinking with tools" capability.
To assist the community in understanding and adapting to this new template, we have provided a dedicated `encoding` folder, which contains Python scripts and test cases demonstrating how to encode messages in OpenAI-compatible format into input strings for the model and how to parse the model's text output.
A brief example is illustrated below:
```python
import transformers
# encoding/encoding_dsv32.py
from encoding_dsv32 import encode_messages, parse_message_from_completion_text
tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V3.2")
messages = [
{"role": "user", "content": "hello"},
{"role": "assistant", "content": "Hello! I am DeepSeek.", "reasoning_content": "thinking..."},
{"role": "user", "content": "1+1=?"}
]
encode_config = dict(thinking_mode="thinking", drop_thinking=True, add_default_bos_token=True)
# messages -&gt; string
prompt = encode_messages(messages, **encode_config)
# Output: "&lt;|begin▁of▁sentence|&gt;&lt;|User|&gt;hello&lt;|Assistant|&gt;Hello! I am DeepSeek.&lt;|end▁of▁sentence|&gt;&lt;|User|&gt;1+1=?&lt;|Assistant|&gt;<think>"
# string -&gt; tokens
tokens = tokenizer.encode(prompt)
# Output: [0, 128803, 33310, 128804, 128799, 19923, 3, 342, 1030, 22651, 4374, 1465, 16, 1, 128803, 19, 13, 19, 127252, 128804, 128798]
```
Important Notes:
1. This release does not include a Jinja-format chat template. Please refer to the Python code mentioned above.
2. The output parsing function included in the code is designed to handle well-formatted strings only. It does not attempt to correct or recover from malformed output that the model might occasionally generate. It is not suitable for production use without robust error handling.
3. A new role named `developer` has been introduced in the chat template. This role is dedicated exclusively to search agent scenarios and is designated for no other tasks. The official API does not accept messages assigned to `developer`.
## How to Run Locally
The model structure of DeepSeek-V3.2 and DeepSeek-V3.2-Speciale are the same as DeepSeek-V3.2-Exp. Please visit [DeepSeek-V3.2-Exp](https://github.com/deepseek-ai/DeepSeek-V3.2-Exp) repo for more information about running this model locally.
Usage Recommendations:
1. For local deployment, we recommend setting the sampling parameters to `temperature = 1.0, top_p = 0.95`.
2. Please note that the DeepSeek-V3.2-Speciale variant is designed exclusively for deep reasoning tasks and does not support the tool-calling functionality.
## License
This repository and the model weights are licensed under the [MIT License](LICENSE).
## Citation
```
@misc{deepseekai2025deepseekv32,
title={DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models},
author={DeepSeek-AI},
year={2025},
}
```
## Contact
If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com).
</think></description>
<link>https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale</link>
<guid isPermaLink="false">https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale</guid>
<pubDate>Mon, 01 Dec 2025 03:06:03 GMT</pubDate>
</item>
<item>
<title>deepseek-ai/DeepSeek-V3.2</title>
<description>deepseek-ai/DeepSeek-V3.2 Text Generation • 685B • Updated 5 days ago • 18.1k • • 738
---
license: mit
library_name: transformers
base_model:
- deepseek-ai/DeepSeek-V3.2-Exp-Base
base_model_relation: finetune
---
# DeepSeek-V3.2: Efficient Reasoning &amp; Agentic AI
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" referrerpolicy="no-referrer">
</div>
<hr>
<div align="center" style="line-height: 1;">
<a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Chat" src="https://img.shields.io/badge/%F0%9F%A4%96%20Chat-DeepSeek%20V3-536af5?color=536af5&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
<img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&amp;logoColor=white&amp;color=7289da" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
<img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
<img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://huggingface.co/deepseek-ai/LICENSE" style="margin: 2px;">
<img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&amp;color=f5de53" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<p align="center">
<a href="https://huggingface.co/deepseek-ai/DeepSeek-V3.2/blob/main/assets/paper.pdf"><b>Technical Report</b>👁️</a>
</p>
## Introduction
We introduce **DeepSeek-V3.2**, a model that harmonizes high computational efficiency with superior reasoning and agent performance. Our approach is built upon three key technical breakthroughs:
1. **DeepSeek Sparse Attention (DSA):** We introduce DSA, an efficient attention mechanism that substantially reduces computational complexity while preserving model performance, specifically optimized for long-context scenarios.
2. **Scalable Reinforcement Learning Framework:** By implementing a robust RL protocol and scaling post-training compute, *DeepSeek-V3.2* performs comparably to GPT-5. Notably, our high-compute variant, **DeepSeek-V3.2-Speciale**, **surpasses GPT-5** and exhibits reasoning proficiency on par with Gemini-3.0-Pro.
- *Achievement:* 🥇 **Gold-medal performance** in the 2025 International Mathematical Olympiad (IMO) and International Olympiad in Informatics (IOI).
3. **Large-Scale Agentic Task Synthesis Pipeline:** To integrate **reasoning into tool-use** scenarios, we developed a novel synthesis pipeline that systematically generates training data at scale. This facilitates scalable agentic post-training, improving compliance and generalization in complex interactive environments.
<div align="center">
<img src="https://huggingface.co/deepseek-ai/assets/benchmark.png" referrerpolicy="no-referrer">
</div>
We have also released the final submissions for IOI 2025, ICPC World Finals, IMO 2025 and CMO 2025, which were selected based on our designed pipeline. These materials are provided for the community to conduct secondary verification. The files can be accessed at `assets/olympiad_cases`.
## Chat Template
DeepSeek-V3.2 introduces significant updates to its chat template compared to prior versions. The primary changes involve a revised format for tool calling and the introduction of a "thinking with tools" capability.
To assist the community in understanding and adapting to this new template, we have provided a dedicated `encoding` folder, which contains Python scripts and test cases demonstrating how to encode messages in OpenAI-compatible format into input strings for the model and how to parse the model's text output.
A brief example is illustrated below:
```python
import transformers
# encoding/encoding_dsv32.py
from encoding_dsv32 import encode_messages, parse_message_from_completion_text
tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V3.2")
messages = [
{"role": "user", "content": "hello"},
{"role": "assistant", "content": "Hello! I am DeepSeek.", "reasoning_content": "thinking..."},
{"role": "user", "content": "1+1=?"}
]
encode_config = dict(thinking_mode="thinking", drop_thinking=True, add_default_bos_token=True)
# messages -&gt; string
prompt = encode_messages(messages, **encode_config)
# Output: "&lt;|begin▁of▁sentence|&gt;&lt;|User|&gt;hello&lt;|Assistant|&gt;Hello! I am DeepSeek.&lt;|end▁of▁sentence|&gt;&lt;|User|&gt;1+1=?&lt;|Assistant|&gt;<think>"
# string -&gt; tokens
tokens = tokenizer.encode(prompt)
# Output: [0, 128803, 33310, 128804, 128799, 19923, 3, 342, 1030, 22651, 4374, 1465, 16, 1, 128803, 19, 13, 19, 127252, 128804, 128798]
```
Important Notes:
1. This release does not include a Jinja-format chat template. Please refer to the Python code mentioned above.
2. The output parsing function included in the code is designed to handle well-formatted strings only. It does not attempt to correct or recover from malformed output that the model might occasionally generate. It is not suitable for production use without robust error handling.
3. A new role named `developer` has been introduced in the chat template. This role is dedicated exclusively to search agent scenarios and is designated for no other tasks. The official API does not accept messages assigned to `developer`.
## How to Run Locally
The model structure of DeepSeek-V3.2 and DeepSeek-V3.2-Speciale are the same as DeepSeek-V3.2-Exp. Please visit [DeepSeek-V3.2-Exp](https://github.com/deepseek-ai/DeepSeek-V3.2-Exp) repo for more information about running this model locally.
Usage Recommendations:
1. For local deployment, we recommend setting the sampling parameters to `temperature = 1.0, top_p = 0.95`.
2. Please note that the DeepSeek-V3.2-Speciale variant is designed exclusively for deep reasoning tasks and does not support the tool-calling functionality.
## License
This repository and the model weights are licensed under the [MIT License](LICENSE).
## Citation
```
@misc{deepseekai2025deepseekv32,
title={DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models},
author={DeepSeek-AI},
year={2025},
}
```
## Contact
If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com).
</think></description>
<link>https://huggingface.co/deepseek-ai/DeepSeek-V3.2</link>
<guid isPermaLink="false">https://huggingface.co/deepseek-ai/DeepSeek-V3.2</guid>
<pubDate>Mon, 01 Dec 2025 03:04:59 GMT</pubDate>
</item>
<item>
<title>deepseek-ai/DeepSeek-Math-V2</title>
<description>deepseek-ai/DeepSeek-Math-V2 Text Generation • 685B • Updated 9 days ago • 8.96k • 637
---
license: apache-2.0
library_name: transformers
base_model:
- deepseek-ai/DeepSeek-Math-V2
---
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" referrerpolicy="no-referrer">
</div>
<hr>
<div align="center" style="line-height: 1;">
<a href="https://www.deepseek.com/"><img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" referrerpolicy="no-referrer"></a>
<a href="https://chat.deepseek.com/"><img alt="Chat" src="https://img.shields.io/badge/%F0%9F%A4%96%20Chat-DeepSeek%20V3-536af5?color=536af5&amp;logoColor=white" referrerpolicy="no-referrer"></a>
<a href="https://huggingface.co/deepseek-ai"><img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&amp;logoColor=white" referrerpolicy="no-referrer"></a>
<br>
<a href="https://discord.gg/Tc7c45Zzu5"><img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&amp;logoColor=white&amp;color=7289da" referrerpolicy="no-referrer"></a>
<a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true"><img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&amp;logoColor=white" referrerpolicy="no-referrer"></a>
<a href="https://twitter.com/deepseek_ai"><img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&amp;logoColor=white" referrerpolicy="no-referrer"></a>
<br>
<a href="https://huggingface.co/deepseek-ai/LICENSE" style="margin: 2px;">
<img alt="License" src="https://img.shields.io/badge/License-Apache%202.0-f5de53?&amp;color=f5de53" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<br>
</div>
# DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
## 1. Introduction
Large language models have made significant progress in mathematical reasoning, which serves as an important testbed for AI and could impact scientific research if further advanced.
By scaling reasoning with reinforcement learning that rewards correct final answers, LLMs have improved from poor performance to saturating quantitative reasoning competitions like AIME and HMMT in one year.
However, this approach faces fundamental limitations.
Pursuing higher final answer accuracy doesn't address a key issue: correct answers don't guarantee correct reasoning.
Moreover, many mathematical tasks like theorem proving require rigorous step-by-step derivation rather than numerical answers, making final answer rewards inapplicable.
To push the limits of deep reasoning, we believe it is necessary to verify the comprehensiveness and rigor of mathematical reasoning.
Self-verification is particularly important for scaling test-time compute, especially for open problems without known solutions.
Towards self-verifiable mathematical reasoning, we investigate how to train an accurate and faithful LLM-based verifier for theorem proving.
We then train a proof generator using the verifier as the reward model, and incentivize the generator to identify and resolve as many issues as possible in their own proofs before finalizing them.
To maintain the generation-verification gap as the generator becomes stronger, we propose to scale verification compute to automatically label new hard-to-verify proofs, creating training data to further improve the verifier.
Our resulting model, DeepSeekMath-V2, demonstrates strong theorem-proving capabilities, achieving gold-level scores on IMO 2025 and CMO 2024 and a near-perfect 118/120 on Putnam 2024 with scaled test-time compute.
While much work remains, these results suggest that self-verifiable mathematical reasoning is a feasible research direction that may help develop more capable mathematical AI systems.
## 2. Evaluation Results
Below are evaluation results on [IMO-ProofBench](https://github.com/google-deepmind/superhuman/tree/main/imobench) (developed by the DeepMind team behind DeepThink IMO-Gold) and recent mathematics competitions including IMO 2025, CMO 2024, and Putnam 2024.
**IMO-ProofBench**
<p align="center">
<img width="100%" src="https://raw.githubusercontent.com/deepseek-ai/DeepSeek-Math-V2/refs/heads/main/figures/IMO-ProofBench.png" referrerpolicy="no-referrer">
</p>
---
**Mathematics Competitions**
<p align="center">
<img width="41%&quot;" src="https://raw.githubusercontent.com/deepseek-ai/DeepSeek-Math-V2/refs/heads/main/figures/Competitions.png" referrerpolicy="no-referrer">
</p>
## 4. Quick Start
DeepSeekMath-V2 is built on top of DeepSeek-V3.2-Exp-Base.
For inference support, please refer to [the DeepSeek-V3.2-Exp github repository](https://github.com/deepseek-ai/DeepSeek-V3.2-Exp).
## 6. License
This repository and the model weights are licensed under [the Apache License, Version 2.0 (Apache 2.0)](LICENSE).
## 7. Citation
```
@misc{deepseek-math-v2,
author = {Zhihong Shao, Yuxiang Luo, Chengda Lu, Z.Z. Ren, Jiewen Hu, Tian Ye, Zhibin Gou, Shirong Ma, Xiaokang Zhang},
title = {DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning},
year = {2025},
}
```
## 8. Contact
If you have any questions, please raise an issue or contact us at [service@deepseek.com](mailto:service@deepseek.com).
</description>
<link>https://huggingface.co/deepseek-ai/DeepSeek-Math-V2</link>
<guid isPermaLink="false">https://huggingface.co/deepseek-ai/DeepSeek-Math-V2</guid>
<pubDate>Thu, 27 Nov 2025 02:35:52 GMT</pubDate>
</item>
<item>
<title>deepseek-ai/DeepSeek-V3.2-Exp</title>
<description>deepseek-ai/DeepSeek-V3.2-Exp Text Generation • 685B • Updated 18 days ago • 66.2k • • 899
---
license: mit
library_name: transformers
base_model:
- deepseek-ai/DeepSeek-V3.2-Exp-Base
base_model_relation: finetune
---
# DeepSeek-V3.2-Exp
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" referrerpolicy="no-referrer">
</div>
<hr>
<div align="center" style="line-height: 1;">
<a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Chat" src="https://img.shields.io/badge/%F0%9F%A4%96%20Chat-DeepSeek%20V3-536af5?color=536af5&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
<img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&amp;logoColor=white&amp;color=7289da" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
<img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
<img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://huggingface.co/deepseek-ai/LICENSE" style="margin: 2px;">
<img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&amp;color=f5de53" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
## Introduction
We are excited to announce the official release of DeepSeek-V3.2-Exp, an experimental version of our model. As an intermediate step toward our next-generation architecture, V3.2-Exp builds upon V3.1-Terminus by introducing DeepSeek Sparse Attention—a sparse attention mechanism designed to explore and validate optimizations for training and inference efficiency in long-context scenarios.
This experimental release represents our ongoing research into more efficient transformer architectures, particularly focusing on improving computational efficiency when processing extended text sequences.
<div align="center">
<img src="https://huggingface.co/deepseek-ai/assets/cost.png" referrerpolicy="no-referrer">
</div>
- DeepSeek Sparse Attention (DSA) achieves fine-grained sparse attention for the first time, delivering substantial improvements in long-context training and inference efficiency while maintaining virtually identical model output quality.
- To rigorously evaluate the impact of introducing sparse attention, we deliberately aligned the training configurations of DeepSeek-V3.2-Exp with V3.1-Terminus. Across public benchmarks in various domains, DeepSeek-V3.2-Exp demonstrates performance on par with V3.1-Terminus.
| Benchmark | DeepSeek-V3.1-Terminus | DeepSeek-V3.2-Exp |
| :--- | :---: | :---: |
| **Reasoning Mode w/o Tool Use** | | |
| MMLU-Pro | 85.0 | 85.0 |
| GPQA-Diamond | 80.7 | 79.9 |
| Humanity's Last Exam | 21.7 | 19.8 |
| LiveCodeBench | 74.9 | 74.1 |
| AIME 2025 | 88.4 | 89.3 |
| HMMT 2025 | 86.1 | 83.6 |
| Codeforces | 2046 | 2121 |
| Aider-Polyglot | 76.1 | 74.5 |
| **Agentic Tool Use** | | |
| BrowseComp | 38.5 | 40.1 |
| BrowseComp-zh | 45.0 | 47.9 |
| SimpleQA | 96.8 | 97.1 |
| SWE Verified | 68.4 | 67.8 |
| SWE-bench Multilingual | 57.8 | 57.9 |
| Terminal-bench | 36.7 | 37.7 |
## Update
- 2025.11.17: **We have identified that previous versions of the inference demo code contained an implementation discrepancy in Rotary Position Embedding (RoPE) within the indexer module, potentially leading to degraded model performance.** Specifically, the input tensor to RoPE in the indexer module requires a non-interleaved layout, whereas RoPE in the MLA module expects an interleaved layout. This issue has now been resolved. Please refer to the updated version of the inference demo code and take note of this implementation detail.
## How to Run Locally
### HuggingFace
We provide an updated inference demo code in the [inference](https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp/tree/main/inference) folder to help the community quickly get started with our model and understand its architectural details.
First convert huggingface model weights to the the format required by our inference demo. Set `MP` to match your available GPU count:
```bash
cd inference
export EXPERTS=256
python convert.py --hf-ckpt-path ${HF_CKPT_PATH} --save-path ${SAVE_PATH} --n-experts ${EXPERTS} --model-parallel ${MP}
```
Launch the interactive chat interface and start exploring DeepSeek's capabilities:
```bash
export CONFIG=config_671B_v3.2.json
torchrun --nproc-per-node ${MP} generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --interactive
```
### SGLang
#### Installation with Docker
```
# H200
docker pull lmsysorg/sglang:dsv32
# MI350
docker pull lmsysorg/sglang:dsv32-rocm
# NPUs
docker pull lmsysorg/sglang:dsv32-a2
docker pull lmsysorg/sglang:dsv32-a3
```
#### Launch Command
```bash
python -m sglang.launch_server --model deepseek-ai/DeepSeek-V3.2-Exp --tp 8 --dp 8 --enable-dp-attention
```
### vLLM
vLLM provides day-0 support of DeepSeek-V3.2-Exp. See the [recipes](https://docs.vllm.ai/projects/recipes/en/latest/DeepSeek/DeepSeek-V3_2-Exp.html) for up-to-date details.
## Open-Source Kernels
For TileLang kernels with **better readability and research-purpose design**, please refer to [TileLang](https://github.com/tile-ai/tilelang/tree/main/examples/deepseek_v32).
For **high-performance CUDA kernels**, indexer logit kernels (including paged versions) are available in [DeepGEMM](https://github.com/deepseek-ai/DeepGEMM/pull/200). Sparse attention kernels are released in [FlashMLA](https://github.com/deepseek-ai/FlashMLA/pull/98).
## License
This repository and the model weights are licensed under the [MIT License](LICENSE).
## Citation
```
@misc{deepseekai2024deepseekv32,
title={DeepSeek-V3.2-Exp: Boosting Long-Context Efficiency with DeepSeek Sparse Attention},
author={DeepSeek-AI},
year={2025},
}
```
## Contact
If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com).</description>
<link>https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp</link>
<guid isPermaLink="false">https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp</guid>
<pubDate>Mon, 17 Nov 2025 18:39:54 GMT</pubDate>
</item>
<item>
<title>deepseek-ai/DeepSeek-OCR</title>
<description>deepseek-ai/DeepSeek-OCR Image-Text-to-Text • 3B • Updated Nov 4 • 5.47M • 2.93k
---
pipeline_tag: image-text-to-text
language:
- multilingual
tags:
- deepseek
- vision-language
- ocr
- custom_code
license: mit
library_name: transformers
---
<div align="center">
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek AI" referrerpolicy="no-referrer">
</div>
<hr>
<div align="center">
<a href="https://www.deepseek.com/" target="_blank">
<img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" referrerpolicy="no-referrer">
</a>
<a href="https://huggingface.co/deepseek-ai/DeepSeek-OCR" target="_blank">
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&amp;logoColor=white" referrerpolicy="no-referrer">
</a>
</div>
<div align="center">
<a href="https://discord.gg/Tc7c45Zzu5" target="_blank">
<img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&amp;logoColor=white&amp;color=7289da" referrerpolicy="no-referrer">
</a>
<a href="https://twitter.com/deepseek_ai" target="_blank">
<img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&amp;logoColor=white" referrerpolicy="no-referrer">
</a>
</div>
<p align="center">
<a href="https://github.com/deepseek-ai/DeepSeek-OCR"><b>🌟 Github</b></a> |
<a href="https://huggingface.co/deepseek-ai/DeepSeek-OCR"><b>📥 Model Download</b></a> |
<a href="https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSeek_OCR_paper.pdf"><b>📄 Paper Link</b></a> |
<a href="https://arxiv.org/abs/2510.18234"><b>📄 Arxiv Paper Link</b></a> |
</p>
<h2>
<p align="center">
<a href="https://huggingface.co/papers/2510.18234">DeepSeek-OCR: Contexts Optical Compression</a>
</p>
</h2>
<p align="center">
<img src="https://huggingface.co/deepseek-ai/assets/fig1.png" style="width: 1000px" align="center" referrerpolicy="no-referrer">
</p>
<p align="center">
<a href="https://huggingface.co/papers/2510.18234">Explore the boundaries of visual-text compression.</a>
</p>
## Usage
Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.12.9 + CUDA11.8:
```
torch==2.6.0
transformers==4.46.3
tokenizers==0.20.3
einops
addict
easydict
pip install flash-attn==2.7.3 --no-build-isolation
```
```python
from transformers import AutoModel, AutoTokenizer
import torch
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '0'
model_name = 'deepseek-ai/DeepSeek-OCR'
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, _attn_implementation='flash_attention_2', trust_remote_code=True, use_safetensors=True)
model = model.eval().cuda().to(torch.bfloat16)
# prompt = "<img referrerpolicy="no-referrer">\nFree OCR. "
prompt = "<img referrerpolicy="no-referrer">\n&lt;|grounding|&gt;Convert the document to markdown. "
image_file = 'your_image.jpg'
output_path = 'your/output/dir'
# infer(self, tokenizer, prompt='', image_file='', output_path = ' ', base_size = 1024, image_size = 640, crop_mode = True, test_compress = False, save_results = False):
# Tiny: base_size = 512, image_size = 512, crop_mode = False
# Small: base_size = 640, image_size = 640, crop_mode = False
# Base: base_size = 1024, image_size = 1024, crop_mode = False
# Large: base_size = 1280, image_size = 1280, crop_mode = False
# Gundam: base_size = 1024, image_size = 640, crop_mode = True
res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = True)
```
## vLLM
Refer to [🌟GitHub](https://github.com/deepseek-ai/DeepSeek-OCR/) for guidance on model inference acceleration and PDF processing, etc.<!-- -->
[2025/10/23] 🚀🚀🚀 DeepSeek-OCR is now officially supported in upstream [vLLM](https://docs.vllm.ai/projects/recipes/en/latest/DeepSeek/DeepSeek-OCR.html#installing-vllm).
```shell
uv venv
source .venv/bin/activate
# Until v0.11.1 release, you need to install vLLM from nightly build
uv pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
```
```python
from vllm import LLM, SamplingParams
from vllm.model_executor.models.deepseek_ocr import NGramPerReqLogitsProcessor
from PIL import Image
# Create model instance
llm = LLM(
model="deepseek-ai/DeepSeek-OCR",
enable_prefix_caching=False,
mm_processor_cache_gb=0,
logits_processors=[NGramPerReqLogitsProcessor]
)
# Prepare batched input with your image file
image_1 = Image.open("path/to/your/image_1.png").convert("RGB")
image_2 = Image.open("path/to/your/image_2.png").convert("RGB")
prompt = "<img referrerpolicy="no-referrer">\nFree OCR."
model_input = [
{
"prompt": prompt,
"multi_modal_data": {"image": image_1}
},
{
"prompt": prompt,
"multi_modal_data": {"image": image_2}
}
]
sampling_param = SamplingParams(
temperature=0.0,
max_tokens=8192,
# ngram logit processor args
extra_args=dict(
ngram_size=30,
window_size=90,
whitelist_token_ids={128821, 128822}, # whitelist: ,
),
skip_special_tokens=False,
)
# Generate output
model_outputs = llm.generate(model_input, sampling_param)
# Print output
for output in model_outputs:
print(output.outputs[0].text)
```
## Visualizations
<table>
<tbody><tr>
<td><img src="https://huggingface.co/deepseek-ai/assets/show1.jpg" style="width: 500px" referrerpolicy="no-referrer"></td>
<td><img src="https://huggingface.co/deepseek-ai/assets/show2.jpg" style="width: 500px" referrerpolicy="no-referrer"></td>
</tr>
<tr>
<td><img src="https://huggingface.co/deepseek-ai/assets/show3.jpg" style="width: 500px" referrerpolicy="no-referrer"></td>
<td><img src="https://huggingface.co/deepseek-ai/assets/show4.jpg" style="width: 500px" referrerpolicy="no-referrer"></td>
</tr>
</tbody></table>
## Acknowledgement
We would like to thank [Vary](https://github.com/Ucas-HaoranWei/Vary/), [GOT-OCR2.0](https://github.com/Ucas-HaoranWei/GOT-OCR2.0/), [MinerU](https://github.com/opendatalab/MinerU), [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR), [OneChart](https://github.com/LingyvKong/OneChart), [Slow Perception](https://github.com/Ucas-HaoranWei/Slow-Perception) for their valuable models and ideas.
We also appreciate the benchmarks: [Fox](https://github.com/ucaslcl/Fox), [OminiDocBench](https://github.com/opendatalab/OmniDocBench).
## Citation
```bibtex
@article{wei2025deepseek,
title={DeepSeek-OCR: Contexts Optical Compression},
author={Wei, Haoran and Sun, Yaofeng and Li, Yukun},
journal={arXiv preprint arXiv:2510.18234},
year={2025}
}</description>
<link>https://huggingface.co/deepseek-ai/DeepSeek-OCR</link>
<guid isPermaLink="false">https://huggingface.co/deepseek-ai/DeepSeek-OCR</guid>
<pubDate>Mon, 03 Nov 2025 18:36:12 GMT</pubDate>
</item>
<item>
<title>deepseek-ai/DeepSeek-V3.2-Exp-Base</title>
<description>deepseek-ai/DeepSeek-V3.2-Exp-Base Text Generation • 685B • Updated Oct 9 • 217 • 48
---
license: mit
library_name: transformers
---</description>
<link>https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp-Base</link>
<guid isPermaLink="false">https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp-Base</guid>
<pubDate>Wed, 08 Oct 2025 18:09:46 GMT</pubDate>
</item>
<item>
<title>deepseek-ai/DeepSeek-V3.1-Terminus</title>
<description>deepseek-ai/DeepSeek-V3.1-Terminus Text Generation • 685B • Updated Sep 29 • 81k • • 354
---
license: mit
library_name: transformers
base_model:
- deepseek-ai/DeepSeek-V3.1-Base
---
# DeepSeek-V3.1-Terminus
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" referrerpolicy="no-referrer">
</div>
<hr>
<div align="center" style="line-height: 1;">
<a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Chat" src="https://img.shields.io/badge/%F0%9F%A4%96%20Chat-DeepSeek%20V3-536af5?color=536af5&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
<img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&amp;logoColor=white&amp;color=7289da" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
<img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
<img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://huggingface.co/deepseek-ai/LICENSE" style="margin: 2px;">
<img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&amp;color=f5de53" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
## Introduction
This update maintains the model's original capabilities while addressing issues reported by users, including:
- Language consistency: Reducing instances of mixed Chinese-English text and occasional abnormal characters;
- Agent capabilities: Further optimizing the performance of the Code Agent and Search Agent.
| Benchmark | DeepSeek-V3.1 | DeepSeek-V3.1-Terminus |
| :--- | :---: | :---: |
| **Reasoning Mode w/o Tool Use** | | |
| MMLU-Pro | 84.8 | 85.0 |
| GPQA-Diamond | 80.1 | 80.7 |
| Humanity's Last Exam | 15.9 | 21.7 |
| LiveCodeBench | 74.8 | 74.9 |
| Codeforces | 2091 | 2046 |
| Aider-Polyglot | 76.3 | 76.1 |
| **Agentic Tool Use** | | |
| BrowseComp | 30.0 | 38.5 |
| BrowseComp-zh | 49.2 | 45.0 |
| SimpleQA | 93.4 | 96.8 |
| SWE Verified | 66.0 | 68.4 |
| SWE-bench Multilingual | 54.5 | 57.8 |
| Terminal-bench | 31.3 | 36.7 |
**The template and tool-set of search agent have been updated, which is shown in `assets/search_tool_trajectory.html`.**
## How to Run Locally
The model structure of DeepSeek-V3.1-Terminus is the same as DeepSeek-V3. Please visit [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repo for more information about running this model locally.
For the model's chat template other than search agent, please refer to the [DeepSeek-V3.1](https://huggingface.co/deepseek-ai/DeepSeek-V3.1) repo.
**Here we also provide an updated inference demo code in the `inference` folder to help the community get started with running our model and understand the details of model architecture.**
**NOTE: In the current model checkpoint, the parameters of `self_attn.o_proj` do not conform to the UE8M0 FP8 scale data format. This is a known issue and will be corrected in future model releases.**
## License
This repository and the model weights are licensed under the [MIT License](LICENSE).
## Citation
```
@misc{deepseekai2024deepseekv3technicalreport,
title={DeepSeek-V3 Technical Report},
author={DeepSeek-AI},
year={2024},
eprint={2412.19437},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.19437},
}
```
## Contact
If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com).
</description>
<link>https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus</link>
<guid isPermaLink="false">https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus</guid>
<pubDate>Sun, 28 Sep 2025 17:52:07 GMT</pubDate>
</item>
<item>
<title>deepseek-ai/DeepSeek-V3.1</title>
<description>deepseek-ai/DeepSeek-V3.1 Text Generation • 685B • Updated Sep 5 • 82.2k • • 808
---
license: mit
library_name: transformers
base_model:
- deepseek-ai/DeepSeek-V3.1-Base
---
# DeepSeek-V3.1
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" referrerpolicy="no-referrer">
</div>
<hr>
<div align="center" style="line-height: 1;">
<a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Chat" src="https://img.shields.io/badge/%F0%9F%A4%96%20Chat-DeepSeek%20V3-536af5?color=536af5&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
<img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&amp;logoColor=white&amp;color=7289da" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
<img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
<img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://huggingface.co/deepseek-ai/LICENSE" style="margin: 2px;">
<img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&amp;color=f5de53" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
## Introduction
DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Compared to the previous version, this upgrade brings improvements in multiple aspects:
- **Hybrid thinking mode**: One model supports both thinking mode and non-thinking mode by changing the chat template.
- **Smarter tool calling**: Through post-training optimization, the model's performance in tool usage and agent tasks has significantly improved.
- **Higher thinking efficiency**: DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly.
DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach, following the methodology outlined in the original DeepSeek-V3 report. We have expanded our dataset by collecting additional long documents and substantially extending both training phases. The 32K extension phase has been increased 10-fold to 630B tokens, while the 128K extension phase has been extended by 3.3x to 209B tokens.
Additionally, DeepSeek-V3.1 is trained using the **UE8M0 FP8 scale data format on both model weights and activations** to ensure compatibility with microscaling data formats. Please refer to [DeepGEMM](https://github.com/deepseek-ai/DeepGEMM) for more details.
## Model Downloads
<div align="center">
| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** |
| :------------: | :------------: | :------------: | :------------: | :------------: |
| DeepSeek-V3.1-Base | 671B | 37B | 128K | [HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base) \| [ModelScope](https://modelscope.cn/models/deepseek-ai/DeepSeek-V3.1-Base) |
| DeepSeek-V3.1 | 671B | 37B | 128K | [HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V3.1) \| [ModelScope](https://modelscope.cn/models/deepseek-ai/DeepSeek-V3.1) |
</div>
## Chat Template
The details of our chat template is described in `tokenizer_config.json` and `assets/chat_template.jinja`. Here is a brief description.
### Non-Thinking
#### First-Turn
Prefix:
`&lt;|begin▁of▁sentence|&gt;{system prompt}&lt;|User|&gt;{query}&lt;|Assistant|&gt;`
With the given prefix, DeepSeek V3.1 generates responses to queries in non-thinking mode. Unlike DeepSeek V3, it introduces an additional token ``.
#### Multi-Turn
Context:
`&lt;|begin▁of▁sentence|&gt;{system prompt}&lt;|User|&gt;{query}&lt;|Assistant|&gt;{response}&lt;|end▁of▁sentence|&gt;...&lt;|User|&gt;{query}&lt;|Assistant|&gt;{response}&lt;|end▁of▁sentence|&gt;`
Prefix:
`&lt;|User|&gt;{query}&lt;|Assistant|&gt;`
By concatenating the context and the prefix, we obtain the correct prompt for the query.
### Thinking
#### First-Turn
Prefix:
`&lt;|begin▁of▁sentence|&gt;{system prompt}&lt;|User|&gt;{query}&lt;|Assistant|&gt;<think>`
The prefix of thinking mode is similar to DeepSeek-R1.
#### Multi-Turn
Context:
`&lt;|begin▁of▁sentence|&gt;{system prompt}&lt;|User|&gt;{query}&lt;|Assistant|&gt;</think>{response}&lt;|end▁of▁sentence|&gt;...&lt;|User|&gt;{query}&lt;|Assistant|&gt;{response}&lt;|end▁of▁sentence|&gt;`
Prefix:
`&lt;|User|&gt;{query}&lt;|Assistant|&gt;<think>`
The multi-turn template is the same with non-thinking multi-turn chat template. It means the thinking token in the last turn will be dropped but the `</think>` is retained in every turn of context.
### ToolCall
Toolcall is supported in non-thinking mode. The format is:
`&lt;|begin▁of▁sentence|&gt;{system prompt}\n\n{tool_description}&lt;|User|&gt;{query}&lt;|Assistant|&gt;` where the tool_description is
```
## Tools
You have access to the following tools:
### {tool_name1}
Description: {description}
Parameters: {json.dumps(parameters)}
IMPORTANT: ALWAYS adhere to this exact format for tool use:
&lt;|tool▁calls▁begin|&gt;&lt;|tool▁call▁begin|&gt;tool_call_name&lt;|tool▁sep|&gt;tool_call_arguments&lt;|tool▁call▁end|&gt;{additional_tool_calls}&lt;|tool▁calls▁end|&gt;
Where:
- `tool_call_name` must be an exact match to one of the available tools
- `tool_call_arguments` must be valid JSON that strictly follows the tool's Parameters Schema
- For multiple tool calls, chain them directly without separators or spaces
```
### Code-Agent
We support various code agent frameworks. Please refer to the above toolcall format to create your own code agents. An example is shown in `assets/code_agent_trajectory.html`.
### Search-Agent
We design a specific format for searching toolcall in thinking mode, to support search agent.
For complex questions that require accessing external or up-to-date information, DeepSeek-V3.1 can leverage a user-provided search tool through a multi-turn tool-calling process.
Please refer to the `assets/search_tool_trajectory.html` and `assets/search_python_tool_trajectory.html` for the detailed template.
## Evaluation
| Category | Benchmark (Metric) | DeepSeek V3.1-NonThinking | DeepSeek V3 0324 | DeepSeek V3.1-Thinking | DeepSeek R1 0528
|----------|----------------------------------|-----------------|---|---|---|
| General |
| | MMLU-Redux (EM) | 91.8 | 90.5 | 93.7 | 93.4
| | MMLU-Pro (EM) | 83.7 | 81.2 | 84.8 | 85.0
| | GPQA-Diamond (Pass@1) | 74.9 | 68.4 | 80.1 | 81.0
| | Humanity's Last Exam (Pass@1) | - | - | 15.9 | 17.7
|Search Agent|
| | BrowseComp | - | - | 30.0 | 8.9
| | BrowseComp_zh | - | - | 49.2 | 35.7
| | Humanity's Last Exam (Python + Search) |- | - | 29.8 | 24.8
| | SimpleQA | - | - | 93.4 | 92.3
| Code |
| | LiveCodeBench (2408-2505) (Pass@1) | 56.4 | 43.0 | 74.8 | 73.3
| | Codeforces-Div1 (Rating) | - | - | 2091 | 1930
| | Aider-Polyglot (Acc.) | 68.4 | 55.1 | 76.3 | 71.6
| Code Agent|
| | SWE Verified (Agent mode) | 66.0 | 45.4 | - | 44.6
| | SWE-bench Multilingual (Agent mode) | 54.5 | 29.3 | - | 30.5
| | Terminal-bench (Terminus 1 framework) | 31.3 | 13.3 | - | 5.7
| Math |
| | AIME 2024 (Pass@1) | 66.3 | 59.4 | 93.1 | 91.4
| | AIME 2025 (Pass@1) | 49.8 | 51.3 | 88.4 | 87.5
| | HMMT 2025 (Pass@1) | 33.5 | 29.2 | 84.2 | 79.4 |
Note:
- Search agents are evaluated with our internal search framework, which uses a commercial search API + webpage filter + 128K context window. Seach agent results of R1-0528 are eva... |
http://localhost:1200/huggingface/models/facebook - Success ✔️<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
<channel>
<title>Huggingface facebook Models</title>
<link>https://huggingface.co/facebook/models?sort=created</link>
<atom:link href="http://localhost:1200/huggingface/models/facebook" rel="self" type="application/rss+xml"></atom:link>
<description>Huggingface facebook Models - Powered by RSSHub</description>
<generator>RSSHub</generator>
<webMaster>contact@rsshub.app (RSSHub)</webMaster>
<language>en</language>
<lastBuildDate>Sat, 06 Dec 2025 11:14:55 GMT</lastBuildDate>
<ttl>5</ttl>
<item>
<title>facebook/testing_instructions</title>
<description>facebook/testing_instructions Updated 2 days ago • 2
---
extra_gated_fields:
First Name: text
Last Name: text
Date of birth: date_picker
Country: country
Affiliation: text
I accept the terms and conditions: checkbox
geo: ip_location
Test request?: checkbox
By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox
extra_gated_description: &gt;-
The information you provide will be collected, stored, processed and shared in
accordance with the [Meta Privacy
Policy](https://www.facebook.com/privacy/policy/).
extra_gated_button_content: Submit
extra_gated_heading: "Please be sure to provide your full legal name, date of birth, and full organization name with all corporate identifiers. Avoid the use of acronyms and special characters. Failure to follow these instructions may prevent you from accessing this model and others on Hugging Face. You will not have the ability to edit this form after submission, so please ensure all information is accurate."
language:
- en
tags:
- meta-ai
- meta-pytorch
license: fair-noncommercial-research-license
---</description>
<link>https://huggingface.co/facebook/testing_instructions</link>
<guid isPermaLink="false">https://huggingface.co/facebook/testing_instructions</guid>
<pubDate>Thu, 04 Dec 2025 00:10:57 GMT</pubDate>
</item>
<item>
<title>facebook/omniASR-LLM-7B-ZS</title>
<description>facebook/omniASR-LLM-7B-ZS Automatic Speech Recognition • Updated 8 days ago • 3
---
license: apache-2.0
datasets:
- facebook/omnilingual-asr-corpus
pipeline_tag: automatic-speech-recognition
---
# Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
<div align="center" style="lline-height: 1.2; font-size:16px; margin-bottom: 30px;">
<a href="https://huggingface.co/facebook" target="_blank" style="margin: 2px;">
🤗 Hugging Face
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr" target="_blank" style="margin: 2px;">
🐙 GitHub
</a> |
<a href="https://huggingface.co/spaces/facebook/omniasr-transcriptions" target="_blank" style="margin: 2px;">
🤖️ Demo
</a> |
<a href="https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/" target="_blank" style="margin: 2px;">
📃 Paper
</a> |
<a href="https://ai.meta.com/blog/omnilingual-asr-advancing-automatic-speech-recognition/" target="_blank" style="margin: 2px;">
📝 Blogpost
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr/blob/main/LICENSE" style="margin: 2px;">
📄 License: Apache 2.0
</a>
</div>
# Model Card for omniASR-LLM-7B-ZS
## Model Description
This model is part of the **Omnilingual ASR** family released by Meta AI. The original suite includes:
<!-- TODO : add new tokenizer, we'll get two tokenizer, add mssing speed numbers-->
| Model Name | Features | Parameters | Download Size (FP32) | Inference VRAM¹ | Real-Time Factor¹ (relative speed)² |
|---------------------|---------------|------------:|---------------:|---------------:|-----------:|
| [`omniASR_W2V_300M`](https://huggingface.co/facebook/omniASR-W2V-300M) | SSL | 317_390_592 | 1.2 GiB | | |
| [`omniASR_W2V_1B`](https://huggingface.co/facebook/omniASR-W2V-1B) | SSL | 965_514_752 | 3.6 GiB | | |
| [`omniASR_W2V_3B`](https://huggingface.co/facebook/omniASR-W2V-3B) | SSL | 3_064_124_672 | 12.0 GiB | | |
| [`omniASR_W2V_7B`](https://huggingface.co/facebook/omniASR-W2V-7B) | SSL | 6_488_487_168 | 25.0 GiB | | |
| [`omniASR_CTC_300M`](https://huggingface.co/facebook/omniASR-CTC-300M) | ASR | 325_494_996 | 1.3 GiB | ~2 GiB | 0.001 (96x) |
| [`omniASR_CTC_1B`](https://huggingface.co/facebook/omniASR-CTC-1B) | ASR | 975_065_300 | 3.7 GiB | ~3 GiB | 0.002 (48x) |
| [`omniASR_CTC_3B`](https://huggingface.co/facebook/omniASR-CTC-3B) | ASR | 3_080_423_636 | 12.0 GiB | ~8 GiB | 0.003 (32x) |
| [`omniASR_CTC_7B`](https://huggingface.co/facebook/omniASR-CTC-7B) | ASR | 6_504_786_132 | 25.0 GiB | ~15 GiB | 0.006 (16x) |
| [`omniASR_LLM_300M`](https://huggingface.co/facebook/omniASR-LLM-300M) | ASR with optional language conditioning | 1_627_603_584 | 6.1 GiB | ~5 GiB | 0.090 (~1x) |
| [`omniASR_LLM_1B`](https://huggingface.co/facebook/omniASR-LLM-1B) | ASR with optional language conditioning | 2_275_710_592 | 8.5 GiB | ~6 GiB | 0.091 (~1x) |
| [`omniASR_LLM_3B`](https://huggingface.co/facebook/omniASR-LLM-3B) | ASR with optional language conditioning | 4_376_679_040 | 17.0 GiB | ~10 GiB | 0.093 (~1x) |
| [`omniASR_LLM_7B`](https://huggingface.co/facebook/omniASR-LLM-7B) | ASR with optional language conditioning | 7_801_041_536 | 30.0 GiB | ~17 GiB | 0.092 (~1x) |
| [`omniASR_LLM_7B_ZS`](https://huggingface.co/facebook/omniASR-LLM-7B-ZS) | Zero-Shot ASR | 7_810_900_608 | 30.0 GiB | ~20 GiB |&nbsp;0.194 (~0.5x) |
¹ (batch=1, audio_len=30s, BF16, A100)
² Relative speed to `omniASR_LLM_7B`
---
## Installation
The models were developed using [fairseq2](https://github.com/facebookresearch/fairseq2), a research-focused sequence modeling toolkit. While we provide a **reference** inference pipeline that works across platforms, audio support requires [libsndfile](https://github.com/facebookresearch/fairseq2?tab=readme-ov-file#system-dependencies) (Mac: `brew install libsndfile`; Windows may need an additional [setup](https://github.com/facebookresearch/fairseq2?tab=readme-ov-file#installing-on-windows)).
```bash
# using pip
pip install omnilingual-asr
# using uv
uv add omnilingual-asr
```
## Inference
```python
from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")
audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
lang = ["eng_Latn", "deu_Latn"]
transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)
```
## Supported Languages
To view the full list of 1600+ supported languages, you can access the language list [programmatically](/src/omnilingual_asr/models/wav2vec2_llama/lang_ids.py):
```python
from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs
# Print all supported languages
print(f"Total supported languages: {len(supported_langs)}")
print(supported_langs)
# Check if a specific language is supported
if "eng_Latn" in supported_langs:
print("English (Latin script) is supported!")
```
Languages follow the format `{language_code}_{script}`, for example `eng_Latn` - English (Latin script), `cmn_Hans` - Mandarin Chinese (Simplified), ...
---
## Training
To further finetune the released checkpoints on your own data, use our [data preparation guide](/workflows/dataprep/README.md) followed by the [finetuning recipe guide](/workflows/recipes/wav2vec2/asr/README.md).
---
## Citation
**BibTeX:**
```bibtex
@misc{omnilingualasr2025,
title={{Omnilingual ASR}: Open-Source Multilingual Speech Recognition for 1600+ Languages},
author={{Omnilingual ASR Team} and Keren, Gil and Kozhevnikov, Artyom and Meng, Yen and Ropers, Christophe and Setzler, Matthew and Wang, Skyler and Adebara, Ife and Auli, Michael and Can, Balioglu and Chan, Kevin and Cheng, Chierh and Chuang, Joe and Droof, Caley and Duppenthaler, Mark and Duquenne, Paul-Ambroise and Erben, Alexander and Gao, Cynthia and Mejia Gonzalez, Gabriel and Lyu, Kehan and Miglani, Sagar and Pratap, Vineel and Sadagopan, Kaushik Ram and Saleem, Safiyyah and Turkatenko, Arina and Ventayol-Boada, Albert and Yong, Zheng-Xin and Chung, Yu-An and Maillard, Jean and Moritz, Rashel and Mourachko, Alexandre and Williamson, Mary and Yates, Shireen},
year={2025},
url={https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/},
}
[1]: https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file "GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages"
|
| Model Name | Features | Parameters | Download Size (FP32) | Inference VRAM¹ | Real-Time Factor¹ (relative speed)² |
|---|---|---|---|---|---|
omniASR_W2V_300M |
SSL | 317_390_592 | 1.2 GiB | ||
omniASR_W2V_1B |
SSL | 965_514_752 | 3.6 GiB | ||
omniASR_W2V_3B |
SSL | 3_064_124_672 | 12.0 GiB | ||
omniASR_W2V_7B |
SSL | 6_488_487_168 | 25.0 GiB | ||
omniASR_CTC_300M |
ASR | 325_494_996 | 1.3 GiB | ~2 GiB | 0.001 (96x) |
omniASR_CTC_1B |
ASR | 975_065_300 | 3.7 GiB | ~3 GiB | 0.002 (48x) |
omniASR_CTC_3B |
ASR | 3_080_423_636 | 12.0 GiB | ~8 GiB | 0.003 (32x) |
omniASR_CTC_7B |
ASR | 6_504_786_132 | 25.0 GiB | ~15 GiB | 0.006 (16x) |
omniASR_LLM_300M |
ASR with optional language conditioning | 1_627_603_584 | 6.1 GiB | ~5 GiB | 0.090 (~1x) |
omniASR_LLM_1B |
ASR with optional language conditioning | 2_275_710_592 | 8.5 GiB | ~6 GiB | 0.091 (~1x) |
omniASR_LLM_3B |
ASR with optional language conditioning | 4_376_679_040 | 17.0 GiB | ~10 GiB | 0.093 (~1x) |
omniASR_LLM_7B |
ASR with optional language conditioning | 7_801_041_536 | 30.0 GiB | ~17 GiB | 0.092 (~1x) |
omniASR_LLM_7B_ZS |
Zero-Shot ASR | 7_810_900_608 | 30.0 GiB | ~20 GiB | 0.194 (~0.5x) |
| ¹ (batch=1, audio_len=30s, BF16, A100) | |||||
² Relative speed to omniASR_LLM_7B |
Installation
The models were developed using fairseq2, a research-focused sequence modeling toolkit. While we provide a reference inference pipeline that works across platforms, audio support requires libsndfile (Mac: brew install libsndfile; Windows may need an additional setup).
# using pip
pip install omnilingual-asr
# using uv
uv add omnilingual-asrInference
from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")
audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
lang = ["eng_Latn", "deu_Latn"]
transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)Supported Languages
To view the full list of 1600+ supported languages, you can access the language list programmatically:
from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs
# Print all supported languages
print(f"Total supported languages: {len(supported_langs)}")
print(supported_langs)
# Check if a specific language is supported
if "eng_Latn" in supported_langs:
print("English (Latin script) is supported!")Languages follow the format {language_code}_{script}, for example eng_Latn - English (Latin script), cmn_Hans - Mandarin Chinese (Simplified), ...
Training
To further finetune the released checkpoints on your own data, use our data preparation guide followed by the finetuning recipe guide.
Citation
BibTeX:
@misc{omnilingualasr2025,
title={{Omnilingual ASR}: Open-Source Multilingual Speech Recognition for 1600+ Languages},
author={{Omnilingual ASR Team} and Keren, Gil and Kozhevnikov, Artyom and Meng, Yen and Ropers, Christophe and Setzler, Matthew and Wang, Skyler and Adebara, Ife and Auli, Michael and Can, Balioglu and Chan, Kevin and Cheng, Chierh and Chuang, Joe and Droof, Caley and Duppenthaler, Mark and Duquenne, Paul-Ambroise and Erben, Alexander and Gao, Cynthia and Mejia Gonzalez, Gabriel and Lyu, Kehan and Miglani, Sagar and Pratap, Vineel and Sadagopan, Kaushik Ram and Saleem, Safiyyah and Turkatenko, Arina and Ventayol-Boada, Albert and Yong, Zheng-Xin and Chung, Yu-An and Maillard, Jean and Moritz, Rashel and Mourachko, Alexandre and Williamson, Mary and Yates, Shireen},
year={2025},
url={https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/},
}- Developed by: Meta AI / Omnilingual ASR Team([GitHub][1])
- Model type: End-to-end automatic speech recognition model (wav2vec2-style encoder with CTC head / encoder-decoder, depending on checkpoint).
- Language(s) (NLP): 1,600+ languages overall in Omnilingual ASR; this corpus release specifically covers 348 under-served languages across many writing systems (Latin, Arabic, Devanagari, etc.).([GitHub][1])
- License: Apache-2.0 (for the model and code), CC-BY-4.0 for the
facebook/omnilingual-asr-corpusdataset.([GitHub][1])
[1]: https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file "GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages"
[2]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus/blob/main/README.md "README.md · facebook/omnilingual-asr-corpus at main"
[3]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus?utm_source=chatgpt.com "facebook/omnilingual-asr-corpus · Datasets at ..."
[4]: https://venturebeat.com/ai/meta-returns-to-open-source-ai-with-omnilingual-asr-models-that-can?utm_source=chatgpt.com "Meta returns to open source AI with Omnilingual ASR ..."
[5]: https://huggingface.co/spaces/facebook/omniasr-transcriptions?utm_source=chatgpt.com "Omnilingual ASR Media Transcription"
[6]: https://huggingface.co/collections/bezzam/omnilingual-asr-1-600-languages?utm_source=chatgpt.com "Omnilingual ASR (1600+ Languages) - a bezzam Collection"
https://huggingface.co/facebook/omniASR-LLM-7Bhttps://huggingface.co/facebook/omniASR-LLM-7BThu, 27 Nov 2025 23:26:06 GMT<title>facebook/omniASR-LLM-3B</title>facebook/omniASR-LLM-3B Automatic Speech Recognition • Updated 8 days ago • 1
license: apache-2.0
datasets:
- facebook/omnilingual-asr-corpus
pipeline_tag: automatic-speech-recognition
Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
<div align="center" style="lline-height: 1.2; font-size:16px; margin-bottom: 30px;">
<a href="https://huggingface.co/facebook" target="_blank" style="margin: 2px;">
🤗 Hugging Face
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr" target="_blank" style="margin: 2px;">
🐙 GitHub
</a> |
<a href="https://huggingface.co/spaces/facebook/omniasr-transcriptions" target="_blank" style="margin: 2px;">
🤖️ Demo
</a> |
<a href="https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/" target="_blank" style="margin: 2px;">
📃 Paper
</a> |
<a href="https://ai.meta.com/blog/omnilingual-asr-advancing-automatic-speech-recognition/" target="_blank" style="margin: 2px;">
📝 Blogpost
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr/blob/main/LICENSE" style="margin: 2px;">
📄 License: Apache 2.0
</a>
</div>
Model Card for omniASR-LLM-3B
Model Description
This model is part of the Omnilingual ASR family released by Meta AI. The original suite includes:
<!-- TODO : add new tokenizer, we'll get two tokenizer, add mssing speed numbers-->
| Model Name | Features | Parameters | Download Size (FP32) | Inference VRAM¹ | Real-Time Factor¹ (relative speed)² |
|---|---|---|---|---|---|
omniASR_W2V_300M |
SSL | 317_390_592 | 1.2 GiB | ||
omniASR_W2V_1B |
SSL | 965_514_752 | 3.6 GiB | ||
omniASR_W2V_3B |
SSL | 3_064_124_672 | 12.0 GiB | ||
omniASR_W2V_7B |
SSL | 6_488_487_168 | 25.0 GiB | ||
omniASR_CTC_300M |
ASR | 325_494_996 | 1.3 GiB | ~2 GiB | 0.001 (96x) |
omniASR_CTC_1B |
ASR | 975_065_300 | 3.7 GiB | ~3 GiB | 0.002 (48x) |
omniASR_CTC_3B |
ASR | 3_080_423_636 | 12.0 GiB | ~8 GiB | 0.003 (32x) |
omniASR_CTC_7B |
ASR | 6_504_786_132 | 25.0 GiB | ~15 GiB | 0.006 (16x) |
omniASR_LLM_300M |
ASR with optional language conditioning | 1_627_603_584 | 6.1 GiB | ~5 GiB | 0.090 (~1x) |
omniASR_LLM_1B |
ASR with optional language conditioning | 2_275_710_592 | 8.5 GiB | ~6 GiB | 0.091 (~1x) |
omniASR_LLM_3B |
ASR with optional language conditioning | 4_376_679_040 | 17.0 GiB | ~10 GiB | 0.093 (~1x) |
omniASR_LLM_7B |
ASR with optional language conditioning | 7_801_041_536 | 30.0 GiB | ~17 GiB | 0.092 (~1x) |
omniASR_LLM_7B_ZS |
Zero-Shot ASR | 7_810_900_608 | 30.0 GiB | ~20 GiB | 0.194 (~0.5x) |
| ¹ (batch=1, audio_len=30s, BF16, A100) | |||||
² Relative speed to omniASR_LLM_7B |
Installation
The models were developed using fairseq2, a research-focused sequence modeling toolkit. While we provide a reference inference pipeline that works across platforms, audio support requires libsndfile (Mac: brew install libsndfile; Windows may need an additional setup).
# using pip
pip install omnilingual-asr
# using uv
uv add omnilingual-asrInference
from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")
audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
lang = ["eng_Latn", "deu_Latn"]
transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)Supported Languages
To view the full list of 1600+ supported languages, you can access the language list programmatically:
from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs
# Print all supported languages
print(f"Total supported languages: {len(supported_langs)}")
print(supported_langs)
# Check if a specific language is supported
if "eng_Latn" in supported_langs:
print("English (Latin script) is supported!")Languages follow the format {language_code}_{script}, for example eng_Latn - English (Latin script), cmn_Hans - Mandarin Chinese (Simplified), ...
Training
To further finetune the released checkpoints on your own data, use our data preparation guide followed by the finetuning recipe guide.
Citation
BibTeX:
@misc{omnilingualasr2025,
title={{Omnilingual ASR}: Open-Source Multilingual Speech Recognition for 1600+ Languages},
author={{Omnilingual ASR Team} and Keren, Gil and Kozhevnikov, Artyom and Meng, Yen and Ropers, Christophe and Setzler, Matthew and Wang, Skyler and Adebara, Ife and Auli, Michael and Can, Balioglu and Chan, Kevin and Cheng, Chierh and Chuang, Joe and Droof, Caley and Duppenthaler, Mark and Duquenne, Paul-Ambroise and Erben, Alexander and Gao, Cynthia and Mejia Gonzalez, Gabriel and Lyu, Kehan and Miglani, Sagar and Pratap, Vineel and Sadagopan, Kaushik Ram and Saleem, Safiyyah and Turkatenko, Arina and Ventayol-Boada, Albert and Yong, Zheng-Xin and Chung, Yu-An and Maillard, Jean and Moritz, Rashel and Mourachko, Alexandre and Williamson, Mary and Yates, Shireen},
year={2025},
url={https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/},
}- Developed by: Meta AI / Omnilingual ASR Team([GitHub][1])
- Model type: End-to-end automatic speech recognition model (wav2vec2-style encoder with CTC head / encoder-decoder, depending on checkpoint).
- Language(s) (NLP): 1,600+ languages overall in Omnilingual ASR; this corpus release specifically covers 348 under-served languages across many writing systems (Latin, Arabic, Devanagari, etc.).([GitHub][1])
- License: Apache-2.0 (for the model and code), CC-BY-4.0 for the
facebook/omnilingual-asr-corpusdataset.([GitHub][1])
[1]: https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file "GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages"
[2]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus/blob/main/README.md "README.md · facebook/omnilingual-asr-corpus at main"
[3]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus?utm_source=chatgpt.com "facebook/omnilingual-asr-corpus · Datasets at ..."
[4]: https://venturebeat.com/ai/meta-returns-to-open-source-ai-with-omnilingual-asr-models-that-can?utm_source=chatgpt.com "Meta returns to open source AI with Omnilingual ASR ..."
[5]: https://huggingface.co/spaces/facebook/omniasr-transcriptions?utm_source=chatgpt.com "Omnilingual ASR Media Transcription"
[6]: https://huggingface.co/collections/bezzam/omnilingual-asr-1-600-languages?utm_source=chatgpt.com "Omnilingual ASR (1600+ Languages) - a bezzam Collection"
https://huggingface.co/facebook/omniASR-LLM-3Bhttps://huggingface.co/facebook/omniASR-LLM-3BThu, 27 Nov 2025 23:08:56 GMT<title>facebook/omniASR-LLM-1B</title>facebook/omniASR-LLM-1B Automatic Speech Recognition • Updated 8 days ago • 3
license: apache-2.0
datasets:
- facebook/omnilingual-asr-corpus
pipeline_tag: automatic-speech-recognition
Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
<div align="center" style="lline-height: 1.2; font-size:16px; margin-bottom: 30px;">
<a href="https://huggingface.co/facebook" target="_blank" style="margin: 2px;">
🤗 Hugging Face
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr" target="_blank" style="margin: 2px;">
🐙 GitHub
</a> |
<a href="https://huggingface.co/spaces/facebook/omniasr-transcriptions" target="_blank" style="margin: 2px;">
🤖️ Demo
</a> |
<a href="https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/" target="_blank" style="margin: 2px;">
📃 Paper
</a> |
<a href="https://ai.meta.com/blog/omnilingual-asr-advancing-automatic-speech-recognition/" target="_blank" style="margin: 2px;">
📝 Blogpost
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr/blob/main/LICENSE" style="margin: 2px;">
📄 License: Apache 2.0
</a>
</div>
Model Card for omniASR-LLM-1B
Model Description
This model is part of the Omnilingual ASR family released by Meta AI. The original suite includes:
<!-- TODO : add new tokenizer, we'll get two tokenizer, add mssing speed numbers-->
| Model Name | Features | Parameters | Download Size (FP32) | Inference VRAM¹ | Real-Time Factor¹ (relative speed)² |
|---|---|---|---|---|---|
omniASR_W2V_300M |
SSL | 317_390_592 | 1.2 GiB | ||
omniASR_W2V_1B |
SSL | 965_514_752 | 3.6 GiB | ||
omniASR_W2V_3B |
SSL | 3_064_124_672 | 12.0 GiB | ||
omniASR_W2V_7B |
SSL | 6_488_487_168 | 25.0 GiB | ||
omniASR_CTC_300M |
ASR | 325_494_996 | 1.3 GiB | ~2 GiB | 0.001 (96x) |
omniASR_CTC_1B |
ASR | 975_065_300 | 3.7 GiB | ~3 GiB | 0.002 (48x) |
omniASR_CTC_3B |
ASR | 3_080_423_636 | 12.0 GiB | ~8 GiB | 0.003 (32x) |
omniASR_CTC_7B |
ASR | 6_504_786_132 | 25.0 GiB | ~15 GiB | 0.006 (16x) |
omniASR_LLM_300M |
ASR with optional language conditioning | 1_627_603_584 | 6.1 GiB | ~5 GiB | 0.090 (~1x) |
omniASR_LLM_1B |
ASR with optional language conditioning | 2_275_710_592 | 8.5 GiB | ~6 GiB | 0.091 (~1x) |
omniASR_LLM_3B |
ASR with optional language conditioning | 4_376_679_040 | 17.0 GiB | ~10 GiB | 0.093 (~1x) |
omniASR_LLM_7B |
ASR with optional language conditioning | 7_801_041_536 | 30.0 GiB | ~17 GiB | 0.092 (~1x) |
omniASR_LLM_7B_ZS |
Zero-Shot ASR | 7_810_900_608 | 30.0 GiB | ~20 GiB | 0.194 (~0.5x) |
| ¹ (batch=1, audio_len=30s, BF16, A100) | |||||
² Relative speed to omniASR_LLM_7B |
Installation
The models were developed using fairseq2, a research-focused sequence modeling toolkit. While we provide a reference inference pipeline that works across platforms, audio support requires libsndfile (Mac: brew install libsndfile; Windows may need an additional setup).
# using pip
pip install omnilingual-asr
# using uv
uv add omnilingual-asrInference
from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")
audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
lang = ["eng_Latn", "deu_Latn"]
transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)Supported Languages
To view the full list of 1600+ supported languages, you can access the language list programmatically:
from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs
# Print all supported languages
print(f"Total supported languages: {len(supported_langs)}")
print(supported_langs)
# Check if a specific language is supported
if "eng_Latn" in supported_langs:
print("English (Latin script) is supported!")Languages follow the format {language_code}_{script}, for example eng_Latn - English (Latin script), cmn_Hans - Mandarin Chinese (Simplified), ...
Training
To further finetune the released checkpoints on your own data, use our data preparation guide followed by the finetuning recipe guide.
Citation
BibTeX:
@misc{omnilingualasr2025,
title={{Omnilingual ASR}: Open-Source Multilingual Speech Recognition for 1600+ Languages},
author={{Omnilingual ASR Team} and Keren, Gil and Kozhevnikov, Artyom and Meng, Yen and Ropers, Christophe and Setzler, Matthew and Wang, Skyler and Adebara, Ife and Auli, Michael and Can, Balioglu and Chan, Kevin and Cheng, Chierh and Chuang, Joe and Droof, Caley and Duppenthaler, Mark and Duquenne, Paul-Ambroise and Erben, Alexander and Gao, Cynthia and Mejia Gonzalez, Gabriel and Lyu, Kehan and Miglani, Sagar and Pratap, Vineel and Sadagopan, Kaushik Ram and Saleem, Safiyyah and Turkatenko, Arina and Ventayol-Boada, Albert and Yong, Zheng-Xin and Chung, Yu-An and Maillard, Jean and Moritz, Rashel and Mourachko, Alexandre and Williamson, Mary and Yates, Shireen},
year={2025},
url={https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/},
}- Developed by: Meta AI / Omnilingual ASR Team([GitHub][1])
- Model type: End-to-end automatic speech recognition model (wav2vec2-style encoder with CTC head / encoder-decoder, depending on checkpoint).
- Language(s) (NLP): 1,600+ languages overall in Omnilingual ASR; this corpus release specifically covers 348 under-served languages across many writing systems (Latin, Arabic, Devanagari, etc.).([GitHub][1])
- License: Apache-2.0 (for the model and code), CC-BY-4.0 for the
facebook/omnilingual-asr-corpusdataset.([GitHub][1])
[1]: https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file "GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages"
[2]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus/blob/main/README.md "README.md · facebook/omnilingual-asr-corpus at main"
[3]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus?utm_source=chatgpt.com "facebook/omnilingual-asr-corpus · Datasets at ..."
[4]: https://venturebeat.com/ai/meta-returns-to-open-source-ai-with-omnilingual-asr-models-that-can?utm_source=chatgpt.com "Meta returns to open source AI with Omnilingual ASR ..."
[5]: https://huggingface.co/spaces/facebook/omniasr-transcriptions?utm_source=chatgpt.com "Omnilingual ASR Media Transcription"
[6]: https://huggingface.co/collections/bezzam/omnilingual-asr-1-600-languages?utm_source=chatgpt.com "Omnilingual ASR (1600+ Languages) - a bezzam Collection"
https://huggingface.co/facebook/omniASR-LLM-1Bhttps://huggingface.co/facebook/omniASR-LLM-1BThu, 27 Nov 2025 22:37:27 GMT<title>facebook/omniASR-LLM-300M</title>facebook/omniASR-LLM-300M Automatic Speech Recognition • Updated 8 days ago • 3
license: apache-2.0
datasets:
- facebook/omnilingual-asr-corpus
pipeline_tag: automatic-speech-recognition
Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
<div align="center" style="lline-height: 1.2; font-size:16px; margin-bottom: 30px;">
<a href="https://huggingface.co/facebook" target="_blank" style="margin: 2px;">
🤗 Hugging Face
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr" target="_blank" style="margin: 2px;">
🐙 GitHub
</a> |
<a href="https://huggingface.co/spaces/facebook/omniasr-transcriptions" target="_blank" style="margin: 2px;">
🤖️ Demo
</a> |
<a href="https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/" target="_blank" style="margin: 2px;">
📃 Paper
</a> |
<a href="https://ai.meta.com/blog/omnilingual-asr-advancing-automatic-speech-recognition/" target="_blank" style="margin: 2px;">
📝 Blogpost
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr/blob/main/LICENSE" style="margin: 2px;">
📄 License: Apache 2.0
</a>
</div>
Model Card for omniASR-LLM-300M
Model Description
This model is part of the Omnilingual ASR family released by Meta AI. The original suite includes:
<!-- TODO : add new tokenizer, we'll get two tokenizer, add mssing speed numbers-->
| Model Name | Features | Parameters | Download Size (FP32) | Inference VRAM¹ | Real-Time Factor¹ (relative speed)² |
|---|---|---|---|---|---|
omniASR_W2V_300M |
SSL | 317_390_592 | 1.2 GiB | ||
omniASR_W2V_1B |
SSL | 965_514_752 | 3.6 GiB | ||
omniASR_W2V_3B |
SSL | 3_064_124_672 | 12.0 GiB | ||
omniASR_W2V_7B |
SSL | 6_488_487_168 | 25.0 GiB | ||
omniASR_CTC_300M |
ASR | 325_494_996 | 1.3 GiB | ~2 GiB | 0.001 (96x) |
omniASR_CTC_1B |
ASR | 975_065_300 | 3.7 GiB | ~3 GiB | 0.002 (48x) |
omniASR_CTC_3B |
ASR | 3_080_423_636 | 12.0 GiB | ~8 GiB | 0.003 (32x) |
omniASR_CTC_7B |
ASR | 6_504_786_132 | 25.0 GiB | ~15 GiB | 0.006 (16x) |
omniASR_LLM_300M |
ASR with optional language conditioning | 1_627_603_584 | 6.1 GiB | ~5 GiB | 0.090 (~1x) |
omniASR_LLM_1B |
ASR with optional language conditioning | 2_275_710_592 | 8.5 GiB | ~6 GiB | 0.091 (~1x) |
omniASR_LLM_3B |
ASR with optional language conditioning | 4_376_679_040 | 17.0 GiB | ~10 GiB | 0.093 (~1x) |
omniASR_LLM_7B |
ASR with optional language conditioning | 7_801_041_536 | 30.0 GiB | ~17 GiB | 0.092 (~1x) |
omniASR_LLM_7B_ZS |
Zero-Shot ASR | 7_810_900_608 | 30.0 GiB | ~20 GiB | 0.194 (~0.5x) |
| ¹ (batch=1, audio_len=30s, BF16, A100) | |||||
² Relative speed to omniASR_LLM_7B |
Installation
The models were developed using fairseq2, a research-focused sequence modeling toolkit. While we provide a reference inference pipeline that works across platforms, audio support requires libsndfile (Mac: brew install libsndfile; Windows may need an additional setup).
# using pip
pip install omnilingual-asr
# using uv
uv add omnilingual-asrInference
from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")
audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
lang = ["eng_Latn", "deu_Latn"]
transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)Supported Languages
To view the full list of 1600+ supported languages, you can access the language list programmatically:
from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs
# Print all supported languages
print(f"Total supported languages: {len(supported_langs)}")
print(supported_langs)
# Check if a specific language is supported
if "eng_Latn" in supported_langs:
print("English (Latin script) is supported!")Languages follow the format {language_code}_{script}, for example eng_Latn - English (Latin script), cmn_Hans - Mandarin Chinese (Simplified), ...
Training
To further finetune the released checkpoints on your own data, use our data preparation guide followed by the finetuning recipe guide.
Citation
BibTeX:
@misc{omnilingualasr2025,
title={{Omnilingual ASR}: Open-Source Multilingual Speech Recognition for 1600+ Languages},
author={{Omnilingual ASR Team} and Keren, Gil and Kozhevnikov, Artyom and Meng, Yen and Ropers, Christophe and Setzler, Matthew and Wang, Skyler and Adebara, Ife and Auli, Michael and Can, Balioglu and Chan, Kevin and Cheng, Chierh and Chuang, Joe and Droof, Caley and Duppenthaler, Mark and Duquenne, Paul-Ambroise and Erben, Alexander and Gao, Cynthia and Mejia Gonzalez, Gabriel and Lyu, Kehan and Miglani, Sagar and Pratap, Vineel and Sadagopan, Kaushik Ram and Saleem, Safiyyah and Turkatenko, Arina and Ventayol-Boada, Albert and Yong, Zheng-Xin and Chung, Yu-An and Maillard, Jean and Moritz, Rashel and Mourachko, Alexandre and Williamson, Mary and Yates, Shireen},
year={2025},
url={https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/},
}- Developed by: Meta AI / Omnilingual ASR Team([GitHub][1])
- Model type: End-to-end automatic speech recognition model (wav2vec2-style encoder with CTC head / encoder-decoder, depending on checkpoint).
- Language(s) (NLP): 1,600+ languages overall in Omnilingual ASR; this corpus release specifically covers 348 under-served languages across many writing systems (Latin, Arabic, Devanagari, etc.).([GitHub][1])
- License: Apache-2.0 (for the model and code), CC-BY-4.0 for the
facebook/omnilingual-asr-corpusdataset.([GitHub][1])
[1]: https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file "GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages"
[2]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus/blob/main/README.md "README.md · facebook/omnilingual-asr-corpus at main"
[3]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus?utm_source=chatgpt.com "facebook/omnilingual-asr-corpus · Datasets at ..."
[4]: https://venturebeat.com/ai/meta-returns-to-open-source-ai-with-omnilingual-asr-models-that-can?utm_source=chatgpt.com "Meta returns to open source AI with Omnilingual ASR ..."
[5]: https://huggingface.co/spaces/facebook/omniasr-transcriptions?utm_source=chatgpt.com "Omnilingual ASR Media Transcription"
[6]: https://huggingface.co/collections/bezzam/omnilingual-asr-1-600-languages?utm_source=chatgpt.com "Omnilingual ASR (1600+ Languages) - a bezzam Collection"
https://huggingface.co/facebook/omniASR-LLM-300Mhttps://huggingface.co/facebook/omniASR-LLM-300MThu, 27 Nov 2025 22:04:09 GMT<title>facebook/omniASR-CTC-7B</title>facebook/omniASR-CTC-7B Automatic Speech Recognition • Updated 8 days ago • 1
license: apache-2.0
datasets:
- facebook/omnilingual-asr-corpus
pipeline_tag: automatic-speech-recognition
Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
<div align="center" style="lline-height: 1.2; font-size:16px; margin-bottom: 30px;">
<a href="https://huggingface.co/facebook" target="_blank" style="margin: 2px;">
🤗 Hugging Face
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr" target="_blank" style="margin: 2px;">
🐙 GitHub
</a> |
<a href="https://huggingface.co/spaces/facebook/omniasr-transcriptions" target="_blank" style="margin: 2px;">
🤖️ Demo
</a> |
<a href="https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/" target="_blank" style="margin: 2px;">
📃 Paper
</a> |
<a href="https://ai.meta.com/blog/omnilingual-asr-advancing-automatic-speech-recognition/" target="_blank" style="margin: 2px;">
📝 Blogpost
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr/blob/main/LICENSE" style="margin: 2px;">
📄 License: Apache 2.0
</a>
</div>
Model Card for omniASR-CTC-7B
Model Description
This model is part of the Omnilingual ASR family released by Meta AI. The original suite includes:
<!-- TODO : add new tokenizer, we'll get two tokenizer, add mssing speed numbers-->
| Model Name | Features | Parameters | Download Size (FP32) | Inference VRAM¹ | Real-Time Factor¹ (relative speed)² |
|---|---|---|---|---|---|
omniASR_W2V_300M |
SSL | 317_390_592 | 1.2 GiB | ||
omniASR_W2V_1B |
SSL | 965_514_752 | 3.6 GiB | ||
omniASR_W2V_3B |
SSL | 3_064_124_672 | 12.0 GiB | ||
omniASR_W2V_7B |
SSL | 6_488_487_168 | 25.0 GiB | ||
omniASR_CTC_300M |
ASR | 325_494_996 | 1.3 GiB | ~2 GiB | 0.001 (96x) |
omniASR_CTC_1B |
ASR | 975_065_300 | 3.7 GiB | ~3 GiB | 0.002 (48x) |
omniASR_CTC_3B |
ASR | 3_080_423_636 | 12.0 GiB | ~8 GiB | 0.003 (32x) |
omniASR_CTC_7B |
ASR | 6_504_786_132 | 25.0 GiB | ~15 GiB | 0.006 (16x) |
omniASR_LLM_300M |
ASR with optional language conditioning | 1_627_603_584 | 6.1 GiB | ~5 GiB | 0.090 (~1x) |
omniASR_LLM_1B |
ASR with optional language conditioning | 2_275_710_592 | 8.5 GiB | ~6 GiB | 0.091 (~1x) |
omniASR_LLM_3B |
ASR with optional language conditioning | 4_376_679_040 | 17.0 GiB | ~10 GiB | 0.093 (~1x) |
omniASR_LLM_7B |
ASR with optional language conditioning | 7_801_041_536 | 30.0 GiB | ~17 GiB | 0.092 (~1x) |
omniASR_LLM_7B_ZS |
Zero-Shot ASR | 7_810_900_608 | 30.0 GiB | ~20 GiB | 0.194 (~0.5x) |
| ¹ (batch=1, audio_len=30s, BF16, A100) | |||||
² Relative speed to omniASR_LLM_7B |
Installation
The models were developed using fairseq2, a research-focused sequence modeling toolkit. While we provide a reference inference pipeline that works across platforms, audio support requires libsndfile (Mac: brew install libsndfile; Windows may need an additional setup).
# using pip
pip install omnilingual-asr
# using uv
uv add omnilingual-asrInference
from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")
audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
lang = ["eng_Latn", "deu_Latn"]
transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)Supported Languages
To view the full list of 1600+ supported languages, you can access the language list programmatically:
from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs
# Print all supported languages
print(f"Total supported languages: {len(supported_langs)}")
print(supported_langs)
# Check if a specific language is supported
if "eng_Latn" in supported_langs:
print("English (Latin script) is supported!")Languages follow the format {language_code}_{script}, for example eng_Latn - English (Latin script), cmn_Hans - Mandarin Chinese (Simplified), ...
Training
To further finetune the released checkpoints on your own data, use our data preparation guide followed by the finetuning recipe guide.
Citation
BibTeX:
@misc{omnilingualasr2025,
title={{Omnilingual ASR}: Open-Source Multilingual Speech Recognition for 1600+ Languages},
author={{Omnilingual ASR Team} and Keren, Gil and Kozhevnikov, Artyom and Meng, Yen and Ropers, Christophe and Setzler, Matthew and Wang, Skyler and Adebara, Ife and Auli, Michael and Can, Balioglu and Chan, Kevin and Cheng, Chierh and Chuang, Joe and Droof, Caley and Duppenthaler, Mark and Duquenne, Paul-Ambroise and Erben, Alexander and Gao, Cynthia and Mejia Gonzalez, Gabriel and Lyu, Kehan and Miglani, Sagar and Pratap, Vineel and Sadagopan, Kaushik Ram and Saleem, Safiyyah and Turkatenko, Arina and Ventayol-Boada, Albert and Yong, Zheng-Xin and Chung, Yu-An and Maillard, Jean and Moritz, Rashel and Mourachko, Alexandre and Williamson, Mary and Yates, Shireen},
year={2025},
url={https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/},
}- Developed by: Meta AI / Omnilingual ASR Team([GitHub][1])
- Model type: End-to-end automatic speech recognition model (wav2vec2-style encoder with CTC head / encoder-decoder, depending on checkpoint).
- Language(s) (NLP): 1,600+ languages overall in Omnilingual ASR; this corpus release specifically covers 348 under-served languages across many writing systems (Latin, Arabic, Devanagari, etc.).([GitHub][1])
- License: Apache-2.0 (for the model and code), CC-BY-4.0 for the
facebook/omnilingual-asr-corpusdataset.([GitHub][1])
[1]: https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file "GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages"
[2]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus/blob/main/README.md "README.md · facebook/omnilingual-asr-corpus at main"
[3]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus?utm_source=chatgpt.com "facebook/omnilingual-asr-corpus · Datasets at ..."
[4]: https://venturebeat.com/ai/meta-returns-to-open-source-ai-with-omnilingual-asr-models-that-can?utm_source=chatgpt.com "Meta returns to open source AI with Omnilingual ASR ..."
[5]: https://huggingface.co/spaces/facebook/omniasr-transcriptions?utm_source=chatgpt.com "Omnilingual ASR Media Transcription"
[6]: https://huggingface.co/collections/bezzam/omnilingual-asr-1-600-languages?utm_source=chatgpt.com "Omnilingual ASR (1600+ Languages) - a bezzam Collection"
https://huggingface.co/facebook/omniASR-CTC-7Bhttps://huggingface.co/facebook/omniASR-CTC-7BThu, 27 Nov 2025 21:31:20 GMT<title>facebook/omniASR-CTC-3B</title>facebook/omniASR-CTC-3B Automatic Speech Recognition • Updated 8 days ago • 2
license: apache-2.0
datasets:
- facebook/omnilingual-asr-corpus
pipeline_tag: automatic-speech-recognition
Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
<div align="center" style="lline-height: 1.2; font-size:16px; margin-bottom: 30px;">
<a href="https://huggingface.co/facebook" target="_blank" style="margin: 2px;">
🤗 Hugging Face
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr" target="_blank" style="margin: 2px;">
🐙 GitHub
</a> |
<a href="https://huggingface.co/spaces/facebook/omniasr-transcriptions" target="_blank" style="margin: 2px;">
🤖️ Demo
</a> |
<a href="https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/" target="_blank" style="margin: 2px;">
📃 Paper
</a> |
<a href="https://ai.meta.com/blog/omnilingual-asr-advancing-automatic-speech-recognition/" target="_blank" style="margin: 2px;">
📝 Blogpost
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr/blob/main/LICENSE" style="margin: 2px;">
📄 License: Apache 2.0
</a>
</div>
Model Card for omniASR-CTC-3B
Model Description
This model is part of the Omnilingual ASR family released by Meta AI. The original suite includes:
<!-- TODO : add new tokenizer, we'll get two tokenizer, add mssing speed numbers-->
| Model Name | Features | Parameters | Download Size (FP32) | Inference VRAM¹ | Real-Time Factor¹ (relative speed)² |
|---|---|---|---|---|---|
omniASR_W2V_300M |
SSL | 317_390_592 | 1.2 GiB | ||
omniASR_W2V_1B |
SSL | 965_514_752 | 3.6 GiB | ||
omniASR_W2V_3B |
SSL | 3_064_124_672 | 12.0 GiB | ||
omniASR_W2V_7B |
SSL | 6_488_487_168 | 25.0 GiB | ||
omniASR_CTC_300M |
ASR | 325_494_996 | 1.3 GiB | ~2 GiB | 0.001 (96x) |
omniASR_CTC_1B |
ASR | 975_065_300 | 3.7 GiB | ~3 GiB | 0.002 (48x) |
omniASR_CTC_3B |
ASR | 3_080_423_636 | 12.0 GiB | ~8 GiB | 0.003 (32x) |
omniASR_CTC_7B |
ASR | 6_504_786_132 | 25.0 GiB | ~15 GiB | 0.006 (16x) |
omniASR_LLM_300M |
ASR with optional language conditioning | 1_627_603_584 | 6.1 GiB | ~5 GiB | 0.090 (~1x) |
omniASR_LLM_1B |
ASR with optional language conditioning | 2_275_710_592 | 8.5 GiB | ~6 GiB | 0.091 (~1x) |
omniASR_LLM_3B |
ASR with optional language conditioning | 4_376_679_040 | 17.0 GiB | ~10 GiB | 0.093 (~1x) |
omniASR_LLM_7B |
ASR with optional language conditioning | 7_801_041_536 | 30.0 GiB | ~17 GiB | 0.092 (~1x) |
omniASR_LLM_7B_ZS |
Zero-Shot ASR | 7_810_900_608 | 30.0 GiB | ~20 GiB | 0.194 (~0.5x) |
| ¹ (batch=1, audio_len=30s, BF16, A100) | |||||
² Relative speed to omniASR_LLM_7B |
Installation
The models were developed using fairseq2, a research-focused sequence modeling toolkit. While we provide a reference inference pipeline that works across platforms, audio support requires libsndfile (Mac: brew install libsndfile; Windows may need an additional setup).
# using pip
pip install omnilingual-asr
# using uv
uv add omnilingual-asrInference
from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")
audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
lang = ["eng_Latn", "deu_Latn"]
transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)Supported Languages
To view the full list of 1600+ supported languages, you can access the language list programmatically:
from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs
# Print all supported languages
print(f"Total supported languages: {len(supported_langs)}")
print(supported_langs)
# Check if a specific language is supported
if "eng_Latn" in supported_langs:
print("English (Latin script) is supported!")Languages follow the format {language_code}_{script}, for example eng_Latn - English (Latin script), cmn_Hans - Mandarin Chinese (Simplified), ...
Training
To further finetune the released checkpoints on your own data, use our data preparation guide followed by the finetuning recipe guide.
Citation
BibTeX:
@misc{omnilingualasr2025,
title={{Omnilingual ASR}: Open-Source Multilingual Speech Recognition for 1600+ Languages},
author={{Omnilingual ASR Team} and Keren, Gil and Kozhevnikov, Artyom and Meng, Yen and Ropers, Christophe and Setzler, Matthew and Wang, Skyler and Adebara, Ife and Auli, Michael and Can, Balioglu and Chan, Kevin and Cheng, Chierh and Chuang, Joe and Droof, Caley and Duppenthaler, Mark and Duquenne, Paul-Ambroise and Erben, Alexander and Gao, Cynthia and Mejia Gonzalez, Gabriel and Lyu, Kehan and Miglani, Sagar and Pratap, Vineel and Sadagopan, Kaushik Ram and Saleem, Safiyyah and Turkatenko, Arina and Ventayol-Boada, Albert and Yong, Zheng-Xin and Chung, Yu-An and Maillard, Jean and Moritz, Rashel and Mourachko, Alexandre and Williamson, Mary and Yates, Shireen},
year={2025},
url={https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/},
}- Developed by: Meta AI / Omnilingual ASR Team([GitHub][1])
- Model type: End-to-end automatic speech recognition model (wav2vec2-style encoder with CTC head / encoder-decoder, depending on checkpoint).
- Language(s) (NLP): 1,600+ languages overall in Omnilingual ASR; this corpus release specifically covers 348 under-served languages across many writing systems (Latin, Arabic, Devanagari, etc.).([GitHub][1])
- License: Apache-2.0 (for the model and code), CC-BY-4.0 for the
facebook/omnilingual-asr-corpusdataset.([GitHub][1])
[1]: https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file "GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages"
[2]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus/blob/main/README.md "README.md · facebook/omnilingual-asr-corpus at main"
[3]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus?utm_source=chatgpt.com "facebook/omnilingual-asr-corpus · Datasets at ..."
[4]: https://venturebeat.com/ai/meta-returns-to-open-source-ai-with-omnilingual-asr-models-that-can?utm_source=chatgpt.com "Meta returns to open source AI with Omnilingual ASR ..."
[5]: https://huggingface.co/spaces/facebook/omniasr-transcriptions?utm_source=chatgpt.com "Omnilingual ASR Media Transcription"
[6]: https://huggingface.co/collections/bezzam/omnilingual-asr-1-600-languages?utm_source=chatgpt.com "Omnilingual ASR (1600+ Languages) - a bezzam Collection"
https://huggingface.co/facebook/omniASR-CTC-3Bhttps://huggingface.co/facebook/omniASR-CTC-3BThu, 27 Nov 2025 15:47:08 GMT<title>facebook/omniASR-CTC-1B</title>facebook/omniASR-CTC-1B Automatic Speech Recognition • Updated 8 days ago • 1
license: apache-2.0
datasets:
- facebook/omnilingual-asr-corpus
pipeline_tag: automatic-speech-recognition
Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
<div align="center" style="lline-height: 1.2; font-size:16px; margin-bottom: 30px;">
<a href="https://huggingface.co/facebook" target="_blank" style="margin: 2px;">
🤗 Hugging Face
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr" target="_blank" style="margin: 2px;">
🐙 GitHub
</a> |
<a href="https://huggingface.co/spaces/facebook/omniasr-transcriptions" target="_blank" style="margin: 2px;">
🤖️ Demo
</a> |
<a href="https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/" target="_blank" style="margin: 2px;">
📃 Paper
</a> |
<a href="https://ai.meta.com/blog/omnilingual-asr-advancing-automatic-speech-recognition/" target="_blank" style="margin: 2px;">
📝 Blogpost
</a> |
<a href="https://github.com/faceb
</details>
...
http://localhost:1200/huggingface/models/ianyang02 - Success ✔️<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
<channel>
<title>Huggingface ianyang02 Models</title>
<link>https://huggingface.co/ianyang02/models?sort=created</link>
<atom:link href="http://localhost:1200/huggingface/models/ianyang02" rel="self" type="application/rss+xml"></atom:link>
<description>Huggingface ianyang02 Models - Powered by RSSHub</description>
<generator>RSSHub</generator>
<webMaster>contact@rsshub.app (RSSHub)</webMaster>
<language>en</language>
<lastBuildDate>Sat, 06 Dec 2025 11:14:56 GMT</lastBuildDate>
<ttl>5</ttl>
<item>
<title>ianyang02/aita_qwen3-30b</title>
<description>ianyang02/aita_qwen3-30b Updated about 13 hours ago</description>
<link>https://huggingface.co/ianyang02/aita_qwen3-30b</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/aita_qwen3-30b</guid>
<pubDate>Fri, 05 Dec 2025 14:17:52 GMT</pubDate>
</item>
<item>
<title>ianyang02/aita_qwen3-4b_dpo</title>
<description>ianyang02/aita_qwen3-4b_dpo Updated 15 days ago
---
base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: transformers
model_name: aita_qwen3-4b_dpo
tags:
- generated_from_trainer
- trl
- dpo
licence: license
---
# Model Card for aita_qwen3-4b_dpo
This model is a fine-tuned version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507).
It has been trained using [TRL](https://github.com/huggingface/trl).
## Quick start
```python
from transformers import pipeline
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="ianyang02/aita_qwen3-4b_dpo", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
```
## Training procedure
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights &amp; Biases" width="150" height="24" referrerpolicy="no-referrer">](https://wandb.ai/ianyang02-university-of-washington/huggingface/runs/4xnikck8)
This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
### Framework versions
- TRL: 0.24.0
- Transformers: 4.57.1
- Pytorch: 2.9.0
- Datasets: 4.3.0
- Tokenizers: 0.22.1
## Citations
Cite DPO as:
```bibtex
@inproceedings{rafailov2023direct,
title = {{Direct Preference Optimization: Your Language Model is Secretly a Reward Model}},
author = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Christopher D. Manning and Stefano Ermon and Chelsea Finn},
year = 2023,
booktitle = {Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023},
url = {http://papers.nips.cc/paper_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html},
editor = {Alice Oh and Tristan Naumann and Amir Globerson and Kate Saenko and Moritz Hardt and Sergey Levine},
}
```
Cite TRL as:
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
```</description>
<link>https://huggingface.co/ianyang02/aita_qwen3-4b_dpo</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/aita_qwen3-4b_dpo</guid>
<pubDate>Fri, 21 Nov 2025 15:09:21 GMT</pubDate>
</item>
<item>
<title>ianyang02/ppo_model_qwen3-4b_aita_h200_2</title>
<description>ianyang02/ppo_model_qwen3-4b_aita_h200_2 Text Generation • Updated 15 days ago • 66
---
base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: peft
model_name: ppo_model_qwen3-4b_aita_h200_2
tags:
- base_model:adapter:Qwen/Qwen3-4B-Instruct-2507
- lora
- transformers
licence: license
pipeline_tag: text-generation
---
# Model Card for ppo_model_qwen3-4b_aita_h200_2
This model is a fine-tuned version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507).
It has been trained using [TRL](https://github.com/huggingface/trl).
## Quick start
```python
from transformers import pipeline
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="ianyang02/ppo_model_qwen3-4b_aita_h200_2", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
```
## Training procedure
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights &amp; Biases" width="150" height="24" referrerpolicy="no-referrer">](https://wandb.ai/ianyang02-university-of-washington/huggingface/runs/x99g8ppd)
This model was trained with PPO, a method introduced in [Fine-Tuning Language Models from Human Preferences](https://huggingface.co/papers/1909.08593).
### Framework versions
- PEFT 0.18.0
- TRL: 0.25.1
- Transformers: 4.57.1
- Pytorch: 2.7.0.dev20250224+cu126
- Datasets: 4.4.1
- Tokenizers: 0.22.1
## Citations
Cite PPO as:
```bibtex
@article{mziegler2019fine-tuning,
title = {{Fine-Tuning Language Models from Human Preferences}},
author = {Daniel M. Ziegler and Nisan Stiennon and Jeffrey Wu and Tom B. Brown and Alec Radford and Dario Amodei and Paul F. Christiano and Geoffrey Irving},
year = 2019,
eprint = {arXiv:1909.08593}
}
```
Cite TRL as:
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
```</description>
<link>https://huggingface.co/ianyang02/ppo_model_qwen3-4b_aita_h200_2</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/ppo_model_qwen3-4b_aita_h200_2</guid>
<pubDate>Fri, 21 Nov 2025 08:19:14 GMT</pubDate>
</item>
<item>
<title>ianyang02/aita_qwen3-4b_2</title>
<description>ianyang02/aita_qwen3-4b_2 Text Classification • 4B • Updated 16 days ago • 23
---
base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: transformers
model_name: aita_qwen3-4b_2
tags:
- generated_from_trainer
- trl
- reward-trainer
licence: license
---
# Model Card for aita_qwen3-4b_2
This model is a fine-tuned version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507).
It has been trained using [TRL](https://github.com/huggingface/trl).
## Quick start
```python
from transformers import pipeline
text = "The capital of France is Paris."
rewarder = pipeline(model="ianyang02/aita_qwen3-4b_2", device="cuda")
output = rewarder(text)[0]
print(output["score"])
```
## Training procedure
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights &amp; Biases" width="150" height="24" referrerpolicy="no-referrer">](https://wandb.ai/ianyang02-university-of-washington/huggingface/runs/lacazue4)
This model was trained with Reward.
### Framework versions
- TRL: 0.25.1
- Transformers: 4.57.1
- Pytorch: 2.5.0.dev20240818+cu124
- Datasets: 4.4.1
- Tokenizers: 0.22.1
## Citations
Cite TRL as:
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
```</description>
<link>https://huggingface.co/ianyang02/aita_qwen3-4b_2</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/aita_qwen3-4b_2</guid>
<pubDate>Wed, 19 Nov 2025 23:02:33 GMT</pubDate>
</item>
<item>
<title>ianyang02/aita_qwen3-4b_2_checkpoint_2</title>
<description>ianyang02/aita_qwen3-4b_2_checkpoint_2 Text Classification • 4B • Updated 16 days ago • 25
---
base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: transformers
model_name: aita_qwen3-4b_2
tags:
- generated_from_trainer
- trl
- reward-trainer
licence: license
---
# Model Card for aita_qwen3-4b_2
This model is a fine-tuned version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507).
It has been trained using [TRL](https://github.com/huggingface/trl).
## Quick start
```python
from transformers import pipeline
text = "The capital of France is Paris."
rewarder = pipeline(model="ianyang02/aita_qwen3-4b_2", device="cuda")
output = rewarder(text)[0]
print(output["score"])
```
## Training procedure
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights &amp; Biases" width="150" height="24" referrerpolicy="no-referrer">](https://wandb.ai/ianyang02-university-of-washington/huggingface/runs/lacazue4)
This model was trained with Reward.
### Framework versions
- TRL: 0.25.1
- Transformers: 4.57.1
- Pytorch: 2.5.0.dev20240818+cu124
- Datasets: 4.4.1
- Tokenizers: 0.22.1
## Citations
Cite TRL as:
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
```</description>
<link>https://huggingface.co/ianyang02/aita_qwen3-4b_2_checkpoint_2</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/aita_qwen3-4b_2_checkpoint_2</guid>
<pubDate>Wed, 19 Nov 2025 19:52:00 GMT</pubDate>
</item>
<item>
<title>ianyang02/aita_qwen3-4b_2_checkpoint_1</title>
<description>ianyang02/aita_qwen3-4b_2_checkpoint_1 Text Classification • 4B • Updated 16 days ago • 4
---
base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: transformers
model_name: aita_qwen3-4b_2
tags:
- generated_from_trainer
- trl
- reward-trainer
licence: license
---
# Model Card for aita_qwen3-4b_2
This model is a fine-tuned version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507).
It has been trained using [TRL](https://github.com/huggingface/trl).
## Quick start
```python
from transformers import pipeline
text = "The capital of France is Paris."
rewarder = pipeline(model="ianyang02/aita_qwen3-4b_2", device="cuda")
output = rewarder(text)[0]
print(output["score"])
```
## Training procedure
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights &amp; Biases" width="150" height="24" referrerpolicy="no-referrer">](https://wandb.ai/ianyang02-university-of-washington/huggingface/runs/lacazue4)
This model was trained with Reward.
### Framework versions
- TRL: 0.25.1
- Transformers: 4.57.1
- Pytorch: 2.5.0.dev20240818+cu124
- Datasets: 4.4.1
- Tokenizers: 0.22.1
## Citations
Cite TRL as:
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
```</description>
<link>https://huggingface.co/ianyang02/aita_qwen3-4b_2_checkpoint_1</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/aita_qwen3-4b_2_checkpoint_1</guid>
<pubDate>Wed, 19 Nov 2025 16:40:34 GMT</pubDate>
</item>
<item>
<title>ianyang02/ppo_model_qwen3-4b_aita_h200</title>
<description>ianyang02/ppo_model_qwen3-4b_aita_h200 Updated 17 days ago
---
base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: transformers
model_name: ppo_model_qwen3-4b_aita_h200
tags:
- generated_from_trainer
- trl
- ppo
licence: license
---
# Model Card for ppo_model_qwen3-4b_aita_h200
This model is a fine-tuned version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507).
It has been trained using [TRL](https://github.com/huggingface/trl).
## Quick start
```python
from transformers import pipeline
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="ianyang02/ppo_model_qwen3-4b_aita_h200", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
```
## Training procedure
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights &amp; Biases" width="150" height="24" referrerpolicy="no-referrer">](https://wandb.ai/ianyang02-university-of-washington/huggingface/runs/ht40cr4d)
This model was trained with PPO, a method introduced in [Fine-Tuning Language Models from Human Preferences](https://huggingface.co/papers/1909.08593).
### Framework versions
- TRL: 0.24.0
- Transformers: 4.57.1
- Pytorch: 2.9.0
- Datasets: 4.3.0
- Tokenizers: 0.22.1
## Citations
Cite PPO as:
```bibtex
@article{mziegler2019fine-tuning,
title = {{Fine-Tuning Language Models from Human Preferences}},
author = {Daniel M. Ziegler and Nisan Stiennon and Jeffrey Wu and Tom B. Brown and Alec Radford and Dario Amodei and Paul F. Christiano and Geoffrey Irving},
year = 2019,
eprint = {arXiv:1909.08593}
}
```
Cite TRL as:
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
```</description>
<link>https://huggingface.co/ianyang02/ppo_model_qwen3-4b_aita_h200</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/ppo_model_qwen3-4b_aita_h200</guid>
<pubDate>Tue, 18 Nov 2025 16:21:51 GMT</pubDate>
</item>
<item>
<title>ianyang02/aita_qwen3-4b</title>
<description>ianyang02/aita_qwen3-4b Text Classification • 4B • Updated 20 days ago • 93
---
base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: transformers
model_name: aita_qwen3-4b
tags:
- generated_from_trainer
- trl
- reward-trainer
licence: license
---
# Model Card for aita_qwen3-4b
This model is a fine-tuned version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507).
It has been trained using [TRL](https://github.com/huggingface/trl).
## Quick start
```python
from transformers import pipeline
text = "The capital of France is Paris."
rewarder = pipeline(model="ianyang02/aita_qwen3-4b", device="cuda")
output = rewarder(text)[0]
print(output["score"])
```
## Training procedure
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights &amp; Biases" width="150" height="24" referrerpolicy="no-referrer">](https://wandb.ai/ianyang02-university-of-washington/huggingface/runs/ofxhxi45)
This model was trained with Reward.
### Framework versions
- TRL: 0.24.0
- Transformers: 4.57.1
- Pytorch: 2.9.0
- Datasets: 4.3.0
- Tokenizers: 0.22.1
## Citations
Cite TRL as:
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
```</description>
<link>https://huggingface.co/ianyang02/aita_qwen3-4b</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/aita_qwen3-4b</guid>
<pubDate>Sun, 16 Nov 2025 01:32:06 GMT</pubDate>
</item>
<item>
<title>ianyang02/aita_fine_tuned_Qwen3_0.6B</title>
<description>ianyang02/aita_fine_tuned_Qwen3_0.6B Updated 24 days ago</description>
<link>https://huggingface.co/ianyang02/aita_fine_tuned_Qwen3_0.6B</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/aita_fine_tuned_Qwen3_0.6B</guid>
<pubDate>Tue, 11 Nov 2025 16:18:49 GMT</pubDate>
</item>
<item>
<title>ianyang02/aita_Qwen3-0.6B</title>
<description>ianyang02/aita_Qwen3-0.6B Text Classification • 0.6B • Updated 28 days ago • 50
---
base_model: Qwen/Qwen3-0.6B
library_name: transformers
model_name: aita_Qwen3-0.6B
tags:
- generated_from_trainer
- reward-trainer
- trl
licence: license
---
# Model Card for aita_Qwen3-0.6B
This model is a fine-tuned version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B).
It has been trained using [TRL](https://github.com/huggingface/trl).
## Quick start
```python
from transformers import pipeline
text = "The capital of France is Paris."
rewarder = pipeline(model="ianyang02/aita_Qwen3-0.6B", device="cuda")
output = rewarder(text)[0]
print(output["score"])
```
## Training procedure
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights &amp; Biases" width="150" height="24" referrerpolicy="no-referrer">](https://wandb.ai/ianyang02-university-of-washington/huggingface/runs/7xwpy737)
This model was trained with Reward.
### Framework versions
- TRL: 0.24.0
- Transformers: 4.57.1
- Pytorch: 2.9.0
- Datasets: 4.3.0
- Tokenizers: 0.22.1
## Citations
Cite TRL as:
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
```</description>
<link>https://huggingface.co/ianyang02/aita_Qwen3-0.6B</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/aita_Qwen3-0.6B</guid>
<pubDate>Sat, 08 Nov 2025 07:48:49 GMT</pubDate>
</item>
</channel>
</rss> |
…proper state !!!!!
|
Successfully generated as following: http://localhost:1200/huggingface/models/deepseek-ai - Success ✔️<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
<channel>
<title>Huggingface deepseek-ai Models</title>
<link>https://huggingface.co/deepseek-ai/models?sort=created</link>
<atom:link href="http://localhost:1200/huggingface/models/deepseek-ai" rel="self" type="application/rss+xml"></atom:link>
<description>Huggingface deepseek-ai Models - Powered by RSSHub</description>
<generator>RSSHub</generator>
<webMaster>contact@rsshub.app (RSSHub)</webMaster>
<language>en</language>
<lastBuildDate>Sat, 06 Dec 2025 11:25:05 GMT</lastBuildDate>
<ttl>5</ttl>
<item>
<title>deepseek-ai/DeepSeek-V3.2-Speciale</title>
<description>deepseek-ai/DeepSeek-V3.2-Speciale Text Generation • 685B • Updated 5 days ago • 4.72k • 517
---
license: mit
library_name: transformers
base_model:
- deepseek-ai/DeepSeek-V3.2-Exp-Base
base_model_relation: finetune
---
# DeepSeek-V3.2: Efficient Reasoning &amp; Agentic AI
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" referrerpolicy="no-referrer">
</div>
<hr>
<div align="center" style="line-height: 1;">
<a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Chat" src="https://img.shields.io/badge/%F0%9F%A4%96%20Chat-DeepSeek%20V3-536af5?color=536af5&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
<img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&amp;logoColor=white&amp;color=7289da" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
<img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
<img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://huggingface.co/deepseek-ai/LICENSE" style="margin: 2px;">
<img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&amp;color=f5de53" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<p align="center">
<a href="https://huggingface.co/deepseek-ai/DeepSeek-V3.2/blob/main/assets/paper.pdf"><b>Technical Report</b>👁️</a>
</p>
## Introduction
We introduce **DeepSeek-V3.2**, a model that harmonizes high computational efficiency with superior reasoning and agent performance. Our approach is built upon three key technical breakthroughs:
1. **DeepSeek Sparse Attention (DSA):** We introduce DSA, an efficient attention mechanism that substantially reduces computational complexity while preserving model performance, specifically optimized for long-context scenarios.
2. **Scalable Reinforcement Learning Framework:** By implementing a robust RL protocol and scaling post-training compute, *DeepSeek-V3.2* performs comparably to GPT-5. Notably, our high-compute variant, **DeepSeek-V3.2-Speciale**, **surpasses GPT-5** and exhibits reasoning proficiency on par with Gemini-3.0-Pro.
- *Achievement:* 🥇 **Gold-medal performance** in the 2025 International Mathematical Olympiad (IMO) and International Olympiad in Informatics (IOI).
3. **Large-Scale Agentic Task Synthesis Pipeline:** To integrate **reasoning into tool-use** scenarios, we developed a novel synthesis pipeline that systematically generates training data at scale. This facilitates scalable agentic post-training, improving compliance and generalization in complex interactive environments.
<div align="center">
<img src="https://huggingface.co/deepseek-ai/assets/benchmark.png" referrerpolicy="no-referrer">
</div>
We have also released the final submissions for IOI 2025, ICPC World Finals, IMO 2025 and CMO 2025, which were selected based on our designed pipeline. These materials are provided for the community to conduct secondary verification. The files can be accessed at `assets/olympiad_cases`.
## Chat Template
DeepSeek-V3.2 introduces significant updates to its chat template compared to prior versions. The primary changes involve a revised format for tool calling and the introduction of a "thinking with tools" capability.
To assist the community in understanding and adapting to this new template, we have provided a dedicated `encoding` folder, which contains Python scripts and test cases demonstrating how to encode messages in OpenAI-compatible format into input strings for the model and how to parse the model's text output.
A brief example is illustrated below:
```python
import transformers
# encoding/encoding_dsv32.py
from encoding_dsv32 import encode_messages, parse_message_from_completion_text
tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V3.2")
messages = [
{"role": "user", "content": "hello"},
{"role": "assistant", "content": "Hello! I am DeepSeek.", "reasoning_content": "thinking..."},
{"role": "user", "content": "1+1=?"}
]
encode_config = dict(thinking_mode="thinking", drop_thinking=True, add_default_bos_token=True)
# messages -&gt; string
prompt = encode_messages(messages, **encode_config)
# Output: "&lt;|begin▁of▁sentence|&gt;&lt;|User|&gt;hello&lt;|Assistant|&gt;Hello! I am DeepSeek.&lt;|end▁of▁sentence|&gt;&lt;|User|&gt;1+1=?&lt;|Assistant|&gt;<think>"
# string -&gt; tokens
tokens = tokenizer.encode(prompt)
# Output: [0, 128803, 33310, 128804, 128799, 19923, 3, 342, 1030, 22651, 4374, 1465, 16, 1, 128803, 19, 13, 19, 127252, 128804, 128798]
```
Important Notes:
1. This release does not include a Jinja-format chat template. Please refer to the Python code mentioned above.
2. The output parsing function included in the code is designed to handle well-formatted strings only. It does not attempt to correct or recover from malformed output that the model might occasionally generate. It is not suitable for production use without robust error handling.
3. A new role named `developer` has been introduced in the chat template. This role is dedicated exclusively to search agent scenarios and is designated for no other tasks. The official API does not accept messages assigned to `developer`.
## How to Run Locally
The model structure of DeepSeek-V3.2 and DeepSeek-V3.2-Speciale are the same as DeepSeek-V3.2-Exp. Please visit [DeepSeek-V3.2-Exp](https://github.com/deepseek-ai/DeepSeek-V3.2-Exp) repo for more information about running this model locally.
Usage Recommendations:
1. For local deployment, we recommend setting the sampling parameters to `temperature = 1.0, top_p = 0.95`.
2. Please note that the DeepSeek-V3.2-Speciale variant is designed exclusively for deep reasoning tasks and does not support the tool-calling functionality.
## License
This repository and the model weights are licensed under the [MIT License](LICENSE).
## Citation
```
@misc{deepseekai2025deepseekv32,
title={DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models},
author={DeepSeek-AI},
year={2025},
}
```
## Contact
If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com).
</think></description>
<link>https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale</link>
<guid isPermaLink="false">https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale</guid>
<pubDate>Mon, 01 Dec 2025 03:06:03 GMT</pubDate>
</item>
<item>
<title>deepseek-ai/DeepSeek-V3.2</title>
<description>deepseek-ai/DeepSeek-V3.2 Text Generation • 685B • Updated 5 days ago • 18.1k • • 738
---
license: mit
library_name: transformers
base_model:
- deepseek-ai/DeepSeek-V3.2-Exp-Base
base_model_relation: finetune
---
# DeepSeek-V3.2: Efficient Reasoning &amp; Agentic AI
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" referrerpolicy="no-referrer">
</div>
<hr>
<div align="center" style="line-height: 1;">
<a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Chat" src="https://img.shields.io/badge/%F0%9F%A4%96%20Chat-DeepSeek%20V3-536af5?color=536af5&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
<img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&amp;logoColor=white&amp;color=7289da" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
<img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
<img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://huggingface.co/deepseek-ai/LICENSE" style="margin: 2px;">
<img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&amp;color=f5de53" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<p align="center">
<a href="https://huggingface.co/deepseek-ai/DeepSeek-V3.2/blob/main/assets/paper.pdf"><b>Technical Report</b>👁️</a>
</p>
## Introduction
We introduce **DeepSeek-V3.2**, a model that harmonizes high computational efficiency with superior reasoning and agent performance. Our approach is built upon three key technical breakthroughs:
1. **DeepSeek Sparse Attention (DSA):** We introduce DSA, an efficient attention mechanism that substantially reduces computational complexity while preserving model performance, specifically optimized for long-context scenarios.
2. **Scalable Reinforcement Learning Framework:** By implementing a robust RL protocol and scaling post-training compute, *DeepSeek-V3.2* performs comparably to GPT-5. Notably, our high-compute variant, **DeepSeek-V3.2-Speciale**, **surpasses GPT-5** and exhibits reasoning proficiency on par with Gemini-3.0-Pro.
- *Achievement:* 🥇 **Gold-medal performance** in the 2025 International Mathematical Olympiad (IMO) and International Olympiad in Informatics (IOI).
3. **Large-Scale Agentic Task Synthesis Pipeline:** To integrate **reasoning into tool-use** scenarios, we developed a novel synthesis pipeline that systematically generates training data at scale. This facilitates scalable agentic post-training, improving compliance and generalization in complex interactive environments.
<div align="center">
<img src="https://huggingface.co/deepseek-ai/assets/benchmark.png" referrerpolicy="no-referrer">
</div>
We have also released the final submissions for IOI 2025, ICPC World Finals, IMO 2025 and CMO 2025, which were selected based on our designed pipeline. These materials are provided for the community to conduct secondary verification. The files can be accessed at `assets/olympiad_cases`.
## Chat Template
DeepSeek-V3.2 introduces significant updates to its chat template compared to prior versions. The primary changes involve a revised format for tool calling and the introduction of a "thinking with tools" capability.
To assist the community in understanding and adapting to this new template, we have provided a dedicated `encoding` folder, which contains Python scripts and test cases demonstrating how to encode messages in OpenAI-compatible format into input strings for the model and how to parse the model's text output.
A brief example is illustrated below:
```python
import transformers
# encoding/encoding_dsv32.py
from encoding_dsv32 import encode_messages, parse_message_from_completion_text
tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V3.2")
messages = [
{"role": "user", "content": "hello"},
{"role": "assistant", "content": "Hello! I am DeepSeek.", "reasoning_content": "thinking..."},
{"role": "user", "content": "1+1=?"}
]
encode_config = dict(thinking_mode="thinking", drop_thinking=True, add_default_bos_token=True)
# messages -&gt; string
prompt = encode_messages(messages, **encode_config)
# Output: "&lt;|begin▁of▁sentence|&gt;&lt;|User|&gt;hello&lt;|Assistant|&gt;Hello! I am DeepSeek.&lt;|end▁of▁sentence|&gt;&lt;|User|&gt;1+1=?&lt;|Assistant|&gt;<think>"
# string -&gt; tokens
tokens = tokenizer.encode(prompt)
# Output: [0, 128803, 33310, 128804, 128799, 19923, 3, 342, 1030, 22651, 4374, 1465, 16, 1, 128803, 19, 13, 19, 127252, 128804, 128798]
```
Important Notes:
1. This release does not include a Jinja-format chat template. Please refer to the Python code mentioned above.
2. The output parsing function included in the code is designed to handle well-formatted strings only. It does not attempt to correct or recover from malformed output that the model might occasionally generate. It is not suitable for production use without robust error handling.
3. A new role named `developer` has been introduced in the chat template. This role is dedicated exclusively to search agent scenarios and is designated for no other tasks. The official API does not accept messages assigned to `developer`.
## How to Run Locally
The model structure of DeepSeek-V3.2 and DeepSeek-V3.2-Speciale are the same as DeepSeek-V3.2-Exp. Please visit [DeepSeek-V3.2-Exp](https://github.com/deepseek-ai/DeepSeek-V3.2-Exp) repo for more information about running this model locally.
Usage Recommendations:
1. For local deployment, we recommend setting the sampling parameters to `temperature = 1.0, top_p = 0.95`.
2. Please note that the DeepSeek-V3.2-Speciale variant is designed exclusively for deep reasoning tasks and does not support the tool-calling functionality.
## License
This repository and the model weights are licensed under the [MIT License](LICENSE).
## Citation
```
@misc{deepseekai2025deepseekv32,
title={DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models},
author={DeepSeek-AI},
year={2025},
}
```
## Contact
If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com).
</think></description>
<link>https://huggingface.co/deepseek-ai/DeepSeek-V3.2</link>
<guid isPermaLink="false">https://huggingface.co/deepseek-ai/DeepSeek-V3.2</guid>
<pubDate>Mon, 01 Dec 2025 03:04:59 GMT</pubDate>
</item>
<item>
<title>deepseek-ai/DeepSeek-Math-V2</title>
<description>deepseek-ai/DeepSeek-Math-V2 Text Generation • 685B • Updated 9 days ago • 8.96k • 637
---
license: apache-2.0
library_name: transformers
base_model:
- deepseek-ai/DeepSeek-Math-V2
---
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" referrerpolicy="no-referrer">
</div>
<hr>
<div align="center" style="line-height: 1;">
<a href="https://www.deepseek.com/"><img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" referrerpolicy="no-referrer"></a>
<a href="https://chat.deepseek.com/"><img alt="Chat" src="https://img.shields.io/badge/%F0%9F%A4%96%20Chat-DeepSeek%20V3-536af5?color=536af5&amp;logoColor=white" referrerpolicy="no-referrer"></a>
<a href="https://huggingface.co/deepseek-ai"><img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&amp;logoColor=white" referrerpolicy="no-referrer"></a>
<br>
<a href="https://discord.gg/Tc7c45Zzu5"><img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&amp;logoColor=white&amp;color=7289da" referrerpolicy="no-referrer"></a>
<a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true"><img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&amp;logoColor=white" referrerpolicy="no-referrer"></a>
<a href="https://twitter.com/deepseek_ai"><img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&amp;logoColor=white" referrerpolicy="no-referrer"></a>
<br>
<a href="https://huggingface.co/deepseek-ai/LICENSE" style="margin: 2px;">
<img alt="License" src="https://img.shields.io/badge/License-Apache%202.0-f5de53?&amp;color=f5de53" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<br>
</div>
# DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
## 1. Introduction
Large language models have made significant progress in mathematical reasoning, which serves as an important testbed for AI and could impact scientific research if further advanced.
By scaling reasoning with reinforcement learning that rewards correct final answers, LLMs have improved from poor performance to saturating quantitative reasoning competitions like AIME and HMMT in one year.
However, this approach faces fundamental limitations.
Pursuing higher final answer accuracy doesn't address a key issue: correct answers don't guarantee correct reasoning.
Moreover, many mathematical tasks like theorem proving require rigorous step-by-step derivation rather than numerical answers, making final answer rewards inapplicable.
To push the limits of deep reasoning, we believe it is necessary to verify the comprehensiveness and rigor of mathematical reasoning.
Self-verification is particularly important for scaling test-time compute, especially for open problems without known solutions.
Towards self-verifiable mathematical reasoning, we investigate how to train an accurate and faithful LLM-based verifier for theorem proving.
We then train a proof generator using the verifier as the reward model, and incentivize the generator to identify and resolve as many issues as possible in their own proofs before finalizing them.
To maintain the generation-verification gap as the generator becomes stronger, we propose to scale verification compute to automatically label new hard-to-verify proofs, creating training data to further improve the verifier.
Our resulting model, DeepSeekMath-V2, demonstrates strong theorem-proving capabilities, achieving gold-level scores on IMO 2025 and CMO 2024 and a near-perfect 118/120 on Putnam 2024 with scaled test-time compute.
While much work remains, these results suggest that self-verifiable mathematical reasoning is a feasible research direction that may help develop more capable mathematical AI systems.
## 2. Evaluation Results
Below are evaluation results on [IMO-ProofBench](https://github.com/google-deepmind/superhuman/tree/main/imobench) (developed by the DeepMind team behind DeepThink IMO-Gold) and recent mathematics competitions including IMO 2025, CMO 2024, and Putnam 2024.
**IMO-ProofBench**
<p align="center">
<img width="100%" src="https://raw.githubusercontent.com/deepseek-ai/DeepSeek-Math-V2/refs/heads/main/figures/IMO-ProofBench.png" referrerpolicy="no-referrer">
</p>
---
**Mathematics Competitions**
<p align="center">
<img width="41%&quot;" src="https://raw.githubusercontent.com/deepseek-ai/DeepSeek-Math-V2/refs/heads/main/figures/Competitions.png" referrerpolicy="no-referrer">
</p>
## 4. Quick Start
DeepSeekMath-V2 is built on top of DeepSeek-V3.2-Exp-Base.
For inference support, please refer to [the DeepSeek-V3.2-Exp github repository](https://github.com/deepseek-ai/DeepSeek-V3.2-Exp).
## 6. License
This repository and the model weights are licensed under [the Apache License, Version 2.0 (Apache 2.0)](LICENSE).
## 7. Citation
```
@misc{deepseek-math-v2,
author = {Zhihong Shao, Yuxiang Luo, Chengda Lu, Z.Z. Ren, Jiewen Hu, Tian Ye, Zhibin Gou, Shirong Ma, Xiaokang Zhang},
title = {DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning},
year = {2025},
}
```
## 8. Contact
If you have any questions, please raise an issue or contact us at [service@deepseek.com](mailto:service@deepseek.com).
</description>
<link>https://huggingface.co/deepseek-ai/DeepSeek-Math-V2</link>
<guid isPermaLink="false">https://huggingface.co/deepseek-ai/DeepSeek-Math-V2</guid>
<pubDate>Thu, 27 Nov 2025 02:35:52 GMT</pubDate>
</item>
<item>
<title>deepseek-ai/DeepSeek-V3.2-Exp</title>
<description>deepseek-ai/DeepSeek-V3.2-Exp Text Generation • 685B • Updated 18 days ago • 66.2k • • 899
---
license: mit
library_name: transformers
base_model:
- deepseek-ai/DeepSeek-V3.2-Exp-Base
base_model_relation: finetune
---
# DeepSeek-V3.2-Exp
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" referrerpolicy="no-referrer">
</div>
<hr>
<div align="center" style="line-height: 1;">
<a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Chat" src="https://img.shields.io/badge/%F0%9F%A4%96%20Chat-DeepSeek%20V3-536af5?color=536af5&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
<img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&amp;logoColor=white&amp;color=7289da" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
<img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
<img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://huggingface.co/deepseek-ai/LICENSE" style="margin: 2px;">
<img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&amp;color=f5de53" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
## Introduction
We are excited to announce the official release of DeepSeek-V3.2-Exp, an experimental version of our model. As an intermediate step toward our next-generation architecture, V3.2-Exp builds upon V3.1-Terminus by introducing DeepSeek Sparse Attention—a sparse attention mechanism designed to explore and validate optimizations for training and inference efficiency in long-context scenarios.
This experimental release represents our ongoing research into more efficient transformer architectures, particularly focusing on improving computational efficiency when processing extended text sequences.
<div align="center">
<img src="https://huggingface.co/deepseek-ai/assets/cost.png" referrerpolicy="no-referrer">
</div>
- DeepSeek Sparse Attention (DSA) achieves fine-grained sparse attention for the first time, delivering substantial improvements in long-context training and inference efficiency while maintaining virtually identical model output quality.
- To rigorously evaluate the impact of introducing sparse attention, we deliberately aligned the training configurations of DeepSeek-V3.2-Exp with V3.1-Terminus. Across public benchmarks in various domains, DeepSeek-V3.2-Exp demonstrates performance on par with V3.1-Terminus.
| Benchmark | DeepSeek-V3.1-Terminus | DeepSeek-V3.2-Exp |
| :--- | :---: | :---: |
| **Reasoning Mode w/o Tool Use** | | |
| MMLU-Pro | 85.0 | 85.0 |
| GPQA-Diamond | 80.7 | 79.9 |
| Humanity's Last Exam | 21.7 | 19.8 |
| LiveCodeBench | 74.9 | 74.1 |
| AIME 2025 | 88.4 | 89.3 |
| HMMT 2025 | 86.1 | 83.6 |
| Codeforces | 2046 | 2121 |
| Aider-Polyglot | 76.1 | 74.5 |
| **Agentic Tool Use** | | |
| BrowseComp | 38.5 | 40.1 |
| BrowseComp-zh | 45.0 | 47.9 |
| SimpleQA | 96.8 | 97.1 |
| SWE Verified | 68.4 | 67.8 |
| SWE-bench Multilingual | 57.8 | 57.9 |
| Terminal-bench | 36.7 | 37.7 |
## Update
- 2025.11.17: **We have identified that previous versions of the inference demo code contained an implementation discrepancy in Rotary Position Embedding (RoPE) within the indexer module, potentially leading to degraded model performance.** Specifically, the input tensor to RoPE in the indexer module requires a non-interleaved layout, whereas RoPE in the MLA module expects an interleaved layout. This issue has now been resolved. Please refer to the updated version of the inference demo code and take note of this implementation detail.
## How to Run Locally
### HuggingFace
We provide an updated inference demo code in the [inference](https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp/tree/main/inference) folder to help the community quickly get started with our model and understand its architectural details.
First convert huggingface model weights to the the format required by our inference demo. Set `MP` to match your available GPU count:
```bash
cd inference
export EXPERTS=256
python convert.py --hf-ckpt-path ${HF_CKPT_PATH} --save-path ${SAVE_PATH} --n-experts ${EXPERTS} --model-parallel ${MP}
```
Launch the interactive chat interface and start exploring DeepSeek's capabilities:
```bash
export CONFIG=config_671B_v3.2.json
torchrun --nproc-per-node ${MP} generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --interactive
```
### SGLang
#### Installation with Docker
```
# H200
docker pull lmsysorg/sglang:dsv32
# MI350
docker pull lmsysorg/sglang:dsv32-rocm
# NPUs
docker pull lmsysorg/sglang:dsv32-a2
docker pull lmsysorg/sglang:dsv32-a3
```
#### Launch Command
```bash
python -m sglang.launch_server --model deepseek-ai/DeepSeek-V3.2-Exp --tp 8 --dp 8 --enable-dp-attention
```
### vLLM
vLLM provides day-0 support of DeepSeek-V3.2-Exp. See the [recipes](https://docs.vllm.ai/projects/recipes/en/latest/DeepSeek/DeepSeek-V3_2-Exp.html) for up-to-date details.
## Open-Source Kernels
For TileLang kernels with **better readability and research-purpose design**, please refer to [TileLang](https://github.com/tile-ai/tilelang/tree/main/examples/deepseek_v32).
For **high-performance CUDA kernels**, indexer logit kernels (including paged versions) are available in [DeepGEMM](https://github.com/deepseek-ai/DeepGEMM/pull/200). Sparse attention kernels are released in [FlashMLA](https://github.com/deepseek-ai/FlashMLA/pull/98).
## License
This repository and the model weights are licensed under the [MIT License](LICENSE).
## Citation
```
@misc{deepseekai2024deepseekv32,
title={DeepSeek-V3.2-Exp: Boosting Long-Context Efficiency with DeepSeek Sparse Attention},
author={DeepSeek-AI},
year={2025},
}
```
## Contact
If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com).</description>
<link>https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp</link>
<guid isPermaLink="false">https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp</guid>
<pubDate>Mon, 17 Nov 2025 18:39:54 GMT</pubDate>
</item>
<item>
<title>deepseek-ai/DeepSeek-OCR</title>
<description>deepseek-ai/DeepSeek-OCR Image-Text-to-Text • 3B • Updated Nov 4 • 5.47M • 2.93k
---
pipeline_tag: image-text-to-text
language:
- multilingual
tags:
- deepseek
- vision-language
- ocr
- custom_code
license: mit
library_name: transformers
---
<div align="center">
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek AI" referrerpolicy="no-referrer">
</div>
<hr>
<div align="center">
<a href="https://www.deepseek.com/" target="_blank">
<img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" referrerpolicy="no-referrer">
</a>
<a href="https://huggingface.co/deepseek-ai/DeepSeek-OCR" target="_blank">
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&amp;logoColor=white" referrerpolicy="no-referrer">
</a>
</div>
<div align="center">
<a href="https://discord.gg/Tc7c45Zzu5" target="_blank">
<img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&amp;logoColor=white&amp;color=7289da" referrerpolicy="no-referrer">
</a>
<a href="https://twitter.com/deepseek_ai" target="_blank">
<img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&amp;logoColor=white" referrerpolicy="no-referrer">
</a>
</div>
<p align="center">
<a href="https://github.com/deepseek-ai/DeepSeek-OCR"><b>🌟 Github</b></a> |
<a href="https://huggingface.co/deepseek-ai/DeepSeek-OCR"><b>📥 Model Download</b></a> |
<a href="https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSeek_OCR_paper.pdf"><b>📄 Paper Link</b></a> |
<a href="https://arxiv.org/abs/2510.18234"><b>📄 Arxiv Paper Link</b></a> |
</p>
<h2>
<p align="center">
<a href="https://huggingface.co/papers/2510.18234">DeepSeek-OCR: Contexts Optical Compression</a>
</p>
</h2>
<p align="center">
<img src="https://huggingface.co/deepseek-ai/assets/fig1.png" style="width: 1000px" align="center" referrerpolicy="no-referrer">
</p>
<p align="center">
<a href="https://huggingface.co/papers/2510.18234">Explore the boundaries of visual-text compression.</a>
</p>
## Usage
Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.12.9 + CUDA11.8:
```
torch==2.6.0
transformers==4.46.3
tokenizers==0.20.3
einops
addict
easydict
pip install flash-attn==2.7.3 --no-build-isolation
```
```python
from transformers import AutoModel, AutoTokenizer
import torch
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '0'
model_name = 'deepseek-ai/DeepSeek-OCR'
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, _attn_implementation='flash_attention_2', trust_remote_code=True, use_safetensors=True)
model = model.eval().cuda().to(torch.bfloat16)
# prompt = "<img referrerpolicy="no-referrer">\nFree OCR. "
prompt = "<img referrerpolicy="no-referrer">\n&lt;|grounding|&gt;Convert the document to markdown. "
image_file = 'your_image.jpg'
output_path = 'your/output/dir'
# infer(self, tokenizer, prompt='', image_file='', output_path = ' ', base_size = 1024, image_size = 640, crop_mode = True, test_compress = False, save_results = False):
# Tiny: base_size = 512, image_size = 512, crop_mode = False
# Small: base_size = 640, image_size = 640, crop_mode = False
# Base: base_size = 1024, image_size = 1024, crop_mode = False
# Large: base_size = 1280, image_size = 1280, crop_mode = False
# Gundam: base_size = 1024, image_size = 640, crop_mode = True
res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = True)
```
## vLLM
Refer to [🌟GitHub](https://github.com/deepseek-ai/DeepSeek-OCR/) for guidance on model inference acceleration and PDF processing, etc.<!-- -->
[2025/10/23] 🚀🚀🚀 DeepSeek-OCR is now officially supported in upstream [vLLM](https://docs.vllm.ai/projects/recipes/en/latest/DeepSeek/DeepSeek-OCR.html#installing-vllm).
```shell
uv venv
source .venv/bin/activate
# Until v0.11.1 release, you need to install vLLM from nightly build
uv pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
```
```python
from vllm import LLM, SamplingParams
from vllm.model_executor.models.deepseek_ocr import NGramPerReqLogitsProcessor
from PIL import Image
# Create model instance
llm = LLM(
model="deepseek-ai/DeepSeek-OCR",
enable_prefix_caching=False,
mm_processor_cache_gb=0,
logits_processors=[NGramPerReqLogitsProcessor]
)
# Prepare batched input with your image file
image_1 = Image.open("path/to/your/image_1.png").convert("RGB")
image_2 = Image.open("path/to/your/image_2.png").convert("RGB")
prompt = "<img referrerpolicy="no-referrer">\nFree OCR."
model_input = [
{
"prompt": prompt,
"multi_modal_data": {"image": image_1}
},
{
"prompt": prompt,
"multi_modal_data": {"image": image_2}
}
]
sampling_param = SamplingParams(
temperature=0.0,
max_tokens=8192,
# ngram logit processor args
extra_args=dict(
ngram_size=30,
window_size=90,
whitelist_token_ids={128821, 128822}, # whitelist: ,
),
skip_special_tokens=False,
)
# Generate output
model_outputs = llm.generate(model_input, sampling_param)
# Print output
for output in model_outputs:
print(output.outputs[0].text)
```
## Visualizations
<table>
<tbody><tr>
<td><img src="https://huggingface.co/deepseek-ai/assets/show1.jpg" style="width: 500px" referrerpolicy="no-referrer"></td>
<td><img src="https://huggingface.co/deepseek-ai/assets/show2.jpg" style="width: 500px" referrerpolicy="no-referrer"></td>
</tr>
<tr>
<td><img src="https://huggingface.co/deepseek-ai/assets/show3.jpg" style="width: 500px" referrerpolicy="no-referrer"></td>
<td><img src="https://huggingface.co/deepseek-ai/assets/show4.jpg" style="width: 500px" referrerpolicy="no-referrer"></td>
</tr>
</tbody></table>
## Acknowledgement
We would like to thank [Vary](https://github.com/Ucas-HaoranWei/Vary/), [GOT-OCR2.0](https://github.com/Ucas-HaoranWei/GOT-OCR2.0/), [MinerU](https://github.com/opendatalab/MinerU), [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR), [OneChart](https://github.com/LingyvKong/OneChart), [Slow Perception](https://github.com/Ucas-HaoranWei/Slow-Perception) for their valuable models and ideas.
We also appreciate the benchmarks: [Fox](https://github.com/ucaslcl/Fox), [OminiDocBench](https://github.com/opendatalab/OmniDocBench).
## Citation
```bibtex
@article{wei2025deepseek,
title={DeepSeek-OCR: Contexts Optical Compression},
author={Wei, Haoran and Sun, Yaofeng and Li, Yukun},
journal={arXiv preprint arXiv:2510.18234},
year={2025}
}</description>
<link>https://huggingface.co/deepseek-ai/DeepSeek-OCR</link>
<guid isPermaLink="false">https://huggingface.co/deepseek-ai/DeepSeek-OCR</guid>
<pubDate>Mon, 03 Nov 2025 18:36:12 GMT</pubDate>
</item>
<item>
<title>deepseek-ai/DeepSeek-V3.2-Exp-Base</title>
<description>deepseek-ai/DeepSeek-V3.2-Exp-Base Text Generation • 685B • Updated Oct 9 • 217 • 48
---
license: mit
library_name: transformers
---</description>
<link>https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp-Base</link>
<guid isPermaLink="false">https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp-Base</guid>
<pubDate>Wed, 08 Oct 2025 18:09:46 GMT</pubDate>
</item>
<item>
<title>deepseek-ai/DeepSeek-V3.1-Terminus</title>
<description>deepseek-ai/DeepSeek-V3.1-Terminus Text Generation • 685B • Updated Sep 29 • 81k • • 354
---
license: mit
library_name: transformers
base_model:
- deepseek-ai/DeepSeek-V3.1-Base
---
# DeepSeek-V3.1-Terminus
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" referrerpolicy="no-referrer">
</div>
<hr>
<div align="center" style="line-height: 1;">
<a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Chat" src="https://img.shields.io/badge/%F0%9F%A4%96%20Chat-DeepSeek%20V3-536af5?color=536af5&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
<img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&amp;logoColor=white&amp;color=7289da" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
<img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
<img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://huggingface.co/deepseek-ai/LICENSE" style="margin: 2px;">
<img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&amp;color=f5de53" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
## Introduction
This update maintains the model's original capabilities while addressing issues reported by users, including:
- Language consistency: Reducing instances of mixed Chinese-English text and occasional abnormal characters;
- Agent capabilities: Further optimizing the performance of the Code Agent and Search Agent.
| Benchmark | DeepSeek-V3.1 | DeepSeek-V3.1-Terminus |
| :--- | :---: | :---: |
| **Reasoning Mode w/o Tool Use** | | |
| MMLU-Pro | 84.8 | 85.0 |
| GPQA-Diamond | 80.1 | 80.7 |
| Humanity's Last Exam | 15.9 | 21.7 |
| LiveCodeBench | 74.8 | 74.9 |
| Codeforces | 2091 | 2046 |
| Aider-Polyglot | 76.3 | 76.1 |
| **Agentic Tool Use** | | |
| BrowseComp | 30.0 | 38.5 |
| BrowseComp-zh | 49.2 | 45.0 |
| SimpleQA | 93.4 | 96.8 |
| SWE Verified | 66.0 | 68.4 |
| SWE-bench Multilingual | 54.5 | 57.8 |
| Terminal-bench | 31.3 | 36.7 |
**The template and tool-set of search agent have been updated, which is shown in `assets/search_tool_trajectory.html`.**
## How to Run Locally
The model structure of DeepSeek-V3.1-Terminus is the same as DeepSeek-V3. Please visit [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repo for more information about running this model locally.
For the model's chat template other than search agent, please refer to the [DeepSeek-V3.1](https://huggingface.co/deepseek-ai/DeepSeek-V3.1) repo.
**Here we also provide an updated inference demo code in the `inference` folder to help the community get started with running our model and understand the details of model architecture.**
**NOTE: In the current model checkpoint, the parameters of `self_attn.o_proj` do not conform to the UE8M0 FP8 scale data format. This is a known issue and will be corrected in future model releases.**
## License
This repository and the model weights are licensed under the [MIT License](LICENSE).
## Citation
```
@misc{deepseekai2024deepseekv3technicalreport,
title={DeepSeek-V3 Technical Report},
author={DeepSeek-AI},
year={2024},
eprint={2412.19437},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.19437},
}
```
## Contact
If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com).
</description>
<link>https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus</link>
<guid isPermaLink="false">https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus</guid>
<pubDate>Sun, 28 Sep 2025 17:52:07 GMT</pubDate>
</item>
<item>
<title>deepseek-ai/DeepSeek-V3.1</title>
<description>deepseek-ai/DeepSeek-V3.1 Text Generation • 685B • Updated Sep 5 • 82.2k • • 808
---
license: mit
library_name: transformers
base_model:
- deepseek-ai/DeepSeek-V3.1-Base
---
# DeepSeek-V3.1
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" referrerpolicy="no-referrer">
</div>
<hr>
<div align="center" style="line-height: 1;">
<a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Chat" src="https://img.shields.io/badge/%F0%9F%A4%96%20Chat-DeepSeek%20V3-536af5?color=536af5&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
<img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&amp;logoColor=white&amp;color=7289da" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
<img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
<img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://huggingface.co/deepseek-ai/LICENSE" style="margin: 2px;">
<img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&amp;color=f5de53" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
## Introduction
DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Compared to the previous version, this upgrade brings improvements in multiple aspects:
- **Hybrid thinking mode**: One model supports both thinking mode and non-thinking mode by changing the chat template.
- **Smarter tool calling**: Through post-training optimization, the model's performance in tool usage and agent tasks has significantly improved.
- **Higher thinking efficiency**: DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly.
DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach, following the methodology outlined in the original DeepSeek-V3 report. We have expanded our dataset by collecting additional long documents and substantially extending both training phases. The 32K extension phase has been increased 10-fold to 630B tokens, while the 128K extension phase has been extended by 3.3x to 209B tokens.
Additionally, DeepSeek-V3.1 is trained using the **UE8M0 FP8 scale data format on both model weights and activations** to ensure compatibility with microscaling data formats. Please refer to [DeepGEMM](https://github.com/deepseek-ai/DeepGEMM) for more details.
## Model Downloads
<div align="center">
| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** |
| :------------: | :------------: | :------------: | :------------: | :------------: |
| DeepSeek-V3.1-Base | 671B | 37B | 128K | [HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base) \| [ModelScope](https://modelscope.cn/models/deepseek-ai/DeepSeek-V3.1-Base) |
| DeepSeek-V3.1 | 671B | 37B | 128K | [HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V3.1) \| [ModelScope](https://modelscope.cn/models/deepseek-ai/DeepSeek-V3.1) |
</div>
## Chat Template
The details of our chat template is described in `tokenizer_config.json` and `assets/chat_template.jinja`. Here is a brief description.
### Non-Thinking
#### First-Turn
Prefix:
`&lt;|begin▁of▁sentence|&gt;{system prompt}&lt;|User|&gt;{query}&lt;|Assistant|&gt;`
With the given prefix, DeepSeek V3.1 generates responses to queries in non-thinking mode. Unlike DeepSeek V3, it introduces an additional token ``.
#### Multi-Turn
Context:
`&lt;|begin▁of▁sentence|&gt;{system prompt}&lt;|User|&gt;{query}&lt;|Assistant|&gt;{response}&lt;|end▁of▁sentence|&gt;...&lt;|User|&gt;{query}&lt;|Assistant|&gt;{response}&lt;|end▁of▁sentence|&gt;`
Prefix:
`&lt;|User|&gt;{query}&lt;|Assistant|&gt;`
By concatenating the context and the prefix, we obtain the correct prompt for the query.
### Thinking
#### First-Turn
Prefix:
`&lt;|begin▁of▁sentence|&gt;{system prompt}&lt;|User|&gt;{query}&lt;|Assistant|&gt;<think>`
The prefix of thinking mode is similar to DeepSeek-R1.
#### Multi-Turn
Context:
`&lt;|begin▁of▁sentence|&gt;{system prompt}&lt;|User|&gt;{query}&lt;|Assistant|&gt;</think>{response}&lt;|end▁of▁sentence|&gt;...&lt;|User|&gt;{query}&lt;|Assistant|&gt;{response}&lt;|end▁of▁sentence|&gt;`
Prefix:
`&lt;|User|&gt;{query}&lt;|Assistant|&gt;<think>`
The multi-turn template is the same with non-thinking multi-turn chat template. It means the thinking token in the last turn will be dropped but the `</think>` is retained in every turn of context.
### ToolCall
Toolcall is supported in non-thinking mode. The format is:
`&lt;|begin▁of▁sentence|&gt;{system prompt}\n\n{tool_description}&lt;|User|&gt;{query}&lt;|Assistant|&gt;` where the tool_description is
```
## Tools
You have access to the following tools:
### {tool_name1}
Description: {description}
Parameters: {json.dumps(parameters)}
IMPORTANT: ALWAYS adhere to this exact format for tool use:
&lt;|tool▁calls▁begin|&gt;&lt;|tool▁call▁begin|&gt;tool_call_name&lt;|tool▁sep|&gt;tool_call_arguments&lt;|tool▁call▁end|&gt;{additional_tool_calls}&lt;|tool▁calls▁end|&gt;
Where:
- `tool_call_name` must be an exact match to one of the available tools
- `tool_call_arguments` must be valid JSON that strictly follows the tool's Parameters Schema
- For multiple tool calls, chain them directly without separators or spaces
```
### Code-Agent
We support various code agent frameworks. Please refer to the above toolcall format to create your own code agents. An example is shown in `assets/code_agent_trajectory.html`.
### Search-Agent
We design a specific format for searching toolcall in thinking mode, to support search agent.
For complex questions that require accessing external or up-to-date information, DeepSeek-V3.1 can leverage a user-provided search tool through a multi-turn tool-calling process.
Please refer to the `assets/search_tool_trajectory.html` and `assets/search_python_tool_trajectory.html` for the detailed template.
## Evaluation
| Category | Benchmark (Metric) | DeepSeek V3.1-NonThinking | DeepSeek V3 0324 | DeepSeek V3.1-Thinking | DeepSeek R1 0528
|----------|----------------------------------|-----------------|---|---|---|
| General |
| | MMLU-Redux (EM) | 91.8 | 90.5 | 93.7 | 93.4
| | MMLU-Pro (EM) | 83.7 | 81.2 | 84.8 | 85.0
| | GPQA-Diamond (Pass@1) | 74.9 | 68.4 | 80.1 | 81.0
| | Humanity's Last Exam (Pass@1) | - | - | 15.9 | 17.7
|Search Agent|
| | BrowseComp | - | - | 30.0 | 8.9
| | BrowseComp_zh | - | - | 49.2 | 35.7
| | Humanity's Last Exam (Python + Search) |- | - | 29.8 | 24.8
| | SimpleQA | - | - | 93.4 | 92.3
| Code |
| | LiveCodeBench (2408-2505) (Pass@1) | 56.4 | 43.0 | 74.8 | 73.3
| | Codeforces-Div1 (Rating) | - | - | 2091 | 1930
| | Aider-Polyglot (Acc.) | 68.4 | 55.1 | 76.3 | 71.6
| Code Agent|
| | SWE Verified (Agent mode) | 66.0 | 45.4 | - | 44.6
| | SWE-bench Multilingual (Agent mode) | 54.5 | 29.3 | - | 30.5
| | Terminal-bench (Terminus 1 framework) | 31.3 | 13.3 | - | 5.7
| Math |
| | AIME 2024 (Pass@1) | 66.3 | 59.4 | 93.1 | 91.4
| | AIME 2025 (Pass@1) | 49.8 | 51.3 | 88.4 | 87.5
| | HMMT 2025 (Pass@1) | 33.5 | 29.2 | 84.2 | 79.4 |
Note:
- Search agents are evaluated with our internal search framework, which uses a commercial search API + webpage filter + 128K context window. Seach agent results of R1-0528 are eva... |
http://localhost:1200/huggingface/models/facebook - Success ✔️<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
<channel>
<title>Huggingface facebook Models</title>
<link>https://huggingface.co/facebook/models?sort=created</link>
<atom:link href="http://localhost:1200/huggingface/models/facebook" rel="self" type="application/rss+xml"></atom:link>
<description>Huggingface facebook Models - Powered by RSSHub</description>
<generator>RSSHub</generator>
<webMaster>contact@rsshub.app (RSSHub)</webMaster>
<language>en</language>
<lastBuildDate>Sat, 06 Dec 2025 11:25:11 GMT</lastBuildDate>
<ttl>5</ttl>
<item>
<title>facebook/testing_instructions</title>
<description>facebook/testing_instructions Updated 2 days ago • 2
---
extra_gated_fields:
First Name: text
Last Name: text
Date of birth: date_picker
Country: country
Affiliation: text
I accept the terms and conditions: checkbox
geo: ip_location
Test request?: checkbox
By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox
extra_gated_description: &gt;-
The information you provide will be collected, stored, processed and shared in
accordance with the [Meta Privacy
Policy](https://www.facebook.com/privacy/policy/).
extra_gated_button_content: Submit
extra_gated_heading: "Please be sure to provide your full legal name, date of birth, and full organization name with all corporate identifiers. Avoid the use of acronyms and special characters. Failure to follow these instructions may prevent you from accessing this model and others on Hugging Face. You will not have the ability to edit this form after submission, so please ensure all information is accurate."
language:
- en
tags:
- meta-ai
- meta-pytorch
license: fair-noncommercial-research-license
---</description>
<link>https://huggingface.co/facebook/testing_instructions</link>
<guid isPermaLink="false">https://huggingface.co/facebook/testing_instructions</guid>
<pubDate>Thu, 04 Dec 2025 00:10:57 GMT</pubDate>
</item>
<item>
<title>facebook/omniASR-LLM-7B-ZS</title>
<description>facebook/omniASR-LLM-7B-ZS Automatic Speech Recognition • Updated 8 days ago • 3
---
license: apache-2.0
datasets:
- facebook/omnilingual-asr-corpus
pipeline_tag: automatic-speech-recognition
---
# Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
<div align="center" style="lline-height: 1.2; font-size:16px; margin-bottom: 30px;">
<a href="https://huggingface.co/facebook" target="_blank" style="margin: 2px;">
🤗 Hugging Face
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr" target="_blank" style="margin: 2px;">
🐙 GitHub
</a> |
<a href="https://huggingface.co/spaces/facebook/omniasr-transcriptions" target="_blank" style="margin: 2px;">
🤖️ Demo
</a> |
<a href="https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/" target="_blank" style="margin: 2px;">
📃 Paper
</a> |
<a href="https://ai.meta.com/blog/omnilingual-asr-advancing-automatic-speech-recognition/" target="_blank" style="margin: 2px;">
📝 Blogpost
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr/blob/main/LICENSE" style="margin: 2px;">
📄 License: Apache 2.0
</a>
</div>
# Model Card for omniASR-LLM-7B-ZS
## Model Description
This model is part of the **Omnilingual ASR** family released by Meta AI. The original suite includes:
<!-- TODO : add new tokenizer, we'll get two tokenizer, add mssing speed numbers-->
| Model Name | Features | Parameters | Download Size (FP32) | Inference VRAM¹ | Real-Time Factor¹ (relative speed)² |
|---------------------|---------------|------------:|---------------:|---------------:|-----------:|
| [`omniASR_W2V_300M`](https://huggingface.co/facebook/omniASR-W2V-300M) | SSL | 317_390_592 | 1.2 GiB | | |
| [`omniASR_W2V_1B`](https://huggingface.co/facebook/omniASR-W2V-1B) | SSL | 965_514_752 | 3.6 GiB | | |
| [`omniASR_W2V_3B`](https://huggingface.co/facebook/omniASR-W2V-3B) | SSL | 3_064_124_672 | 12.0 GiB | | |
| [`omniASR_W2V_7B`](https://huggingface.co/facebook/omniASR-W2V-7B) | SSL | 6_488_487_168 | 25.0 GiB | | |
| [`omniASR_CTC_300M`](https://huggingface.co/facebook/omniASR-CTC-300M) | ASR | 325_494_996 | 1.3 GiB | ~2 GiB | 0.001 (96x) |
| [`omniASR_CTC_1B`](https://huggingface.co/facebook/omniASR-CTC-1B) | ASR | 975_065_300 | 3.7 GiB | ~3 GiB | 0.002 (48x) |
| [`omniASR_CTC_3B`](https://huggingface.co/facebook/omniASR-CTC-3B) | ASR | 3_080_423_636 | 12.0 GiB | ~8 GiB | 0.003 (32x) |
| [`omniASR_CTC_7B`](https://huggingface.co/facebook/omniASR-CTC-7B) | ASR | 6_504_786_132 | 25.0 GiB | ~15 GiB | 0.006 (16x) |
| [`omniASR_LLM_300M`](https://huggingface.co/facebook/omniASR-LLM-300M) | ASR with optional language conditioning | 1_627_603_584 | 6.1 GiB | ~5 GiB | 0.090 (~1x) |
| [`omniASR_LLM_1B`](https://huggingface.co/facebook/omniASR-LLM-1B) | ASR with optional language conditioning | 2_275_710_592 | 8.5 GiB | ~6 GiB | 0.091 (~1x) |
| [`omniASR_LLM_3B`](https://huggingface.co/facebook/omniASR-LLM-3B) | ASR with optional language conditioning | 4_376_679_040 | 17.0 GiB | ~10 GiB | 0.093 (~1x) |
| [`omniASR_LLM_7B`](https://huggingface.co/facebook/omniASR-LLM-7B) | ASR with optional language conditioning | 7_801_041_536 | 30.0 GiB | ~17 GiB | 0.092 (~1x) |
| [`omniASR_LLM_7B_ZS`](https://huggingface.co/facebook/omniASR-LLM-7B-ZS) | Zero-Shot ASR | 7_810_900_608 | 30.0 GiB | ~20 GiB |&nbsp;0.194 (~0.5x) |
¹ (batch=1, audio_len=30s, BF16, A100)
² Relative speed to `omniASR_LLM_7B`
---
## Installation
The models were developed using [fairseq2](https://github.com/facebookresearch/fairseq2), a research-focused sequence modeling toolkit. While we provide a **reference** inference pipeline that works across platforms, audio support requires [libsndfile](https://github.com/facebookresearch/fairseq2?tab=readme-ov-file#system-dependencies) (Mac: `brew install libsndfile`; Windows may need an additional [setup](https://github.com/facebookresearch/fairseq2?tab=readme-ov-file#installing-on-windows)).
```bash
# using pip
pip install omnilingual-asr
# using uv
uv add omnilingual-asr
```
## Inference
```python
from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")
audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
lang = ["eng_Latn", "deu_Latn"]
transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)
```
## Supported Languages
To view the full list of 1600+ supported languages, you can access the language list [programmatically](/src/omnilingual_asr/models/wav2vec2_llama/lang_ids.py):
```python
from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs
# Print all supported languages
print(f"Total supported languages: {len(supported_langs)}")
print(supported_langs)
# Check if a specific language is supported
if "eng_Latn" in supported_langs:
print("English (Latin script) is supported!")
```
Languages follow the format `{language_code}_{script}`, for example `eng_Latn` - English (Latin script), `cmn_Hans` - Mandarin Chinese (Simplified), ...
---
## Training
To further finetune the released checkpoints on your own data, use our [data preparation guide](/workflows/dataprep/README.md) followed by the [finetuning recipe guide](/workflows/recipes/wav2vec2/asr/README.md).
---
## Citation
**BibTeX:**
```bibtex
@misc{omnilingualasr2025,
title={{Omnilingual ASR}: Open-Source Multilingual Speech Recognition for 1600+ Languages},
author={{Omnilingual ASR Team} and Keren, Gil and Kozhevnikov, Artyom and Meng, Yen and Ropers, Christophe and Setzler, Matthew and Wang, Skyler and Adebara, Ife and Auli, Michael and Can, Balioglu and Chan, Kevin and Cheng, Chierh and Chuang, Joe and Droof, Caley and Duppenthaler, Mark and Duquenne, Paul-Ambroise and Erben, Alexander and Gao, Cynthia and Mejia Gonzalez, Gabriel and Lyu, Kehan and Miglani, Sagar and Pratap, Vineel and Sadagopan, Kaushik Ram and Saleem, Safiyyah and Turkatenko, Arina and Ventayol-Boada, Albert and Yong, Zheng-Xin and Chung, Yu-An and Maillard, Jean and Moritz, Rashel and Mourachko, Alexandre and Williamson, Mary and Yates, Shireen},
year={2025},
url={https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/},
}
[1]: https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file "GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages"
|
| Model Name | Features | Parameters | Download Size (FP32) | Inference VRAM¹ | Real-Time Factor¹ (relative speed)² |
|---|---|---|---|---|---|
omniASR_W2V_300M |
SSL | 317_390_592 | 1.2 GiB | ||
omniASR_W2V_1B |
SSL | 965_514_752 | 3.6 GiB | ||
omniASR_W2V_3B |
SSL | 3_064_124_672 | 12.0 GiB | ||
omniASR_W2V_7B |
SSL | 6_488_487_168 | 25.0 GiB | ||
omniASR_CTC_300M |
ASR | 325_494_996 | 1.3 GiB | ~2 GiB | 0.001 (96x) |
omniASR_CTC_1B |
ASR | 975_065_300 | 3.7 GiB | ~3 GiB | 0.002 (48x) |
omniASR_CTC_3B |
ASR | 3_080_423_636 | 12.0 GiB | ~8 GiB | 0.003 (32x) |
omniASR_CTC_7B |
ASR | 6_504_786_132 | 25.0 GiB | ~15 GiB | 0.006 (16x) |
omniASR_LLM_300M |
ASR with optional language conditioning | 1_627_603_584 | 6.1 GiB | ~5 GiB | 0.090 (~1x) |
omniASR_LLM_1B |
ASR with optional language conditioning | 2_275_710_592 | 8.5 GiB | ~6 GiB | 0.091 (~1x) |
omniASR_LLM_3B |
ASR with optional language conditioning | 4_376_679_040 | 17.0 GiB | ~10 GiB | 0.093 (~1x) |
omniASR_LLM_7B |
ASR with optional language conditioning | 7_801_041_536 | 30.0 GiB | ~17 GiB | 0.092 (~1x) |
omniASR_LLM_7B_ZS |
Zero-Shot ASR | 7_810_900_608 | 30.0 GiB | ~20 GiB | 0.194 (~0.5x) |
| ¹ (batch=1, audio_len=30s, BF16, A100) | |||||
² Relative speed to omniASR_LLM_7B |
Installation
The models were developed using fairseq2, a research-focused sequence modeling toolkit. While we provide a reference inference pipeline that works across platforms, audio support requires libsndfile (Mac: brew install libsndfile; Windows may need an additional setup).
# using pip
pip install omnilingual-asr
# using uv
uv add omnilingual-asrInference
from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")
audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
lang = ["eng_Latn", "deu_Latn"]
transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)Supported Languages
To view the full list of 1600+ supported languages, you can access the language list programmatically:
from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs
# Print all supported languages
print(f"Total supported languages: {len(supported_langs)}")
print(supported_langs)
# Check if a specific language is supported
if "eng_Latn" in supported_langs:
print("English (Latin script) is supported!")Languages follow the format {language_code}_{script}, for example eng_Latn - English (Latin script), cmn_Hans - Mandarin Chinese (Simplified), ...
Training
To further finetune the released checkpoints on your own data, use our data preparation guide followed by the finetuning recipe guide.
Citation
BibTeX:
@misc{omnilingualasr2025,
title={{Omnilingual ASR}: Open-Source Multilingual Speech Recognition for 1600+ Languages},
author={{Omnilingual ASR Team} and Keren, Gil and Kozhevnikov, Artyom and Meng, Yen and Ropers, Christophe and Setzler, Matthew and Wang, Skyler and Adebara, Ife and Auli, Michael and Can, Balioglu and Chan, Kevin and Cheng, Chierh and Chuang, Joe and Droof, Caley and Duppenthaler, Mark and Duquenne, Paul-Ambroise and Erben, Alexander and Gao, Cynthia and Mejia Gonzalez, Gabriel and Lyu, Kehan and Miglani, Sagar and Pratap, Vineel and Sadagopan, Kaushik Ram and Saleem, Safiyyah and Turkatenko, Arina and Ventayol-Boada, Albert and Yong, Zheng-Xin and Chung, Yu-An and Maillard, Jean and Moritz, Rashel and Mourachko, Alexandre and Williamson, Mary and Yates, Shireen},
year={2025},
url={https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/},
}- Developed by: Meta AI / Omnilingual ASR Team([GitHub][1])
- Model type: End-to-end automatic speech recognition model (wav2vec2-style encoder with CTC head / encoder-decoder, depending on checkpoint).
- Language(s) (NLP): 1,600+ languages overall in Omnilingual ASR; this corpus release specifically covers 348 under-served languages across many writing systems (Latin, Arabic, Devanagari, etc.).([GitHub][1])
- License: Apache-2.0 (for the model and code), CC-BY-4.0 for the
facebook/omnilingual-asr-corpusdataset.([GitHub][1])
[1]: https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file "GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages"
[2]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus/blob/main/README.md "README.md · facebook/omnilingual-asr-corpus at main"
[3]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus?utm_source=chatgpt.com "facebook/omnilingual-asr-corpus · Datasets at ..."
[4]: https://venturebeat.com/ai/meta-returns-to-open-source-ai-with-omnilingual-asr-models-that-can?utm_source=chatgpt.com "Meta returns to open source AI with Omnilingual ASR ..."
[5]: https://huggingface.co/spaces/facebook/omniasr-transcriptions?utm_source=chatgpt.com "Omnilingual ASR Media Transcription"
[6]: https://huggingface.co/collections/bezzam/omnilingual-asr-1-600-languages?utm_source=chatgpt.com "Omnilingual ASR (1600+ Languages) - a bezzam Collection"
https://huggingface.co/facebook/omniASR-LLM-7Bhttps://huggingface.co/facebook/omniASR-LLM-7BThu, 27 Nov 2025 23:26:06 GMT<title>facebook/omniASR-LLM-3B</title>facebook/omniASR-LLM-3B Automatic Speech Recognition • Updated 8 days ago • 1
license: apache-2.0
datasets:
- facebook/omnilingual-asr-corpus
pipeline_tag: automatic-speech-recognition
Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
<div align="center" style="lline-height: 1.2; font-size:16px; margin-bottom: 30px;">
<a href="https://huggingface.co/facebook" target="_blank" style="margin: 2px;">
🤗 Hugging Face
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr" target="_blank" style="margin: 2px;">
🐙 GitHub
</a> |
<a href="https://huggingface.co/spaces/facebook/omniasr-transcriptions" target="_blank" style="margin: 2px;">
🤖️ Demo
</a> |
<a href="https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/" target="_blank" style="margin: 2px;">
📃 Paper
</a> |
<a href="https://ai.meta.com/blog/omnilingual-asr-advancing-automatic-speech-recognition/" target="_blank" style="margin: 2px;">
📝 Blogpost
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr/blob/main/LICENSE" style="margin: 2px;">
📄 License: Apache 2.0
</a>
</div>
Model Card for omniASR-LLM-3B
Model Description
This model is part of the Omnilingual ASR family released by Meta AI. The original suite includes:
<!-- TODO : add new tokenizer, we'll get two tokenizer, add mssing speed numbers-->
| Model Name | Features | Parameters | Download Size (FP32) | Inference VRAM¹ | Real-Time Factor¹ (relative speed)² |
|---|---|---|---|---|---|
omniASR_W2V_300M |
SSL | 317_390_592 | 1.2 GiB | ||
omniASR_W2V_1B |
SSL | 965_514_752 | 3.6 GiB | ||
omniASR_W2V_3B |
SSL | 3_064_124_672 | 12.0 GiB | ||
omniASR_W2V_7B |
SSL | 6_488_487_168 | 25.0 GiB | ||
omniASR_CTC_300M |
ASR | 325_494_996 | 1.3 GiB | ~2 GiB | 0.001 (96x) |
omniASR_CTC_1B |
ASR | 975_065_300 | 3.7 GiB | ~3 GiB | 0.002 (48x) |
omniASR_CTC_3B |
ASR | 3_080_423_636 | 12.0 GiB | ~8 GiB | 0.003 (32x) |
omniASR_CTC_7B |
ASR | 6_504_786_132 | 25.0 GiB | ~15 GiB | 0.006 (16x) |
omniASR_LLM_300M |
ASR with optional language conditioning | 1_627_603_584 | 6.1 GiB | ~5 GiB | 0.090 (~1x) |
omniASR_LLM_1B |
ASR with optional language conditioning | 2_275_710_592 | 8.5 GiB | ~6 GiB | 0.091 (~1x) |
omniASR_LLM_3B |
ASR with optional language conditioning | 4_376_679_040 | 17.0 GiB | ~10 GiB | 0.093 (~1x) |
omniASR_LLM_7B |
ASR with optional language conditioning | 7_801_041_536 | 30.0 GiB | ~17 GiB | 0.092 (~1x) |
omniASR_LLM_7B_ZS |
Zero-Shot ASR | 7_810_900_608 | 30.0 GiB | ~20 GiB | 0.194 (~0.5x) |
| ¹ (batch=1, audio_len=30s, BF16, A100) | |||||
² Relative speed to omniASR_LLM_7B |
Installation
The models were developed using fairseq2, a research-focused sequence modeling toolkit. While we provide a reference inference pipeline that works across platforms, audio support requires libsndfile (Mac: brew install libsndfile; Windows may need an additional setup).
# using pip
pip install omnilingual-asr
# using uv
uv add omnilingual-asrInference
from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")
audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
lang = ["eng_Latn", "deu_Latn"]
transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)Supported Languages
To view the full list of 1600+ supported languages, you can access the language list programmatically:
from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs
# Print all supported languages
print(f"Total supported languages: {len(supported_langs)}")
print(supported_langs)
# Check if a specific language is supported
if "eng_Latn" in supported_langs:
print("English (Latin script) is supported!")Languages follow the format {language_code}_{script}, for example eng_Latn - English (Latin script), cmn_Hans - Mandarin Chinese (Simplified), ...
Training
To further finetune the released checkpoints on your own data, use our data preparation guide followed by the finetuning recipe guide.
Citation
BibTeX:
@misc{omnilingualasr2025,
title={{Omnilingual ASR}: Open-Source Multilingual Speech Recognition for 1600+ Languages},
author={{Omnilingual ASR Team} and Keren, Gil and Kozhevnikov, Artyom and Meng, Yen and Ropers, Christophe and Setzler, Matthew and Wang, Skyler and Adebara, Ife and Auli, Michael and Can, Balioglu and Chan, Kevin and Cheng, Chierh and Chuang, Joe and Droof, Caley and Duppenthaler, Mark and Duquenne, Paul-Ambroise and Erben, Alexander and Gao, Cynthia and Mejia Gonzalez, Gabriel and Lyu, Kehan and Miglani, Sagar and Pratap, Vineel and Sadagopan, Kaushik Ram and Saleem, Safiyyah and Turkatenko, Arina and Ventayol-Boada, Albert and Yong, Zheng-Xin and Chung, Yu-An and Maillard, Jean and Moritz, Rashel and Mourachko, Alexandre and Williamson, Mary and Yates, Shireen},
year={2025},
url={https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/},
}- Developed by: Meta AI / Omnilingual ASR Team([GitHub][1])
- Model type: End-to-end automatic speech recognition model (wav2vec2-style encoder with CTC head / encoder-decoder, depending on checkpoint).
- Language(s) (NLP): 1,600+ languages overall in Omnilingual ASR; this corpus release specifically covers 348 under-served languages across many writing systems (Latin, Arabic, Devanagari, etc.).([GitHub][1])
- License: Apache-2.0 (for the model and code), CC-BY-4.0 for the
facebook/omnilingual-asr-corpusdataset.([GitHub][1])
[1]: https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file "GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages"
[2]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus/blob/main/README.md "README.md · facebook/omnilingual-asr-corpus at main"
[3]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus?utm_source=chatgpt.com "facebook/omnilingual-asr-corpus · Datasets at ..."
[4]: https://venturebeat.com/ai/meta-returns-to-open-source-ai-with-omnilingual-asr-models-that-can?utm_source=chatgpt.com "Meta returns to open source AI with Omnilingual ASR ..."
[5]: https://huggingface.co/spaces/facebook/omniasr-transcriptions?utm_source=chatgpt.com "Omnilingual ASR Media Transcription"
[6]: https://huggingface.co/collections/bezzam/omnilingual-asr-1-600-languages?utm_source=chatgpt.com "Omnilingual ASR (1600+ Languages) - a bezzam Collection"
https://huggingface.co/facebook/omniASR-LLM-3Bhttps://huggingface.co/facebook/omniASR-LLM-3BThu, 27 Nov 2025 23:08:56 GMT<title>facebook/omniASR-LLM-1B</title>facebook/omniASR-LLM-1B Automatic Speech Recognition • Updated 8 days ago • 3
license: apache-2.0
datasets:
- facebook/omnilingual-asr-corpus
pipeline_tag: automatic-speech-recognition
Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
<div align="center" style="lline-height: 1.2; font-size:16px; margin-bottom: 30px;">
<a href="https://huggingface.co/facebook" target="_blank" style="margin: 2px;">
🤗 Hugging Face
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr" target="_blank" style="margin: 2px;">
🐙 GitHub
</a> |
<a href="https://huggingface.co/spaces/facebook/omniasr-transcriptions" target="_blank" style="margin: 2px;">
🤖️ Demo
</a> |
<a href="https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/" target="_blank" style="margin: 2px;">
📃 Paper
</a> |
<a href="https://ai.meta.com/blog/omnilingual-asr-advancing-automatic-speech-recognition/" target="_blank" style="margin: 2px;">
📝 Blogpost
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr/blob/main/LICENSE" style="margin: 2px;">
📄 License: Apache 2.0
</a>
</div>
Model Card for omniASR-LLM-1B
Model Description
This model is part of the Omnilingual ASR family released by Meta AI. The original suite includes:
<!-- TODO : add new tokenizer, we'll get two tokenizer, add mssing speed numbers-->
| Model Name | Features | Parameters | Download Size (FP32) | Inference VRAM¹ | Real-Time Factor¹ (relative speed)² |
|---|---|---|---|---|---|
omniASR_W2V_300M |
SSL | 317_390_592 | 1.2 GiB | ||
omniASR_W2V_1B |
SSL | 965_514_752 | 3.6 GiB | ||
omniASR_W2V_3B |
SSL | 3_064_124_672 | 12.0 GiB | ||
omniASR_W2V_7B |
SSL | 6_488_487_168 | 25.0 GiB | ||
omniASR_CTC_300M |
ASR | 325_494_996 | 1.3 GiB | ~2 GiB | 0.001 (96x) |
omniASR_CTC_1B |
ASR | 975_065_300 | 3.7 GiB | ~3 GiB | 0.002 (48x) |
omniASR_CTC_3B |
ASR | 3_080_423_636 | 12.0 GiB | ~8 GiB | 0.003 (32x) |
omniASR_CTC_7B |
ASR | 6_504_786_132 | 25.0 GiB | ~15 GiB | 0.006 (16x) |
omniASR_LLM_300M |
ASR with optional language conditioning | 1_627_603_584 | 6.1 GiB | ~5 GiB | 0.090 (~1x) |
omniASR_LLM_1B |
ASR with optional language conditioning | 2_275_710_592 | 8.5 GiB | ~6 GiB | 0.091 (~1x) |
omniASR_LLM_3B |
ASR with optional language conditioning | 4_376_679_040 | 17.0 GiB | ~10 GiB | 0.093 (~1x) |
omniASR_LLM_7B |
ASR with optional language conditioning | 7_801_041_536 | 30.0 GiB | ~17 GiB | 0.092 (~1x) |
omniASR_LLM_7B_ZS |
Zero-Shot ASR | 7_810_900_608 | 30.0 GiB | ~20 GiB | 0.194 (~0.5x) |
| ¹ (batch=1, audio_len=30s, BF16, A100) | |||||
² Relative speed to omniASR_LLM_7B |
Installation
The models were developed using fairseq2, a research-focused sequence modeling toolkit. While we provide a reference inference pipeline that works across platforms, audio support requires libsndfile (Mac: brew install libsndfile; Windows may need an additional setup).
# using pip
pip install omnilingual-asr
# using uv
uv add omnilingual-asrInference
from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")
audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
lang = ["eng_Latn", "deu_Latn"]
transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)Supported Languages
To view the full list of 1600+ supported languages, you can access the language list programmatically:
from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs
# Print all supported languages
print(f"Total supported languages: {len(supported_langs)}")
print(supported_langs)
# Check if a specific language is supported
if "eng_Latn" in supported_langs:
print("English (Latin script) is supported!")Languages follow the format {language_code}_{script}, for example eng_Latn - English (Latin script), cmn_Hans - Mandarin Chinese (Simplified), ...
Training
To further finetune the released checkpoints on your own data, use our data preparation guide followed by the finetuning recipe guide.
Citation
BibTeX:
@misc{omnilingualasr2025,
title={{Omnilingual ASR}: Open-Source Multilingual Speech Recognition for 1600+ Languages},
author={{Omnilingual ASR Team} and Keren, Gil and Kozhevnikov, Artyom and Meng, Yen and Ropers, Christophe and Setzler, Matthew and Wang, Skyler and Adebara, Ife and Auli, Michael and Can, Balioglu and Chan, Kevin and Cheng, Chierh and Chuang, Joe and Droof, Caley and Duppenthaler, Mark and Duquenne, Paul-Ambroise and Erben, Alexander and Gao, Cynthia and Mejia Gonzalez, Gabriel and Lyu, Kehan and Miglani, Sagar and Pratap, Vineel and Sadagopan, Kaushik Ram and Saleem, Safiyyah and Turkatenko, Arina and Ventayol-Boada, Albert and Yong, Zheng-Xin and Chung, Yu-An and Maillard, Jean and Moritz, Rashel and Mourachko, Alexandre and Williamson, Mary and Yates, Shireen},
year={2025},
url={https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/},
}- Developed by: Meta AI / Omnilingual ASR Team([GitHub][1])
- Model type: End-to-end automatic speech recognition model (wav2vec2-style encoder with CTC head / encoder-decoder, depending on checkpoint).
- Language(s) (NLP): 1,600+ languages overall in Omnilingual ASR; this corpus release specifically covers 348 under-served languages across many writing systems (Latin, Arabic, Devanagari, etc.).([GitHub][1])
- License: Apache-2.0 (for the model and code), CC-BY-4.0 for the
facebook/omnilingual-asr-corpusdataset.([GitHub][1])
[1]: https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file "GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages"
[2]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus/blob/main/README.md "README.md · facebook/omnilingual-asr-corpus at main"
[3]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus?utm_source=chatgpt.com "facebook/omnilingual-asr-corpus · Datasets at ..."
[4]: https://venturebeat.com/ai/meta-returns-to-open-source-ai-with-omnilingual-asr-models-that-can?utm_source=chatgpt.com "Meta returns to open source AI with Omnilingual ASR ..."
[5]: https://huggingface.co/spaces/facebook/omniasr-transcriptions?utm_source=chatgpt.com "Omnilingual ASR Media Transcription"
[6]: https://huggingface.co/collections/bezzam/omnilingual-asr-1-600-languages?utm_source=chatgpt.com "Omnilingual ASR (1600+ Languages) - a bezzam Collection"
https://huggingface.co/facebook/omniASR-LLM-1Bhttps://huggingface.co/facebook/omniASR-LLM-1BThu, 27 Nov 2025 22:37:27 GMT<title>facebook/omniASR-LLM-300M</title>facebook/omniASR-LLM-300M Automatic Speech Recognition • Updated 8 days ago • 3
license: apache-2.0
datasets:
- facebook/omnilingual-asr-corpus
pipeline_tag: automatic-speech-recognition
Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
<div align="center" style="lline-height: 1.2; font-size:16px; margin-bottom: 30px;">
<a href="https://huggingface.co/facebook" target="_blank" style="margin: 2px;">
🤗 Hugging Face
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr" target="_blank" style="margin: 2px;">
🐙 GitHub
</a> |
<a href="https://huggingface.co/spaces/facebook/omniasr-transcriptions" target="_blank" style="margin: 2px;">
🤖️ Demo
</a> |
<a href="https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/" target="_blank" style="margin: 2px;">
📃 Paper
</a> |
<a href="https://ai.meta.com/blog/omnilingual-asr-advancing-automatic-speech-recognition/" target="_blank" style="margin: 2px;">
📝 Blogpost
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr/blob/main/LICENSE" style="margin: 2px;">
📄 License: Apache 2.0
</a>
</div>
Model Card for omniASR-LLM-300M
Model Description
This model is part of the Omnilingual ASR family released by Meta AI. The original suite includes:
<!-- TODO : add new tokenizer, we'll get two tokenizer, add mssing speed numbers-->
| Model Name | Features | Parameters | Download Size (FP32) | Inference VRAM¹ | Real-Time Factor¹ (relative speed)² |
|---|---|---|---|---|---|
omniASR_W2V_300M |
SSL | 317_390_592 | 1.2 GiB | ||
omniASR_W2V_1B |
SSL | 965_514_752 | 3.6 GiB | ||
omniASR_W2V_3B |
SSL | 3_064_124_672 | 12.0 GiB | ||
omniASR_W2V_7B |
SSL | 6_488_487_168 | 25.0 GiB | ||
omniASR_CTC_300M |
ASR | 325_494_996 | 1.3 GiB | ~2 GiB | 0.001 (96x) |
omniASR_CTC_1B |
ASR | 975_065_300 | 3.7 GiB | ~3 GiB | 0.002 (48x) |
omniASR_CTC_3B |
ASR | 3_080_423_636 | 12.0 GiB | ~8 GiB | 0.003 (32x) |
omniASR_CTC_7B |
ASR | 6_504_786_132 | 25.0 GiB | ~15 GiB | 0.006 (16x) |
omniASR_LLM_300M |
ASR with optional language conditioning | 1_627_603_584 | 6.1 GiB | ~5 GiB | 0.090 (~1x) |
omniASR_LLM_1B |
ASR with optional language conditioning | 2_275_710_592 | 8.5 GiB | ~6 GiB | 0.091 (~1x) |
omniASR_LLM_3B |
ASR with optional language conditioning | 4_376_679_040 | 17.0 GiB | ~10 GiB | 0.093 (~1x) |
omniASR_LLM_7B |
ASR with optional language conditioning | 7_801_041_536 | 30.0 GiB | ~17 GiB | 0.092 (~1x) |
omniASR_LLM_7B_ZS |
Zero-Shot ASR | 7_810_900_608 | 30.0 GiB | ~20 GiB | 0.194 (~0.5x) |
| ¹ (batch=1, audio_len=30s, BF16, A100) | |||||
² Relative speed to omniASR_LLM_7B |
Installation
The models were developed using fairseq2, a research-focused sequence modeling toolkit. While we provide a reference inference pipeline that works across platforms, audio support requires libsndfile (Mac: brew install libsndfile; Windows may need an additional setup).
# using pip
pip install omnilingual-asr
# using uv
uv add omnilingual-asrInference
from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")
audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
lang = ["eng_Latn", "deu_Latn"]
transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)Supported Languages
To view the full list of 1600+ supported languages, you can access the language list programmatically:
from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs
# Print all supported languages
print(f"Total supported languages: {len(supported_langs)}")
print(supported_langs)
# Check if a specific language is supported
if "eng_Latn" in supported_langs:
print("English (Latin script) is supported!")Languages follow the format {language_code}_{script}, for example eng_Latn - English (Latin script), cmn_Hans - Mandarin Chinese (Simplified), ...
Training
To further finetune the released checkpoints on your own data, use our data preparation guide followed by the finetuning recipe guide.
Citation
BibTeX:
@misc{omnilingualasr2025,
title={{Omnilingual ASR}: Open-Source Multilingual Speech Recognition for 1600+ Languages},
author={{Omnilingual ASR Team} and Keren, Gil and Kozhevnikov, Artyom and Meng, Yen and Ropers, Christophe and Setzler, Matthew and Wang, Skyler and Adebara, Ife and Auli, Michael and Can, Balioglu and Chan, Kevin and Cheng, Chierh and Chuang, Joe and Droof, Caley and Duppenthaler, Mark and Duquenne, Paul-Ambroise and Erben, Alexander and Gao, Cynthia and Mejia Gonzalez, Gabriel and Lyu, Kehan and Miglani, Sagar and Pratap, Vineel and Sadagopan, Kaushik Ram and Saleem, Safiyyah and Turkatenko, Arina and Ventayol-Boada, Albert and Yong, Zheng-Xin and Chung, Yu-An and Maillard, Jean and Moritz, Rashel and Mourachko, Alexandre and Williamson, Mary and Yates, Shireen},
year={2025},
url={https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/},
}- Developed by: Meta AI / Omnilingual ASR Team([GitHub][1])
- Model type: End-to-end automatic speech recognition model (wav2vec2-style encoder with CTC head / encoder-decoder, depending on checkpoint).
- Language(s) (NLP): 1,600+ languages overall in Omnilingual ASR; this corpus release specifically covers 348 under-served languages across many writing systems (Latin, Arabic, Devanagari, etc.).([GitHub][1])
- License: Apache-2.0 (for the model and code), CC-BY-4.0 for the
facebook/omnilingual-asr-corpusdataset.([GitHub][1])
[1]: https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file "GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages"
[2]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus/blob/main/README.md "README.md · facebook/omnilingual-asr-corpus at main"
[3]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus?utm_source=chatgpt.com "facebook/omnilingual-asr-corpus · Datasets at ..."
[4]: https://venturebeat.com/ai/meta-returns-to-open-source-ai-with-omnilingual-asr-models-that-can?utm_source=chatgpt.com "Meta returns to open source AI with Omnilingual ASR ..."
[5]: https://huggingface.co/spaces/facebook/omniasr-transcriptions?utm_source=chatgpt.com "Omnilingual ASR Media Transcription"
[6]: https://huggingface.co/collections/bezzam/omnilingual-asr-1-600-languages?utm_source=chatgpt.com "Omnilingual ASR (1600+ Languages) - a bezzam Collection"
https://huggingface.co/facebook/omniASR-LLM-300Mhttps://huggingface.co/facebook/omniASR-LLM-300MThu, 27 Nov 2025 22:04:09 GMT<title>facebook/omniASR-CTC-7B</title>facebook/omniASR-CTC-7B Automatic Speech Recognition • Updated 8 days ago • 1
license: apache-2.0
datasets:
- facebook/omnilingual-asr-corpus
pipeline_tag: automatic-speech-recognition
Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
<div align="center" style="lline-height: 1.2; font-size:16px; margin-bottom: 30px;">
<a href="https://huggingface.co/facebook" target="_blank" style="margin: 2px;">
🤗 Hugging Face
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr" target="_blank" style="margin: 2px;">
🐙 GitHub
</a> |
<a href="https://huggingface.co/spaces/facebook/omniasr-transcriptions" target="_blank" style="margin: 2px;">
🤖️ Demo
</a> |
<a href="https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/" target="_blank" style="margin: 2px;">
📃 Paper
</a> |
<a href="https://ai.meta.com/blog/omnilingual-asr-advancing-automatic-speech-recognition/" target="_blank" style="margin: 2px;">
📝 Blogpost
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr/blob/main/LICENSE" style="margin: 2px;">
📄 License: Apache 2.0
</a>
</div>
Model Card for omniASR-CTC-7B
Model Description
This model is part of the Omnilingual ASR family released by Meta AI. The original suite includes:
<!-- TODO : add new tokenizer, we'll get two tokenizer, add mssing speed numbers-->
| Model Name | Features | Parameters | Download Size (FP32) | Inference VRAM¹ | Real-Time Factor¹ (relative speed)² |
|---|---|---|---|---|---|
omniASR_W2V_300M |
SSL | 317_390_592 | 1.2 GiB | ||
omniASR_W2V_1B |
SSL | 965_514_752 | 3.6 GiB | ||
omniASR_W2V_3B |
SSL | 3_064_124_672 | 12.0 GiB | ||
omniASR_W2V_7B |
SSL | 6_488_487_168 | 25.0 GiB | ||
omniASR_CTC_300M |
ASR | 325_494_996 | 1.3 GiB | ~2 GiB | 0.001 (96x) |
omniASR_CTC_1B |
ASR | 975_065_300 | 3.7 GiB | ~3 GiB | 0.002 (48x) |
omniASR_CTC_3B |
ASR | 3_080_423_636 | 12.0 GiB | ~8 GiB | 0.003 (32x) |
omniASR_CTC_7B |
ASR | 6_504_786_132 | 25.0 GiB | ~15 GiB | 0.006 (16x) |
omniASR_LLM_300M |
ASR with optional language conditioning | 1_627_603_584 | 6.1 GiB | ~5 GiB | 0.090 (~1x) |
omniASR_LLM_1B |
ASR with optional language conditioning | 2_275_710_592 | 8.5 GiB | ~6 GiB | 0.091 (~1x) |
omniASR_LLM_3B |
ASR with optional language conditioning | 4_376_679_040 | 17.0 GiB | ~10 GiB | 0.093 (~1x) |
omniASR_LLM_7B |
ASR with optional language conditioning | 7_801_041_536 | 30.0 GiB | ~17 GiB | 0.092 (~1x) |
omniASR_LLM_7B_ZS |
Zero-Shot ASR | 7_810_900_608 | 30.0 GiB | ~20 GiB | 0.194 (~0.5x) |
| ¹ (batch=1, audio_len=30s, BF16, A100) | |||||
² Relative speed to omniASR_LLM_7B |
Installation
The models were developed using fairseq2, a research-focused sequence modeling toolkit. While we provide a reference inference pipeline that works across platforms, audio support requires libsndfile (Mac: brew install libsndfile; Windows may need an additional setup).
# using pip
pip install omnilingual-asr
# using uv
uv add omnilingual-asrInference
from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")
audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
lang = ["eng_Latn", "deu_Latn"]
transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)Supported Languages
To view the full list of 1600+ supported languages, you can access the language list programmatically:
from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs
# Print all supported languages
print(f"Total supported languages: {len(supported_langs)}")
print(supported_langs)
# Check if a specific language is supported
if "eng_Latn" in supported_langs:
print("English (Latin script) is supported!")Languages follow the format {language_code}_{script}, for example eng_Latn - English (Latin script), cmn_Hans - Mandarin Chinese (Simplified), ...
Training
To further finetune the released checkpoints on your own data, use our data preparation guide followed by the finetuning recipe guide.
Citation
BibTeX:
@misc{omnilingualasr2025,
title={{Omnilingual ASR}: Open-Source Multilingual Speech Recognition for 1600+ Languages},
author={{Omnilingual ASR Team} and Keren, Gil and Kozhevnikov, Artyom and Meng, Yen and Ropers, Christophe and Setzler, Matthew and Wang, Skyler and Adebara, Ife and Auli, Michael and Can, Balioglu and Chan, Kevin and Cheng, Chierh and Chuang, Joe and Droof, Caley and Duppenthaler, Mark and Duquenne, Paul-Ambroise and Erben, Alexander and Gao, Cynthia and Mejia Gonzalez, Gabriel and Lyu, Kehan and Miglani, Sagar and Pratap, Vineel and Sadagopan, Kaushik Ram and Saleem, Safiyyah and Turkatenko, Arina and Ventayol-Boada, Albert and Yong, Zheng-Xin and Chung, Yu-An and Maillard, Jean and Moritz, Rashel and Mourachko, Alexandre and Williamson, Mary and Yates, Shireen},
year={2025},
url={https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/},
}- Developed by: Meta AI / Omnilingual ASR Team([GitHub][1])
- Model type: End-to-end automatic speech recognition model (wav2vec2-style encoder with CTC head / encoder-decoder, depending on checkpoint).
- Language(s) (NLP): 1,600+ languages overall in Omnilingual ASR; this corpus release specifically covers 348 under-served languages across many writing systems (Latin, Arabic, Devanagari, etc.).([GitHub][1])
- License: Apache-2.0 (for the model and code), CC-BY-4.0 for the
facebook/omnilingual-asr-corpusdataset.([GitHub][1])
[1]: https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file "GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages"
[2]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus/blob/main/README.md "README.md · facebook/omnilingual-asr-corpus at main"
[3]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus?utm_source=chatgpt.com "facebook/omnilingual-asr-corpus · Datasets at ..."
[4]: https://venturebeat.com/ai/meta-returns-to-open-source-ai-with-omnilingual-asr-models-that-can?utm_source=chatgpt.com "Meta returns to open source AI with Omnilingual ASR ..."
[5]: https://huggingface.co/spaces/facebook/omniasr-transcriptions?utm_source=chatgpt.com "Omnilingual ASR Media Transcription"
[6]: https://huggingface.co/collections/bezzam/omnilingual-asr-1-600-languages?utm_source=chatgpt.com "Omnilingual ASR (1600+ Languages) - a bezzam Collection"
https://huggingface.co/facebook/omniASR-CTC-7Bhttps://huggingface.co/facebook/omniASR-CTC-7BThu, 27 Nov 2025 21:31:20 GMT<title>facebook/omniASR-CTC-3B</title>facebook/omniASR-CTC-3B Automatic Speech Recognition • Updated 8 days ago • 2
license: apache-2.0
datasets:
- facebook/omnilingual-asr-corpus
pipeline_tag: automatic-speech-recognition
Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
<div align="center" style="lline-height: 1.2; font-size:16px; margin-bottom: 30px;">
<a href="https://huggingface.co/facebook" target="_blank" style="margin: 2px;">
🤗 Hugging Face
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr" target="_blank" style="margin: 2px;">
🐙 GitHub
</a> |
<a href="https://huggingface.co/spaces/facebook/omniasr-transcriptions" target="_blank" style="margin: 2px;">
🤖️ Demo
</a> |
<a href="https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/" target="_blank" style="margin: 2px;">
📃 Paper
</a> |
<a href="https://ai.meta.com/blog/omnilingual-asr-advancing-automatic-speech-recognition/" target="_blank" style="margin: 2px;">
📝 Blogpost
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr/blob/main/LICENSE" style="margin: 2px;">
📄 License: Apache 2.0
</a>
</div>
Model Card for omniASR-CTC-3B
Model Description
This model is part of the Omnilingual ASR family released by Meta AI. The original suite includes:
<!-- TODO : add new tokenizer, we'll get two tokenizer, add mssing speed numbers-->
| Model Name | Features | Parameters | Download Size (FP32) | Inference VRAM¹ | Real-Time Factor¹ (relative speed)² |
|---|---|---|---|---|---|
omniASR_W2V_300M |
SSL | 317_390_592 | 1.2 GiB | ||
omniASR_W2V_1B |
SSL | 965_514_752 | 3.6 GiB | ||
omniASR_W2V_3B |
SSL | 3_064_124_672 | 12.0 GiB | ||
omniASR_W2V_7B |
SSL | 6_488_487_168 | 25.0 GiB | ||
omniASR_CTC_300M |
ASR | 325_494_996 | 1.3 GiB | ~2 GiB | 0.001 (96x) |
omniASR_CTC_1B |
ASR | 975_065_300 | 3.7 GiB | ~3 GiB | 0.002 (48x) |
omniASR_CTC_3B |
ASR | 3_080_423_636 | 12.0 GiB | ~8 GiB | 0.003 (32x) |
omniASR_CTC_7B |
ASR | 6_504_786_132 | 25.0 GiB | ~15 GiB | 0.006 (16x) |
omniASR_LLM_300M |
ASR with optional language conditioning | 1_627_603_584 | 6.1 GiB | ~5 GiB | 0.090 (~1x) |
omniASR_LLM_1B |
ASR with optional language conditioning | 2_275_710_592 | 8.5 GiB | ~6 GiB | 0.091 (~1x) |
omniASR_LLM_3B |
ASR with optional language conditioning | 4_376_679_040 | 17.0 GiB | ~10 GiB | 0.093 (~1x) |
omniASR_LLM_7B |
ASR with optional language conditioning | 7_801_041_536 | 30.0 GiB | ~17 GiB | 0.092 (~1x) |
omniASR_LLM_7B_ZS |
Zero-Shot ASR | 7_810_900_608 | 30.0 GiB | ~20 GiB | 0.194 (~0.5x) |
| ¹ (batch=1, audio_len=30s, BF16, A100) | |||||
² Relative speed to omniASR_LLM_7B |
Installation
The models were developed using fairseq2, a research-focused sequence modeling toolkit. While we provide a reference inference pipeline that works across platforms, audio support requires libsndfile (Mac: brew install libsndfile; Windows may need an additional setup).
# using pip
pip install omnilingual-asr
# using uv
uv add omnilingual-asrInference
from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")
audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
lang = ["eng_Latn", "deu_Latn"]
transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)Supported Languages
To view the full list of 1600+ supported languages, you can access the language list programmatically:
from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs
# Print all supported languages
print(f"Total supported languages: {len(supported_langs)}")
print(supported_langs)
# Check if a specific language is supported
if "eng_Latn" in supported_langs:
print("English (Latin script) is supported!")Languages follow the format {language_code}_{script}, for example eng_Latn - English (Latin script), cmn_Hans - Mandarin Chinese (Simplified), ...
Training
To further finetune the released checkpoints on your own data, use our data preparation guide followed by the finetuning recipe guide.
Citation
BibTeX:
@misc{omnilingualasr2025,
title={{Omnilingual ASR}: Open-Source Multilingual Speech Recognition for 1600+ Languages},
author={{Omnilingual ASR Team} and Keren, Gil and Kozhevnikov, Artyom and Meng, Yen and Ropers, Christophe and Setzler, Matthew and Wang, Skyler and Adebara, Ife and Auli, Michael and Can, Balioglu and Chan, Kevin and Cheng, Chierh and Chuang, Joe and Droof, Caley and Duppenthaler, Mark and Duquenne, Paul-Ambroise and Erben, Alexander and Gao, Cynthia and Mejia Gonzalez, Gabriel and Lyu, Kehan and Miglani, Sagar and Pratap, Vineel and Sadagopan, Kaushik Ram and Saleem, Safiyyah and Turkatenko, Arina and Ventayol-Boada, Albert and Yong, Zheng-Xin and Chung, Yu-An and Maillard, Jean and Moritz, Rashel and Mourachko, Alexandre and Williamson, Mary and Yates, Shireen},
year={2025},
url={https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/},
}- Developed by: Meta AI / Omnilingual ASR Team([GitHub][1])
- Model type: End-to-end automatic speech recognition model (wav2vec2-style encoder with CTC head / encoder-decoder, depending on checkpoint).
- Language(s) (NLP): 1,600+ languages overall in Omnilingual ASR; this corpus release specifically covers 348 under-served languages across many writing systems (Latin, Arabic, Devanagari, etc.).([GitHub][1])
- License: Apache-2.0 (for the model and code), CC-BY-4.0 for the
facebook/omnilingual-asr-corpusdataset.([GitHub][1])
[1]: https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file "GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages"
[2]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus/blob/main/README.md "README.md · facebook/omnilingual-asr-corpus at main"
[3]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus?utm_source=chatgpt.com "facebook/omnilingual-asr-corpus · Datasets at ..."
[4]: https://venturebeat.com/ai/meta-returns-to-open-source-ai-with-omnilingual-asr-models-that-can?utm_source=chatgpt.com "Meta returns to open source AI with Omnilingual ASR ..."
[5]: https://huggingface.co/spaces/facebook/omniasr-transcriptions?utm_source=chatgpt.com "Omnilingual ASR Media Transcription"
[6]: https://huggingface.co/collections/bezzam/omnilingual-asr-1-600-languages?utm_source=chatgpt.com "Omnilingual ASR (1600+ Languages) - a bezzam Collection"
https://huggingface.co/facebook/omniASR-CTC-3Bhttps://huggingface.co/facebook/omniASR-CTC-3BThu, 27 Nov 2025 15:47:08 GMT<title>facebook/omniASR-CTC-1B</title>facebook/omniASR-CTC-1B Automatic Speech Recognition • Updated 8 days ago • 1
license: apache-2.0
datasets:
- facebook/omnilingual-asr-corpus
pipeline_tag: automatic-speech-recognition
Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
<div align="center" style="lline-height: 1.2; font-size:16px; margin-bottom: 30px;">
<a href="https://huggingface.co/facebook" target="_blank" style="margin: 2px;">
🤗 Hugging Face
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr" target="_blank" style="margin: 2px;">
🐙 GitHub
</a> |
<a href="https://huggingface.co/spaces/facebook/omniasr-transcriptions" target="_blank" style="margin: 2px;">
🤖️ Demo
</a> |
<a href="https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/" target="_blank" style="margin: 2px;">
📃 Paper
</a> |
<a href="https://ai.meta.com/blog/omnilingual-asr-advancing-automatic-speech-recognition/" target="_blank" style="margin: 2px;">
📝 Blogpost
</a> |
<a href="https://github.com/faceb
</details>
...
http://localhost:1200/huggingface/models/ianyang02 - Success ✔️<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
<channel>
<title>Huggingface ianyang02 Models</title>
<link>https://huggingface.co/ianyang02/models?sort=created</link>
<atom:link href="http://localhost:1200/huggingface/models/ianyang02" rel="self" type="application/rss+xml"></atom:link>
<description>Huggingface ianyang02 Models - Powered by RSSHub</description>
<generator>RSSHub</generator>
<webMaster>contact@rsshub.app (RSSHub)</webMaster>
<language>en</language>
<lastBuildDate>Sat, 06 Dec 2025 11:25:13 GMT</lastBuildDate>
<ttl>5</ttl>
<item>
<title>ianyang02/aita_qwen3-30b</title>
<description>ianyang02/aita_qwen3-30b Updated about 13 hours ago</description>
<link>https://huggingface.co/ianyang02/aita_qwen3-30b</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/aita_qwen3-30b</guid>
<pubDate>Fri, 05 Dec 2025 14:17:52 GMT</pubDate>
</item>
<item>
<title>ianyang02/aita_qwen3-4b_dpo</title>
<description>ianyang02/aita_qwen3-4b_dpo Updated 15 days ago
---
base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: transformers
model_name: aita_qwen3-4b_dpo
tags:
- generated_from_trainer
- trl
- dpo
licence: license
---
# Model Card for aita_qwen3-4b_dpo
This model is a fine-tuned version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507).
It has been trained using [TRL](https://github.com/huggingface/trl).
## Quick start
```python
from transformers import pipeline
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="ianyang02/aita_qwen3-4b_dpo", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
```
## Training procedure
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights &amp; Biases" width="150" height="24" referrerpolicy="no-referrer">](https://wandb.ai/ianyang02-university-of-washington/huggingface/runs/4xnikck8)
This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
### Framework versions
- TRL: 0.24.0
- Transformers: 4.57.1
- Pytorch: 2.9.0
- Datasets: 4.3.0
- Tokenizers: 0.22.1
## Citations
Cite DPO as:
```bibtex
@inproceedings{rafailov2023direct,
title = {{Direct Preference Optimization: Your Language Model is Secretly a Reward Model}},
author = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Christopher D. Manning and Stefano Ermon and Chelsea Finn},
year = 2023,
booktitle = {Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023},
url = {http://papers.nips.cc/paper_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html},
editor = {Alice Oh and Tristan Naumann and Amir Globerson and Kate Saenko and Moritz Hardt and Sergey Levine},
}
```
Cite TRL as:
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
```</description>
<link>https://huggingface.co/ianyang02/aita_qwen3-4b_dpo</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/aita_qwen3-4b_dpo</guid>
<pubDate>Fri, 21 Nov 2025 15:09:21 GMT</pubDate>
</item>
<item>
<title>ianyang02/ppo_model_qwen3-4b_aita_h200_2</title>
<description>ianyang02/ppo_model_qwen3-4b_aita_h200_2 Text Generation • Updated 15 days ago • 66
---
base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: peft
model_name: ppo_model_qwen3-4b_aita_h200_2
tags:
- base_model:adapter:Qwen/Qwen3-4B-Instruct-2507
- lora
- transformers
licence: license
pipeline_tag: text-generation
---
# Model Card for ppo_model_qwen3-4b_aita_h200_2
This model is a fine-tuned version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507).
It has been trained using [TRL](https://github.com/huggingface/trl).
## Quick start
```python
from transformers import pipeline
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="ianyang02/ppo_model_qwen3-4b_aita_h200_2", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
```
## Training procedure
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights &amp; Biases" width="150" height="24" referrerpolicy="no-referrer">](https://wandb.ai/ianyang02-university-of-washington/huggingface/runs/x99g8ppd)
This model was trained with PPO, a method introduced in [Fine-Tuning Language Models from Human Preferences](https://huggingface.co/papers/1909.08593).
### Framework versions
- PEFT 0.18.0
- TRL: 0.25.1
- Transformers: 4.57.1
- Pytorch: 2.7.0.dev20250224+cu126
- Datasets: 4.4.1
- Tokenizers: 0.22.1
## Citations
Cite PPO as:
```bibtex
@article{mziegler2019fine-tuning,
title = {{Fine-Tuning Language Models from Human Preferences}},
author = {Daniel M. Ziegler and Nisan Stiennon and Jeffrey Wu and Tom B. Brown and Alec Radford and Dario Amodei and Paul F. Christiano and Geoffrey Irving},
year = 2019,
eprint = {arXiv:1909.08593}
}
```
Cite TRL as:
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
```</description>
<link>https://huggingface.co/ianyang02/ppo_model_qwen3-4b_aita_h200_2</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/ppo_model_qwen3-4b_aita_h200_2</guid>
<pubDate>Fri, 21 Nov 2025 08:19:14 GMT</pubDate>
</item>
<item>
<title>ianyang02/aita_qwen3-4b_2</title>
<description>ianyang02/aita_qwen3-4b_2 Text Classification • 4B • Updated 16 days ago • 23
---
base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: transformers
model_name: aita_qwen3-4b_2
tags:
- generated_from_trainer
- trl
- reward-trainer
licence: license
---
# Model Card for aita_qwen3-4b_2
This model is a fine-tuned version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507).
It has been trained using [TRL](https://github.com/huggingface/trl).
## Quick start
```python
from transformers import pipeline
text = "The capital of France is Paris."
rewarder = pipeline(model="ianyang02/aita_qwen3-4b_2", device="cuda")
output = rewarder(text)[0]
print(output["score"])
```
## Training procedure
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights &amp; Biases" width="150" height="24" referrerpolicy="no-referrer">](https://wandb.ai/ianyang02-university-of-washington/huggingface/runs/lacazue4)
This model was trained with Reward.
### Framework versions
- TRL: 0.25.1
- Transformers: 4.57.1
- Pytorch: 2.5.0.dev20240818+cu124
- Datasets: 4.4.1
- Tokenizers: 0.22.1
## Citations
Cite TRL as:
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
```</description>
<link>https://huggingface.co/ianyang02/aita_qwen3-4b_2</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/aita_qwen3-4b_2</guid>
<pubDate>Wed, 19 Nov 2025 23:02:33 GMT</pubDate>
</item>
<item>
<title>ianyang02/aita_qwen3-4b_2_checkpoint_2</title>
<description>ianyang02/aita_qwen3-4b_2_checkpoint_2 Text Classification • 4B • Updated 16 days ago • 25
---
base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: transformers
model_name: aita_qwen3-4b_2
tags:
- generated_from_trainer
- trl
- reward-trainer
licence: license
---
# Model Card for aita_qwen3-4b_2
This model is a fine-tuned version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507).
It has been trained using [TRL](https://github.com/huggingface/trl).
## Quick start
```python
from transformers import pipeline
text = "The capital of France is Paris."
rewarder = pipeline(model="ianyang02/aita_qwen3-4b_2", device="cuda")
output = rewarder(text)[0]
print(output["score"])
```
## Training procedure
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights &amp; Biases" width="150" height="24" referrerpolicy="no-referrer">](https://wandb.ai/ianyang02-university-of-washington/huggingface/runs/lacazue4)
This model was trained with Reward.
### Framework versions
- TRL: 0.25.1
- Transformers: 4.57.1
- Pytorch: 2.5.0.dev20240818+cu124
- Datasets: 4.4.1
- Tokenizers: 0.22.1
## Citations
Cite TRL as:
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
```</description>
<link>https://huggingface.co/ianyang02/aita_qwen3-4b_2_checkpoint_2</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/aita_qwen3-4b_2_checkpoint_2</guid>
<pubDate>Wed, 19 Nov 2025 19:52:00 GMT</pubDate>
</item>
<item>
<title>ianyang02/aita_qwen3-4b_2_checkpoint_1</title>
<description>ianyang02/aita_qwen3-4b_2_checkpoint_1 Text Classification • 4B • Updated 16 days ago • 4
---
base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: transformers
model_name: aita_qwen3-4b_2
tags:
- generated_from_trainer
- trl
- reward-trainer
licence: license
---
# Model Card for aita_qwen3-4b_2
This model is a fine-tuned version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507).
It has been trained using [TRL](https://github.com/huggingface/trl).
## Quick start
```python
from transformers import pipeline
text = "The capital of France is Paris."
rewarder = pipeline(model="ianyang02/aita_qwen3-4b_2", device="cuda")
output = rewarder(text)[0]
print(output["score"])
```
## Training procedure
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights &amp; Biases" width="150" height="24" referrerpolicy="no-referrer">](https://wandb.ai/ianyang02-university-of-washington/huggingface/runs/lacazue4)
This model was trained with Reward.
### Framework versions
- TRL: 0.25.1
- Transformers: 4.57.1
- Pytorch: 2.5.0.dev20240818+cu124
- Datasets: 4.4.1
- Tokenizers: 0.22.1
## Citations
Cite TRL as:
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
```</description>
<link>https://huggingface.co/ianyang02/aita_qwen3-4b_2_checkpoint_1</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/aita_qwen3-4b_2_checkpoint_1</guid>
<pubDate>Wed, 19 Nov 2025 16:40:34 GMT</pubDate>
</item>
<item>
<title>ianyang02/ppo_model_qwen3-4b_aita_h200</title>
<description>ianyang02/ppo_model_qwen3-4b_aita_h200 Updated 17 days ago
---
base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: transformers
model_name: ppo_model_qwen3-4b_aita_h200
tags:
- generated_from_trainer
- trl
- ppo
licence: license
---
# Model Card for ppo_model_qwen3-4b_aita_h200
This model is a fine-tuned version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507).
It has been trained using [TRL](https://github.com/huggingface/trl).
## Quick start
```python
from transformers import pipeline
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="ianyang02/ppo_model_qwen3-4b_aita_h200", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
```
## Training procedure
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights &amp; Biases" width="150" height="24" referrerpolicy="no-referrer">](https://wandb.ai/ianyang02-university-of-washington/huggingface/runs/ht40cr4d)
This model was trained with PPO, a method introduced in [Fine-Tuning Language Models from Human Preferences](https://huggingface.co/papers/1909.08593).
### Framework versions
- TRL: 0.24.0
- Transformers: 4.57.1
- Pytorch: 2.9.0
- Datasets: 4.3.0
- Tokenizers: 0.22.1
## Citations
Cite PPO as:
```bibtex
@article{mziegler2019fine-tuning,
title = {{Fine-Tuning Language Models from Human Preferences}},
author = {Daniel M. Ziegler and Nisan Stiennon and Jeffrey Wu and Tom B. Brown and Alec Radford and Dario Amodei and Paul F. Christiano and Geoffrey Irving},
year = 2019,
eprint = {arXiv:1909.08593}
}
```
Cite TRL as:
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
```</description>
<link>https://huggingface.co/ianyang02/ppo_model_qwen3-4b_aita_h200</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/ppo_model_qwen3-4b_aita_h200</guid>
<pubDate>Tue, 18 Nov 2025 16:21:51 GMT</pubDate>
</item>
<item>
<title>ianyang02/aita_qwen3-4b</title>
<description>ianyang02/aita_qwen3-4b Text Classification • 4B • Updated 20 days ago • 93
---
base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: transformers
model_name: aita_qwen3-4b
tags:
- generated_from_trainer
- trl
- reward-trainer
licence: license
---
# Model Card for aita_qwen3-4b
This model is a fine-tuned version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507).
It has been trained using [TRL](https://github.com/huggingface/trl).
## Quick start
```python
from transformers import pipeline
text = "The capital of France is Paris."
rewarder = pipeline(model="ianyang02/aita_qwen3-4b", device="cuda")
output = rewarder(text)[0]
print(output["score"])
```
## Training procedure
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights &amp; Biases" width="150" height="24" referrerpolicy="no-referrer">](https://wandb.ai/ianyang02-university-of-washington/huggingface/runs/ofxhxi45)
This model was trained with Reward.
### Framework versions
- TRL: 0.24.0
- Transformers: 4.57.1
- Pytorch: 2.9.0
- Datasets: 4.3.0
- Tokenizers: 0.22.1
## Citations
Cite TRL as:
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
```</description>
<link>https://huggingface.co/ianyang02/aita_qwen3-4b</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/aita_qwen3-4b</guid>
<pubDate>Sun, 16 Nov 2025 01:32:06 GMT</pubDate>
</item>
<item>
<title>ianyang02/aita_fine_tuned_Qwen3_0.6B</title>
<description>ianyang02/aita_fine_tuned_Qwen3_0.6B Updated 24 days ago</description>
<link>https://huggingface.co/ianyang02/aita_fine_tuned_Qwen3_0.6B</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/aita_fine_tuned_Qwen3_0.6B</guid>
<pubDate>Tue, 11 Nov 2025 16:18:49 GMT</pubDate>
</item>
<item>
<title>ianyang02/aita_Qwen3-0.6B</title>
<description>ianyang02/aita_Qwen3-0.6B Text Classification • 0.6B • Updated 28 days ago • 50
---
base_model: Qwen/Qwen3-0.6B
library_name: transformers
model_name: aita_Qwen3-0.6B
tags:
- generated_from_trainer
- reward-trainer
- trl
licence: license
---
# Model Card for aita_Qwen3-0.6B
This model is a fine-tuned version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B).
It has been trained using [TRL](https://github.com/huggingface/trl).
## Quick start
```python
from transformers import pipeline
text = "The capital of France is Paris."
rewarder = pipeline(model="ianyang02/aita_Qwen3-0.6B", device="cuda")
output = rewarder(text)[0]
print(output["score"])
```
## Training procedure
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights &amp; Biases" width="150" height="24" referrerpolicy="no-referrer">](https://wandb.ai/ianyang02-university-of-washington/huggingface/runs/7xwpy737)
This model was trained with Reward.
### Framework versions
- TRL: 0.24.0
- Transformers: 4.57.1
- Pytorch: 2.9.0
- Datasets: 4.3.0
- Tokenizers: 0.22.1
## Citations
Cite TRL as:
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
```</description>
<link>https://huggingface.co/ianyang02/aita_Qwen3-0.6B</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/aita_Qwen3-0.6B</guid>
<pubDate>Sat, 08 Nov 2025 07:48:49 GMT</pubDate>
</item>
</channel>
</rss> |
|
Successfully generated as following: http://localhost:1200/huggingface/models/deepseek-ai - Success ✔️<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
<channel>
<title>Huggingface deepseek-ai Models</title>
<link>https://huggingface.co/deepseek-ai/models?sort=created</link>
<atom:link href="http://localhost:1200/huggingface/models/deepseek-ai" rel="self" type="application/rss+xml"></atom:link>
<description>Huggingface deepseek-ai Models - Powered by RSSHub</description>
<generator>RSSHub</generator>
<webMaster>contact@rsshub.app (RSSHub)</webMaster>
<language>en</language>
<lastBuildDate>Sun, 07 Dec 2025 03:02:24 GMT</lastBuildDate>
<ttl>5</ttl>
<item>
<title>deepseek-ai/DeepSeek-V3.2-Speciale</title>
<description>deepseek-ai/DeepSeek-V3.2-Speciale Text Generation • 685B • Updated 6 days ago • 4.72k • 521<hr>
<p>license: mit
library_name: transformers
base_model:</p>
<ul>
<li>deepseek-ai/DeepSeek-V3.2-Exp-Base
base_model_relation: finetune</li>
</ul>
<hr>
<h1>DeepSeek-V3.2: Efficient Reasoning &amp; Agentic AI</h1>
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" referrerpolicy="no-referrer">
</div>
<hr>
<div align="center" style="line-height: 1;">
<a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Chat" src="https://img.shields.io/badge/%F0%9F%A4%96%20Chat-DeepSeek%20V3-536af5?color=536af5&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
<img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&amp;logoColor=white&amp;color=7289da" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
<img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
<img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://huggingface.co/deepseek-ai/LICENSE" style="margin: 2px;">
<img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&amp;color=f5de53" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<p align="center">
<a href="https://huggingface.co/deepseek-ai/DeepSeek-V3.2/blob/main/assets/paper.pdf"><b>Technical Report</b>👁️</a>
</p>
<h2>Introduction</h2>
<p>We introduce <strong>DeepSeek-V3.2</strong>, a model that harmonizes high computational efficiency with superior reasoning and agent performance. Our approach is built upon three key technical breakthroughs:</p>
<ol>
<li><strong>DeepSeek Sparse Attention (DSA):</strong> We introduce DSA, an efficient attention mechanism that substantially reduces computational complexity while preserving model performance, specifically optimized for long-context scenarios.</li>
<li><strong>Scalable Reinforcement Learning Framework:</strong> By implementing a robust RL protocol and scaling post-training compute, <em>DeepSeek-V3.2</em> performs comparably to GPT-5. Notably, our high-compute variant, <strong>DeepSeek-V3.2-Speciale</strong>, <strong>surpasses GPT-5</strong> and exhibits reasoning proficiency on par with Gemini-3.0-Pro.
<ul>
<li><em>Achievement:</em> 🥇 <strong>Gold-medal performance</strong> in the 2025 International Mathematical Olympiad (IMO) and International Olympiad in Informatics (IOI).</li>
</ul>
</li>
<li><strong>Large-Scale Agentic Task Synthesis Pipeline:</strong> To integrate <strong>reasoning into tool-use</strong> scenarios, we developed a novel synthesis pipeline that systematically generates training data at scale. This facilitates scalable agentic post-training, improving compliance and generalization in complex interactive environments.</li>
</ol>
<div align="center">
<img src="https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale/resolve/main/assets/benchmark.png" referrerpolicy="no-referrer">
</div>
<p>We have also released the final submissions for IOI 2025, ICPC World Finals, IMO 2025 and CMO 2025, which were selected based on our designed pipeline. These materials are provided for the community to conduct secondary verification. The files can be accessed at <code>assets/olympiad_cases</code>.</p>
<h2>Chat Template</h2>
<p>DeepSeek-V3.2 introduces significant updates to its chat template compared to prior versions. The primary changes involve a revised format for tool calling and the introduction of a "thinking with tools" capability.</p>
<p>To assist the community in understanding and adapting to this new template, we have provided a dedicated <code>encoding</code> folder, which contains Python scripts and test cases demonstrating how to encode messages in OpenAI-compatible format into input strings for the model and how to parse the model's text output.</p>
<p>A brief example is illustrated below:</p>
<pre><code class="language-python">import transformers
# encoding/encoding_dsv32.py
from encoding_dsv32 import encode_messages, parse_message_from_completion_text
tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V3.2")
messages = [
{"role": "user", "content": "hello"},
{"role": "assistant", "content": "Hello! I am DeepSeek.", "reasoning_content": "thinking..."},
{"role": "user", "content": "1+1=?"}
]
encode_config = dict(thinking_mode="thinking", drop_thinking=True, add_default_bos_token=True)
# messages -&gt; string
prompt = encode_messages(messages, **encode_config)
# Output: "&lt;|begin▁of▁sentence|&gt;&lt;|User|&gt;hello&lt;|Assistant|&gt;&lt;/think&gt;Hello! I am DeepSeek.&lt;|end▁of▁sentence|&gt;&lt;|User|&gt;1+1=?&lt;|Assistant|&gt;&lt;think&gt;"
# string -&gt; tokens
tokens = tokenizer.encode(prompt)
# Output: [0, 128803, 33310, 128804, 128799, 19923, 3, 342, 1030, 22651, 4374, 1465, 16, 1, 128803, 19, 13, 19, 127252, 128804, 128798]
</code></pre>
<p>Important Notes:</p>
<ol>
<li>This release does not include a Jinja-format chat template. Please refer to the Python code mentioned above.</li>
<li>The output parsing function included in the code is designed to handle well-formatted strings only. It does not attempt to correct or recover from malformed output that the model might occasionally generate. It is not suitable for production use without robust error handling.</li>
<li>A new role named <code>developer</code> has been introduced in the chat template. This role is dedicated exclusively to search agent scenarios and is designated for no other tasks. The official API does not accept messages assigned to <code>developer</code>.</li>
</ol>
<h2>How to Run Locally</h2>
<p>The model structure of DeepSeek-V3.2 and DeepSeek-V3.2-Speciale are the same as DeepSeek-V3.2-Exp. Please visit <a href="https://github.com/deepseek-ai/DeepSeek-V3.2-Exp">DeepSeek-V3.2-Exp</a> repo for more information about running this model locally.</p>
<p>Usage Recommendations:</p>
<ol>
<li>For local deployment, we recommend setting the sampling parameters to <code>temperature = 1.0, top_p = 0.95</code>.</li>
<li>Please note that the DeepSeek-V3.2-Speciale variant is designed exclusively for deep reasoning tasks and does not support the tool-calling functionality.</li>
</ol>
<h2>License</h2>
<p>This repository and the model weights are licensed under the <a href="https://huggingface.co/deepseek-ai/LICENSE">MIT License</a>.</p>
<h2>Citation</h2>
<pre><code>@misc{deepseekai2025deepseekv32,
title={DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models},
author={DeepSeek-AI},
year={2025},
}
</code></pre>
<h2>Contact</h2>
<p>If you have any questions, please raise an issue or contact us at <a href="https://huggingface.co/deepseek-ai/service@deepseek.com">service@deepseek.com</a>.</p>
</description>
<link>https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale</link>
<guid isPermaLink="false">https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale</guid>
<pubDate>Mon, 01 Dec 2025 03:06:03 GMT</pubDate>
</item>
<item>
<title>deepseek-ai/DeepSeek-V3.2</title>
<description>deepseek-ai/DeepSeek-V3.2 Text Generation • 685B • Updated 6 days ago • 18.1k • • 752<hr>
<p>license: mit
library_name: transformers
base_model:</p>
<ul>
<li>deepseek-ai/DeepSeek-V3.2-Exp-Base
base_model_relation: finetune</li>
</ul>
<hr>
<h1>DeepSeek-V3.2: Efficient Reasoning &amp; Agentic AI</h1>
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" referrerpolicy="no-referrer">
</div>
<hr>
<div align="center" style="line-height: 1;">
<a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Chat" src="https://img.shields.io/badge/%F0%9F%A4%96%20Chat-DeepSeek%20V3-536af5?color=536af5&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
<img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&amp;logoColor=white&amp;color=7289da" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
<img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
<img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://huggingface.co/deepseek-ai/LICENSE" style="margin: 2px;">
<img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&amp;color=f5de53" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<p align="center">
<a href="https://huggingface.co/deepseek-ai/DeepSeek-V3.2/blob/main/assets/paper.pdf"><b>Technical Report</b>👁️</a>
</p>
<h2>Introduction</h2>
<p>We introduce <strong>DeepSeek-V3.2</strong>, a model that harmonizes high computational efficiency with superior reasoning and agent performance. Our approach is built upon three key technical breakthroughs:</p>
<ol>
<li><strong>DeepSeek Sparse Attention (DSA):</strong> We introduce DSA, an efficient attention mechanism that substantially reduces computational complexity while preserving model performance, specifically optimized for long-context scenarios.</li>
<li><strong>Scalable Reinforcement Learning Framework:</strong> By implementing a robust RL protocol and scaling post-training compute, <em>DeepSeek-V3.2</em> performs comparably to GPT-5. Notably, our high-compute variant, <strong>DeepSeek-V3.2-Speciale</strong>, <strong>surpasses GPT-5</strong> and exhibits reasoning proficiency on par with Gemini-3.0-Pro.
<ul>
<li><em>Achievement:</em> 🥇 <strong>Gold-medal performance</strong> in the 2025 International Mathematical Olympiad (IMO) and International Olympiad in Informatics (IOI).</li>
</ul>
</li>
<li><strong>Large-Scale Agentic Task Synthesis Pipeline:</strong> To integrate <strong>reasoning into tool-use</strong> scenarios, we developed a novel synthesis pipeline that systematically generates training data at scale. This facilitates scalable agentic post-training, improving compliance and generalization in complex interactive environments.</li>
</ol>
<div align="center">
<img src="https://huggingface.co/deepseek-ai/DeepSeek-V3.2/resolve/main/assets/benchmark.png" referrerpolicy="no-referrer">
</div>
<p>We have also released the final submissions for IOI 2025, ICPC World Finals, IMO 2025 and CMO 2025, which were selected based on our designed pipeline. These materials are provided for the community to conduct secondary verification. The files can be accessed at <code>assets/olympiad_cases</code>.</p>
<h2>Chat Template</h2>
<p>DeepSeek-V3.2 introduces significant updates to its chat template compared to prior versions. The primary changes involve a revised format for tool calling and the introduction of a "thinking with tools" capability.</p>
<p>To assist the community in understanding and adapting to this new template, we have provided a dedicated <code>encoding</code> folder, which contains Python scripts and test cases demonstrating how to encode messages in OpenAI-compatible format into input strings for the model and how to parse the model's text output.</p>
<p>A brief example is illustrated below:</p>
<pre><code class="language-python">import transformers
# encoding/encoding_dsv32.py
from encoding_dsv32 import encode_messages, parse_message_from_completion_text
tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V3.2")
messages = [
{"role": "user", "content": "hello"},
{"role": "assistant", "content": "Hello! I am DeepSeek.", "reasoning_content": "thinking..."},
{"role": "user", "content": "1+1=?"}
]
encode_config = dict(thinking_mode="thinking", drop_thinking=True, add_default_bos_token=True)
# messages -&gt; string
prompt = encode_messages(messages, **encode_config)
# Output: "&lt;|begin▁of▁sentence|&gt;&lt;|User|&gt;hello&lt;|Assistant|&gt;&lt;/think&gt;Hello! I am DeepSeek.&lt;|end▁of▁sentence|&gt;&lt;|User|&gt;1+1=?&lt;|Assistant|&gt;&lt;think&gt;"
# string -&gt; tokens
tokens = tokenizer.encode(prompt)
# Output: [0, 128803, 33310, 128804, 128799, 19923, 3, 342, 1030, 22651, 4374, 1465, 16, 1, 128803, 19, 13, 19, 127252, 128804, 128798]
</code></pre>
<p>Important Notes:</p>
<ol>
<li>This release does not include a Jinja-format chat template. Please refer to the Python code mentioned above.</li>
<li>The output parsing function included in the code is designed to handle well-formatted strings only. It does not attempt to correct or recover from malformed output that the model might occasionally generate. It is not suitable for production use without robust error handling.</li>
<li>A new role named <code>developer</code> has been introduced in the chat template. This role is dedicated exclusively to search agent scenarios and is designated for no other tasks. The official API does not accept messages assigned to <code>developer</code>.</li>
</ol>
<h2>How to Run Locally</h2>
<p>The model structure of DeepSeek-V3.2 and DeepSeek-V3.2-Speciale are the same as DeepSeek-V3.2-Exp. Please visit <a href="https://github.com/deepseek-ai/DeepSeek-V3.2-Exp">DeepSeek-V3.2-Exp</a> repo for more information about running this model locally.</p>
<p>Usage Recommendations:</p>
<ol>
<li>For local deployment, we recommend setting the sampling parameters to <code>temperature = 1.0, top_p = 0.95</code>.</li>
<li>Please note that the DeepSeek-V3.2-Speciale variant is designed exclusively for deep reasoning tasks and does not support the tool-calling functionality.</li>
</ol>
<h2>License</h2>
<p>This repository and the model weights are licensed under the <a href="https://huggingface.co/deepseek-ai/LICENSE">MIT License</a>.</p>
<h2>Citation</h2>
<pre><code>@misc{deepseekai2025deepseekv32,
title={DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models},
author={DeepSeek-AI},
year={2025},
}
</code></pre>
<h2>Contact</h2>
<p>If you have any questions, please raise an issue or contact us at <a href="https://huggingface.co/deepseek-ai/service@deepseek.com">service@deepseek.com</a>.</p>
</description>
<link>https://huggingface.co/deepseek-ai/DeepSeek-V3.2</link>
<guid isPermaLink="false">https://huggingface.co/deepseek-ai/DeepSeek-V3.2</guid>
<pubDate>Mon, 01 Dec 2025 03:04:59 GMT</pubDate>
</item>
<item>
<title>deepseek-ai/DeepSeek-Math-V2</title>
<description>deepseek-ai/DeepSeek-Math-V2 Text Generation • 685B • Updated 10 days ago • 8.96k • 637<hr>
<p>license: apache-2.0
library_name: transformers
base_model:</p>
<ul>
<li>deepseek-ai/DeepSeek-Math-V2</li>
</ul>
<hr>
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" referrerpolicy="no-referrer">
</div>
<hr>
<div align="center" style="line-height: 1;">
<a href="https://www.deepseek.com/"><img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" referrerpolicy="no-referrer"></a>
<a href="https://chat.deepseek.com/"><img alt="Chat" src="https://img.shields.io/badge/%F0%9F%A4%96%20Chat-DeepSeek%20V3-536af5?color=536af5&amp;logoColor=white" referrerpolicy="no-referrer"></a>
<a href="https://huggingface.co/deepseek-ai"><img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&amp;logoColor=white" referrerpolicy="no-referrer"></a>
<br>
<a href="https://discord.gg/Tc7c45Zzu5"><img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&amp;logoColor=white&amp;color=7289da" referrerpolicy="no-referrer"></a>
<a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true"><img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&amp;logoColor=white" referrerpolicy="no-referrer"></a>
<a href="https://twitter.com/deepseek_ai"><img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&amp;logoColor=white" referrerpolicy="no-referrer"></a>
<br>
<a href="https://huggingface.co/deepseek-ai/LICENSE" style="margin: 2px;">
<img alt="License" src="https://img.shields.io/badge/License-Apache%202.0-f5de53?&amp;color=f5de53" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<br>
</div>
<h1>DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning</h1>
<h2>1. Introduction</h2>
<p>Large language models have made significant progress in mathematical reasoning, which serves as an important testbed for AI and could impact scientific research if further advanced.
By scaling reasoning with reinforcement learning that rewards correct final answers, LLMs have improved from poor performance to saturating quantitative reasoning competitions like AIME and HMMT in one year.
However, this approach faces fundamental limitations.
Pursuing higher final answer accuracy doesn't address a key issue: correct answers don't guarantee correct reasoning.
Moreover, many mathematical tasks like theorem proving require rigorous step-by-step derivation rather than numerical answers, making final answer rewards inapplicable.
To push the limits of deep reasoning, we believe it is necessary to verify the comprehensiveness and rigor of mathematical reasoning.
Self-verification is particularly important for scaling test-time compute, especially for open problems without known solutions.
Towards self-verifiable mathematical reasoning, we investigate how to train an accurate and faithful LLM-based verifier for theorem proving.
We then train a proof generator using the verifier as the reward model, and incentivize the generator to identify and resolve as many issues as possible in their own proofs before finalizing them.
To maintain the generation-verification gap as the generator becomes stronger, we propose to scale verification compute to automatically label new hard-to-verify proofs, creating training data to further improve the verifier.
Our resulting model, DeepSeekMath-V2, demonstrates strong theorem-proving capabilities, achieving gold-level scores on IMO 2025 and CMO 2024 and a near-perfect 118/120 on Putnam 2024 with scaled test-time compute.
While much work remains, these results suggest that self-verifiable mathematical reasoning is a feasible research direction that may help develop more capable mathematical AI systems.</p>
<h2>2. Evaluation Results</h2>
<p>Below are evaluation results on <a href="https://github.com/google-deepmind/superhuman/tree/main/imobench">IMO-ProofBench</a> (developed by the DeepMind team behind DeepThink IMO-Gold) and recent mathematics competitions including IMO 2025, CMO 2024, and Putnam 2024.</p>
<p><strong>IMO-ProofBench</strong></p>
<p align="center">
<img width="100%" src="https://raw.githubusercontent.com/deepseek-ai/DeepSeek-Math-V2/refs/heads/main/figures/IMO-ProofBench.png" referrerpolicy="no-referrer">
</p>
<hr>
<p><strong>Mathematics Competitions</strong></p>
<p align="center">
<img width="41%&quot;" src="https://raw.githubusercontent.com/deepseek-ai/DeepSeek-Math-V2/refs/heads/main/figures/Competitions.png" referrerpolicy="no-referrer">
</p>
<h2>4. Quick Start</h2>
<p>DeepSeekMath-V2 is built on top of DeepSeek-V3.2-Exp-Base.
For inference support, please refer to <a href="https://github.com/deepseek-ai/DeepSeek-V3.2-Exp">the DeepSeek-V3.2-Exp github repository</a>.</p>
<h2>6. License</h2>
<p>This repository and the model weights are licensed under <a href="https://huggingface.co/deepseek-ai/LICENSE">the Apache License, Version 2.0 (Apache 2.0)</a>.</p>
<h2>7. Citation</h2>
<pre><code>@misc{deepseek-math-v2,
author = {Zhihong Shao, Yuxiang Luo, Chengda Lu, Z.Z. Ren, Jiewen Hu, Tian Ye, Zhibin Gou, Shirong Ma, Xiaokang Zhang},
title = {DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning},
year = {2025},
}
</code></pre>
<h2>8. Contact</h2>
<p>If you have any questions, please raise an issue or contact us at <a href="mailto:service@deepseek.com">service@deepseek.com</a>.</p>
</description>
<link>https://huggingface.co/deepseek-ai/DeepSeek-Math-V2</link>
<guid isPermaLink="false">https://huggingface.co/deepseek-ai/DeepSeek-Math-V2</guid>
<pubDate>Thu, 27 Nov 2025 02:35:52 GMT</pubDate>
</item>
<item>
<title>deepseek-ai/DeepSeek-V3.2-Exp</title>
<description>deepseek-ai/DeepSeek-V3.2-Exp Text Generation • 685B • Updated 19 days ago • 66.2k • • 900<hr>
<p>license: mit
library_name: transformers
base_model:</p>
<ul>
<li>deepseek-ai/DeepSeek-V3.2-Exp-Base
base_model_relation: finetune</li>
</ul>
<hr>
<h1>DeepSeek-V3.2-Exp</h1>
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" referrerpolicy="no-referrer">
</div>
<hr>
<div align="center" style="line-height: 1;">
<a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Chat" src="https://img.shields.io/badge/%F0%9F%A4%96%20Chat-DeepSeek%20V3-536af5?color=536af5&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
<img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&amp;logoColor=white&amp;color=7289da" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
<img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
<img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://huggingface.co/deepseek-ai/LICENSE" style="margin: 2px;">
<img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&amp;color=f5de53" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<h2>Introduction</h2>
<p>We are excited to announce the official release of DeepSeek-V3.2-Exp, an experimental version of our model. As an intermediate step toward our next-generation architecture, V3.2-Exp builds upon V3.1-Terminus by introducing DeepSeek Sparse Attention—a sparse attention mechanism designed to explore and validate optimizations for training and inference efficiency in long-context scenarios.</p>
<p>This experimental release represents our ongoing research into more efficient transformer architectures, particularly focusing on improving computational efficiency when processing extended text sequences.</p>
<div align="center">
<img src="https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp/resolve/main/assets/cost.png" referrerpolicy="no-referrer">
</div>
<ul>
<li>
<p>DeepSeek Sparse Attention (DSA) achieves fine-grained sparse attention for the first time, delivering substantial improvements in long-context training and inference efficiency while maintaining virtually identical model output quality.</p>
</li>
<li>
<p>To rigorously evaluate the impact of introducing sparse attention, we deliberately aligned the training configurations of DeepSeek-V3.2-Exp with V3.1-Terminus. Across public benchmarks in various domains, DeepSeek-V3.2-Exp demonstrates performance on par with V3.1-Terminus.</p>
</li>
</ul>
<table>
<thead>
<tr>
<th style="text-align:left">Benchmark</th>
<th style="text-align:center">DeepSeek-V3.1-Terminus</th>
<th style="text-align:center">DeepSeek-V3.2-Exp</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left"><strong>Reasoning Mode w/o Tool Use</strong></td>
<td style="text-align:center"></td>
<td style="text-align:center"></td>
</tr>
<tr>
<td style="text-align:left">MMLU-Pro</td>
<td style="text-align:center">85.0</td>
<td style="text-align:center">85.0</td>
</tr>
<tr>
<td style="text-align:left">GPQA-Diamond</td>
<td style="text-align:center">80.7</td>
<td style="text-align:center">79.9</td>
</tr>
<tr>
<td style="text-align:left">Humanity's Last Exam</td>
<td style="text-align:center">21.7</td>
<td style="text-align:center">19.8</td>
</tr>
<tr>
<td style="text-align:left">LiveCodeBench</td>
<td style="text-align:center">74.9</td>
<td style="text-align:center">74.1</td>
</tr>
<tr>
<td style="text-align:left">AIME 2025</td>
<td style="text-align:center">88.4</td>
<td style="text-align:center">89.3</td>
</tr>
<tr>
<td style="text-align:left">HMMT 2025</td>
<td style="text-align:center">86.1</td>
<td style="text-align:center">83.6</td>
</tr>
<tr>
<td style="text-align:left">Codeforces</td>
<td style="text-align:center">2046</td>
<td style="text-align:center">2121</td>
</tr>
<tr>
<td style="text-align:left">Aider-Polyglot</td>
<td style="text-align:center">76.1</td>
<td style="text-align:center">74.5</td>
</tr>
<tr>
<td style="text-align:left"><strong>Agentic Tool Use</strong></td>
<td style="text-align:center"></td>
<td style="text-align:center"></td>
</tr>
<tr>
<td style="text-align:left">BrowseComp</td>
<td style="text-align:center">38.5</td>
<td style="text-align:center">40.1</td>
</tr>
<tr>
<td style="text-align:left">BrowseComp-zh</td>
<td style="text-align:center">45.0</td>
<td style="text-align:center">47.9</td>
</tr>
<tr>
<td style="text-align:left">SimpleQA</td>
<td style="text-align:center">96.8</td>
<td style="text-align:center">97.1</td>
</tr>
<tr>
<td style="text-align:left">SWE Verified</td>
<td style="text-align:center">68.4</td>
<td style="text-align:center">67.8</td>
</tr>
<tr>
<td style="text-align:left">SWE-bench Multilingual</td>
<td style="text-align:center">57.8</td>
<td style="text-align:center">57.9</td>
</tr>
<tr>
<td style="text-align:left">Terminal-bench</td>
<td style="text-align:center">36.7</td>
<td style="text-align:center">37.7</td>
</tr>
</tbody>
</table>
<h2>Update</h2>
<ul>
<li>2025.11.17: <strong>We have identified that previous versions of the inference demo code contained an implementation discrepancy in Rotary Position Embedding (RoPE) within the indexer module, potentially leading to degraded model performance.</strong> Specifically, the input tensor to RoPE in the indexer module requires a non-interleaved layout, whereas RoPE in the MLA module expects an interleaved layout. This issue has now been resolved. Please refer to the updated version of the inference demo code and take note of this implementation detail.</li>
</ul>
<h2>How to Run Locally</h2>
<h3>HuggingFace</h3>
<p>We provide an updated inference demo code in the <a href="https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp/tree/main/inference">inference</a> folder to help the community quickly get started with our model and understand its architectural details.</p>
<p>First convert huggingface model weights to the the format required by our inference demo. Set <code>MP</code> to match your available GPU count:</p>
<pre><code class="language-bash">cd inference
export EXPERTS=256
python convert.py --hf-ckpt-path ${HF_CKPT_PATH} --save-path ${SAVE_PATH} --n-experts ${EXPERTS} --model-parallel ${MP}
</code></pre>
<p>Launch the interactive chat interface and start exploring DeepSeek's capabilities:</p>
<pre><code class="language-bash">export CONFIG=config_671B_v3.2.json
torchrun --nproc-per-node ${MP} generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --interactive
</code></pre>
<h3>SGLang</h3>
<h4>Installation with Docker</h4>
<pre><code># H200
docker pull lmsysorg/sglang:dsv32
# MI350
docker pull lmsysorg/sglang:dsv32-rocm
# NPUs
docker pull lmsysorg/sglang:dsv32-a2
docker pull lmsysorg/sglang:dsv32-a3
</code></pre>
<h4>Launch Command</h4>
<pre><code class="language-bash">python -m sglang.launch_server --model deepseek-ai/DeepSeek-V3.2-Exp --tp 8 --dp 8 --enable-dp-attention
</code></pre>
<h3>vLLM</h3>
<p>vLLM provides day-0 support of DeepSeek-V3.2-Exp. See the <a href="https://docs.vllm.ai/projects/recipes/en/latest/DeepSeek/DeepSeek-V3_2-Exp.html">recipes</a> for up-to-date details.</p>
<h2>Open-Source Kernels</h2>
<p>For TileLang kernels with <strong>better readability and research-purpose design</strong>, please refer to <a href="https://github.com/tile-ai/tilelang/tree/main/examples/deepseek_v32">TileLang</a>.</p>
<p>For <strong>high-performance CUDA kernels</strong>, indexer logit kernels (including paged versions) are available in <a href="https://github.com/deepseek-ai/DeepGEMM/pull/200">DeepGEMM</a>. Sparse attention kernels are released in <a href="https://github.com/deepseek-ai/FlashMLA/pull/98">FlashMLA</a>.</p>
<h2>License</h2>
<p>This repository and the model weights are licensed under the <a href="https://huggingface.co/deepseek-ai/LICENSE">MIT License</a>.</p>
<h2>Citation</h2>
<pre><code>@misc{deepseekai2024deepseekv32,
title={DeepSeek-V3.2-Exp: Boosting Long-Context Efficiency with DeepSeek Sparse Attention},
author={DeepSeek-AI},
year={2025},
}
</code></pre>
<h2>Contact</h2>
<p>If you have any questions, please raise an issue or contact us at <a href="https://huggingface.co/deepseek-ai/service@deepseek.com">service@deepseek.com</a>.</p>
</description>
<link>https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp</link>
<guid isPermaLink="false">https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp</guid>
<pubDate>Mon, 17 Nov 2025 18:39:54 GMT</pubDate>
</item>
<item>
<title>deepseek-ai/DeepSeek-OCR</title>
<description>deepseek-ai/DeepSeek-OCR Image-Text-to-Text • 3B • Updated Nov 4 • 5.47M • 2.93k<hr>
<p>pipeline_tag: image-text-to-text
language:</p>
<ul>
<li>multilingual
tags:</li>
<li>deepseek</li>
<li>vision-language</li>
<li>ocr</li>
<li>custom_code
license: mit
library_name: transformers</li>
</ul>
<hr>
<div align="center">
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek AI" referrerpolicy="no-referrer">
</div>
<hr>
<div align="center">
<a href="https://www.deepseek.com/" target="_blank">
<img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" referrerpolicy="no-referrer">
</a>
<a href="https://huggingface.co/deepseek-ai/DeepSeek-OCR" target="_blank">
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&amp;logoColor=white" referrerpolicy="no-referrer">
</a>
</div>
<div align="center">
<a href="https://discord.gg/Tc7c45Zzu5" target="_blank">
<img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&amp;logoColor=white&amp;color=7289da" referrerpolicy="no-referrer">
</a>
<a href="https://twitter.com/deepseek_ai" target="_blank">
<img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&amp;logoColor=white" referrerpolicy="no-referrer">
</a>
</div>
<p align="center">
<a href="https://github.com/deepseek-ai/DeepSeek-OCR"><b>🌟 Github</b></a> |
<a href="https://huggingface.co/deepseek-ai/DeepSeek-OCR"><b>📥 Model Download</b></a> |
<a href="https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSeek_OCR_paper.pdf"><b>📄 Paper Link</b></a> |
<a href="https://arxiv.org/abs/2510.18234"><b>📄 Arxiv Paper Link</b></a> |
</p>
<h2>
<p align="center">
<a href="https://huggingface.co/papers/2510.18234">DeepSeek-OCR: Contexts Optical Compression</a>
</p>
</h2>
<p align="center">
<img src="https://huggingface.co/deepseek-ai/DeepSeek-OCR/resolve/main/assets/fig1.png" style="width: 1000px" align="center" referrerpolicy="no-referrer">
</p>
<p align="center">
<a href="https://huggingface.co/papers/2510.18234">Explore the boundaries of visual-text compression.</a>
</p>
<h2>Usage</h2>
<p>Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.12.9 + CUDA11.8:</p>
<pre><code>torch==2.6.0
transformers==4.46.3
tokenizers==0.20.3
einops
addict
easydict
pip install flash-attn==2.7.3 --no-build-isolation
</code></pre>
<pre><code class="language-python">from transformers import AutoModel, AutoTokenizer
import torch
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '0'
model_name = 'deepseek-ai/DeepSeek-OCR'
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, _attn_implementation='flash_attention_2', trust_remote_code=True, use_safetensors=True)
model = model.eval().cuda().to(torch.bfloat16)
# prompt = "&lt;image&gt;\nFree OCR. "
prompt = "&lt;image&gt;\n&lt;|grounding|&gt;Convert the document to markdown. "
image_file = 'your_image.jpg'
output_path = 'your/output/dir'
# infer(self, tokenizer, prompt='', image_file='', output_path = ' ', base_size = 1024, image_size = 640, crop_mode = True, test_compress = False, save_results = False):
# Tiny: base_size = 512, image_size = 512, crop_mode = False
# Small: base_size = 640, image_size = 640, crop_mode = False
# Base: base_size = 1024, image_size = 1024, crop_mode = False
# Large: base_size = 1280, image_size = 1280, crop_mode = False
# Gundam: base_size = 1024, image_size = 640, crop_mode = True
res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = True)
</code></pre>
<h2>vLLM</h2>
<p>Refer to <a href="https://github.com/deepseek-ai/DeepSeek-OCR/">🌟GitHub</a> for guidance on model inference acceleration and PDF processing, etc.<!-- --></p>
<p>[2025/10/23] 🚀🚀🚀 DeepSeek-OCR is now officially supported in upstream <a href="https://docs.vllm.ai/projects/recipes/en/latest/DeepSeek/DeepSeek-OCR.html#installing-vllm">vLLM</a>.</p>
<pre><code class="language-shell">uv venv
source .venv/bin/activate
# Until v0.11.1 release, you need to install vLLM from nightly build
uv pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
</code></pre>
<pre><code class="language-python">from vllm import LLM, SamplingParams
from vllm.model_executor.models.deepseek_ocr import NGramPerReqLogitsProcessor
from PIL import Image
# Create model instance
llm = LLM(
model="deepseek-ai/DeepSeek-OCR",
enable_prefix_caching=False,
mm_processor_cache_gb=0,
logits_processors=[NGramPerReqLogitsProcessor]
)
# Prepare batched input with your image file
image_1 = Image.open("path/to/your/image_1.png").convert("RGB")
image_2 = Image.open("path/to/your/image_2.png").convert("RGB")
prompt = "&lt;image&gt;\nFree OCR."
model_input = [
{
"prompt": prompt,
"multi_modal_data": {"image": image_1}
},
{
"prompt": prompt,
"multi_modal_data": {"image": image_2}
}
]
sampling_param = SamplingParams(
temperature=0.0,
max_tokens=8192,
# ngram logit processor args
extra_args=dict(
ngram_size=30,
window_size=90,
whitelist_token_ids={128821, 128822}, # whitelist: &lt;td&gt;, &lt;/td&gt;
),
skip_special_tokens=False,
)
# Generate output
model_outputs = llm.generate(model_input, sampling_param)
# Print output
for output in model_outputs:
print(output.outputs[0].text)
</code></pre>
<h2>Visualizations</h2>
<table>
<tbody><tr>
<td><img src="https://huggingface.co/deepseek-ai/DeepSeek-OCR/resolve/main/assets/show1.jpg" style="width: 500px" referrerpolicy="no-referrer"></td>
<td><img src="https://huggingface.co/deepseek-ai/DeepSeek-OCR/resolve/main/assets/show2.jpg" style="width: 500px" referrerpolicy="no-referrer"></td>
</tr>
<tr>
<td><img src="https://huggingface.co/deepseek-ai/DeepSeek-OCR/resolve/main/assets/show3.jpg" style="width: 500px" referrerpolicy="no-referrer"></td>
<td><img src="https://huggingface.co/deepseek-ai/DeepSeek-OCR/resolve/main/assets/show4.jpg" style="width: 500px" referrerpolicy="no-referrer"></td>
</tr>
</tbody></table>
<h2>Acknowledgement</h2>
<p>We would like to thank <a href="https://github.com/Ucas-HaoranWei/Vary/">Vary</a>, <a href="https://github.com/Ucas-HaoranWei/GOT-OCR2.0/">GOT-OCR2.0</a>, <a href="https://github.com/opendatalab/MinerU">MinerU</a>, <a href="https://github.com/PaddlePaddle/PaddleOCR">PaddleOCR</a>, <a href="https://github.com/LingyvKong/OneChart">OneChart</a>, <a href="https://github.com/Ucas-HaoranWei/Slow-Perception">Slow Perception</a> for their valuable models and ideas.</p>
<p>We also appreciate the benchmarks: <a href="https://github.com/ucaslcl/Fox">Fox</a>, <a href="https://github.com/opendatalab/OmniDocBench">OminiDocBench</a>.</p>
<h2>Citation</h2>
<pre><code class="language-bibtex">@article{wei2025deepseek,
title={DeepSeek-OCR: Contexts Optical Compression},
author={Wei, Haoran and Sun, Yaofeng and Li, Yukun},
journal={arXiv preprint arXiv:2510.18234},
year={2025}
}</code></pre>
</description>
<link>https://huggingface.co/deepseek-ai/DeepSeek-OCR</link>
<guid isPermaLink="false">https://huggingface.co/deepseek-ai/DeepSeek-OCR</guid>
<pubDate>Mon, 03 Nov 2025 18:36:12 GMT</pubDate>
</item>
<item>
<title>deepseek-ai/DeepSeek-V3.2-Exp-Base</title>
<description>deepseek-ai/DeepSeek-V3.2-Exp-Base Text Generation • 685B • Updated Oct 9 • 217 • 48<hr>
<h2>license: mit
library_name: transformers</h2>
</description>
<link>https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp-Base</link>
<guid isPermaLink="false">https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp-Base</guid>
<pubDate>Wed, 08 Oct 2025 18:09:46 GMT</pubDate>
</item>
<item>
<title>deepseek-ai/DeepSeek-V3.1-Terminus</title>
<description>deepseek-ai/DeepSeek-V3.1-Terminus Text Generation • 685B • Updated Sep 29 • 81k • • 354<hr>
<p>license: mit
library_name: transformers
base_model:</p>
<ul>
<li>deepseek-ai/DeepSeek-V3.1-Base</li>
</ul>
<hr>
<h1>DeepSeek-V3.1-Terminus</h1>
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" referrerpolicy="no-referrer">
</div>
<hr>
<div align="center" style="line-height: 1;">
<a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Chat" src="https://img.shields.io/badge/%F0%9F%A4%96%20Chat-DeepSeek%20V3-536af5?color=536af5&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
<img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&amp;logoColor=white&amp;color=7289da" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
<img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
<a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
<img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&amp;logoColor=white" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://huggingface.co/deepseek-ai/LICENSE" style="margin: 2px;">
<img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&amp;color=f5de53" style="display: inline-block; vertical-align: middle;" referrerpolicy="no-referrer">
</a>
</div>
<h2>Introduction</h2>
<p>This update maintains the model's original capabilities while addressing issues reported by users, including:</p>
<ul>
<li>Language consistency: Reducing instances of mixed Chinese-English text and occasional abnormal characters;</li>
<li>Agent capabilities: Further optimizing the performance of the Code Agent and Search Agent.</li>
</ul>
<table>
<thead>
<tr>
<th style="text-align:left">Benchmark</th>
<th style="text-align:center">DeepSeek-V3.1</th>
<th style="text-align:center">DeepSeek-V3.1-Terminus</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left"><strong>Reasoning Mode w/o Tool Use</strong></td>
<td style="text-align:center"></td>
<td style="text-align:center"></td>
</tr>
<tr>
<td style="text-align:left">MMLU-Pro</td>
<td style="text-align:center">84.8</td>
<td style="text-align:center">85.0</td>
</tr>
<tr>
<td style="text-align:left">GPQA-Diamond</td>
<td style="text-align:center">80.1</td>
<td style="text-align:center">80.7</td>
</tr>
<tr>
<td style="text-align:left">Humanity's Last Exam</td>
<td style="text-align:center">15.9</td>
<td style="text-align:center">21.7</td>
</tr>
<tr>
<td style="text-align:left">LiveCodeBench</td>
<td style="text-align:center">74.8</td>
<td style="text-align:center">74.9</td>
</tr>
<tr>
<td style="text-align:left">Codeforces</td>
<td style="text-align:center">2091</td>
<td style="text-align:center">2046</td>
</tr>
<tr>
<td style="text-align:left">Aider-Polyglot</td>
<td style="text-align:center">76.3</td>
<td style="text-align:center">76.1</td>
</tr>
<tr>
<td style="text-align:left"><strong>Agentic Tool Use</strong></td>
<td style="text-align:center"></td>
<td style="text-align:center"></td>
</tr>
<tr>
<td style="text-align:left">BrowseComp</td>
<td style="text-align:center">30.0</td>
<td style="text-align:center">38.5</td>
</tr>
<tr>
<td style="text-align:left">BrowseComp-zh</td>
<td style="text-align:center">49.2</td>
<td style="text-align:center">45.0</td>
</tr>
<tr>
<td style="text-align:left">SimpleQA</td>
<td style="text-align:center">93.4</td>
<td style="text-align:center">96.8</td>
</tr>
<tr>
<td style="text-align:left">SWE Verified</td>
<td style="text-align:center">66.0</td>
<td style="text-align:center">68.4</td>
</tr>
<tr>
<td style="text-align:left">SWE-bench Multilingual</td>
<td style="text-align:center">54.5</td>
<td style="text-align:center">57.8</td>
... |
http://localhost:1200/huggingface/models/facebook - Success ✔️<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
<channel>
<title>Huggingface facebook Models</title>
<link>https://huggingface.co/facebook/models?sort=created</link>
<atom:link href="http://localhost:1200/huggingface/models/facebook" rel="self" type="application/rss+xml"></atom:link>
<description>Huggingface facebook Models - Powered by RSSHub</description>
<generator>RSSHub</generator>
<webMaster>contact@rsshub.app (RSSHub)</webMaster>
<language>en</language>
<lastBuildDate>Sun, 07 Dec 2025 03:02:30 GMT</lastBuildDate>
<ttl>5</ttl>
<item>
<title>facebook/testing_instructions</title>
<description>facebook/testing_instructions Updated 3 days ago • 2<hr>
<pre><code>extra_gated_fields:
First Name: text
Last Name: text
Date of birth: date_picker
Country: country
Affiliation: text
I accept the terms and conditions: checkbox
geo: ip_location
Test request?: checkbox
By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox
extra_gated_description: &gt;-
The information you provide will be collected, stored, processed and shared in
accordance with the [Meta Privacy
Policy](https://www.facebook.com/privacy/policy/).
extra_gated_button_content: Submit
extra_gated_heading: "Please be sure to provide your full legal name, date of birth, and full organization name with all corporate identifiers. Avoid the use of acronyms and special characters. Failure to follow these instructions may prevent you from accessing this model and others on Hugging Face. You will not have the ability to edit this form after submission, so please ensure all information is accurate."
language:
- en
tags:
- meta-ai
- meta-pytorch
license: fair-noncommercial-research-license
---
</code></pre>
</description>
<link>https://huggingface.co/facebook/testing_instructions</link>
<guid isPermaLink="false">https://huggingface.co/facebook/testing_instructions</guid>
<pubDate>Thu, 04 Dec 2025 00:10:57 GMT</pubDate>
</item>
<item>
<title>facebook/omniASR-LLM-7B-ZS</title>
<description>facebook/omniASR-LLM-7B-ZS Automatic Speech Recognition • Updated 9 days ago • 3<hr>
<p>license: apache-2.0
datasets:</p>
<ul>
<li>facebook/omnilingual-asr-corpus
pipeline_tag: automatic-speech-recognition</li>
</ul>
<hr>
<h1>Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages</h1>
<div align="center" style="lline-height: 1.2; font-size:16px; margin-bottom: 30px;">
<a href="https://huggingface.co/facebook" target="_blank" style="margin: 2px;">
🤗 Hugging Face
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr" target="_blank" style="margin: 2px;">
🐙 GitHub
</a> |
<a href="https://huggingface.co/spaces/facebook/omniasr-transcriptions" target="_blank" style="margin: 2px;">
🤖️ Demo
</a> |
<a href="https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/" target="_blank" style="margin: 2px;">
📃 Paper
</a> |
<a href="https://ai.meta.com/blog/omnilingual-asr-advancing-automatic-speech-recognition/" target="_blank" style="margin: 2px;">
📝 Blogpost
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr/blob/main/LICENSE" style="margin: 2px;">
📄 License: Apache 2.0
</a>
</div>
<h1>Model Card for omniASR-LLM-7B-ZS</h1>
<h2>Model Description</h2>
<p>This model is part of the <strong>Omnilingual ASR</strong> family released by Meta AI. The original suite includes:</p>
<!-- TODO : add new tokenizer, we'll get two tokenizer, add mssing speed numbers-->
<table>
<thead>
<tr>
<th>Model Name</th>
<th>Features</th>
<th style="text-align:right">Parameters</th>
<th style="text-align:right">Download Size (FP32)</th>
<th style="text-align:right">Inference VRAM¹</th>
<th style="text-align:right">Real-Time Factor¹ (relative speed)²</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-W2V-300M"><code>omniASR_W2V_300M</code></a></td>
<td>SSL</td>
<td style="text-align:right">317_390_592</td>
<td style="text-align:right">1.2 GiB</td>
<td style="text-align:right"></td>
<td style="text-align:right"></td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-W2V-1B"><code>omniASR_W2V_1B</code></a></td>
<td>SSL</td>
<td style="text-align:right">965_514_752</td>
<td style="text-align:right">3.6 GiB</td>
<td style="text-align:right"></td>
<td style="text-align:right"></td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-W2V-3B"><code>omniASR_W2V_3B</code></a></td>
<td>SSL</td>
<td style="text-align:right">3_064_124_672</td>
<td style="text-align:right">12.0 GiB</td>
<td style="text-align:right"></td>
<td style="text-align:right"></td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-W2V-7B"><code>omniASR_W2V_7B</code></a></td>
<td>SSL</td>
<td style="text-align:right">6_488_487_168</td>
<td style="text-align:right">25.0 GiB</td>
<td style="text-align:right"></td>
<td style="text-align:right"></td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-CTC-300M"><code>omniASR_CTC_300M</code></a></td>
<td>ASR</td>
<td style="text-align:right">325_494_996</td>
<td style="text-align:right">1.3 GiB</td>
<td style="text-align:right">~2 GiB</td>
<td style="text-align:right">0.001 (96x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-CTC-1B"><code>omniASR_CTC_1B</code></a></td>
<td>ASR</td>
<td style="text-align:right">975_065_300</td>
<td style="text-align:right">3.7 GiB</td>
<td style="text-align:right">~3 GiB</td>
<td style="text-align:right">0.002 (48x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-CTC-3B"><code>omniASR_CTC_3B</code></a></td>
<td>ASR</td>
<td style="text-align:right">3_080_423_636</td>
<td style="text-align:right">12.0 GiB</td>
<td style="text-align:right">~8 GiB</td>
<td style="text-align:right">0.003 (32x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-CTC-7B"><code>omniASR_CTC_7B</code></a></td>
<td>ASR</td>
<td style="text-align:right">6_504_786_132</td>
<td style="text-align:right">25.0 GiB</td>
<td style="text-align:right">~15 GiB</td>
<td style="text-align:right">0.006 (16x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-LLM-300M"><code>omniASR_LLM_300M</code></a></td>
<td>ASR with optional language conditioning</td>
<td style="text-align:right">1_627_603_584</td>
<td style="text-align:right">6.1 GiB</td>
<td style="text-align:right">~5 GiB</td>
<td style="text-align:right">0.090 (~1x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-LLM-1B"><code>omniASR_LLM_1B</code></a></td>
<td>ASR with optional language conditioning</td>
<td style="text-align:right">2_275_710_592</td>
<td style="text-align:right">8.5 GiB</td>
<td style="text-align:right">~6 GiB</td>
<td style="text-align:right">0.091 (~1x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-LLM-3B"><code>omniASR_LLM_3B</code></a></td>
<td>ASR with optional language conditioning</td>
<td style="text-align:right">4_376_679_040</td>
<td style="text-align:right">17.0 GiB</td>
<td style="text-align:right">~10 GiB</td>
<td style="text-align:right">0.093 (~1x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-LLM-7B"><code>omniASR_LLM_7B</code></a></td>
<td>ASR with optional language conditioning</td>
<td style="text-align:right">7_801_041_536</td>
<td style="text-align:right">30.0 GiB</td>
<td style="text-align:right">~17 GiB</td>
<td style="text-align:right">0.092 (~1x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-LLM-7B-ZS"><code>omniASR_LLM_7B_ZS</code></a></td>
<td>Zero-Shot ASR</td>
<td style="text-align:right">7_810_900_608</td>
<td style="text-align:right">30.0 GiB</td>
<td style="text-align:right">~20 GiB</td>
<td style="text-align:right">0.194 (~0.5x)</td>
</tr>
</tbody>
</table>
<p>¹ (batch=1, audio_len=30s, BF16, A100)</p>
<p>² Relative speed to <code>omniASR_LLM_7B</code></p>
<hr>
<h2>Installation</h2>
<p>The models were developed using <a href="https://github.com/facebookresearch/fairseq2">fairseq2</a>, a research-focused sequence modeling toolkit. While we provide a <strong>reference</strong> inference pipeline that works across platforms, audio support requires <a href="https://github.com/facebookresearch/fairseq2?tab=readme-ov-file#system-dependencies">libsndfile</a> (Mac: <code>brew install libsndfile</code>; Windows may need an additional <a href="https://github.com/facebookresearch/fairseq2?tab=readme-ov-file#installing-on-windows">setup</a>).</p>
<pre><code class="language-bash"># using pip
pip install omnilingual-asr
# using uv
uv add omnilingual-asr
</code></pre>
<h2>Inference</h2>
<pre><code class="language-python">from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")
audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
lang = ["eng_Latn", "deu_Latn"]
transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)
</code></pre>
<h2>Supported Languages</h2>
<p>To view the full list of 1600+ supported languages, you can access the language list <a href="https://huggingface.co/src/omnilingual_asr/models/wav2vec2_llama/lang_ids.py">programmatically</a>:</p>
<pre><code class="language-python">from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs
# Print all supported languages
print(f"Total supported languages: {len(supported_langs)}")
print(supported_langs)
# Check if a specific language is supported
if "eng_Latn" in supported_langs:
print("English (Latin script) is supported!")
</code></pre>
<p>Languages follow the format <code>{language_code}_{script}</code>, for example <code>eng_Latn</code> - English (Latin script), <code>cmn_Hans</code> - Mandarin Chinese (Simplified), ...</p>
<hr>
<h2>Training</h2>
<p>To further finetune the released checkpoints on your own data, use our <a href="https://huggingface.co/workflows/dataprep/README.md">data preparation guide</a> followed by the <a href="https://huggingface.co/workflows/recipes/wav2vec2/asr/README.md">finetuning recipe guide</a>.</p>
<hr>
<h2>Citation</h2>
<p><strong>BibTeX:</strong></p>
<pre><code class="language-bibtex">@misc{omnilingualasr2025,
title={{Omnilingual ASR}: Open-Source Multilingual Speech Recognition for 1600+ Languages},
author={{Omnilingual ASR Team} and Keren, Gil and Kozhevnikov, Artyom and Meng, Yen and Ropers, Christophe and Setzler, Matthew and Wang, Skyler and Adebara, Ife and Auli, Michael and Can, Balioglu and Chan, Kevin and Cheng, Chierh and Chuang, Joe and Droof, Caley and Duppenthaler, Mark and Duquenne, Paul-Ambroise and Erben, Alexander and Gao, Cynthia and Mejia Gonzalez, Gabriel and Lyu, Kehan and Miglani, Sagar and Pratap, Vineel and Sadagopan, Kaushik Ram and Saleem, Safiyyah and Turkatenko, Arina and Ventayol-Boada, Albert and Yong, Zheng-Xin and Chung, Yu-An and Maillard, Jean and Moritz, Rashel and Mourachko, Alexandre and Williamson, Mary and Yates, Shireen},
year={2025},
url={https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/},
}
</code></pre>
<ul>
<li><strong>Developed by:</strong> Meta AI / Omnilingual ASR Team(<a href="https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file" title="GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages">GitHub</a>)</li>
<li><strong>Model type:</strong> End-to-end automatic speech recognition model (wav2vec2-style encoder with CTC head / encoder-decoder, depending on checkpoint).</li>
<li><strong>Language(s) (NLP):</strong> 1,600+ languages overall in Omnilingual ASR; this corpus release specifically covers <strong>348 under-served languages</strong> across many writing systems (Latin, Arabic, Devanagari, etc.).(<a href="https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file" title="GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages">GitHub</a>)</li>
<li><strong>License:</strong> Apache-2.0 (for the model and code), CC-BY-4.0 for the <code>facebook/omnilingual-asr-corpus</code> dataset.(<a href="https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file" title="GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages">GitHub</a>)</li>
</ul>
<hr>
</description><link>https://huggingface.co/facebook/omniASR-LLM-7B-ZS</link><guid isPermaLink="false">https://huggingface.co/facebook/omniASR-LLM-7B-ZS</guid><pubDate>Thu, 27 Nov 2025 23:41:34 GMT</pubDate></item><item><title>facebook/omniASR-LLM-7B</title><description>facebook/omniASR-LLM-7B Automatic Speech Recognition • Updated 9 days ago • 12<hr>
<p>license: apache-2.0
datasets:</p>
<ul>
<li>facebook/omnilingual-asr-corpus
pipeline_tag: automatic-speech-recognition</li>
</ul>
<hr>
<h1>Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages</h1>
<div align="center" style="lline-height: 1.2; font-size:16px; margin-bottom: 30px;">
<a href="https://huggingface.co/facebook" target="_blank" style="margin: 2px;">
🤗 Hugging Face
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr" target="_blank" style="margin: 2px;">
🐙 GitHub
</a> |
<a href="https://huggingface.co/spaces/facebook/omniasr-transcriptions" target="_blank" style="margin: 2px;">
🤖️ Demo
</a> |
<a href="https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/" target="_blank" style="margin: 2px;">
📃 Paper
</a> |
<a href="https://ai.meta.com/blog/omnilingual-asr-advancing-automatic-speech-recognition/" target="_blank" style="margin: 2px;">
📝 Blogpost
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr/blob/main/LICENSE" style="margin: 2px;">
📄 License: Apache 2.0
</a>
</div>
<h1>Model Card for omniASR-LLM-7B</h1>
<h2>Model Description</h2>
<p>This model is part of the <strong>Omnilingual ASR</strong> family released by Meta AI. The original suite includes:</p>
<!-- TODO : add new tokenizer, we'll get two tokenizer, add mssing speed numbers-->
<table>
<thead>
<tr>
<th>Model Name</th>
<th>Features</th>
<th style="text-align:right">Parameters</th>
<th style="text-align:right">Download Size (FP32)</th>
<th style="text-align:right">Inference VRAM¹</th>
<th style="text-align:right">Real-Time Factor¹ (relative speed)²</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-W2V-300M"><code>omniASR_W2V_300M</code></a></td>
<td>SSL</td>
<td style="text-align:right">317_390_592</td>
<td style="text-align:right">1.2 GiB</td>
<td style="text-align:right"></td>
<td style="text-align:right"></td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-W2V-1B"><code>omniASR_W2V_1B</code></a></td>
<td>SSL</td>
<td style="text-align:right">965_514_752</td>
<td style="text-align:right">3.6 GiB</td>
<td style="text-align:right"></td>
<td style="text-align:right"></td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-W2V-3B"><code>omniASR_W2V_3B</code></a></td>
<td>SSL</td>
<td style="text-align:right">3_064_124_672</td>
<td style="text-align:right">12.0 GiB</td>
<td style="text-align:right"></td>
<td style="text-align:right"></td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-W2V-7B"><code>omniASR_W2V_7B</code></a></td>
<td>SSL</td>
<td style="text-align:right">6_488_487_168</td>
<td style="text-align:right">25.0 GiB</td>
<td style="text-align:right"></td>
<td style="text-align:right"></td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-CTC-300M"><code>omniASR_CTC_300M</code></a></td>
<td>ASR</td>
<td style="text-align:right">325_494_996</td>
<td style="text-align:right">1.3 GiB</td>
<td style="text-align:right">~2 GiB</td>
<td style="text-align:right">0.001 (96x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-CTC-1B"><code>omniASR_CTC_1B</code></a></td>
<td>ASR</td>
<td style="text-align:right">975_065_300</td>
<td style="text-align:right">3.7 GiB</td>
<td style="text-align:right">~3 GiB</td>
<td style="text-align:right">0.002 (48x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-CTC-3B"><code>omniASR_CTC_3B</code></a></td>
<td>ASR</td>
<td style="text-align:right">3_080_423_636</td>
<td style="text-align:right">12.0 GiB</td>
<td style="text-align:right">~8 GiB</td>
<td style="text-align:right">0.003 (32x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-CTC-7B"><code>omniASR_CTC_7B</code></a></td>
<td>ASR</td>
<td style="text-align:right">6_504_786_132</td>
<td style="text-align:right">25.0 GiB</td>
<td style="text-align:right">~15 GiB</td>
<td style="text-align:right">0.006 (16x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-LLM-300M"><code>omniASR_LLM_300M</code></a></td>
<td>ASR with optional language conditioning</td>
<td style="text-align:right">1_627_603_584</td>
<td style="text-align:right">6.1 GiB</td>
<td style="text-align:right">~5 GiB</td>
<td style="text-align:right">0.090 (~1x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-LLM-1B"><code>omniASR_LLM_1B</code></a></td>
<td>ASR with optional language conditioning</td>
<td style="text-align:right">2_275_710_592</td>
<td style="text-align:right">8.5 GiB</td>
<td style="text-align:right">~6 GiB</td>
<td style="text-align:right">0.091 (~1x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-LLM-3B"><code>omniASR_LLM_3B</code></a></td>
<td>ASR with optional language conditioning</td>
<td style="text-align:right">4_376_679_040</td>
<td style="text-align:right">17.0 GiB</td>
<td style="text-align:right">~10 GiB</td>
<td style="text-align:right">0.093 (~1x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-LLM-7B"><code>omniASR_LLM_7B</code></a></td>
<td>ASR with optional language conditioning</td>
<td style="text-align:right">7_801_041_536</td>
<td style="text-align:right">30.0 GiB</td>
<td style="text-align:right">~17 GiB</td>
<td style="text-align:right">0.092 (~1x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-LLM-7B-ZS"><code>omniASR_LLM_7B_ZS</code></a></td>
<td>Zero-Shot ASR</td>
<td style="text-align:right">7_810_900_608</td>
<td style="text-align:right">30.0 GiB</td>
<td style="text-align:right">~20 GiB</td>
<td style="text-align:right">0.194 (~0.5x)</td>
</tr>
</tbody>
</table>
<p>¹ (batch=1, audio_len=30s, BF16, A100)</p>
<p>² Relative speed to <code>omniASR_LLM_7B</code></p>
<hr>
<h2>Installation</h2>
<p>The models were developed using <a href="https://github.com/facebookresearch/fairseq2">fairseq2</a>, a research-focused sequence modeling toolkit. While we provide a <strong>reference</strong> inference pipeline that works across platforms, audio support requires <a href="https://github.com/facebookresearch/fairseq2?tab=readme-ov-file#system-dependencies">libsndfile</a> (Mac: <code>brew install libsndfile</code>; Windows may need an additional <a href="https://github.com/facebookresearch/fairseq2?tab=readme-ov-file#installing-on-windows">setup</a>).</p>
<pre><code class="language-bash"># using pip
pip install omnilingual-asr
# using uv
uv add omnilingual-asr
</code></pre>
<h2>Inference</h2>
<pre><code class="language-python">from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")
audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
lang = ["eng_Latn", "deu_Latn"]
transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)
</code></pre>
<h2>Supported Languages</h2>
<p>To view the full list of 1600+ supported languages, you can access the language list <a href="https://huggingface.co/src/omnilingual_asr/models/wav2vec2_llama/lang_ids.py">programmatically</a>:</p>
<pre><code class="language-python">from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs
# Print all supported languages
print(f"Total supported languages: {len(supported_langs)}")
print(supported_langs)
# Check if a specific language is supported
if "eng_Latn" in supported_langs:
print("English (Latin script) is supported!")
</code></pre>
<p>Languages follow the format <code>{language_code}_{script}</code>, for example <code>eng_Latn</code> - English (Latin script), <code>cmn_Hans</code> - Mandarin Chinese (Simplified), ...</p>
<hr>
<h2>Training</h2>
<p>To further finetune the released checkpoints on your own data, use our <a href="https://huggingface.co/workflows/dataprep/README.md">data preparation guide</a> followed by the <a href="https://huggingface.co/workflows/recipes/wav2vec2/asr/README.md">finetuning recipe guide</a>.</p>
<hr>
<h2>Citation</h2>
<p><strong>BibTeX:</strong></p>
<pre><code class="language-bibtex">@misc{omnilingualasr2025,
title={{Omnilingual ASR}: Open-Source Multilingual Speech Recognition for 1600+ Languages},
author={{Omnilingual ASR Team} and Keren, Gil and Kozhevnikov, Artyom and Meng, Yen and Ropers, Christophe and Setzler, Matthew and Wang, Skyler and Adebara, Ife and Auli, Michael and Can, Balioglu and Chan, Kevin and Cheng, Chierh and Chuang, Joe and Droof, Caley and Duppenthaler, Mark and Duquenne, Paul-Ambroise and Erben, Alexander and Gao, Cynthia and Mejia Gonzalez, Gabriel and Lyu, Kehan and Miglani, Sagar and Pratap, Vineel and Sadagopan, Kaushik Ram and Saleem, Safiyyah and Turkatenko, Arina and Ventayol-Boada, Albert and Yong, Zheng-Xin and Chung, Yu-An and Maillard, Jean and Moritz, Rashel and Mourachko, Alexandre and Williamson, Mary and Yates, Shireen},
year={2025},
url={https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/},
}
</code></pre>
<ul>
<li><strong>Developed by:</strong> Meta AI / Omnilingual ASR Team(<a href="https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file" title="GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages">GitHub</a>)</li>
<li><strong>Model type:</strong> End-to-end automatic speech recognition model (wav2vec2-style encoder with CTC head / encoder-decoder, depending on checkpoint).</li>
<li><strong>Language(s) (NLP):</strong> 1,600+ languages overall in Omnilingual ASR; this corpus release specifically covers <strong>348 under-served languages</strong> across many writing systems (Latin, Arabic, Devanagari, etc.).(<a href="https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file" title="GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages">GitHub</a>)</li>
<li><strong>License:</strong> Apache-2.0 (for the model and code), CC-BY-4.0 for the <code>facebook/omnilingual-asr-corpus</code> dataset.(<a href="https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file" title="GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages">GitHub</a>)</li>
</ul>
<hr>
</description><link>https://huggingface.co/facebook/omniASR-LLM-7B</link><guid isPermaLink="false">https://huggingface.co/facebook/omniASR-LLM-7B</guid><pubDate>Thu, 27 Nov 2025 23:26:06 GMT</pubDate></item><item><title>facebook/omniASR-LLM-3B</title><description>facebook/omniASR-LLM-3B Automatic Speech Recognition • Updated 9 days ago • 1<hr>
<p>license: apache-2.0
datasets:</p>
<ul>
<li>facebook/omnilingual-asr-corpus
pipeline_tag: automatic-speech-recognition</li>
</ul>
<hr>
<h1>Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages</h1>
<div align="center" style="lline-height: 1.2; font-size:16px; margin-bottom: 30px;">
<a href="https://huggingface.co/facebook" target="_blank" style="margin: 2px;">
🤗 Hugging Face
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr" target="_blank" style="margin: 2px;">
🐙 GitHub
</a> |
<a href="https://huggingface.co/spaces/facebook/omniasr-transcriptions" target="_blank" style="margin: 2px;">
🤖️ Demo
</a> |
<a href="https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/" target="_blank" style="margin: 2px;">
📃 Paper
</a> |
<a href="https://ai.meta.com/blog/omnilingual-asr-advancing-automatic-speech-recognition/" target="_blank" style="margin: 2px;">
📝 Blogpost
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr/blob/main/LICENSE" style="margin: 2px;">
📄 License: Apache 2.0
</a>
</div>
<h1>Model Card for omniASR-LLM-3B</h1>
<h2>Model Description</h2>
<p>This model is part of the <strong>Omnilingual ASR</strong> family released by Meta AI. The original suite includes:</p>
<!-- TODO : add new tokenizer, we'll get two tokenizer, add mssing speed numbers-->
<table>
<thead>
<tr>
<th>Model Name</th>
<th>Features</th>
<th style="text-align:right">Parameters</th>
<th style="text-align:right">Download Size (FP32)</th>
<th style="text-align:right">Inference VRAM¹</th>
<th style="text-align:right">Real-Time Factor¹ (relative speed)²</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-W2V-300M"><code>omniASR_W2V_300M</code></a></td>
<td>SSL</td>
<td style="text-align:right">317_390_592</td>
<td style="text-align:right">1.2 GiB</td>
<td style="text-align:right"></td>
<td style="text-align:right"></td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-W2V-1B"><code>omniASR_W2V_1B</code></a></td>
<td>SSL</td>
<td style="text-align:right">965_514_752</td>
<td style="text-align:right">3.6 GiB</td>
<td style="text-align:right"></td>
<td style="text-align:right"></td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-W2V-3B"><code>omniASR_W2V_3B</code></a></td>
<td>SSL</td>
<td style="text-align:right">3_064_124_672</td>
<td style="text-align:right">12.0 GiB</td>
<td style="text-align:right"></td>
<td style="text-align:right"></td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-W2V-7B"><code>omniASR_W2V_7B</code></a></td>
<td>SSL</td>
<td style="text-align:right">6_488_487_168</td>
<td style="text-align:right">25.0 GiB</td>
<td style="text-align:right"></td>
<td style="text-align:right"></td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-CTC-300M"><code>omniASR_CTC_300M</code></a></td>
<td>ASR</td>
<td style="text-align:right">325_494_996</td>
<td style="text-align:right">1.3 GiB</td>
<td style="text-align:right">~2 GiB</td>
<td style="text-align:right">0.001 (96x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-CTC-1B"><code>omniASR_CTC_1B</code></a></td>
<td>ASR</td>
<td style="text-align:right">975_065_300</td>
<td style="text-align:right">3.7 GiB</td>
<td style="text-align:right">~3 GiB</td>
<td style="text-align:right">0.002 (48x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-CTC-3B"><code>omniASR_CTC_3B</code></a></td>
<td>ASR</td>
<td style="text-align:right">3_080_423_636</td>
<td style="text-align:right">12.0 GiB</td>
<td style="text-align:right">~8 GiB</td>
<td style="text-align:right">0.003 (32x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-CTC-7B"><code>omniASR_CTC_7B</code></a></td>
<td>ASR</td>
<td style="text-align:right">6_504_786_132</td>
<td style="text-align:right">25.0 GiB</td>
<td style="text-align:right">~15 GiB</td>
<td style="text-align:right">0.006 (16x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-LLM-300M"><code>omniASR_LLM_300M</code></a></td>
<td>ASR with optional language conditioning</td>
<td style="text-align:right">1_627_603_584</td>
<td style="text-align:right">6.1 GiB</td>
<td style="text-align:right">~5 GiB</td>
<td style="text-align:right">0.090 (~1x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-LLM-1B"><code>omniASR_LLM_1B</code></a></td>
<td>ASR with optional language conditioning</td>
<td style="text-align:right">2_275_710_592</td>
<td style="text-align:right">8.5 GiB</td>
<td style="text-align:right">~6 GiB</td>
<td style="text-align:right">0.091 (~1x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-LLM-3B"><code>omniASR_LLM_3B</code></a></td>
<td>ASR with optional language conditioning</td>
<td style="text-align:right">4_376_679_040</td>
<td style="text-align:right">17.0 GiB</td>
<td style="text-align:right">~10 GiB</td>
<td style="text-align:right">0.093 (~1x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-LLM-7B"><code>omniASR_LLM_7B</code></a></td>
<td>ASR with optional language conditioning</td>
<td style="text-align:right">7_801_041_536</td>
<td style="text-align:right">30.0 GiB</td>
<td style="text-align:right">~17 GiB</td>
<td style="text-align:right">0.092 (~1x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-LLM-7B-ZS"><code>omniASR_LLM_7B_ZS</code></a></td>
<td>Zero-Shot ASR</td>
<td style="text-align:right">7_810_900_608</td>
<td style="text-align:right">30.0 GiB</td>
<td style="text-align:right">~20 GiB</td>
<td style="text-align:right">0.194 (~0.5x)</td>
</tr>
</tbody>
</table>
<p>¹ (batch=1, audio_len=30s, BF16, A100)</p>
<p>² Relative speed to <code>omniASR_LLM_7B</code></p>
<hr>
<h2>Installation</h2>
<p>The models were developed using <a href="https://github.com/facebookresearch/fairseq2">fairseq2</a>, a research-focused sequence modeling toolkit. While we provide a <strong>reference</strong> inference pipeline that works across platforms, audio support requires <a href="https://github.com/facebookresearch/fairseq2?tab=readme-ov-file#system-dependencies">libsndfile</a> (Mac: <code>brew install libsndfile</code>; Windows may need an additional <a href="https://github.com/facebookresearch/fairseq2?tab=readme-ov-file#installing-on-windows">setup</a>).</p>
<pre><code class="language-bash"># using pip
pip install omnilingual-asr
# using uv
uv add omnilingual-asr
</code></pre>
<h2>Inference</h2>
<pre><code class="language-python">from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")
audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
lang = ["eng_Latn", "deu_Latn"]
transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)
</code></pre>
<h2>Supported Languages</h2>
<p>To view the full list of 1600+ supported languages, you can access the language list <a href="https://huggingface.co/src/omnilingual_asr/models/wav2vec2_llama/lang_ids.py">programmatically</a>:</p>
<pre><code class="language-python">from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs
# Print all supported languages
print(f"Total supported languages: {len(supported_langs)}")
print(supported_langs)
# Check if a specific language is supported
if "eng_Latn" in supported_langs:
print("English (Latin script) is supported!")
</code></pre>
<p>Languages follow the format <code>{language_code}_{script}</code>, for example <code>eng_Latn</code> - English (Latin script), <code>cmn_Hans</code> - Mandarin Chinese (Simplified), ...</p>
<hr>
<h2>Training</h2>
<p>To further finetune the released checkpoints on your own data, use our <a href="https://huggingface.co/workflows/dataprep/README.md">data preparation guide</a> followed by the <a href="https://huggingface.co/workflows/recipes/wav2vec2/asr/README.md">finetuning recipe guide</a>.</p>
<hr>
<h2>Citation</h2>
<p><strong>BibTeX:</strong></p>
<pre><code class="language-bibtex">@misc{omnilingualasr2025,
title={{Omnilingual ASR}: Open-Source Multilingual Speech Recognition for 1600+ Languages},
author={{Omnilingual ASR Team} and Keren, Gil and Kozhevnikov, Artyom and Meng, Yen and Ropers, Christophe and Setzler, Matthew and Wang, Skyler and Adebara, Ife and Auli, Michael and Can, Balioglu and Chan, Kevin and Cheng, Chierh and Chuang, Joe and Droof, Caley and Duppenthaler, Mark and Duquenne, Paul-Ambroise and Erben, Alexander and Gao, Cynthia and Mejia Gonzalez, Gabriel and Lyu, Kehan and Miglani, Sagar and Pratap, Vineel and Sadagopan, Kaushik Ram and Saleem, Safiyyah and Turkatenko, Arina and Ventayol-Boada, Albert and Yong, Zheng-Xin and Chung, Yu-An and Maillard, Jean and Moritz, Rashel and Mourachko, Alexandre and Williamson, Mary and Yates, Shireen},
year={2025},
url={https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/},
}
</code></pre>
<ul>
<li><strong>Developed by:</strong> Meta AI / Omnilingual ASR Team(<a href="https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file" title="GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages">GitHub</a>)</li>
<li><strong>Model type:</strong> End-to-end automatic speech recognition model (wav2vec2-style encoder with CTC head / encoder-decoder, depending on checkpoint).</li>
<li><strong>Language(s) (NLP):</strong> 1,600+ languages overall in Omnilingual ASR; this corpus release specifically covers <strong>348 under-served languages</strong> across many writing systems (Latin, Arabic, Devanagari, etc.).(<a href="https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file" title="GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages">GitHub</a>)</li>
<li><strong>License:</strong> Apache-2.0 (for the model and code), CC-BY-4.0 for the <code>facebook/omnilingual-asr-corpus</code> dataset.(<a href="https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file" title="GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages">GitHub</a>)</li>
</ul>
<hr>
</description><link>https://huggingface.co/facebook/omniASR-LLM-3B</link><guid isPermaLink="false">https://huggingface.co/facebook/omniASR-LLM-3B</guid><pubDate>Thu, 27 Nov 2025 23:08:56 GMT</pubDate></item><item><title>facebook/omniASR-LLM-1B</title><description>facebook/omniASR-LLM-1B Automatic Speech Recognition • Updated 9 days ago • 3<hr>
<p>license: apache-2.0
datasets:</p>
<ul>
<li>facebook/omnilingual-asr-corpus
pipeline_tag: automatic-speech-recognition</li>
</ul>
<hr>
<h1>Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages</h1>
<div align="center" style="lline-height: 1.2; font-size:16px; margin-bottom: 30px;">
<a href="https://huggingface.co/facebook" target="_blank" style="margin: 2px;">
🤗 Hugging Face
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr" target="_blank" style="margin: 2px;">
🐙 GitHub
</a> |
<a href="https://huggingface.co/spaces/facebook/omniasr-transcriptions" target="_blank" style="margin: 2px;">
🤖️ Demo
</a> |
<a href="https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/" target="_blank" style="margin: 2px;">
📃 Paper
</a> |
<a href="https://ai.meta.com/blog/omnilingual-asr-advancing-automatic-speech-recognition/" target="_blank" style="margin: 2px;">
📝 Blogpost
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr/blob/main/LICENSE" style="margin: 2px;">
📄 License: Apache 2.0
</a>
</div>
<h1>Model Card for omniASR-LLM-1B</h1>
<h2>Model Description</h2>
<p>This model is part of the <strong>Omnilingual ASR</strong> family released by Meta AI. The original suite includes:</p>
<!-- TODO : add new tokenizer, we'll get two tokenizer, add mssing speed numbers-->
<table>
<thead>
<tr>
<th>Model Name</th>
<th>Features</th>
<th style="text-align:right">Parameters</th>
<th style="text-align:right">Download Size (FP32)</th>
<th style="text-align:right">Inference VRAM¹</th>
<th style="text-align:right">Real-Time Factor¹ (relative speed)²</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-W2V-300M"><code>omniASR_W2V_300M</code></a></td>
<td>SSL</td>
<td style="text-align:right">317_390_592</td>
<td style="text-align:right">1.2 GiB</td>
<td style="text-align:right"></td>
<td style="text-align:right"></td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-W2V-1B"><code>omniASR_W2V_1B</code></a></td>
<td>SSL</td>
<td style="text-align:right">965_514_752</td>
<td style="text-align:right">3.6 GiB</td>
<td style="text-align:right"></td>
<td style="text-align:right"></td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-W2V-3B"><code>omniASR_W2V_3B</code></a></td>
<td>SSL</td>
<td style="text-align:right">3_064_124_672</td>
<td style="text-align:right">12.0 GiB</td>
<td style="text-align:right"></td>
<td style="text-align:right"></td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-W2V-7B"><code>omniASR_W2V_7B</code></a></td>
<td>SSL</td>
<td style="text-align:right">6_488_487_168</td>
<td style="text-align:right">25.0 GiB</td>
<td style="text-align:right"></td>
<td style="text-align:right"></td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-CTC-300M"><code>omniASR_CTC_300M</code></a></td>
<td>ASR</td>
<td style="text-align:right">325_494_996</td>
<td style="text-align:right">1.3 GiB</td>
<td style="text-align:right">~2 GiB</td>
<td style="text-align:right">0.001 (96x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-CTC-1B"><code>omniASR_CTC_1B</code></a></td>
<td>ASR</td>
<td style="text-align:right">975_065_300</td>
<td style="text-align:right">3.7 GiB</td>
<td style="text-align:right">~3 GiB</td>
<td style="text-align:right">0.002 (48x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-CTC-3B"><code>omniASR_CTC_3B</code></a></td>
<td>ASR</td>
<td style="text-align:right">3_080_423_636</td>
<td style="text-align:right">12.0 GiB</td>
<td style="text-align:right">~8 GiB</td>
<td style="text-align:right">0.003 (32x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-CTC-7B"><code>omniASR_CTC_7B</code></a></td>
<td>ASR</td>
<td style="text-align:right">6_504_786_132</td>
<td style="text-align:right">25.0 GiB</td>
<td style="text-align:right">~15 GiB</td>
<td style="text-align:right">0.006 (16x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-LLM-300M"><code>omniASR_LLM_300M</code></a></td>
<td>ASR with optional language conditioning</td>
<td style="text-align:right">1_627_603_584</td>
<td style="text-align:right">6.1 GiB</td>
<td style="text-align:right">~5 GiB</td>
<td style="text-align:right">0.090 (~1x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-LLM-1B"><code>omniASR_LLM_1B</code></a></td>
<td>ASR with optional language conditioning</td>
<td style="text-align:right">2_275_710_592</td>
<td style="text-align:right">8.5 GiB</td>
<td style="text-align:right">~6 GiB</td>
<td style="text-align:right">0.091 (~1x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-LLM-3B"><code>omniASR_LLM_3B</code></a></td>
<td>ASR with optional language conditioning</td>
<td style="text-align:right">4_376_679_040</td>
<td style="text-align:right">17.0 GiB</td>
<td style="text-align:right">~10 GiB</td>
<td style="text-align:right">0.093 (~1x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-LLM-7B"><code>omniASR_LLM_7B</code></a></td>
<td>ASR with optional language conditioning</td>
<td style="text-align:right">7_801_041_536</td>
<td style="text-align:right">30.0 GiB</td>
<td style="text-align:right">~17 GiB</td>
<td style="text-align:right">0.092 (~1x)</td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-LLM-7B-ZS"><code>omniASR_LLM_7B_ZS</code></a></td>
<td>Zero-Shot ASR</td>
<td style="text-align:right">7_810_900_608</td>
<td style="text-align:right">30.0 GiB</td>
<td style="text-align:right">~20 GiB</td>
<td style="text-align:right">0.194 (~0.5x)</td>
</tr>
</tbody>
</table>
<p>¹ (batch=1, audio_len=30s, BF16, A100)</p>
<p>² Relative speed to <code>omniASR_LLM_7B</code></p>
<hr>
<h2>Installation</h2>
<p>The models were developed using <a href="https://github.com/facebookresearch/fairseq2">fairseq2</a>, a research-focused sequence modeling toolkit. While we provide a <strong>reference</strong> inference pipeline that works across platforms, audio support requires <a href="https://github.com/facebookresearch/fairseq2?tab=readme-ov-file#system-dependencies">libsndfile</a> (Mac: <code>brew install libsndfile</code>; Windows may need an additional <a href="https://github.com/facebookresearch/fairseq2?tab=readme-ov-file#installing-on-windows">setup</a>).</p>
<pre><code class="language-bash"># using pip
pip install omnilingual-asr
# using uv
uv add omnilingual-asr
</code></pre>
<h2>Inference</h2>
<pre><code class="language-python">from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")
audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
lang = ["eng_Latn", "deu_Latn"]
transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)
</code></pre>
<h2>Supported Languages</h2>
<p>To view the full list of 1600+ supported languages, you can access the language list <a href="https://huggingface.co/src/omnilingual_asr/models/wav2vec2_llama/lang_ids.py">programmatically</a>:</p>
<pre><code class="language-python">from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs
# Print all supported languages
print(f"Total supported languages: {len(supported_langs)}")
print(supported_langs)
# Check if a specific language is supported
if "eng_Latn" in supported_langs:
print("English (Latin script) is supported!")
</code></pre>
<p>Languages follow the format <code>{language_code}_{script}</code>, for example <code>eng_Latn</code> - English (Latin script), <code>cmn_Hans</code> - Mandarin Chinese (Simplified), ...</p>
<hr>
<h2>Training</h2>
<p>To further finetune the released checkpoints on your own data, use our <a href="https://huggingface.co/workflows/dataprep/README.md">data preparation guide</a> followed by the <a href="https://huggingface.co/workflows/recipes/wav2vec2/asr/README.md">finetuning recipe guide</a>.</p>
<hr>
<h2>Citation</h2>
<p><strong>BibTeX:</strong></p>
<pre><code class="language-bibtex">@misc{omnilingualasr2025,
title={{Omnilingual ASR}: Open-Source Multilingual Speech Recognition for 1600+ Languages},
author={{Omnilingual ASR Team} and Keren, Gil and Kozhevnikov, Artyom and Meng, Yen and Ropers, Christophe and Setzler, Matthew and Wang, Skyler and Adebara, Ife and Auli, Michael and Can, Balioglu and Chan, Kevin and Cheng, Chierh and Chuang, Joe and Droof, Caley and Duppenthaler, Mark and Duquenne, Paul-Ambroise and Erben, Alexander and Gao, Cynthia and Mejia Gonzalez, Gabriel and Lyu, Kehan and Miglani, Sagar and Pratap, Vineel and Sadagopan, Kaushik Ram and Saleem, Safiyyah and Turkatenko, Arina and Ventayol-Boada, Albert and Yong, Zheng-Xin and Chung, Yu-An and Maillard, Jean and Moritz, Rashel and Mourachko, Alexandre and Williamson, Mary and Yates, Shireen},
year={2025},
url={https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/},
}
</code></pre>
<ul>
<li><strong>Developed by:</strong> Meta AI / Omnilingual ASR Team(<a href="https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file" title="GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages">GitHub</a>)</li>
<li><strong>Model type:</strong> End-to-end automatic speech recognition model (wav2vec2-style encoder with CTC head / encoder-decoder, depending on checkpoint).</li>
<li><strong>Language(s) (NLP):</strong> 1,600+ languages overall in Omnilingual ASR; this corpus release specifically covers <strong>348 under-served languages</strong> across many writing systems (Latin, Arabic, Devanagari, etc.).(<a href="https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file" title="GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages">GitHub</a>)</li>
<li><strong>License:</strong> Apache-2.0 (for the model and code), CC-BY-4.0 for the <code>facebook/omnilingual-asr-corpus</code> dataset.(<a href="https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file" title="GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages">GitHub</a>)</li>
</ul>
<hr>
</description><link>https://huggingface.co/facebook/omniASR-LLM-1B</link><guid isPermaLink="false">https://huggingface.co/facebook/omniASR-LLM-1B</guid><pubDate>Thu, 27 Nov 2025 22:37:27 GMT</pubDate></item><item><title>facebook/omniASR-LLM-300M</title><description>facebook/omniASR-LLM-300M Automatic Speech Recognition • Updated 9 days ago • 3<hr>
<p>license: apache-2.0
datasets:</p>
<ul>
<li>facebook/omnilingual-asr-corpus
pipeline_tag: automatic-speech-recognition</li>
</ul>
<hr>
<h1>Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages</h1>
<div align="center" style="lline-height: 1.2; font-size:16px; margin-bottom: 30px;">
<a href="https://huggingface.co/facebook" target="_blank" style="margin: 2px;">
🤗 Hugging Face
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr" target="_blank" style="margin: 2px;">
🐙 GitHub
</a> |
<a href="https://huggingface.co/spaces/facebook/omniasr-transcriptions" target="_blank" style="margin: 2px;">
🤖️ Demo
</a> |
<a href="https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/" target="_blank" style="margin: 2px;">
📃 Paper
</a> |
<a href="https://ai.meta.com/blog/omnilingual-asr-advancing-automatic-speech-recognition/" target="_blank" style="margin: 2px;">
📝 Blogpost
</a> |
<a href="https://github.com/facebookresearch/omnilingual-asr/blob/main/LICENSE" style="margin: 2px;">
📄 License: Apache 2.0
</a>
</div>
<h1>Model Card for omniASR-LLM-300M</h1>
<h2>Model Description</h2>
<p>This model is part of the <strong>Omnilingual ASR</strong> family released by Meta AI. The original suite includes:</p>
<!-- TODO : add new tokenizer, we'll get two tokenizer, add mssing speed numbers-->
<table>
<thead>
<tr>
<th>Model Name</th>
<th>Features</th>
<th style="text-align:right">Parameters</th>
<th style="text-align:right">Download Size (FP32)</th>
<th style="text-align:right">Inference VRAM¹</th>
<th style="text-align:right">Real-Time Factor¹ (relative speed)²</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-W2V-300M"><code>omniASR_W2V_300M</code></a></td>
<td>SSL</td>
<td style="text-align:right">317_390_592</td>
<td style="text-align:right">1.2 GiB</td>
<td style="text-align:right"></td>
<td style="text-align:right"></td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-W2V-1B"><code>omniASR_W2V_1B</code></a></td>
<td>SSL</td>
<td style="text-align:right">965_514_752</td>
<td style="text-align:right">3.6 GiB</td>
<td style="text-align:right"></td>
<td style="text-align:right"></td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-W2V-3B"><code>omniASR_W2V_3B</code></a></td>
<td>SSL</td>
<td style="text-align:right">3_064_124_672</td>
<td style="text-align:right">12.0 GiB</td>
<td style="text-align:right"></td>
<td style="text-align:right"></td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-W2V-7B"><code>omniASR_W2V_7B</code></a></td>
<td>SSL</td>
<td style="text-align:right">6_488_487_168</td>
<td style="text-align:right">25.0 GiB</td>
<td style="text-align:right"></td>
<td style="text-align:right"></td>
</tr>
<tr>
<td><a href="https://huggingface.co/facebook/omniASR-CTC-300M"><code>omniASR_CTC_300M</code></a></td>
<td>ASR</td>
<td style="text-align:right">325_494_996</td>
<td style="text-align:rig... |
http://localhost:1200/huggingface/models/ianyang02 - Success ✔️<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
<channel>
<title>Huggingface ianyang02 Models</title>
<link>https://huggingface.co/ianyang02/models?sort=created</link>
<atom:link href="http://localhost:1200/huggingface/models/ianyang02" rel="self" type="application/rss+xml"></atom:link>
<description>Huggingface ianyang02 Models - Powered by RSSHub</description>
<generator>RSSHub</generator>
<webMaster>contact@rsshub.app (RSSHub)</webMaster>
<language>en</language>
<lastBuildDate>Sun, 07 Dec 2025 03:02:32 GMT</lastBuildDate>
<ttl>5</ttl>
<item>
<title>ianyang02/aita_qwen3-30b</title>
<description>ianyang02/aita_qwen3-30b Updated 1 day ago</description>
<link>https://huggingface.co/ianyang02/aita_qwen3-30b</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/aita_qwen3-30b</guid>
<pubDate>Fri, 05 Dec 2025 14:17:52 GMT</pubDate>
</item>
<item>
<title>ianyang02/aita_qwen3-4b_dpo</title>
<description>ianyang02/aita_qwen3-4b_dpo Updated 15 days ago<hr>
<p>base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: transformers
model_name: aita_qwen3-4b_dpo
tags:</p>
<ul>
<li>generated_from_trainer</li>
<li>trl</li>
<li>dpo
licence: license</li>
</ul>
<hr>
<h1>Model Card for aita_qwen3-4b_dpo</h1>
<p>This model is a fine-tuned version of <a href="https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507">Qwen/Qwen3-4B-Instruct-2507</a>.
It has been trained using <a href="https://github.com/huggingface/trl">TRL</a>.</p>
<h2>Quick start</h2>
<pre><code class="language-python">from transformers import pipeline
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="ianyang02/aita_qwen3-4b_dpo", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
</code></pre>
<h2>Training procedure</h2>
<p><a href="https://wandb.ai/ianyang02-university-of-washington/huggingface/runs/4xnikck8"><img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights &amp; Biases" width="150" height="24" referrerpolicy="no-referrer"></a></p>
<p>This model was trained with DPO, a method introduced in <a href="https://huggingface.co/papers/2305.18290">Direct Preference Optimization: Your Language Model is Secretly a Reward Model</a>.</p>
<h3>Framework versions</h3>
<ul>
<li>TRL: 0.24.0</li>
<li>Transformers: 4.57.1</li>
<li>Pytorch: 2.9.0</li>
<li>Datasets: 4.3.0</li>
<li>Tokenizers: 0.22.1</li>
</ul>
<h2>Citations</h2>
<p>Cite DPO as:</p>
<pre><code class="language-bibtex">@inproceedings{rafailov2023direct,
title = {{Direct Preference Optimization: Your Language Model is Secretly a Reward Model}},
author = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Christopher D. Manning and Stefano Ermon and Chelsea Finn},
year = 2023,
booktitle = {Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023},
url = {http://papers.nips.cc/paper_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html},
editor = {Alice Oh and Tristan Naumann and Amir Globerson and Kate Saenko and Moritz Hardt and Sergey Levine},
}
</code></pre>
<p>Cite TRL as:</p>
<pre><code class="language-bibtex">@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
</code></pre>
</description>
<link>https://huggingface.co/ianyang02/aita_qwen3-4b_dpo</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/aita_qwen3-4b_dpo</guid>
<pubDate>Fri, 21 Nov 2025 15:09:21 GMT</pubDate>
</item>
<item>
<title>ianyang02/ppo_model_qwen3-4b_aita_h200_2</title>
<description>ianyang02/ppo_model_qwen3-4b_aita_h200_2 Text Generation • Updated 15 days ago • 66<hr>
<p>base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: peft
model_name: ppo_model_qwen3-4b_aita_h200_2
tags:</p>
<ul>
<li>base_model:adapter:Qwen/Qwen3-4B-Instruct-2507</li>
<li>lora</li>
<li>transformers
licence: license
pipeline_tag: text-generation</li>
</ul>
<hr>
<h1>Model Card for ppo_model_qwen3-4b_aita_h200_2</h1>
<p>This model is a fine-tuned version of <a href="https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507">Qwen/Qwen3-4B-Instruct-2507</a>.
It has been trained using <a href="https://github.com/huggingface/trl">TRL</a>.</p>
<h2>Quick start</h2>
<pre><code class="language-python">from transformers import pipeline
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="ianyang02/ppo_model_qwen3-4b_aita_h200_2", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
</code></pre>
<h2>Training procedure</h2>
<p><a href="https://wandb.ai/ianyang02-university-of-washington/huggingface/runs/x99g8ppd"><img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights &amp; Biases" width="150" height="24" referrerpolicy="no-referrer"></a></p>
<p>This model was trained with PPO, a method introduced in <a href="https://huggingface.co/papers/1909.08593">Fine-Tuning Language Models from Human Preferences</a>.</p>
<h3>Framework versions</h3>
<ul>
<li>PEFT 0.18.0</li>
<li>TRL: 0.25.1</li>
<li>Transformers: 4.57.1</li>
<li>Pytorch: 2.7.0.dev20250224+cu126</li>
<li>Datasets: 4.4.1</li>
<li>Tokenizers: 0.22.1</li>
</ul>
<h2>Citations</h2>
<p>Cite PPO as:</p>
<pre><code class="language-bibtex">@article{mziegler2019fine-tuning,
title = {{Fine-Tuning Language Models from Human Preferences}},
author = {Daniel M. Ziegler and Nisan Stiennon and Jeffrey Wu and Tom B. Brown and Alec Radford and Dario Amodei and Paul F. Christiano and Geoffrey Irving},
year = 2019,
eprint = {arXiv:1909.08593}
}
</code></pre>
<p>Cite TRL as:</p>
<pre><code class="language-bibtex">@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
</code></pre>
</description>
<link>https://huggingface.co/ianyang02/ppo_model_qwen3-4b_aita_h200_2</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/ppo_model_qwen3-4b_aita_h200_2</guid>
<pubDate>Fri, 21 Nov 2025 08:19:14 GMT</pubDate>
</item>
<item>
<title>ianyang02/aita_qwen3-4b_2</title>
<description>ianyang02/aita_qwen3-4b_2 Text Classification • 4B • Updated 17 days ago • 23<hr>
<p>base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: transformers
model_name: aita_qwen3-4b_2
tags:</p>
<ul>
<li>generated_from_trainer</li>
<li>trl</li>
<li>reward-trainer
licence: license</li>
</ul>
<hr>
<h1>Model Card for aita_qwen3-4b_2</h1>
<p>This model is a fine-tuned version of <a href="https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507">Qwen/Qwen3-4B-Instruct-2507</a>.
It has been trained using <a href="https://github.com/huggingface/trl">TRL</a>.</p>
<h2>Quick start</h2>
<pre><code class="language-python">from transformers import pipeline
text = "The capital of France is Paris."
rewarder = pipeline(model="ianyang02/aita_qwen3-4b_2", device="cuda")
output = rewarder(text)[0]
print(output["score"])
</code></pre>
<h2>Training procedure</h2>
<p><a href="https://wandb.ai/ianyang02-university-of-washington/huggingface/runs/lacazue4"><img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights &amp; Biases" width="150" height="24" referrerpolicy="no-referrer"></a></p>
<p>This model was trained with Reward.</p>
<h3>Framework versions</h3>
<ul>
<li>TRL: 0.25.1</li>
<li>Transformers: 4.57.1</li>
<li>Pytorch: 2.5.0.dev20240818+cu124</li>
<li>Datasets: 4.4.1</li>
<li>Tokenizers: 0.22.1</li>
</ul>
<h2>Citations</h2>
<p>Cite TRL as:</p>
<pre><code class="language-bibtex">@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
</code></pre>
</description>
<link>https://huggingface.co/ianyang02/aita_qwen3-4b_2</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/aita_qwen3-4b_2</guid>
<pubDate>Wed, 19 Nov 2025 23:02:33 GMT</pubDate>
</item>
<item>
<title>ianyang02/aita_qwen3-4b_2_checkpoint_2</title>
<description>ianyang02/aita_qwen3-4b_2_checkpoint_2 Text Classification • 4B • Updated 17 days ago • 25<hr>
<p>base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: transformers
model_name: aita_qwen3-4b_2
tags:</p>
<ul>
<li>generated_from_trainer</li>
<li>trl</li>
<li>reward-trainer
licence: license</li>
</ul>
<hr>
<h1>Model Card for aita_qwen3-4b_2</h1>
<p>This model is a fine-tuned version of <a href="https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507">Qwen/Qwen3-4B-Instruct-2507</a>.
It has been trained using <a href="https://github.com/huggingface/trl">TRL</a>.</p>
<h2>Quick start</h2>
<pre><code class="language-python">from transformers import pipeline
text = "The capital of France is Paris."
rewarder = pipeline(model="ianyang02/aita_qwen3-4b_2", device="cuda")
output = rewarder(text)[0]
print(output["score"])
</code></pre>
<h2>Training procedure</h2>
<p><a href="https://wandb.ai/ianyang02-university-of-washington/huggingface/runs/lacazue4"><img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights &amp; Biases" width="150" height="24" referrerpolicy="no-referrer"></a></p>
<p>This model was trained with Reward.</p>
<h3>Framework versions</h3>
<ul>
<li>TRL: 0.25.1</li>
<li>Transformers: 4.57.1</li>
<li>Pytorch: 2.5.0.dev20240818+cu124</li>
<li>Datasets: 4.4.1</li>
<li>Tokenizers: 0.22.1</li>
</ul>
<h2>Citations</h2>
<p>Cite TRL as:</p>
<pre><code class="language-bibtex">@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
</code></pre>
</description>
<link>https://huggingface.co/ianyang02/aita_qwen3-4b_2_checkpoint_2</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/aita_qwen3-4b_2_checkpoint_2</guid>
<pubDate>Wed, 19 Nov 2025 19:52:00 GMT</pubDate>
</item>
<item>
<title>ianyang02/aita_qwen3-4b_2_checkpoint_1</title>
<description>ianyang02/aita_qwen3-4b_2_checkpoint_1 Text Classification • 4B • Updated 17 days ago • 4<hr>
<p>base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: transformers
model_name: aita_qwen3-4b_2
tags:</p>
<ul>
<li>generated_from_trainer</li>
<li>trl</li>
<li>reward-trainer
licence: license</li>
</ul>
<hr>
<h1>Model Card for aita_qwen3-4b_2</h1>
<p>This model is a fine-tuned version of <a href="https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507">Qwen/Qwen3-4B-Instruct-2507</a>.
It has been trained using <a href="https://github.com/huggingface/trl">TRL</a>.</p>
<h2>Quick start</h2>
<pre><code class="language-python">from transformers import pipeline
text = "The capital of France is Paris."
rewarder = pipeline(model="ianyang02/aita_qwen3-4b_2", device="cuda")
output = rewarder(text)[0]
print(output["score"])
</code></pre>
<h2>Training procedure</h2>
<p><a href="https://wandb.ai/ianyang02-university-of-washington/huggingface/runs/lacazue4"><img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights &amp; Biases" width="150" height="24" referrerpolicy="no-referrer"></a></p>
<p>This model was trained with Reward.</p>
<h3>Framework versions</h3>
<ul>
<li>TRL: 0.25.1</li>
<li>Transformers: 4.57.1</li>
<li>Pytorch: 2.5.0.dev20240818+cu124</li>
<li>Datasets: 4.4.1</li>
<li>Tokenizers: 0.22.1</li>
</ul>
<h2>Citations</h2>
<p>Cite TRL as:</p>
<pre><code class="language-bibtex">@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
</code></pre>
</description>
<link>https://huggingface.co/ianyang02/aita_qwen3-4b_2_checkpoint_1</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/aita_qwen3-4b_2_checkpoint_1</guid>
<pubDate>Wed, 19 Nov 2025 16:40:34 GMT</pubDate>
</item>
<item>
<title>ianyang02/ppo_model_qwen3-4b_aita_h200</title>
<description>ianyang02/ppo_model_qwen3-4b_aita_h200 Updated 18 days ago<hr>
<p>base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: transformers
model_name: ppo_model_qwen3-4b_aita_h200
tags:</p>
<ul>
<li>generated_from_trainer</li>
<li>trl</li>
<li>ppo
licence: license</li>
</ul>
<hr>
<h1>Model Card for ppo_model_qwen3-4b_aita_h200</h1>
<p>This model is a fine-tuned version of <a href="https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507">Qwen/Qwen3-4B-Instruct-2507</a>.
It has been trained using <a href="https://github.com/huggingface/trl">TRL</a>.</p>
<h2>Quick start</h2>
<pre><code class="language-python">from transformers import pipeline
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="ianyang02/ppo_model_qwen3-4b_aita_h200", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
</code></pre>
<h2>Training procedure</h2>
<p><a href="https://wandb.ai/ianyang02-university-of-washington/huggingface/runs/ht40cr4d"><img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights &amp; Biases" width="150" height="24" referrerpolicy="no-referrer"></a></p>
<p>This model was trained with PPO, a method introduced in <a href="https://huggingface.co/papers/1909.08593">Fine-Tuning Language Models from Human Preferences</a>.</p>
<h3>Framework versions</h3>
<ul>
<li>TRL: 0.24.0</li>
<li>Transformers: 4.57.1</li>
<li>Pytorch: 2.9.0</li>
<li>Datasets: 4.3.0</li>
<li>Tokenizers: 0.22.1</li>
</ul>
<h2>Citations</h2>
<p>Cite PPO as:</p>
<pre><code class="language-bibtex">@article{mziegler2019fine-tuning,
title = {{Fine-Tuning Language Models from Human Preferences}},
author = {Daniel M. Ziegler and Nisan Stiennon and Jeffrey Wu and Tom B. Brown and Alec Radford and Dario Amodei and Paul F. Christiano and Geoffrey Irving},
year = 2019,
eprint = {arXiv:1909.08593}
}
</code></pre>
<p>Cite TRL as:</p>
<pre><code class="language-bibtex">@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
</code></pre>
</description>
<link>https://huggingface.co/ianyang02/ppo_model_qwen3-4b_aita_h200</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/ppo_model_qwen3-4b_aita_h200</guid>
<pubDate>Tue, 18 Nov 2025 16:21:51 GMT</pubDate>
</item>
<item>
<title>ianyang02/aita_qwen3-4b</title>
<description>ianyang02/aita_qwen3-4b Text Classification • 4B • Updated 21 days ago • 93<hr>
<p>base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: transformers
model_name: aita_qwen3-4b
tags:</p>
<ul>
<li>generated_from_trainer</li>
<li>trl</li>
<li>reward-trainer
licence: license</li>
</ul>
<hr>
<h1>Model Card for aita_qwen3-4b</h1>
<p>This model is a fine-tuned version of <a href="https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507">Qwen/Qwen3-4B-Instruct-2507</a>.
It has been trained using <a href="https://github.com/huggingface/trl">TRL</a>.</p>
<h2>Quick start</h2>
<pre><code class="language-python">from transformers import pipeline
text = "The capital of France is Paris."
rewarder = pipeline(model="ianyang02/aita_qwen3-4b", device="cuda")
output = rewarder(text)[0]
print(output["score"])
</code></pre>
<h2>Training procedure</h2>
<p><a href="https://wandb.ai/ianyang02-university-of-washington/huggingface/runs/ofxhxi45"><img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights &amp; Biases" width="150" height="24" referrerpolicy="no-referrer"></a></p>
<p>This model was trained with Reward.</p>
<h3>Framework versions</h3>
<ul>
<li>TRL: 0.24.0</li>
<li>Transformers: 4.57.1</li>
<li>Pytorch: 2.9.0</li>
<li>Datasets: 4.3.0</li>
<li>Tokenizers: 0.22.1</li>
</ul>
<h2>Citations</h2>
<p>Cite TRL as:</p>
<pre><code class="language-bibtex">@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
</code></pre>
</description>
<link>https://huggingface.co/ianyang02/aita_qwen3-4b</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/aita_qwen3-4b</guid>
<pubDate>Sun, 16 Nov 2025 01:32:06 GMT</pubDate>
</item>
<item>
<title>ianyang02/aita_fine_tuned_Qwen3_0.6B</title>
<description>ianyang02/aita_fine_tuned_Qwen3_0.6B Updated 25 days ago</description>
<link>https://huggingface.co/ianyang02/aita_fine_tuned_Qwen3_0.6B</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/aita_fine_tuned_Qwen3_0.6B</guid>
<pubDate>Tue, 11 Nov 2025 16:18:49 GMT</pubDate>
</item>
<item>
<title>ianyang02/aita_Qwen3-0.6B</title>
<description>ianyang02/aita_Qwen3-0.6B Text Classification • 0.6B • Updated 28 days ago • 50<hr>
<p>base_model: Qwen/Qwen3-0.6B
library_name: transformers
model_name: aita_Qwen3-0.6B
tags:</p>
<ul>
<li>generated_from_trainer</li>
<li>reward-trainer</li>
<li>trl
licence: license</li>
</ul>
<hr>
<h1>Model Card for aita_Qwen3-0.6B</h1>
<p>This model is a fine-tuned version of <a href="https://huggingface.co/Qwen/Qwen3-0.6B">Qwen/Qwen3-0.6B</a>.
It has been trained using <a href="https://github.com/huggingface/trl">TRL</a>.</p>
<h2>Quick start</h2>
<pre><code class="language-python">from transformers import pipeline
text = "The capital of France is Paris."
rewarder = pipeline(model="ianyang02/aita_Qwen3-0.6B", device="cuda")
output = rewarder(text)[0]
print(output["score"])
</code></pre>
<h2>Training procedure</h2>
<p><a href="https://wandb.ai/ianyang02-university-of-washington/huggingface/runs/7xwpy737"><img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights &amp; Biases" width="150" height="24" referrerpolicy="no-referrer"></a></p>
<p>This model was trained with Reward.</p>
<h3>Framework versions</h3>
<ul>
<li>TRL: 0.24.0</li>
<li>Transformers: 4.57.1</li>
<li>Pytorch: 2.9.0</li>
<li>Datasets: 4.3.0</li>
<li>Tokenizers: 0.22.1</li>
</ul>
<h2>Citations</h2>
<p>Cite TRL as:</p>
<pre><code class="language-bibtex">@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
</code></pre>
</description>
<link>https://huggingface.co/ianyang02/aita_Qwen3-0.6B</link>
<guid isPermaLink="false">https://huggingface.co/ianyang02/aita_Qwen3-0.6B</guid>
<pubDate>Sat, 08 Nov 2025 07:48:49 GMT</pubDate>
</item>
</channel>
</rss> |
|
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Involved Issue / 该 PR 相关 Issue
Close #
Example for the Proposed Route(s) / 路由地址示例
New RSS Route Checklist / 新 RSS 路由检查表
PuppeteerNote / 说明
在今天#20631 merge的基础上增加了模型的详细内容
Adding model detail on the base of #20631