# BitNet 

论文地址：https://arxiv.org/pdf/2504.12285

## 模型配置

参数量：~2 Billion  
训练token：4 Trillion  
上下文最大长度：4096 tokens
Tokenizer： LLaMA 3 Tokenizer (vocab size: 128,256)

## Architecture: Transformer-based, modified with BitLinear layers (BitNet framework)
* 采用 Rotary Position Embeddings (RoPE)   <br/><br/> 

    用于 Transformer 模型的位置编码方法，相对于传统的位置编码（如 sin/cos、可学习位置向量等）来说的一种新颖方式。<br/>
    传统的位置编码方法，是 将位置信息直接加或拼接到 token 的 embedding 上；而 RoPE 是将位置编码内置在 attention 中，<br/>
    通过旋转方式引入位置信息：<br/> 对 Query 和 Key 做一个基于位置的旋转变换，使得 Attention Score 中自然体现位置信息的差异。<br/><br/>

    **特性**：相对位置感知；无限长度扩展性（与 LLaMA 一起用于 extrapolation 到更长输入）；纯函数实现，无需学习参数；更适合多语言、大模型（已用于多个 SOTA 模型）  

    RoFormer: Enhanced Transformer with Rotary Position Embedding (论文地址：https://arxiv.org/abs/2104.09864) <br/><br/> 

* squared ReLU (ReLU²) activation in FFN layers (squared ReLU (ReLU²) <br/><br/> 
    是一种在 Transformer 中替代普通 ReLU 激活函数的变体，常用于 FFN 层（前馈 <br/>
神经网络 FeedForward Layer） 中。其基本思想是将 ReLU 的输出再平方一次，从而产生更稀疏、更集中的激活分布。)  <br/><br/> 
    
    **特性**:  <br/>
    更稀疏的激活(小于 1 的数平方变得更小，趋向于 0);  
    有助于表示能力(加强激活差异，提升模型表示力);  
    
* 采用 subln（Sub-Layer Normalization） 归一化：一种在 Transformer 的 子层（sub-layer）内使用的 LayerNorm 方式。  

* 线性层或者归一化层不适用偏置项  

## Usage example

In [1]:
!pip install git+https://github.com/shumingma/transformers.git

Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Collecting git+https://github.com/shumingma/transformers.git
  Cloning https://github.com/shumingma/transformers.git to /private/var/folders/_0/f08g_6v53fx_19c_xz7kcsnr0000gn/T/pip-req-build-nzipj877
  Running command git clone --filter=blob:none --quiet https://github.com/shumingma/transformers.git /private/var/folders/_0/f08g_6v53fx_19c_xz7kcsnr0000gn/T/pip-req-build-nzipj877
  Resolved https://github.com/shumingma/transformers.git to commit eb28a5caa9853d45266b1ea6548e8fb2cf2fb855
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Building wheels for collected packages: transformers
  Building wheel for transformers (pyproject.toml) ... [?25ldone
[?25h  Created wheel for transformers: filename=transformers-4.52.0.dev0-py3-none-any.whl size=11413960 sha256=5eb274d425cc2486173d5eaf8d7ab05f16e801000e880

In [11]:
# !pip uninstall torch -y
# !pip install torch==2.2.1 --force-reinstall
# !pip show transformers
# !pip uninstall torch torchvision torchaudio -y
!pip install torch==2.2.2  --index-url https://download.pytorch.org/whl/cpu

Looking in indexes: https://download.pytorch.org/whl/cpu
Collecting torch==2.2.2
  Downloading https://download.pytorch.org/whl/cpu/torch-2.2.2-cp310-none-macosx_10_9_x86_64.whl (151.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m151.0/151.0 MB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m00:01[0m00:03[0m
Installing collected packages: torch
Successfully installed torch-2.2.2


In [1]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "microsoft/bitnet-b1.58-2B-4T"

print(torch.__version__)
print(torch.__file__)
print(hasattr(torch, "get_default_device"))

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16)

message = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "How are you?"},
]

prompt = tokenizer.apply_chat_template(message, tokenize=False, add_special_tokens=True)
chat_input = tokenizer(prompt, return_tensors="pt").to(model.device)

chat_outputs = model.generate(**chat_input, max_new_tokens=50)
response = tokenizer.decode(chat_outputs[0][chat_input['input_ids'].shape[-1]:])
print("\nAssistant Response:", response)


A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.2.4 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "/Users/admin/miniconda3/envs/py310/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/admin/miniconda3/envs/py310/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/admin/miniconda3/envs/py310/lib/python3.10/site-packages/ipykernel_launcher.py", line 18, in <module>
    app.launch_new_instance()
  File "/Users/admin/miniconda3/envs/py310/lib/python3.10/site-packages/traitlets/config/application.py", line 10

2.2.2
/Users/admin/miniconda3/envs/py310/lib/python3.10/site-packages/torch/__init__.py
False


AttributeError: module 'torch' has no attribute 'get_default_device'