This project provides a testing environment for Microsoft's BitNet b1.58 2B4T model - the first open-source, native 1-bit Large Language Model at the 2-billion parameter scale.
- Official BitNet Repository - Official inference framework for 1-bit LLMs
- Hugging Face Model - BitNet b1.58 2B4T model on Hugging Face
- Model: BitNet b1.58 2B4T
- Parameters: ~2 Billion
- Training Tokens: 4 Trillion
- Context Length: 4096 tokens
- Quantization: Native 1.58-bit weights, 8-bit activations
- Architecture: Transformer with BitLinear layers
- License: MIT
- Python 3.9+
- uv package manager
-
Clone and navigate to the project:
git clone https://github.com/amithegde/bitnet-test cd bitnet-test -
Install dependencies using uv:
# Create virtual environment and install all dependencies uv sync -
Run your first test:
# Basic functionality test uv run python test_basic.py # Comprehensive test with multiple scenarios uv run python test_comprehensive.py
For Windows users, you can also use the provided batch script:
# Run the installation script
install.bat
# Then run tests
uv run python test_basic.pyRun the basic functionality test:
uv run python test_basic.pyRun the full test suite with memory monitoring and multiple scenarios:
uv run python test_comprehensive.pyimport torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model and tokenizer
model_id = "microsoft/bitnet-b1.58-2B-4T"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="cpu",
trust_remote_code=False
)
# Create conversation
messages = [
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "Hello! How are you?"},
]
# Generate response
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[-1]:], skip_special_tokens=True)
print(response)transformers library for testing purposes only. For production use and optimal performance, you should use the dedicated bitnet.cpp implementation.
The current transformers implementation:
- Does NOT provide the efficiency benefits of BitNet
- May have similar or worse performance than full-precision models
- Is intended for testing and experimentation only
For optimal performance, use the official C++ implementation:
- Repository: microsoft/BitNet
- Provides the actual efficiency benefits (memory, energy, latency)
- Optimized for BitNet's 1.58-bit quantization
Several model variants are available:
- microsoft/bitnet-b1.58-2B-4T: Packed 1.58-bit weights (use for deployment)
- microsoft/bitnet-b1.58-2B-4T-bf16: Master weights in BF16 (use for training/fine-tuning)
- microsoft/bitnet-b1.58-2B-4T-gguf: GGUF format (compatible with bitnet.cpp)
According to the official evaluation, BitNet b1.58 2B4T shows:
- Memory Usage: 0.4GB (vs 2-4.8GB for comparable models)
- CPU Latency: 29ms (vs 41-124ms for comparable models)
- Energy Usage: 0.028J (vs 0.186-0.649J for comparable models)
- CUDA out of memory: Use
device_map="auto"or run on CPU - Slow inference: Expected with transformers - use bitnet.cpp for speed
- Installation issues: Ensure you're using the correct transformers version
- RAM: At least 8GB recommended (model uses ~5GB)
- GPU: Optional, but recommended for faster inference
- Storage: ~2GB for model weights
- Python: 3.9+ required
This project now includes a complete RAG (Retrieval-Augmented Generation) system that can automatically generate educational flashcards from Wikipedia articles using BitNet!
# Run the interactive RAG system
uv run python rag_system.py
# Or try the demo
uv run python demo_rag.py
# Test individual components
uv run python test_rag_system.py wikipedia
uv run python test_rag_system.py flashcards
uv run python test_rag_system.py full- Wikipedia Processing: Automatically fetches and processes Wikipedia articles
- Intelligent Chunking: Breaks down long articles into manageable chunks
- BitNet Integration: Uses BitNet for efficient flashcard generation
- Multiple Card Types: Content-specific and summary flashcards
- Export Options: Save flashcards to JSON format
See RAG_README.md for complete documentation.
bitnet-test/
├── test_basic.py # Basic functionality test
├── test_comprehensive.py # Comprehensive testing suite
├── rag_system.py # Main RAG application
├── wikipedia_processor.py # Wikipedia content processing
├── flashcard_generator.py # BitNet flashcard generation
├── test_rag_system.py # RAG system tests
├── demo_rag.py # RAG system demo
├── RAG_README.md # RAG system documentation
├── pyproject.toml # Project configuration & dependencies
├── install.bat # Windows installation script (optional)
├── README.md # This documentation
├── agents.md # Testing agents documentation
└── uv.lock # Dependency lock file
This project is licensed under the MIT License, same as the BitNet model.