## For installing dependencies:

In [None]:
pip install -r requirements.txt

## Code Demonstration

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation.utils import GenerationConfig
tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan2-13B-Chat", use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan2-13B-Chat", device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
model.generation_config = GenerationConfig.from_pretrained("baichuan-inc/Baichuan2-13B-Chat")
messages = []
messages.append({"role": "user", "content": "解释一下“温故而知新”"})
response = model.chat(tokenizer, messages)
print(response)


## Python code for Base model inference:


In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan2-13B-Base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan2-13B-Base", device_map="auto", trust_remote_code=True)

inputs = tokenizer('Climbing the Stork Tower->Wang Zhihuan\n A Night Rain Sent North->', return_tensors='pt')
inputs = inputs.to('cuda:0')
pred = model.generate(**inputs, max_new_tokens=64, repetition_penalty=1.1)

print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))


The model is loaded specifying device_map='auto' to use all available GPUs. If you need to specify the devices to use, you can control this with an approach like export CUDA_VISIBLE_DEVICES=0,1 to use GPUs 0 and 1.

## For the command line tool:

In [None]:
python cli_demo.py


This command line tool is designed for the Chat scenario, so using this tool with the Base model is not supported.

## For the web demo:

In [None]:
streamlit run web_demo.py


Running the above command with Streamlit will start a local web service. Enter the address provided by the console into a browser to access it. This web demo tool is designed for the Chat scenario, so it does not support calling the Base model.

## Model Fine-tuning

### Dependency Installation

In [None]:
git clone https://github.com/baichuan-inc/Baichuan2.git
cd Baichuan2/fine-tune
pip install -r requirements.txt


For lightweight fine-tuning methods like LoRA, additional installation is required from [peft].
For training acceleration using xFormers, additional installation is required from [xFormers].

### Single Machine Training

Here is an example of fine-tuning Baichuan2-7B-Base on a single machine:

Training data: data/belle_chat_ramdon_10k.json, this sample data is drawn from 10,000 entries and format-transformed from [multiturn_chat_0.8M] on huggingface.co. This is mainly to show how to train on multi-turn data, the effect is not guaranteed.

In [None]:
hostfile=""
deepspeed --hostfile=$hostfile fine-tune.py  \
    --report_to "none" \
    --data_path "data/belle_chat_ramdon_10k.json" \
    --model_name_or_path "baichuan-inc/Baichuan2-7B-Base" \
    --output_dir "output" \
    --model_max_length 512 \
    --num_train_epochs 4 \
    --per_device_train_batch_size 16 \
    --gradient_accumulation_steps 1 \
    --save_strategy epoch \
    --learning_rate 2e-5 \
    --lr_scheduler_type constant \
    --adam_beta1 0.9 \
    --adam_beta2 0.98 \
    --adam_epsilon 1e-8 \
    --max_grad_norm 1.0 \
    --weight_decay 1e-4 \
    --warmup_ratio 0.0 \
    --logging_steps 1 \
    --gradient_checkpointing True \
    --deepspeed ds_config.json \
    --bf16 True \
    --tf32 True


### Multi-Machine Training

For multi-machine training, provide a hostfile like the following:

In [None]:
ip1 slots=8
ip2 slots=8
ip3 slots=8
ip4 slots=8
...
hostfile="/path/to/hostfile"
deepspeed --hostfile=$hostfile fine-tune.py  \
    --report_to "none" \
    --data_path "data/belle_chat_ramdon_10k.json" \
    --model_name_or_path "baichuan-inc/Baichuan2-7B-Base" \
    --output_dir "output" \
    --model_max_length 512 \
    --num_train_epochs 4 \
    --per_device_train_batch_size 16 \
    --gradient_accumulation_steps 1 \
    --save_strategy epoch \
    --learning_rate 2e-5 \
    --lr_scheduler_type constant \
    --adam_beta1 0.9 \
    --adam_beta2 0.98 \
    --adam_epsilon 1e-8 \
    --max_grad_norm 1.0 \
    --weight_decay 1e-4 \
    --warmup_ratio 0.0 \
    --logging_steps 1 \
    --gradient_checkpointing True \
    --deepspeed ds_config.json \
    --bf16 True \
    --tf32 True


### Lightweight Fine-Tuning
The code already supports lightweight fine-tuning such as LoRA, to use it just add the following parameter to the script above:

In [None]:
--use_lora True
from peft import AutoPeftModelForCausalLM
model = AutoPeftModelForCausalLM.from_pretrained("output")
