# 2. CoLLiE 的使用分析

> 2.1 &ensp; 案例：指令微调 MOSS
> 
> 2.2 &ensp; CoLLiE 的 Config 模块
> 
> 2.3 &ensp; CoLLiE 的 Monitor 模块
> 
> 2.4 &ensp; CoLLiE 的 Evaluator 模块
> 
> 2.5 &ensp; CoLLiE 的 Trainer 模块

## 2.1 &ensp; 案例：指令微调 MOSS

&ensp; &ensp; 在上一章中，我们已经介绍了 CoLLiE 提出的背景、实现的功能 以及 包含的模块。

&ensp; &ensp; 写个LOMO、ZeRO3、微调MOSS

```sh
!CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --standalone example-tutorial-2-one-sentence-overfitting.py
```

## 2.2 &ensp; CoLLiE 的 Config 模块

&ensp; &ensp; config参数列表

| 名称 | 描述 |
|:----|:----|
| seed | int,42,Random seed that will be set at the beginning of training | 
| pp_size | int,1,Pipeline parallelism degree. | 
| tp_size | int,1,Tensor parallelism degree. | 
| dp_size | int,1,Data parallelism degree | 
| pp_partition_method | str,'parameters',Partition method for pipeline parallelism. Default is 'parameters'. | 
| train_epochs | int,100,Number of training epochs. | 
| eval_per_n_steps | int,0,Evaluate every n steps. | 
| eval_per_n_epochs | int,0,Evaluate every n epochs. | 
| train_micro_batch_size | int,1,Batch size (one step) for training. | 
| gradient_accumulation_steps | int,1,Number of gradient accumulation steps. | 
| eval_batch_size | int,1,Batch size for evaluation. | 
| checkpointing | bool,True,Whether to use activation checkpointing.
| use_flash | bool,True,Whether to use flash attention.
| dropout | float,0.0,Dropout probability. | 
| initization_method | str,"none",Initialization method. Possible values are 'none', 'normal', 'xavier_normal', 'xavier_uniform',<br> 'kaiming_normal', 'kaiming_uniform', 'orthogonal', 'sparse', 'eye', 'dirac'. | 
| initization_method_params | dict,None,Parameters for initialization method.
| low_cpu_mem_usage | bool,True,Tries to not use more than 1x model size in CPU memory (including peak memory) while loading the model. | 
| ds_config | Union[str, dict],"",DeepSpeed configuration file. | 
| model_config | PretrainedConfig,PretrainedConfig(),Model configuration. | 
| peft_config | PeftConfig,PeftConfig(),PEFT configuration. | 
| quantization_config | BitsAndBytesConfig,BitsAndBytesConfig(),Configuration parameters for the `bitsandbytes` library | 

&ensp; &ensp; 在`example-tutorial-2-one-sentence-overfitting.py`脚本中，

``` python
from collie.config import CollieConfig

pretrained_model = "fnlp/moss-moon-003-sft"  

config = CollieConfig.from_pretrained(pretrained_model, trust_remote_code=True)
config.tp_size = 1
config.dp_size = 1
config.pp_size = 1
config.train_epochs = 1
config.eval_per_n_steps = 0
config.eval_per_n_epochs = 1 
config.train_micro_batch_size = 2
config.eval_batch_size = 1
```

## 2.3 &ensp; CoLLiE 的 Monitor 模块

&ensp; &ensp; monitor模块列表

| 名称 | 描述 |
|:----|:----|
| BaseMonitor | 用于记录模型训练过程中的统计信息 |
| StepTimeMonitor | 用来记录每个step的时间 |
| NetworkIOMonitor | 用来记录每个step的网络带宽情况 |
| DiskIOMonitor | 用来记录每个step的硬盘读写情况 |
| TGSMonitor | 用来记录每秒每张 GPU 可训练的 token 数 (token / s / GPU) |
| CPUMemoryMonitor | 用来记录每个step的CPU内存占用 |
| MemoryMonitor | 用来记录每个step的内存占用 |
| LossMonitor | 用来记录每个step的loss |
| EvalMonitor | 用来记录每个step的eval结果，仅支持 int 和 float 类型的结果 |
| LRMonitor | 用来记录每个step的learning rate |

&ensp; &ensp; 在`example-tutorial-2-one-sentence-overfitting.py`脚本中，

``` python
from collie.utils.monitor import StepTimeMonitor, TGSMonitor, MemoryMonitor, LossMonitor, EvalMonitor

monitors = [
    StepTimeMonitor(config),
    TGSMonitor(config),
    MemoryMonitor(config),
    LossMonitor(config),
    EvalMonitor(config)
]
``` 

## 2.4 &ensp; CoLLiE 的 Evaluator 模块

&ensp; &ensp; evaluator模块列表

| 名称 | 描述 |
|:----|:----|
| Evaluator | | 
| EvaluatorForPerplexity | | 
| EvaluatorForClassfication | | 
| EvaluatorForGeneration | | 

&ensp; &ensp; metric模块列表

| 名称 | 描述 |
|:----|:----|
| BaseMetric | |
| DecodeMetric | |
| AccuracyMetric | |
| PPLMetric | |
| BleuMetric | |
| RougeMetric | |
| ClassifyFPreRecMetric | |

&ensp; &ensp; 在`example-tutorial-2-one-sentence-overfitting.py`脚本中，

``` python
from collie.controller.evaluator import EvaluatorForPerplexity, EvaluatorForGeneration

from collie.metrics import PPLMetric, DecodeMetric

evaluator_ppl = EvaluatorForPerplexity(
    model=model, config=config, dataset=eval_dataset,
    monitors=[EvalMonitor(config), ], metrics={'ppl': PPLMetric(), }
)

evaluator_decode = EvaluatorForGeneration(
    model=model, config=config, tokenizer=tokenizer, dataset=eval_dataset,
    monitors=[EvalMonitor(config), ], metrics={'decode': DecodeMetric(), }
)
``` 

## 2.5 &ensp; CoLLiE 的 Trainer 模块

&ensp; &ensp; trainer参数列表

| 名称 | 描述 |
|:----|:----|
| model | torch.nn.Module,, | 
| config | CollieConfig,, | 
| tokenizer | PreTrainedTokenizerBase,None, | 
| loss_fn | Callable,GPTLMLoss(), | 
| train_fn | Callable,None, | 
| eval_fn | Callable,None, | 
| optimizer | torch.optim.Optimizer,None, | 
| lr_scheduler | LRScheduler, DeepSpeedSchedulerCallable,None, | 
| train_dataset | torch.utils.data.Dataset,None, | 
| eval_dataset | torch.utils.data.Dataset,None, | 
| callbacks | Callback, List[Callback],None, | 
| train_dataset_collate_fn | Callable,ColliePadder(), | 
| eval_dataset_collate_fn | Callable,ColliePadder(padding_left=True), | 
| data_provider | BaseProvider,None, | 
| monitors | BaseMonitor,[], | 
| metrics | Dict,None, | 
| evaluators | List,None, | 

&ensp; &ensp; 在`example-tutorial-2-one-sentence-overfitting.py`脚本中，

``` python
from collie.controller.trainer import Trainer

trainer = Trainer(
    model=model,config=config,train_dataset=train_dataset,
    loss_fn=GPTLMLoss(-100),optimizer=optimizer,
    monitors=monitors, evaluators=[evaluator_ppl, evaluator_decode]
)

trainer.train()
```