- 
                Notifications
    
You must be signed in to change notification settings  - Fork 62
 
update vlm docs #435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update vlm docs #435
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| base: | ||
| seed: &seed 42 | ||
| model: | ||
| type: Llava OneVision | ||
| path: model path | ||
| torch_dtype: auto | ||
| eval: | ||
| eval_pos: [pretrain, transformed] | ||
| type: vqa | ||
| name: [mme] | ||
| download: False | ||
| path: MME dataset path | ||
| bs: 1 | ||
| inference_per_block: False | ||
| sparse: | ||
| method: TokenReduction | ||
| special: | ||
| method: HoliTom | ||
| RETAIN_RATIO: 0.20 | ||
| T: 0.65 | ||
| HOLITOM_k: 18 | ||
| HOLITOM_r: 0.5 | ||
| save: | ||
| save_trans: False | ||
| save_fake: False | ||
| save_path: /path/to/save/ | 
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,72 @@ | ||||||
| 
     | 
||||||
| 
     | 
||||||
| # Token Reduction | ||||||
| 
     | 
||||||
| LightCompress currently supports token reduction for mainstream multimodal large language models. Configuration is very simple—plug and play. | ||||||
| 
     | 
||||||
| Here is an example configuration | ||||||
| 
     | 
||||||
| ```yaml | ||||||
| base: | ||||||
| seed: &seed 42 | ||||||
| model: | ||||||
| type: Llava | ||||||
| path: model path | ||||||
| torch_dtype: auto | ||||||
| eval: | ||||||
| eval_pos: [pretrain, transformed] | ||||||
| type: vqa | ||||||
| name: [gqa, mmbench_en_dev, mme] | ||||||
| bs: 1 | ||||||
| inference_per_block: False | ||||||
| sparse: | ||||||
| method: TokenReduction | ||||||
| special: | ||||||
| method: FastV | ||||||
| pruning_loc: 3 | ||||||
| rate: 0.778 | ||||||
| save: | ||||||
| save_trans: False | ||||||
| save_fake: False | ||||||
| save_path: /path/to/save/ | ||||||
| ``` | ||||||
| The configuration file contains three core sections, including: | ||||||
| 1. **`model`** | ||||||
| For model selection, you can choose LLaVA, LLaVA-NeXT, Qwen2.5VL, and LLaVA OneVision, etc. These models cover both image and video tasks. For the detailed list of supported models, see the file. LightCompress will support more models in the future. | ||||||
| 
     | 
||||||
| 2. **`eval`** | ||||||
| For the `eval_pos` parameter: | ||||||
| - `pretrain` denotes the original model that keeps all visual tokens. | ||||||
| - `transformed` denotes the model with token reduction applied. | ||||||
| LightCompress integrates lmms-eval to evaluate various downstream datasets. Set `type` to `vqa`, and specify the datasets in `name` following the naming conventions in the lmms-eval documentation. | ||||||
| 
     | 
||||||
| 3. **`sparse`** | ||||||
| Set `method` to `TokenReduction` first, and then specify the concrete algorithm and related hyperparameters under `special`. Since each algorithm has different hyperparameters, refer to the configuration files for details. | ||||||
| 
     | 
||||||
| ## Combining Quantization | ||||||
| 
     | 
||||||
| LightCompress also supports an extreme compression scheme that combines token reduction with quantization. First, choose a quantization algorithm to save a `fake_qunat` model (see the quantization section of the docs). Then load this model and add the `token_reduction` field under `quant`. | ||||||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's a typo in  
        Suggested change
       
    
  | 
||||||
| 
     | 
||||||
| ```yaml | ||||||
| quant: | ||||||
| method: RTN | ||||||
| weight: | ||||||
| bit: 4 | ||||||
| symmetric: False | ||||||
| granularity: per_group | ||||||
| group_size: 128 | ||||||
| special: | ||||||
| actorder: True | ||||||
| static_groups: True | ||||||
| percdamp: 0.01 | ||||||
| blocksize: 128 | ||||||
| true_sequential: True | ||||||
| quant_out: True | ||||||
| token_reduction: | ||||||
| method: FastV | ||||||
| special: | ||||||
| pruning_loc: 3 | ||||||
| rate: 0.778 | ||||||
| ``` | ||||||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| 
          
            
          
           | 
    @@ -360,6 +360,26 @@ quant: | |||||
| static: True | ||||||
| ``` | ||||||
| 
     | 
||||||
| ## sparse | ||||||
| 
     | 
||||||
| <font color=792ee5> sparse.method </font> | ||||||
| 
     | 
||||||
| The name of the sparsification algorithm used. This includes both [model sparsification](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn) and [reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py) of visual tokens. All supported algorithms can be found in the corresponding files. | ||||||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's a typo in the file extension in the link to  
        Suggested change
       
    
  | 
||||||
| 
     | 
||||||
| It’s worth noting that for model sparsification, you need to specify the exact algorithm name, whereas for token reduction, you only need to set it to `TokenReduction` first, and then specify the exact algorithm under `special`. | ||||||
| 
     | 
||||||
| ```yaml | ||||||
| sparse: | ||||||
| method: Wanda | ||||||
| ``` | ||||||
| 
     | 
||||||
| ```yaml | ||||||
| sparse: | ||||||
| method: TokenReduction | ||||||
| special: | ||||||
| method: FastV | ||||||
| ``` | ||||||
| 
     | 
||||||
| ## save | ||||||
| 
     | 
||||||
| <font color=792ee5> save.save_vllm</font> | ||||||
| 
          
            
          
           | 
    ||||||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,68 @@ | ||||||
| # Token Reduction | ||||||
| 
     | 
||||||
| 目前LightCompress支持对主流的多模态大语言模型进行token reduction,配置十分简单,即插即用。 | ||||||
| 
     | 
||||||
| 下面是一个配置的例子 | ||||||
| 
     | 
||||||
| ```yaml | ||||||
| base: | ||||||
| seed: &seed 42 | ||||||
| model: | ||||||
| type: Llava | ||||||
| path: model path | ||||||
| torch_dtype: auto | ||||||
| eval: | ||||||
| eval_pos: [pretrain, transformed] | ||||||
| type: vqa | ||||||
| name: [gqa, mmbench_en_dev, mme] | ||||||
| bs: 1 | ||||||
| inference_per_block: False | ||||||
| sparse: | ||||||
| method: TokenReduction | ||||||
| special: | ||||||
| method: FastV | ||||||
| pruning_loc: 3 | ||||||
| rate: 0.778 | ||||||
| save: | ||||||
| save_trans: False | ||||||
| save_fake: False | ||||||
| save_path: /path/to/save/ | ||||||
| ``` | ||||||
| 配置文件中包含三大核心内容,包括: | ||||||
| 1. `model` | ||||||
| 在模型选择上,可以选择LLaVA,LLaVA-NeXT,Qwen2.5VL以及LLaVA OneVision等,这些模型涵盖了图像任务和视频任务,详细的模型支持列表可以查阅[文件](https://github.com/ModelTC/LightCompress/blob/main/llmc/models/__init__.py),未来LightCompress也会支持更多的模型。 | ||||||
| 
     | 
||||||
| 2. `eval` | ||||||
| 首先,在`eval_pos`参数的选择上,`pretrain`表示原始保留所有视觉token的模型,`transformed`表示应用相应算法进行token reduction的模型。LightCompress接入了[lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval)进行各种下游数据集测评,需要将`type`指定为`vqa`,`name`中的下游测评数据集参考lmms-eval[文档](https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/docs/current_tasks.md)中的命名方式。 | ||||||
| 
     | 
||||||
| 3. `sparse` | ||||||
| `method`需要首先指定为TokenReduction,在`special`中继续指定具体的算法以及相关的一些超参数。由于每个算法对应的超参数不同,详细的可以参考[配置文件](https://github.com/ModelTC/LightCompress/tree/main/configs/sparsification/methods)。 | ||||||
| 
     | 
||||||
| 
     | 
||||||
| ## 结合量化 | ||||||
| 
     | 
||||||
| LightCompress也支持同时使用token reduction和量化的极致压缩方案,首先需要选择量化算法存储一个`fake_qunat`模型,可以参考量化板块的文档。其次加载这个模型并在`quant`下加入`token_reduction`字段即可。 | ||||||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's a typo in  
        Suggested change
       
    
  | 
||||||
| 
     | 
||||||
| ```yaml | ||||||
| quant: | ||||||
| method: RTN | ||||||
| weight: | ||||||
| bit: 4 | ||||||
| symmetric: False | ||||||
| granularity: per_group | ||||||
| group_size: 128 | ||||||
| special: | ||||||
| actorder: True | ||||||
| static_groups: True | ||||||
| percdamp: 0.01 | ||||||
| blocksize: 128 | ||||||
| true_sequential: True | ||||||
| quant_out: True | ||||||
| token_reduction: | ||||||
| method: FastV | ||||||
| special: | ||||||
| pruning_loc: 3 | ||||||
| rate: 0.778 | ||||||
| ``` | ||||||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| 
          
            
          
           | 
    @@ -401,6 +401,26 @@ quant: | |||||
| granularity: per_token | ||||||
| ``` | ||||||
| 
     | 
||||||
| ## sparse | ||||||
| 
     | 
||||||
| <font color=792ee5> sparse.method </font> | ||||||
| 
     | 
||||||
| 使用的稀疏化算法名,这包含对[模型的稀疏化](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn)和对视觉token的[reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py),所有支持算法可以在文件中查看。 | ||||||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's a typo in the file extension in the link to  
        Suggested change
       
    
  | 
||||||
| 
     | 
||||||
| 值得说明的是针对模型稀疏化,需要指定具体的算法名称,而token reduction只需要先指定为`TokenReduction`,在`special`中继续指定具体的算法。 | ||||||
| 
     | 
||||||
| ```yaml | ||||||
| sparse: | ||||||
| method: Wanda | ||||||
| ``` | ||||||
| 
     | 
||||||
| ```yaml | ||||||
| sparse: | ||||||
| method: TokenReduction | ||||||
| special: | ||||||
| method: FastV | ||||||
| ``` | ||||||
| 
     | 
||||||
| ## save | ||||||
| 
     | 
||||||
| <font color=792ee5> save.save_vllm </font> | ||||||
| 
          
            
          
           | 
    ||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is missing a few helpful hyperlinks that are present in the Chinese version of the documentation. Adding them would improve the user experience by making it easier to navigate to related resources.