Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge branch 'main' into 'docs' #463

Merged
merged 60 commits into from
Mar 11, 2024
Merged

Conversation

LZHgrla
Copy link
Collaborator

@LZHgrla LZHgrla commented Mar 11, 2024

No description provided.

LZHgrla and others added 30 commits January 11, 2024 23:29
* update

* update cfgs

* update

* fix bugs

* upload docs

* rename

* update

* Revert "update cfgs"

This reverts commit 93966aa.

* update cfgs

* update

* rename

* rename

* fix bc

* fix stop_word

* fix

* fix

* Update prompt_template.md
* fix bugs

* Update mmbench.py
* support deepseek moe

* update docs

* update

* update
* update examples

* add examples

* add json template config

* rename

* update

* update

* update
* add cfgs

* add internlm2 template

* add dispatch

* add docs

* update readme

* update
* accelerate cli

* Update entry_point.py

* Update entry_point.py

---------

Co-authored-by: Zhihao Lin <36994684+LZHgrla@users.noreply.github.com>
* fix

* update

* Update README.md

* Update README_zh-CN.md
* update

* Update README.md

* Update README.md

* Update README.md

* Update README_zh-CN.md

* update

* update

* fix pre-commit

* update
* add new loop

* rename

* fix pre-commit

* add max_keep_ckpts

* fix

* update cfgs

* update examples

* fix

* update

* update llava

* update

* update

* update

* update
* support petrelfs

* fix deepspeed save/load/resume

* add ENV to toggle petrelfs

* support hf save_pretrained

* patch deepspeed engine
* support ddp mmbench evaluate

* Update xtuner/tools/mmbench.py

Co-authored-by: Zhihao Lin <36994684+LZHgrla@users.noreply.github.com>

* Update xtuner/tools/mmbench.py

Co-authored-by: Zhihao Lin <36994684+LZHgrla@users.noreply.github.com>

* update minimum version of mmengine

* Update runtime.txt

---------

Co-authored-by: Zhihao Lin <36994684+LZHgrla@users.noreply.github.com>
* add local_attn_args_to_messagehub_hook

* add internlm repo sampler

* add internlm repo dataset and collate_fn

* dispatch internlm1 and internlm2 local attn

* add internlm2 config

* add internlm1 and intenrlm2 config

* add internlm2 template

* fix replace_internlm1_rote bugs

* add internlm1 and internlm2 config templates

* change priority of EvaluateChatHook

* fix docs

* fix config

* fix bug

* set rotary_base according the latest internlm2 config

* add llama local attn

* add llama local attn

* update intern_repo_dataset docs when using aliyun

* support using both hf load_dataset and intern_repo packed_dataset

* add configs

* add opencompass doc

* update opencompass doc

* use T data order

* use T data order

* add config

* add a tool to get data order

* support offline processing untokenized dataset

* add docs

* add doc about only saving model weights

* add doc about only saving model weights

* dispatch mistral

* add mistral template

* add mistral template

* fix torch_dtype

* reset pre-commit-config

* fix config

* fix internlm_7b_full_intern_repo_dataset_template

* update local_attn to varlen_attn

* rename local_attn

* fix InternlmRepoSampler and train.py to support resume

* modify Packer to support varlen attn

* support varlen attn in default pipeline

* update mmengine version requirement to 0.10.3

* Update ceph.md

* delete intern_repo_collate_fn

* delete intern_repo_collate_fn

* delete useless files

* assert pack_to_max_length=True if use_varlen_attn=True

* add varlen attn doc

* add varlen attn to configs

* delete useless codes

* update

* update

* update configs

* fix priority of ThroughputHook and flake8 ignore W504

* using map_fn to set length attr to dataset

* support split=None in process_hf_dataset

* add dataset_format_mapping

* support preprocess ftdp and normal dataset

* refactor process_hf_dataset

* support pack dataset in process_untokenized_datasets

* add xtuner_dataset_timeout

* using gloo backend for monitored barrier

* set gloo timeout

* fix bugs

* fix configs

* refactor intern repo dataset docs

* fix doc

* fix lint

---------

Co-authored-by: pppppM <67539920+pppppM@users.noreply.github.com>
Co-authored-by: pppppM <gjf_mail@126.com>
HIT-cwh and others added 27 commits January 30, 2024 15:29
…` and rename 'internlm_repo' to 'intern_repo' (InternLM#372)

* fix

* rename internlm_repo to intern_repo

* add InternlmRepoSampler for preventing bc break

* add how to install flash_attn to doc
…nLM#379)

* delete useless codes

* refactor process_untokenized_datasets: add ftdp to dataset-format

* fix lint
…ernLM#381)

support flash attn 2 in internlm1, internlm2 and llama
…#385)

* support saving eval output before save checkpoint

* refactor
* fix lr scheduler setting

* fix more

---------

Co-authored-by: zilong.guo <zilong.guo@zeron.ai>
Co-authored-by: LZHgrla <linzhihao@pjlab.org.cn>
* rename

* update docs

* update template

* update

* add cfgs

* update

* update
…ternLM#404)

* [Fix] Fix no space in chat output using InternLM2. (InternLM#357)

* Update chat.py

* Update utils.py

* Update utils.py

* fix pre-commit

---------

Co-authored-by: Zhihao Lin <36994684+LZHgrla@users.noreply.github.com>
Co-authored-by: LZHgrla <linzhihao@pjlab.org.cn>
…NEL environment variable (InternLM#411)

* dispatch support transformers>=4.36

* add USE_TRITON_KERNEL environment variable

* raise RuntimeError use triton kernels on cpu

* fix lint
* [Feature]Add InternLM2-Chat-1_8b full config

* [Feature]Add InternLM2-Chat-1_8b full config

* update

---------

Co-authored-by: LZHgrla <linzhihao@pjlab.org.cn>
Co-authored-by: Zhihao Lin <36994684+LZHgrla@users.noreply.github.com>
* added gemma config and template

* check config and make sure the consistancy

* Update xtuner/configs/gemma/gemma_2b_base/gemma_2b_base_qlora_alpaca_e3.py

Co-authored-by: Zhihao Lin <36994684+LZHgrla@users.noreply.github.com>

* Update xtuner/configs/gemma/gemma_2b_base/gemma_2b_base_full_alpaca_e3.py

Co-authored-by: Zhihao Lin <36994684+LZHgrla@users.noreply.github.com>

* Update xtuner/configs/gemma/gemma_7b_base/gemma_7b_base_full_alpaca_e3.py

Co-authored-by: Zhihao Lin <36994684+LZHgrla@users.noreply.github.com>

* Update xtuner/configs/gemma/gemma_7b_base/gemma_7b_base_qlora_alpaca_e3.py

Co-authored-by: Zhihao Lin <36994684+LZHgrla@users.noreply.github.com>

* Update xtuner/utils/templates.py

Co-authored-by: Zhihao Lin <36994684+LZHgrla@users.noreply.github.com>

* update

* added  required version

* update

* update

---------

Co-authored-by: Zhihao Lin <36994684+LZHgrla@users.noreply.github.com>
Co-authored-by: LZHgrla <linzhihao@pjlab.org.cn>
* add base dataset

* update dataset generation

* update refcoco

* add convert refcooc

* add eval_refcoco

* add config

* update dataset

* fix bug

* fix bug

* update data prepare

* fix error

* refactor eval_refcoco

* fix bug

* fix error

* update readme

* add entry_point

* update config

* update config

* update entry point

* update

* update doc

* update

---------

Co-authored-by: jacky <jacky@xx.com>
* Update version.py

* Update version.py
…nternLM#410)

* support smart_tokenizer_and_embedding_resize

* replace ast with json.loads

* support list_dataset_format cli

* add doc about ftdp and custom dataset

* add custom dataset template

* add args name to process_hf_dataset

* use new process_untokenized_datasets

* support tokenize_ftdp_datasets

* add mistral_7b_w_tokenized_dataset config

* update doc

* update doc

* add comments

* fix data save path

* smart_tokenizer_and_embedding_resize support zero3

* fix lint

* add data format to internlm2_7b_full_finetune_custom_dataset_e1.py

* add a data format example to configs associated with finetuning custom dataset

* add a data format example to configs associated with finetuning custom dataset

* fix lint
* split finetune_custom_dataset.md to 6 parts

* refactor custom_dataset and ftdp_dataset related docs

* fix comments
@LZHgrla LZHgrla changed the base branch from main to docs March 11, 2024 11:57
@LZHgrla LZHgrla merged commit c360cb8 into InternLM:docs Mar 11, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet