[Improve] Use map_fn and collate_fn to manage dataset and dataloader by LZHgrla · Pull Request #8 · InternLM/xtuner

LZHgrla · 2023-07-21T07:00:13Z

This PR is based on open-mmlab/mmengine#1262

…nternLM#8) * use global constants * refactor dataset map_fn * refactor collate_fn * fix bugs * add mmlu collator * add default pad_token_id for tokenizer * use print_log

* support llama3.1 * fix load jsonl * fix build_llm_model: set attn_implementation and torch_dtype

…mparison - Add generate_stress_pack_config: greedy packing with uniform [200,16000] token lengths - Add _MockDataset: satisfies JsonlDataset interface without file I/O - Add TestStress with 3 tests: - test_generate_stress_pack_config: validates NPY directory output - test_multiprocess_getitem: 8 fork'd processes with random index sampling, reports init time, RSS/PSS deltas, and __getitem__ latency per rank - test_mmap_memory_saving: two subprocesses compare load_config RSS/PSS/elapsed for mmap=True (0.2MB, 0.7ms) vs mmap=False (24MB, 7.5ms) - Updated feature_list.json: marked feature InternLM#8 as passing (8/8 complete) Made-with: Cursor

LZHgrla added 7 commits July 18, 2023 17:38

use global constants

3b47ba4

refactor dataset map_fn

4491c14

refactor collate_fn

b3e964a

fix bugs

af26ec2

add mmlu collator

fc80956

add default pad_token_id for tokenizer

f617ee0

use print_log

fc56a8f

LZHgrla merged commit c9e59bb into InternLM:main Jul 21, 2023

LZHgrla deleted the lzh/data branch July 21, 2023 07:10

HIT-cwh added a commit to HIT-cwh/xtuner that referenced this pull request Aug 5, 2024

Lite support llama 3.1 (InternLM#8)

add3b0b

* support llama3.1 * fix load jsonl * fix build_llm_model: set attn_implementation and torch_dtype

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improve] Use map_fn and collate_fn to manage dataset and dataloader #8

[Improve] Use map_fn and collate_fn to manage dataset and dataloader #8
LZHgrla merged 7 commits intoInternLM:mainfrom
LZHgrla:lzh/data

LZHgrla commented Jul 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LZHgrla commented Jul 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant