refactor of datautils #20

poedator · 2023-07-01T17:46:36Z

fixing bos, eos token ids when loading from decapoda-research llama models - this was cause by incorrect tokenized config and was causing errors with transformers >=4.29.0
combining old and new eval functions
separate loading of train and eval data using param:eval_mode (saves time)
combining args.dataset and args.custom_data_path into one option
loading parama and refinedweb using dataset option (since they both are included into this repo in a fixed location)
tests available in the private chat

datautils.py

poedator · 2023-07-03T18:42:50Z

lmeval.py

@@ -125,7 +124,7 @@ def main():
        num_fewshot=args.num_fewshot,
        batch_size=args.batch_size,
        device=args.device,
-        no_cache=args.no_cache,
+        no_cache=True,


no_cache is recommended to be set true in the tests. Also no_cache=False is not compatible with evaluating pre-loaded models (incorrect code in lm-eval). Now it is set to True by default for convenience, option removed.

poedator · 2023-07-21T12:32:49Z

reworked in #29

Godofnothing reviewed Jul 1, 2023

View reviewed changes

datautils.py Outdated Show resolved Hide resolved

Godofnothing reviewed Jul 1, 2023

View reviewed changes

datautils.py Show resolved Hide resolved

poedator force-pushed the datautils_upd branch from 89b5610 to 0fdf8ed Compare July 3, 2023 18:38

poedator commented Jul 3, 2023

View reviewed changes

poedator added 3 commits July 6, 2023 01:43

refactor of datautils

ab5bf4a

get_loaders() docstring updated

985386a

removed args.no_cache option, hardcoded True

d437062

poedator force-pushed the datautils_upd branch from 0fdf8ed to d437062 Compare July 5, 2023 22:47

poedator added 2 commits July 6, 2023 01:54

added printout of wbits_avg after quantiz

822a35d

restored custom_data_path option with deprecation warning

ee7daed

Vahe1994 mentioned this pull request Jul 17, 2023

[ptb perplexity is different from paper] #16

Open

poedator mentioned this pull request Jul 21, 2023

Datautils upd 2 #29

Merged

poedator closed this Jul 21, 2023

poedator deleted the datautils_upd branch August 5, 2023 16:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor of datautils #20

refactor of datautils #20

poedator commented Jul 1, 2023

poedator Jul 3, 2023

poedator commented Jul 21, 2023

refactor of datautils #20

refactor of datautils #20

Conversation

poedator commented Jul 1, 2023

poedator Jul 3, 2023

Choose a reason for hiding this comment

poedator commented Jul 21, 2023