Add needle in haystack test#121
Conversation
maxjeblick
left a comment
There was a problem hiding this comment.
Thanks a lot for the PR!
I have one question/comment:
In your opinion, would it make sense to create the needle datasets beforehand (like ruler, etc.), upload them and reuse them?
Uploading dataset beforehand wouldn't allow on-the-fly benchmarking with arbitrary context_length/needle_depth combinations.
IMO, for kvpress, having ~40 = 5 * 8 combinations of context_length/needle_depth should probably be enough. I'm not concerned of having a common tokenizer for the dataset, but please feel free to chime in on this.
The change would improve the code quality of the PR, as evaluate.py is now also responsible for dataset creation. WIth precomputed datasets, no changes to evaluate.py are needed.
There was a problem hiding this comment.
Alertnatively, one could
- add a create dataset script in needle_in_a_haystack folder.
- encode the
needle_depthasdataset = needle_in_haystack_needle_depth - in
load_datasethave anif-elseblock, loading the needle dataset if dataset starts with neddle_in_haystack - Remove
_insert_needle_in_haystackmethod
By this,
- the code changes in
evaluate.pyare slim - no new parameter is introduced
|
I see your point, and I also thought of this, but decided to do it this way for 2 main reasons:
|
|
Ok mkaes sense. |
|
On second thought, it might make sense to move the |
|
Done :) |
|
Thanks @maxjeblick for your feedback, updates:
|
|
Sorry @maxjeblick had to fix a typo, should be ok now 😄 |
PR description
This small PR adds the standard NIAH test to the benchmarks. This test allows to stress test the model at different context lengths and needle depths.
Checklist
make test)make style, on errors try fix withmake format)git commit -smypress_press.pyis in thepressesdirectoryMyPressis in__init__.pyREADME.mdis updated with a 1 liner about the new press in the Available presses sectiondefault_presseslist intests/default_presses.py