This repo inculdes the offical code in the paper Critical Data Size of Language Models from a Grokking Perspective.
torch
>= 2.0transformers
Execute the following command to re-produce our results:
sh run_grokking_on_imdb.sh
sh run_grokking_on_yelp.sh