You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the most surprising puzzles in neural network generalisation isgrokking: a network with perfect training accuracy but poor generalisationwill, upon further training, transition to perfect generalisation. We proposethat grokking occurs when the task admits a generalising solution and amemorising solution, where the generalising solution is slower to learn butmore efficient, producing larger logits with the same parameter norm. Wehypothesise that memorising circuits become more inefficient with largertraining datasets while generalising circuits do not, suggesting there is acritical dataset size at which memorisation and generalisation are equallyefficient. We make and confirm four novel predictions about grokking, providingsignificant evidence in favour of our explanation. Most strikingly, wedemonstrate two novel and surprising behaviours: ungrokking, in which a networkregresses from perfect to low test accuracy, and semi-grokking, in which anetwork shows delayed generalisation to partial rather than perfect testaccuracy.
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)
The text was updated successfully, but these errors were encountered: