-
Notifications
You must be signed in to change notification settings - Fork 15
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
8e97752
commit ce1df03
Showing
18 changed files
with
73,421 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,20 @@ | ||
| # Omnigrok | ||
| Omnigrok: Grokking Beyond Algorithmic Data | ||
| # Omnigrok | ||
| This is the code repo for the paper: ["Omnigrok: Grokking Beyond Algorithmic Data"](https://openreview.net/forum?id=zDiHoIWa0q1), accpeted in ICLR 2023 as spotlight. We elucidate [the grokking phenomenon](https://arxiv.org/abs/2201.02177) from the perspective of loss landscapes, and show that grokking can not only happen for algorithmic datasets and toy teacher-students setups, but also for standard machine learning datasets (e.g., MNIST handwritten digits, IMDb movie reviews, QM9 molecule property predictions). | ||
|
|
||
| The examples used in this paper are relatively small-scale. We also make our codes as minimal as possible: each example is self-consistent, kept in a single folder. | ||
| |Examples| Figure in [paper](https://openreview.net/forum?id=zDiHoIWa0q1) | Folder | | ||
| |--|--|--| | ||
| |Teacher-student|Figure 2| ./teacher-student| | ||
| | MNIST handwritten digits | Figure 3 | ./mnist | | ||
| | IMDb Movie Reviews | Figure 4 | ./imdb | | ||
| | QM9 Molecule properties | Figure 5 | ./qm9 | | ||
| | Modular addition | Figure 6 & 8 | ./mod-addition | | ||
| | MNIST Representation | Figure 7 | ./mnist-repr | | ||
|
|
||
| For each example, we conduct two kinds of experiments: | ||
| * (1) reduced landscape analysis: the weight norm is fixed during training. | ||
| * (2) grokking experiments: the weight norm is not fixed during training (standard training). | ||
|
|
||
| Each folder (except for MNIST representation) contains two subfolders, for (1) "landscape" and (2) "grokking". | ||
|
|
||
|
|
Oops, something went wrong.