Skip to content

SHI-Labs/CompactNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CompactNet

This repo is a slight modification on DeepMind's Advancing mathematics by guiding human intuition with AI

We noticed some issues when the KAN paper cited this work and found that the comparisons had some errors. We found that we could match KAN's 81.6% accuracy on this dataset with as few as 122 parameters. We did not make any major modifications to the DeepMind code. To achieve this result we only decreased the network size, used a random seed, and increased the training time. Keeping the same seed and keeping the same training cutoff we could get a matching result with a network with 204 parameters.

The table below depicts some of our results. There are some variances so numbers may change slightly during your runs. Running several times you should be quite similar to ours. These results maintain the same random seed and the same training limit.

Network Number of Hidden Neurons Number of Parameters Accuracy Pre Salient Accuracy Post Salient
[300, 300, 300] 900 190,214 81.38% 80.14%
[100, 100, 100] 300 23,414 82.79% 82.04%
[50, 50, 50] 150 6,714 85.13% 81.65%
[10, 10, 10] 30 554 84.45% 82.30%
[5, 5, 5] 15 234 83.06% 80.42%
[4, 4, 4] 12 182 76.73% 65.19%
[3, 3, 3] 9 134 66.33% 74.93%
[50, 50] 100 4,164 87.15% 82.65%
[10, 10] 20 444 83.02% 81.50%
[5, 5] 10 204 82.19% 81.33%
[4, 4] 8 162 81.89% 81.03%
[3, 3] 6 122 77.72% 76.24%
Baseline (direct calculate) 0 0 73.82% 73.82%
DeepMind's 4 layer reported 900 190,214 78% 78%
KAN N/A 200 81.6% 78.2%

Here are some results where we have changed the random seed and training length. We set num_training_steps to 50k for an arbitrarially long run and report how many steps before the network early stopped (Steps)

Network Number of Hidden Neurons Number of Parameters seed Accuarcy Pre Salient Steps Accuracy Post Salient Steps
[3,3] 6 122 552 81.60% 20700 81.69% 22100
[2,2] 4 84 8110 81.33% 22700 80.44% 23300

We also have advanced methods to train a two layer extremely small MLP in 1k steps that can achieve average performance (averaged over 10 times) of more thann 80%. You may check the models in the folder ckpt for details. You may run the following code for inference.

python test_ckpt.py -hn 3
python test_ckpt.py -hn 2
Network Number of Neurons Number of Parameters Accuarcy Pre Salient Steps Accuracy Post Salient Steps
[3] 3 110 82.42% ~1000 80.33% ~1000
[2] 2 78 80.85% ~1000 80.23% ~1000

Running

pip install -r requirements.txt

You will also need to install the dataset which requires having installed gsutil. If you install this, we will automatically download the dataset for you. Make sure gsutil is in your path before opening the notebook or it may not be able to download it for you.

Line 1 of the file contains the network definition where you should define how many hidden neurons you want per hidden layer. The length of the list determines the number of hidden layers. For example [2,2] means two hidden layers with 2 neurons each.