Top-5%/top-64 computation #11

egracheva · 2022-09-20T06:21:40Z

Hello,

Thanks for a great paper.

When you compute top-5%/top-64 score (Tables 4, 11), how many architectures are there in total?
Is it 3000 architectures (only warmup) or the size of the entire dataset?

Cheers,
Ekaterina

vaenyr · 2022-09-20T09:37:26Z

From the top of my head I would say all models from the search space were included.
This is also what the notebook seems to suggest, although I wasn't the one to run these experiments.
Perhaps @mohsaied can verify.

mohsaied · 2022-09-20T13:36:35Z

Correct. They are the top-64 models in the entire search space. The idea is to quantify the degree by which zero-cost warmup improves the sampled architectures. If you took 64 random models then the number of top-5 models would simply be 5% of 64 = 3 models. However, when we use a zero-cost metric like synflow, and take the top 64 models in the search space, we increase that number significantly as shown in the tables.

So this comparison shows the best case scenario of zero-cost warmup. It would be interesting to also try it out with smaller warmup sizes as you suggested and that should be somewhat straightforward to do. If you end up doing this experiment, we'd love a pull request :)

egracheva · 2022-09-21T01:10:01Z

Thanks for your replies!

I was confused by the fact that these Tables are given in the "Warmup" section. Actually, I am still inclined to believe that the numbers are given for a random 3000 warmup.

Some time ago I plotted synflow metric vs accuracy, and the numbers in the tables did not seem to fit the shape of the cloud. Now I have double-checked and recomputed the value for the whole set (using the provided nasbench101_correlations notebook). My result with synflow for the top-5%/top64 is 4 for the whole search space of NAS-Bench-101 (compared to 12 given in the paper).

This is confirmed by the plots below:

Zoom:

My final aim is to compare my zero-cost metric to your results, and even though I have higher overall and top-10% correlations, my top-5%/top-64 for NAS-Bench-101 is also very low (I'd guess this is the nature of the benchmark, probably, not the optimal set of hyperparameters during the training, or not enough epochs.)

I think I can do multiple warmup test as you suggested some time later (soon).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Top-5%/top-64 computation #11

Top-5%/top-64 computation #11

egracheva commented Sep 20, 2022

vaenyr commented Sep 20, 2022

mohsaied commented Sep 20, 2022

egracheva commented Sep 21, 2022

Top-5%/top-64 computation #11

Top-5%/top-64 computation #11

Comments

egracheva commented Sep 20, 2022

vaenyr commented Sep 20, 2022

mohsaied commented Sep 20, 2022

egracheva commented Sep 21, 2022