quirks that hold the model back #11

murpen · 2019-06-15T14:47:12Z

In Addendum: Evaluation of My Model you mention:

Although I used the same amount of hardware (or more), the differences in my training setup and hyperparameters made a significant difference. Which is an unfortunate reality to anyone familiar with reproducing deep learning papers. I don’t think my model in its current state is even as dangerous as 117M in its text generating abilities. But I believe to have found the quirks in my setup that have held the model back, and they are easy to fix.

Are you willing to elaborate on this, and describe or fix the quirks? I think it would be really interesting/informative/useful for students of deep learning as a case study, showing how small non-obvious changes can make a big difference. Please consider doing so :) Thank you.

ConnorJL · 2019-06-16T13:12:31Z

I'm currently investigating these quirks in fact! I'll talk about this more if my hunches are confirmed, but it might take a while.

Lerbytech · 2019-06-27T13:23:22Z

Any updates regarding quirks? Really interested in this topic

ConnorJL · 2019-06-28T09:20:55Z

Unfortunately not much interesting to report so far. I've tried several tweaks, to no avail. I'll continue experimenting for a while before I compile my results.

ConnorJL · 2019-07-12T11:20:57Z

One of the main suspects for my model's worse performance is weight initialization. I just pushed some new code that should allow for different kinds of weight initialization and make weight initialization closer to the original work (though we can't know for sure since there is no public code of how weight initialization was done).

ConnorJL closed this as completed Jun 16, 2019

ConnorJL reopened this Jun 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quirks that hold the model back #11

quirks that hold the model back #11

murpen commented Jun 15, 2019

ConnorJL commented Jun 16, 2019

Lerbytech commented Jun 27, 2019

ConnorJL commented Jun 28, 2019

ConnorJL commented Jul 12, 2019

quirks that hold the model back #11

quirks that hold the model back #11

Comments

murpen commented Jun 15, 2019

ConnorJL commented Jun 16, 2019

Lerbytech commented Jun 27, 2019

ConnorJL commented Jun 28, 2019

ConnorJL commented Jul 12, 2019