Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quirks that hold the model back #11

Open
murpen opened this issue Jun 15, 2019 · 4 comments
Open

quirks that hold the model back #11

murpen opened this issue Jun 15, 2019 · 4 comments

Comments

@murpen
Copy link

murpen commented Jun 15, 2019

In Addendum: Evaluation of My Model you mention:

Although I used the same amount of hardware (or more), the differences in my training setup and hyperparameters made a significant difference. Which is an unfortunate reality to anyone familiar with reproducing deep learning papers. I don’t think my model in its current state is even as dangerous as 117M in its text generating abilities. But I believe to have found the quirks in my setup that have held the model back, and they are easy to fix.

Are you willing to elaborate on this, and describe or fix the quirks? I think it would be really interesting/informative/useful for students of deep learning as a case study, showing how small non-obvious changes can make a big difference. Please consider doing so :) Thank you.

@ConnorJL
Copy link
Owner

I'm currently investigating these quirks in fact! I'll talk about this more if my hunches are confirmed, but it might take a while.

@ConnorJL ConnorJL reopened this Jun 16, 2019
@Lerbytech
Copy link

Any updates regarding quirks? Really interested in this topic

@ConnorJL
Copy link
Owner

Unfortunately not much interesting to report so far. I've tried several tweaks, to no avail. I'll continue experimenting for a while before I compile my results.

@ConnorJL
Copy link
Owner

One of the main suspects for my model's worse performance is weight initialization. I just pushed some new code that should allow for different kinds of weight initialization and make weight initialization closer to the original work (though we can't know for sure since there is no public code of how weight initialization was done).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants