-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quirks that hold the model back #11
Comments
I'm currently investigating these quirks in fact! I'll talk about this more if my hunches are confirmed, but it might take a while. |
Any updates regarding quirks? Really interested in this topic |
Unfortunately not much interesting to report so far. I've tried several tweaks, to no avail. I'll continue experimenting for a while before I compile my results. |
One of the main suspects for my model's worse performance is weight initialization. I just pushed some new code that should allow for different kinds of weight initialization and make weight initialization closer to the original work (though we can't know for sure since there is no public code of how weight initialization was done). |
In Addendum: Evaluation of My Model you mention:
Are you willing to elaborate on this, and describe or fix the quirks? I think it would be really interesting/informative/useful for students of deep learning as a case study, showing how small non-obvious changes can make a big difference. Please consider doing so :) Thank you.
The text was updated successfully, but these errors were encountered: