Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix to save the state of FTRL models #1912

Merged
merged 16 commits into from Jun 5, 2019

Conversation

bremen79
Copy link
Contributor

@bremen79 bremen79 commented Jun 5, 2019

I have made a fix to save the state of FTRL models. Currently the state is not saved at all because it calls the the gd save function that doesn't save the state unless it finds a gd structure, that is absent in ftrl models.

I have added a integer in the vw structure the indicates the size of the ftrl model to save and changed read/write of gd accordingly.

Note that the saving for Pistol and Coin betting still does not work: It correctly save the state, but the test fails. Not sure why, I think it is the loss in precision due to the saving and the exponential nature of these algorithms. On the other hand, the saving state for FTRL Proximal now works perfectly.

@@ -153,8 +153,9 @@ def do_test(filename, args, verbose=None, repeat_args=None, known_failure=False)
errors += do_test(filename, '--loss_function logistic --link logistic')
errors += do_test(filename, '--nn 2')
errors += do_test(filename, '--binary')
errors += do_test(filename, '--ftrl', known_failure=True)
errors += do_test(filename, '--ftrl')
errors += do_test(filename, '--pistol', known_failure=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are pistol and coin still in known failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. As I write above, the data structure is saved correctly, but the weights seem to be different if trained continuously or trained, saved, resumed. The difference does not happen immediately, but after some samples. This is why I suspect numerical issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This violates my understanding of computers so I suspect there is something that we're missing. I probably won't be able to debug before the next release, but it should get done...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to debug this issue a bit more

@@ -565,6 +565,7 @@ struct vw
bool adaptive; // Should I use adaptive individual learning rates?
bool normalized_updates; // Should every feature be normalized
bool invariant_updates; // Should we use importance aware/safe updates
uint32_t ftrl_size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we pass this as an argument to save_load instead? That seems more elegant than sticking it in the global data structure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I merged. Should I revert, or do you want to do a separate pull request?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found the bug, not sure what best way to fix it, see mail

@JohnLangford
Copy link
Member

JohnLangford commented Jun 5, 2019

The test 67 failure is fixed in the current patch that I'm working on.

@JohnLangford JohnLangford merged commit 538e9bb into VowpalWabbit:master Jun 5, 2019
JohnLangford added a commit that referenced this pull request Jun 5, 2019
JohnLangford added a commit that referenced this pull request Jun 6, 2019
yarikoptic added a commit to yarikoptic/vowpal_wabbit that referenced this pull request Jun 9, 2019
* tag '8.7.0': (354 commits)
  Update version to 8.7.0 (VowpalWabbit#1926)
  Fix misconfiguration (VowpalWabbit#1925)
  Ataymano ataymano/warnings fixes (VowpalWabbit#1924)
  Update new version script (VowpalWabbit#1922)
  Run clang-format over codebase (VowpalWabbit#1921)
  change semantics of lambda (VowpalWabbit#1920)
  Bremen79 fix save ftrl (VowpalWabbit#1919)
  fix for daemon race condition (VowpalWabbit#1918)
  Revert "Fix to save the state of FTRL models (VowpalWabbit#1912)" (VowpalWabbit#1916)
  fix static library build (VowpalWabbit#1913)
  more warnings (VowpalWabbit#1915)
  Fix to save the state of FTRL models (VowpalWabbit#1912)
  remove warnings (VowpalWabbit#1911)
  fix closing invalid file descriptor with memory_io_buf (VowpalWabbit#1910)
  Optional exception (VowpalWabbit#1906)
  Contextual Memory Tree (VowpalWabbit#1799)
  Coin betting (VowpalWabbit#1903)
  Ataymano/c wrapper fix2 (VowpalWabbit#1859)
  Use Appveyor MSBuildLogger (VowpalWabbit#1904)
  fix for no label confidence (VowpalWabbit#1901)
  ...
yarikoptic added a commit to yarikoptic/vowpal_wabbit that referenced this pull request Jun 9, 2019
* releases: (354 commits)
  Update version to 8.7.0 (VowpalWabbit#1926)
  Fix misconfiguration (VowpalWabbit#1925)
  Ataymano ataymano/warnings fixes (VowpalWabbit#1924)
  Update new version script (VowpalWabbit#1922)
  Run clang-format over codebase (VowpalWabbit#1921)
  change semantics of lambda (VowpalWabbit#1920)
  Bremen79 fix save ftrl (VowpalWabbit#1919)
  fix for daemon race condition (VowpalWabbit#1918)
  Revert "Fix to save the state of FTRL models (VowpalWabbit#1912)" (VowpalWabbit#1916)
  fix static library build (VowpalWabbit#1913)
  more warnings (VowpalWabbit#1915)
  Fix to save the state of FTRL models (VowpalWabbit#1912)
  remove warnings (VowpalWabbit#1911)
  fix closing invalid file descriptor with memory_io_buf (VowpalWabbit#1910)
  Optional exception (VowpalWabbit#1906)
  Contextual Memory Tree (VowpalWabbit#1799)
  Coin betting (VowpalWabbit#1903)
  Ataymano/c wrapper fix2 (VowpalWabbit#1859)
  Use Appveyor MSBuildLogger (VowpalWabbit#1904)
  fix for no label confidence (VowpalWabbit#1901)
  ...
yarikoptic added a commit to yarikoptic/vowpal_wabbit that referenced this pull request Jun 9, 2019
* dfsg: (354 commits)
  Update version to 8.7.0 (VowpalWabbit#1926)
  Fix misconfiguration (VowpalWabbit#1925)
  Ataymano ataymano/warnings fixes (VowpalWabbit#1924)
  Update new version script (VowpalWabbit#1922)
  Run clang-format over codebase (VowpalWabbit#1921)
  change semantics of lambda (VowpalWabbit#1920)
  Bremen79 fix save ftrl (VowpalWabbit#1919)
  fix for daemon race condition (VowpalWabbit#1918)
  Revert "Fix to save the state of FTRL models (VowpalWabbit#1912)" (VowpalWabbit#1916)
  fix static library build (VowpalWabbit#1913)
  more warnings (VowpalWabbit#1915)
  Fix to save the state of FTRL models (VowpalWabbit#1912)
  remove warnings (VowpalWabbit#1911)
  fix closing invalid file descriptor with memory_io_buf (VowpalWabbit#1910)
  Optional exception (VowpalWabbit#1906)
  Contextual Memory Tree (VowpalWabbit#1799)
  Coin betting (VowpalWabbit#1903)
  Ataymano/c wrapper fix2 (VowpalWabbit#1859)
  Use Appveyor MSBuildLogger (VowpalWabbit#1904)
  fix for no label confidence (VowpalWabbit#1901)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants