Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quick patch __code__ #1352

Merged
merged 14 commits into from Apr 3, 2020
Merged

quick patch __code__ #1352

merged 14 commits into from Apr 3, 2020

Conversation

williamFalcon
Copy link
Contributor

fixing pickle error from earlier PR

@mergify mergify bot requested a review from a team April 2, 2020 22:42
@Borda
Copy link
Member

Borda commented Apr 2, 2020

@bmartinn with removing the line above we are getting Trains issue:

Traceback (most recent call last):
  File "/home/software/miniconda3/envs/pl10/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/home/software/miniconda3/envs/pl10/lib/python3.7/site-packages/trains/backend_interface/logger.py", line 197, in run
    while not self._exit_event.wait(period or 1.0):
  File "/home/software/miniconda3/envs/pl10/lib/python3.7/site-packages/trains/logger.py", line 522, in flush
    return self._task.flush()
  File "/home/software/miniconda3/envs/pl10/lib/python3.7/site-packages/trains/task.py", line 751, in flush
    LoggerRoot.flush()
  File "/home/software/miniconda3/envs/pl10/lib/python3.7/site-packages/trains/debugging/log.py", line 92, in flush
    h.flush()
  File "/home/software/miniconda3/envs/pl10/lib/python3.7/logging/__init__.py", line 1009, in flush
    self.stream.flush()
ValueError: I/O operation on closed file.

@Borda Borda added the bug Something isn't working label Apr 2, 2020
@Borda Borda added this to the 0.7.2 milestone Apr 2, 2020
@pep8speaks
Copy link

pep8speaks commented Apr 3, 2020

Hello @williamFalcon! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-04-03 11:33:59 UTC

@bmartinn
Copy link
Contributor

bmartinn commented Apr 3, 2020

@Borda is this related to the TrainsLogger ? How do I reproduce this bug?

@Borda Borda added the priority: 0 High priority task label Apr 3, 2020
@williamFalcon williamFalcon merged commit 2eca8a9 into master Apr 3, 2020
@Borda Borda deleted the fix branch April 3, 2020 12:44
@bmartinn
Copy link
Contributor

bmartinn commented Apr 3, 2020

@williamFalcon @Borda I just tested the merged master, seems to pass the test_trains.py.
Is the bug still there?

@Borda
Copy link
Member

Borda commented Apr 3, 2020

According to CI everything is passing new - http://35.192.60.23/PyTorchLightning/pytorch-lightning/1049

@bmartinn
Copy link
Contributor

bmartinn commented Apr 3, 2020

@Borda Can I assume this is not relevant anymore ?
out of curiosity, what caused the issue in the first place? and what exactly was executed triggering the aforementioned trace, I could not locate any execution path leading to it...

@williamFalcon
Copy link
Contributor Author

williamFalcon commented Apr 3, 2020

@bmartinn

=================================== FAILURES ===================================
___________ [doctest] pytorch_lightning.loggers.trains.TrainsLogger ____________
059 
060     Examples:
061         >>> logger = TrainsLogger("lightning_log", "my-test", output_uri=".")  # doctest: +ELLIPSIS
062         TRAINS Task: ...
063         TRAINS results page: https://demoapp.trains.allegro.ai/.../log
064         >>> logger.log_metrics({"val_loss": 1.23}, step=0)
065         >>> logger.log_text("sample test")
066         sample test
067         >>> import numpy as np
068         >>> logger.log_artifact("confusion matrix", np.ones((2, 3)))
Expected nothing
Got:
    TRAINS Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring

Still very relevant. we get these sporadic failures in tests which are caused by trains.

@Borda maybe it's best to wait until trains is stable and well tested? otherwise we're going to spend way too much time blocked by these failures

@bmartinn
Copy link
Contributor

bmartinn commented Apr 3, 2020

@williamFalcon @Borda , for the CI test you have to use TrainsLogger.set_bypass_mode(True) otherwise it will have to have a backend-server to communicate with...
This is why we added it in the test_trains.py tests.

Should TrainsLogger.set_bypass_mode(True) added to the docstring as well?

@Borda
Copy link
Member

Borda commented Apr 3, 2020

but the bypass is not it shall be used, right? meaning we do not want to have it in the example...

@bmartinn
Copy link
Contributor

bmartinn commented Apr 3, 2020

@Borda True but Trains does a lot of stuff in the background, monitoring mostly, and this will break CI as output will be changing from run to run. Also we cannot move everything to stderr, because this will break projects using Trains and monitoring stderr ...

A few ideas on a solution:

  1. Add a flag (OS environment) controlling the bypass mode (to be used in the CI, or using an OS environment that CI already sets, in order to automatically switch to bypass_mode)
  2. Use set_bypass_mode in the doctest and add a remark that it should be used for "offline/CI mode"
  3. Add a flag (OS environment) piping all the Trains messages to stderr instead of stdout.

I think (1) makes most sense, this should be a very quick fix.
What do you think?

EDIT:
Is there an OS environment automatically set in your CI process?

EDIT2:
I can push a PR with (1) fix, automatically switching to bypass_mode if GITHUB_ACTIONS is set.
Suggested solution here
@Borda what do you think?

@bmartinn
Copy link
Contributor

bmartinn commented Apr 3, 2020

@Borda master can now pass all tests:
https://github.com/bmartinn/pytorch-lightning/actions/runs/70097526
Should I push a PR?
p.s.
My apologies, I did not realize tests are still failing on the master branch. I would have fixed it sooner if I had realized this is not a single instance :(

alexeykarnachev pushed a commit to alexeykarnachev/pytorch-lightning that referenced this pull request Apr 4, 2020
* quick patch

* testing fix

* testing fix

* testing fix

* testing fix

* testing fix

* testing fix

* testing fix

* testing fix

* testing fix

* testing fix

* testing fix

* testing fix

* testing fix
@Borda Borda modified the milestones: v0.7., v0.7.x Apr 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority: 0 High priority task
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants