quick patch code #1352

williamFalcon · 2020-04-02T22:41:34Z

fixing pickle error from earlier PR

Borda · 2020-04-02T23:12:00Z

@bmartinn with removing the line above we are getting Trains issue:

Traceback (most recent call last):
  File "/home/software/miniconda3/envs/pl10/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/home/software/miniconda3/envs/pl10/lib/python3.7/site-packages/trains/backend_interface/logger.py", line 197, in run
    while not self._exit_event.wait(period or 1.0):
  File "/home/software/miniconda3/envs/pl10/lib/python3.7/site-packages/trains/logger.py", line 522, in flush
    return self._task.flush()
  File "/home/software/miniconda3/envs/pl10/lib/python3.7/site-packages/trains/task.py", line 751, in flush
    LoggerRoot.flush()
  File "/home/software/miniconda3/envs/pl10/lib/python3.7/site-packages/trains/debugging/log.py", line 92, in flush
    h.flush()
  File "/home/software/miniconda3/envs/pl10/lib/python3.7/logging/__init__.py", line 1009, in flush
    self.stream.flush()
ValueError: I/O operation on closed file.

pep8speaks · 2020-04-03T01:47:24Z

Hello @williamFalcon! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-04-03 11:33:59 UTC

bmartinn · 2020-04-03T02:01:47Z

@Borda is this related to the TrainsLogger ? How do I reproduce this bug?

bmartinn · 2020-04-03T12:51:44Z

@williamFalcon @Borda I just tested the merged master, seems to pass the test_trains.py.
Is the bug still there?

Borda · 2020-04-03T13:00:31Z

According to CI everything is passing new - http://35.192.60.23/PyTorchLightning/pytorch-lightning/1049

bmartinn · 2020-04-03T13:04:34Z

@Borda Can I assume this is not relevant anymore ?
out of curiosity, what caused the issue in the first place? and what exactly was executed triggering the aforementioned trace, I could not locate any execution path leading to it...

williamFalcon · 2020-04-03T14:25:09Z

@bmartinn

=================================== FAILURES ===================================
___________ [doctest] pytorch_lightning.loggers.trains.TrainsLogger ____________
059 
060     Examples:
061         >>> logger = TrainsLogger("lightning_log", "my-test", output_uri=".")  # doctest: +ELLIPSIS
062         TRAINS Task: ...
063         TRAINS results page: https://demoapp.trains.allegro.ai/.../log
064         >>> logger.log_metrics({"val_loss": 1.23}, step=0)
065         >>> logger.log_text("sample test")
066         sample test
067         >>> import numpy as np
068         >>> logger.log_artifact("confusion matrix", np.ones((2, 3)))
Expected nothing
Got:
    TRAINS Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring

Still very relevant. we get these sporadic failures in tests which are caused by trains.

@Borda maybe it's best to wait until trains is stable and well tested? otherwise we're going to spend way too much time blocked by these failures

bmartinn · 2020-04-03T14:29:03Z

@williamFalcon @Borda , for the CI test you have to use TrainsLogger.set_bypass_mode(True) otherwise it will have to have a backend-server to communicate with...
This is why we added it in the test_trains.py tests.

Should TrainsLogger.set_bypass_mode(True) added to the docstring as well?

Borda · 2020-04-03T14:39:16Z

but the bypass is not it shall be used, right? meaning we do not want to have it in the example...

bmartinn · 2020-04-03T14:47:31Z

@Borda True but Trains does a lot of stuff in the background, monitoring mostly, and this will break CI as output will be changing from run to run. Also we cannot move everything to stderr, because this will break projects using Trains and monitoring stderr ...

A few ideas on a solution:

Add a flag (OS environment) controlling the bypass mode (to be used in the CI, or using an OS environment that CI already sets, in order to automatically switch to bypass_mode)
Use set_bypass_mode in the doctest and add a remark that it should be used for "offline/CI mode"
Add a flag (OS environment) piping all the Trains messages to stderr instead of stdout.

I think (1) makes most sense, this should be a very quick fix.
What do you think?

EDIT:
Is there an OS environment automatically set in your CI process?

EDIT2:
I can push a PR with (1) fix, automatically switching to bypass_mode if GITHUB_ACTIONS is set.
Suggested solution here
@Borda what do you think?

bmartinn · 2020-04-03T18:31:25Z

@Borda master can now pass all tests:
https://github.com/bmartinn/pytorch-lightning/actions/runs/70097526
Should I push a PR?
p.s.
My apologies, I did not realize tests are still failing on the master branch. I would have fixed it sooner if I had realized this is not a single instance :(

* quick patch * testing fix * testing fix * testing fix * testing fix * testing fix * testing fix * testing fix * testing fix * testing fix * testing fix * testing fix * testing fix * testing fix

quick patch

9b7573b

mergify bot requested a review from a team April 2, 2020 22:42

Borda added the bug Something isn't working label Apr 2, 2020

Borda added this to the 0.7.2 milestone Apr 2, 2020

williamFalcon added 2 commits April 2, 2020 21:46

testing fix

72fb5a4

testing fix

dd6d860

williamFalcon added 3 commits April 2, 2020 21:55

testing fix

d68b89d

testing fix

fdfd5bc

testing fix

1a9827a

Borda mentioned this pull request Apr 3, 2020

Add TRAINS experiment manager support #929

Closed

williamFalcon added 8 commits April 3, 2020 07:03

testing fix

ec9cc34

testing fix

5a73473

testing fix

7dbd07b

testing fix

8e0309d

testing fix

923ade3

testing fix

427b847

testing fix

2abd552

testing fix

abea342

Borda added the priority: 0 High priority task label Apr 3, 2020

williamFalcon merged commit 2eca8a9 into master Apr 3, 2020

Borda deleted the fix branch April 3, 2020 12:44

bmartinn mentioned this pull request Apr 4, 2020

Fix TrainsLogger doctest failing (switch to bypass mode in GitHub CI) #1379

Merged

Borda modified the milestones: v0.7., v0.7.x Apr 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quick patch code #1352

quick patch code #1352

williamFalcon commented Apr 2, 2020

Borda commented Apr 2, 2020

pep8speaks commented Apr 3, 2020 •

edited

bmartinn commented Apr 3, 2020

bmartinn commented Apr 3, 2020

Borda commented Apr 3, 2020

bmartinn commented Apr 3, 2020

williamFalcon commented Apr 3, 2020 •

edited

bmartinn commented Apr 3, 2020 •

edited

Borda commented Apr 3, 2020

bmartinn commented Apr 3, 2020 •

edited

bmartinn commented Apr 3, 2020 •

edited

quick patch __code__ #1352

quick patch __code__ #1352

Conversation

williamFalcon commented Apr 2, 2020

Borda commented Apr 2, 2020

pep8speaks commented Apr 3, 2020 • edited

Comment last updated at 2020-04-03 11:33:59 UTC

bmartinn commented Apr 3, 2020

bmartinn commented Apr 3, 2020

Borda commented Apr 3, 2020

bmartinn commented Apr 3, 2020

williamFalcon commented Apr 3, 2020 • edited

bmartinn commented Apr 3, 2020 • edited

Borda commented Apr 3, 2020

bmartinn commented Apr 3, 2020 • edited

bmartinn commented Apr 3, 2020 • edited

quick patch code #1352

quick patch code #1352

pep8speaks commented Apr 3, 2020 •

edited

williamFalcon commented Apr 3, 2020 •

edited

bmartinn commented Apr 3, 2020 •

edited

bmartinn commented Apr 3, 2020 •

edited

bmartinn commented Apr 3, 2020 •

edited