Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Display errors encountered by the experiment in the UI #54

Open
ranrubin opened this issue Jul 21, 2020 · 2 comments
Open

Comments

@ranrubin
Copy link

** Feature request **

When running an experiment code, the UI displayed an error from the server (500), but with no details regarding the cause. After exploring the logs I found out that the fileserver crashed because the file name was too long.

I would love a way for the UI to clearly display all kinds of errors encountered by the experiment (including, but not limited to, file names being too long...)

The solution I would like

I would rather get a message in the UI saying that the file name is too big, rather than have to look for the issue in the logs

Additional context

Logs from /opt/trains/logs/fileserver.log

[2020-07-12 14:22:04,977] [7] [ERROR] [fileserver] Exception on / [POST]
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.6/site-packages/flask_cors/extension.py", line 161, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "fileserver.py", line 32, in upload
    file.save(str(target))
  File "/usr/local/lib/python3.6/site-packages/werkzeug/datastructures.py", line 3066, in save
    dst = open(dst, "wb")
OSError: [Errno 36] File name too long: '/mnt/fileserver/trains/14-trains.bd72a5a2afdhsy2aa0acc3dca21b9b5f/metrics/Evaluator CV_no_my_real_name_no_my_real_name_no_my_real_name_len_512_Jul12_14-20-42_merge__no_my_real_nameh_sz_8__no_my_real_name_sz_8_lr_1e-06_w_decay_0.0_warm_up_50_in_sz_768_hid_sz_256_word_aug_p_0.0_no_my_real_name_1__no_my_real_name/_no_my_real_name _no_my_real_name_no_my_real_namele layer__no_my_real_namelen_512_Jul12_14-20-42__no_my_real_name_batch_sz_8_test_batch_sz_8_lr_1e-06_w_d_no_my_real_name0_in_sz_768_hid__no_my_real_name_0.0_word_aug_min_1_imba_no_my_real_name__no_my_real_name_00000000.jpeg'
@bmartinn
Copy link
Member

Hi @ranrubin

The original bug (#49) is actually a bug in Trains (even though the manifestation is in the trains-server) .
The bug is, Trains will try to create links that the file storage might not support (basically there is a filename length limit, e.g. s3 object storage has its own limits, and shared filesystem as well). A fix will be deployed in the next RC (due in a few days).

But regardless of the original bug, are you suggesting a per Task section capturing the stderr, for easier readability?
Or, are you saying it will be nice to get the trains-server log in the UI?

@ranrubin
Copy link
Author

Hi @bmartinn, thanks for commenting.
I'm not familiar enough with trains to define what I mean as well as you defined the situation in your comment.
Putting it simply, I would say that - as a user of the UI, when an error occurs, I want to see the exact reason for the failure rather than a generic "500" message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants