You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running an experiment code, the UI displayed an error from the server (500), but with no details regarding the cause. After exploring the logs I found out that the fileserver crashed because the file name was too long.
I would love a way for the UI to clearly display all kinds of errors encountered by the experiment (including, but not limited to, file names being too long...)
The solution I would like
I would rather get a message in the UI saying that the file name is too big, rather than have to look for the issue in the logs
Additional context
Logs from /opt/trains/logs/fileserver.log
[2020-07-12 14:22:04,977] [7] [ERROR] [fileserver] Exception on / [POST]
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.6/site-packages/flask_cors/extension.py", line 161, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "fileserver.py", line 32, in upload
file.save(str(target))
File "/usr/local/lib/python3.6/site-packages/werkzeug/datastructures.py", line 3066, in save
dst = open(dst, "wb")
OSError: [Errno 36] File name too long: '/mnt/fileserver/trains/14-trains.bd72a5a2afdhsy2aa0acc3dca21b9b5f/metrics/Evaluator CV_no_my_real_name_no_my_real_name_no_my_real_name_len_512_Jul12_14-20-42_merge__no_my_real_nameh_sz_8__no_my_real_name_sz_8_lr_1e-06_w_decay_0.0_warm_up_50_in_sz_768_hid_sz_256_word_aug_p_0.0_no_my_real_name_1__no_my_real_name/_no_my_real_name _no_my_real_name_no_my_real_namele layer__no_my_real_namelen_512_Jul12_14-20-42__no_my_real_name_batch_sz_8_test_batch_sz_8_lr_1e-06_w_d_no_my_real_name0_in_sz_768_hid__no_my_real_name_0.0_word_aug_min_1_imba_no_my_real_name__no_my_real_name_00000000.jpeg'
The text was updated successfully, but these errors were encountered:
The original bug (#49) is actually a bug in Trains (even though the manifestation is in the trains-server) .
The bug is, Trains will try to create links that the file storage might not support (basically there is a filename length limit, e.g. s3 object storage has its own limits, and shared filesystem as well). A fix will be deployed in the next RC (due in a few days).
But regardless of the original bug, are you suggesting a per Task section capturing the stderr, for easier readability?
Or, are you saying it will be nice to get the trains-server log in the UI?
Hi @bmartinn, thanks for commenting.
I'm not familiar enough with trains to define what I mean as well as you defined the situation in your comment.
Putting it simply, I would say that - as a user of the UI, when an error occurs, I want to see the exact reason for the failure rather than a generic "500" message.
** Feature request **
When running an experiment code, the UI displayed an error from the server (500), but with no details regarding the cause. After exploring the logs I found out that the fileserver crashed because the file name was too long.
I would love a way for the UI to clearly display all kinds of errors encountered by the experiment (including, but not limited to, file names being too long...)
The solution I would like
I would rather get a message in the UI saying that the file name is too big, rather than have to look for the issue in the logs
Additional context
Logs from
/opt/trains/logs/fileserver.log
The text was updated successfully, but these errors were encountered: