Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting issues / DataTables warning: table id=runs - Ajax error. #70

Open
pinae opened this issue Jul 21, 2017 · 19 comments
Open

Sorting issues / DataTables warning: table id=runs - Ajax error. #70

pinae opened this issue Jul 21, 2017 · 19 comments

Comments

@pinae
Copy link

pinae commented Jul 21, 2017

Sacredboard shows this error if I try to edit the filters:

DataTables warning: table id=runs - Ajax error. For more information about this error, please see http://datatables.net/tn/7

This seems to be a problem with missisng indexes in the mongodb (as far as I know). I originally got this error when starting sacredboard but after I created an index for start and end dates it only shows up when I change the filters.

I'm using the default settings for a mongodb installation on Ubuntu 17.04. Memory usage for sorting without an index seems to be limited to 32MB in this configuration.

If this is no bug in Sacredboard please add some documentation for the correct settings.

@chovanecm
Copy link
Owner

Hi Johannes,
thanks for your report.
I've never encountered such problem so just to make sure:

  • Before you created the indices, the error was showing every time you ran sacredboard -m name_of_the_db?
  • After you created the indices, it only occurs when modifying the filter. What kind of filters are you trying to use
    • (currently, filtering by date is unfortunately not (yet) supported).
  • Have you tried a different web browser? Though, Ubuntu 17.04 should contain up-to-date browsers with modern JavaScript capabilities (so JavaScript version incompatibility shouldn't be this case).
  • Does sacredboard print any error message in the command line?

Thanks a lot.

@pinae
Copy link
Author

pinae commented Jul 24, 2017

Hi,
before I created the indices the error occured every time. But I did not test that systematically because I thougt I did something wrong during the installation.

I can reproduce the error every time I change the sorting by clicking on "Experiment name", "Command" or "Hostname". I remember having the error when deactivating some of the statuses on Friday but I could not reproduce that today.

I tested with Firefox 54.0 and Chromium 59.0.3071.109 on Ubuntu 17.04.

There are only Errors for missing files and a Server error on the JavaScript console. Here are some screenshots:

bildschirmfoto von 2017-07-24 11-20-25
bildschirmfoto von 2017-07-24 11-22-23
bildschirmfoto von 2017-07-24 11-23-21

@chovanecm
Copy link
Owner

chovanecm commented Jul 24, 2017

Thanks to your observation, I discovered another minor issue but that was probably not causing your problem. But I was unable to reproduce it.
When you now upgrade to the latest sacredboard version (0.3.1), I think the issue persists.
Nevertheless, when you run sacredboard -m your_db, the program should produce some output to the console, and I'm pretty sure there is a stack trace describing the cause of the problem.
Could you please copy it for me?

Sorry for the inconvenience.

@black-puppydog
Copy link

black-puppydog commented Jul 28, 2017

Might this be related to stdout/stderr logging? for me, this happened when I had only a handful of runs stored, but with long outputs.

I get sth like this:

pymongo.errors.OperationFailure: database error: Runner error: Overflow sort stage buffered data usage of 33836427 bytes exceeds internal limit of 33554432 bytes

Which looks to be related to this

@pinae
Copy link
Author

pinae commented Aug 24, 2017

Sorry for my late reply. I get this error on the console:

[2017-08-24 15:49:10,529] ERROR in app: Exception on /api/run [GET]
Traceback (most recent call last):
  File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/flask/app.py", line 1614, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/flask/app.py", line 1517, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/sacredboard/app/webapi/routes.py", line 41, in api_runs
    return get_runs()
  File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/sacredboard/app/webapi/runs.py", line 53, in get_runs
    recordsFiltered=records_filtered),
  File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/flask/templating.py", line 134, in render_template
    context, ctx.app)
  File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/flask/templating.py", line 116, in _render 
    rv = template.render(context)
  File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/jinja2/environment.py", line 1008, in render
    return self.environment.handle_exception(exc_info, True)
  File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/jinja2/environment.py", line 780, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/jinja2/_compat.py", line 37, in reraise
    raise value.with_traceback(tb)
  File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/sacredboard/templates/api/runs.js", line 7, in top-level template code
    {%- for run in runs -%}
  File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/jinja2/runtime.py", line 410, in __init__  
    self._after = self._safe_next()
  File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/jinja2/runtime.py", line 430, in _safe_next
    return next(self._iterator)
  File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/pymongo/cursor.py", line 1132, in next
    if len(self.__data) or self._refresh():
  File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/pymongo/cursor.py", line 1055, in _refresh 
    self.__collation))
  File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/pymongo/cursor.py", line 947, in __send_message
    helpers._check_command_response(doc['data'][0])
  File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/pymongo/helpers.py", line 210, in _check_command_response
    raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: Executor error during find command: OperationFailed: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.

My outputs are also pretty long because my code displays progress bars.

@chovanecm
Copy link
Owner

Thanks for posting the output! It really seems to be related to the issue that @black-puppydog posted.
This needs further analysis to see how to handle the problem - whether Sacredboard should try to automatically add indices on the columns in the table, or just after the exception is thrown.
I'll have a look at it after I finish the feature I have been working on recently (deleting experiments). I'm sorry for inconvenience until then.

@trickmeyer
Copy link

trickmeyer commented Sep 21, 2017

@pinae I'm pretty sure it's a CORS error. Check out the datatables.net link they provide and read about it some more for different solution. A quick test to see if this is the case is to check out the web app from a different computer on the same network (swapping 127.0.0.1 for xxx.x.x.xx for whatever your local IP is).

@enricoschroeder
Copy link

Hey there, I'm frequently stumbling upon this issue as well. I assume that it happens when the stored log output of the experiment is very long (e.g. training a model with Tensorflow for a couple of days).

Any updates on a potential fix?

@pinae
Copy link
Author

pinae commented Oct 18, 2017

@schroederen Try to add some indices. I added some and it fixed the problem. I missed to write down what exactly I did and realized after I reported the issue that it would have been beneficial.

@enricoschroeder
Copy link

@pinae Thanks for the reply. For what key did you create the indices? i.e. which parameters did you use for db.collections.createIndex()?

@enricoschroeder
Copy link

In the meantime, one can increase the limit of the search buffer, as described here. I Increased it to 50MB (from 30) and this fixes the issues I'm having. However, I expect to run out of buffer again eventually, so the thing with the indices might be a more elegant solution.

@thomwolf
Copy link

I created an index for each column in the board and it fixed the problem for me:

  • experiment.name
  • command
  • start_time
  • heartbeat
  • host.hostname
  • result

To create an index see https://docs.mongodb.com/manual/indexes/

@enricoschroeder
Copy link

Did this and it fixed the issue for a while, but it has returned. Additionally, I'm getting other wierd issues now: Some experiments stop to show up and also sorting by ID does not work correctly anymore. Maybe it wasn't such a good idea to create indices for all columns or I didn't do it correctly? :D

@chovanecm Is there any "official" fix incoming? Would be greatly appreciated!

@anibali
Copy link

anibali commented Feb 12, 2018

For those that don't know how to create an index (like I didn't), you can use createIndex in the Mongo CLI. So to add a heartbeat index I did the following:

> use sacred
switched to db sacred
> db.runs.createIndex({ "heartbeat": -1 });
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}

@chovanecm
Copy link
Owner

I am considering letting Sacredboard automatically create indices for the displayed columns. I am just afraid of what happens if I implement #24 (adding custom columns).

@chovanecm chovanecm changed the title DataTables warning: table id=runs - Ajax error. Sorting issues / DataTables warning: table id=runs - Ajax error. May 21, 2018
@enricoschroeder
Copy link

I added indices for all possible entries, but I'm still getting this error on some columns. I've managed to get a trace from the console:

[2018-06-08 09:20:09,991] ERROR in app: Exception on /api/run [GET]
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 2292, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1815, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1718, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.5/dist-packages/flask/_compat.py", line 35, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1813, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1799, in dispatch_request
return self.view_functionsrule.endpoint
File "/usr/local/lib/python3.5/dist-packages/sacredboard/app/webapi/runs.py", line 16, in api_runs
return get_runs()
File "/usr/local/lib/python3.5/dist-packages/sacredboard/app/webapi/runs.py", line 94, in get_runs
recordsFiltered=records_filtered),
File "/usr/local/lib/python3.5/dist-packages/flask/templating.py", line 135, in render_template
context, ctx.app)
File "/usr/local/lib/python3.5/dist-packages/flask/templating.py", line 117, in _render
rv = template.render(context)
File "/usr/local/lib/python3.5/dist-packages/jinja2/environment.py", line 1008, in render
return self.environment.handle_exception(exc_info, True)
File "/usr/local/lib/python3.5/dist-packages/jinja2/environment.py", line 780, in handle_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.5/dist-packages/jinja2/_compat.py", line 37, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.5/dist-packages/sacredboard/templates/api/runs.js", line 13, in top-level template code
"is_alive": {{run.heartbeat | default | timediff | detect_alive_experiment | tojson }},
File "/usr/local/lib/python3.5/dist-packages/sacredboard/app/config/jinja_filters.py", line 28, in timediff
diff = now - time
TypeError: unsupported operand type(s) for -: 'datetime.datetime' and 'NoneType'

This looks like a different error, but it happens when trying to sort entries by some of the columns.

@JarnoRFB
Copy link

@anibali I had the same problem. Adding the index you described immediately resolved the problem. So probably letting sacredboard do this automatically is a good idea.

@Leinadj
Copy link

Leinadj commented Jun 30, 2018

Same here, adding the indices worked like magic!
So just to make it easier for copy pasting:

  1. Open mongo shell (if added to PATH, just type: mongo into the shell)
  2. Switch to your database: use <databasename>
  3. Issue the following commands to create the indices:
    db.runs.createIndex({ "result": -1 });
    db.runs.createIndex({ "experiment.name": -1 });
    db.runs.createIndex({ "command": -1 });
    db.runs.createIndex({ "host.hostname: -1 });
    db.runs.createIndex({ "start_time": -1 });
    db.runs.createIndex({ "heartbeat": -1 });

@SumNeuron
Copy link

SumNeuron commented Mar 1, 2019

Hi I am getting this issue when using the FileObserver...

me:~/Projects/sacred_test$ sacredboard -F experiments/tests/
[2019-03-01 18:57:28,825] ERROR in app: Exception on /api/run [GET]
Traceback (most recent call last):
  File "/anaconda3/lib/python3.6/site-packages/flask/app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1718, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/anaconda3/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
    raise value
  File "/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/anaconda3/lib/python3.6/site-packages/sacredboard/app/webapi/runs.py", line 16, in api_runs
    return get_runs()
  File "/anaconda3/lib/python3.6/site-packages/sacredboard/app/webapi/runs.py", line 94, in get_runs
    recordsFiltered=records_filtered),
  File "/anaconda3/lib/python3.6/site-packages/flask/templating.py", line 135, in render_template
    context, ctx.app)
  File "/anaconda3/lib/python3.6/site-packages/flask/templating.py", line 117, in _render
    rv = template.render(context)
  File "/anaconda3/lib/python3.6/site-packages/jinja2/asyncsupport.py", line 76, in render
    return original_render(self, *args, **kwargs)
  File "/anaconda3/lib/python3.6/site-packages/jinja2/environment.py", line 1008, in render
    return self.environment.handle_exception(exc_info, True)
  File "/anaconda3/lib/python3.6/site-packages/jinja2/environment.py", line 780, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/anaconda3/lib/python3.6/site-packages/jinja2/_compat.py", line 37, in reraise
    raise value.with_traceback(tb)
  File "/anaconda3/lib/python3.6/site-packages/sacredboard/templates/api/runs.js", line 7, in top-level template code
    {%- for run in runs -%}
  File "/anaconda3/lib/python3.6/site-packages/jinja2/runtime.py", line 435, in __init__
    self._after = self._safe_next()
  File "/anaconda3/lib/python3.6/site-packages/jinja2/runtime.py", line 455, in _safe_next
    return next(self._iterator)
  File "/anaconda3/lib/python3.6/site-packages/sacredboard/app/data/filestorage/rundao.py", line 41, in run_iterator
    yield self.get(id)
  File "/anaconda3/lib/python3.6/site-packages/sacredboard/app/data/filestorage/rundao.py", line 60, in get
    run = _read_json(_path_to_run(self.directory, run_id))
  File "/anaconda3/lib/python3.6/site-packages/sacredboard/app/data/filestorage/rundao.py", line 101, in _read_json
    return json.load(f)
  File "/anaconda3/lib/python3.6/json/__init__.py", line 299, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/anaconda3/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/anaconda3/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/anaconda3/lib/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

SacredBoard shows:

DataTables warning: table id=runs - Ajax error. For more information about this error, please see http://datatables.net/tn/7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants