Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes after a few minutes with 100 entries of csv data. #17

Closed
sirisian opened this issue Sep 16, 2021 · 1 comment
Closed

Crashes after a few minutes with 100 entries of csv data. #17

sirisian opened this issue Sep 16, 2021 · 1 comment

Comments

@sirisian
Copy link

I have data with 9 variables and a result. So I have a csv file with samples that look like:

x1,x2,x3,x4,x5,x6,x7,x8,x9,y

Using a simple config:

{
  "task" : {
    "task_type" : "regression",
    "dataset" : "data.csv",
    "function_set" : ["add", "sub", "mul", "div", "neg", "n2", "n3"]
  }
}

This is running on an RTX 3090. It sits on "Running DSO for 1 seeds" for a bit then spikes to over 24 GBs of memory and then crashes.

python -m dso.run config.json
Running DSO for 1 seeds
-- BUILDING PRIOR -------------------
WARNING: Skipping invalid 'RelationalConstraint' with arguments {'targets': [], 'effectors': [], 'relationship': None}. Reason: Prior disabled.
WARNING: Skipping invalid 'RepeatConstraint' with arguments {'tokens': 'const', 'min_': None, 'max_': 3}. Reason: Uses Tokens not in the Library.
WARNING: Skipping invalid 'TrigConstraint' with arguments {}. Reason: There are no target Tokens.
WARNING: Skipping invalid 'ConstConstraint' with arguments {}. Reason: Uses Tokens not in the Library.
WARNING: Skipping invalid 'NoInputsConstraint' with arguments {}. Reason: All terminal tokens are input variables, so allsequences will have an input variable.
WARNING: Skipping invalid 'LanguageModelPrior' with arguments {'weight': None}. Reason: Prior disabled.
LengthConstraint: Sequences have minimum length 4.
LengthConstraint: Sequences have maximum length 30.
RelationalConstraint: [neg] cannot be a child of [neg].
UniformArityPrior: Activated.
SoftLengthPrior: No description available.
-------------------------------------
2021-09-16 02:49:06.352759: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
  File "C:\Python37\lib\site-packages\tensorflow\python\client\session.py", line 1356, in _do_call
    return fn(*args)
  File "C:\Python37\lib\site-packages\tensorflow\python\client\session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "C:\Python37\lib\site-packages\tensorflow\python\client\session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Blas GEMM launch failed : a.shape=(1000, 57), b.shape=(57, 128), m=1000, n=128, k=57
         [[{{node controller/policy/rnn/while/LinearWrapper/LinearWrapper/multi_rnn_cell/cell_0/lstm_cell/MatMul}}]]
         [[controller/policy/rnn/while/Exit_6/_39]]
  (1) Internal: Blas GEMM launch failed : a.shape=(1000, 57), b.shape=(57, 128), m=1000, n=128, k=57
         [[{{node controller/policy/rnn/while/LinearWrapper/LinearWrapper/multi_rnn_cell/cell_0/lstm_cell/MatMul}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Python37\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Python37\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "...\deep-symbolic-optimization\dso\dso\run.py", line 124, in <module>
    main()
  File "C:\Python37\lib\site-packages\click\core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "C:\Python37\lib\site-packages\click\core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "C:\Python37\lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Python37\lib\site-packages\click\core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "...\deep-symbolic-optimization\dso\dso\run.py", line 109, in main
    result, summary_path = train_dso(config)
  File "...\deep-symbolic-optimization\dso\dso\run.py", line 31, in train_dso
    result = model.train()
  File "...\deep-symbolic-optimization\dso\dso\core.py", line 90, in train
    **self.config_training))
  File "...\deep-symbolic-optimization\dso\dso\train.py", line 259, in learn
    actions, obs, priors = controller.sample(batch_size)
  File "...\deep-symbolic-optimization\dso\dso\controller.py", line 626, in sample
    actions, obs, priors = self.sess.run([self.actions, self.obs, self.priors], feed_dict=feed_dict)
  File "C:\Python37\lib\site-packages\tensorflow\python\client\session.py", line 950, in run
    run_metadata_ptr)
  File "C:\Python37\lib\site-packages\tensorflow\python\client\session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Python37\lib\site-packages\tensorflow\python\client\session.py", line 1350, in _do_run
    run_metadata)
  File "C:\Python37\lib\site-packages\tensorflow\python\client\session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Blas GEMM launch failed : a.shape=(1000, 57), b.shape=(57, 128), m=1000, n=128, k=57
         [[node controller/policy/rnn/while/LinearWrapper/LinearWrapper/multi_rnn_cell/cell_0/lstm_cell/MatMul (defined at ...\deep-symbolic-optimization\dso\dso\controller.py:25) ]]
         [[controller/policy/rnn/while/Exit_6/_39]]
  (1) Internal: Blas GEMM launch failed : a.shape=(1000, 57), b.shape=(57, 128), m=1000, n=128, k=57
         [[node controller/policy/rnn/while/LinearWrapper/LinearWrapper/multi_rnn_cell/cell_0/lstm_cell/MatMul (defined at ...\deep-symbolic-optimization\dso\dso\controller.py:25) ]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'controller/policy/rnn/while/LinearWrapper/LinearWrapper/multi_rnn_cell/cell_0/lstm_cell/MatMul':
  File "C:\Python37\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Python37\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "...\deep-symbolic-optimization\dso\dso\run.py", line 124, in <module>
    main()
  File "C:\Python37\lib\site-packages\click\core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "C:\Python37\lib\site-packages\click\core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "C:\Python37\lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Python37\lib\site-packages\click\core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "...\deep-symbolic-optimization\dso\dso\run.py", line 109, in main
    result, summary_path = train_dso(config)
  File "...\deep-symbolic-optimization\dso\dso\run.py", line 31, in train_dso
    result = model.train()
  File "...\deep-symbolic-optimization\dso\dso\core.py", line 82, in train
    self.setup()
  File "...\deep-symbolic-optimization\dso\dso\core.py", line 62, in setup
    self.controller = self.make_controller()
  File "...\deep-symbolic-optimization\dso\dso\core.py", line 134, in make_controller
    **self.config_controller)
  File "...\deep-symbolic-optimization\dso\dso\controller.py", line 438, in __init__
    _, _, loop_state = tf.nn.raw_rnn(cell=cell, loop_fn=loop_fn)
  File "C:\Python37\lib\site-packages\tensorflow\python\ops\rnn.py", line 1252, in raw_rnn
    swap_memory=swap_memory)
  File "C:\Python37\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3501, in while_loop
    return_same_structure)
  File "C:\Python37\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3012, in BuildLoop
    pred, body, original_loop_vars, loop_vars, shape_invariants)
  File "C:\Python37\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2937, in _BuildLoop
    body_result = body(*packed_vars_for_body)
  File "C:\Python37\lib\site-packages\tensorflow\python\ops\rnn.py", line 1201, in body
    (next_output, cell_state) = cell(current_input, state)
  File "...\deep-symbolic-optimization\dso\dso\controller.py", line 25, in __call__
    outputs, state = self.cell(inputs, state, scope=scope)
  File "C:\Python37\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 248, in __call__
    return super(RNNCell, self).__call__(inputs, state)
  File "C:\Python37\lib\site-packages\tensorflow\python\layers\base.py", line 537, in __call__
    outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
  File "C:\Python37\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 634, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "C:\Python37\lib\site-packages\tensorflow\python\autograph\impl\api.py", line 146, in wrapper
    ), args, kwargs)
  File "C:\Python37\lib\site-packages\tensorflow\python\autograph\impl\api.py", line 446, in converted_call
    return _call_unconverted(f, args, kwargs)
  File "C:\Python37\lib\site-packages\tensorflow\python\autograph\impl\api.py", line 253, in _call_unconverted
    return f(*args, **kwargs)
  File "C:\Python37\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 1719, in call
    cur_inp, new_state = cell(cur_inp, cur_state)
  File "C:\Python37\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 385, in __call__
    self, inputs, state, scope=scope, *args, **kwargs)
  File "C:\Python37\lib\site-packages\tensorflow\python\layers\base.py", line 537, in __call__
    outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
  File "C:\Python37\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 634, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "C:\Python37\lib\site-packages\tensorflow\python\autograph\impl\api.py", line 146, in wrapper
    ), args, kwargs)
  File "C:\Python37\lib\site-packages\tensorflow\python\autograph\impl\api.py", line 446, in converted_call
    return _call_unconverted(f, args, kwargs)
  File "C:\Python37\lib\site-packages\tensorflow\python\autograph\impl\api.py", line 253, in _call_unconverted
    return f(*args, **kwargs)
  File "C:\Python37\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 1027, in call
    array_ops.concat([inputs, m_prev], 1), self._kernel)
  File "C:\Python37\lib\site-packages\tensorflow\python\util\dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "C:\Python37\lib\site-packages\tensorflow\python\ops\math_ops.py", line 2647, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "C:\Python37\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 6295, in mat_mul
    name=name)
  File "C:\Python37\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "C:\Python37\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "C:\Python37\lib\site-packages\tensorflow\python\framework\ops.py", line 3616, in create_op
    op_def=op_def)
  File "C:\Python37\lib\site-packages\tensorflow\python\framework\ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()

The data for reference:

10,10,10,13.33749,10.16646,19.20067,11.72338,20,11,10
11,10,10,13.33749,10.16646,19.20067,11.72338,20,11,10.08056
12,10,10,13.33749,10.16646,19.20067,11.72338,20,11,10.20316
13,10,10,13.33749,10.16646,19.20067,11.72338,20,11,10.35009
14,10,10,13.33749,10.16646,19.20067,11.72338,20,11,10.51071
15,10,10,13.33749,10.16646,19.20067,11.72338,20,11,10.67702
16,10,10,13.33749,10.16646,19.20067,11.72338,20,11,10.84145
17,10,10,13.33749,10.16646,19.20067,11.72338,20,11,10.99478
18,10,10,13.33749,10.16646,19.20067,11.72338,20,11,11.12228
19,10,10,13.33749,10.16646,19.20067,11.72338,20,11,11.19015
20,10,10,13.33749,10.16646,19.20067,11.72338,20,11,11
10,10,10,13.34984,10.3317,18.97691,12.83873,20,12,10
11,10,10,13.34984,10.3317,18.97691,12.83873,20,12,10.14675
12,10,10,13.34984,10.3317,18.97691,12.83873,20,12,10.3607
13,10,10,13.34984,10.3317,18.97691,12.83873,20,12,10.61526
14,10,10,13.34984,10.3317,18.97691,12.83873,20,12,10.89402
15,10,10,13.34984,10.3317,18.97691,12.83873,20,12,11.18464
16,10,10,13.34984,10.3317,18.97691,12.83873,20,12,11.47558
17,10,10,13.34984,10.3317,18.97691,12.83873,20,12,11.75313
18,10,10,13.34984,10.3317,18.97691,12.83873,20,12,11.9959
19,10,10,13.34984,10.3317,18.97691,12.83873,20,12,12.15667
20,10,10,13.34984,10.3317,18.97691,12.83873,20,12,12
10,10,10,13.37002,10.49461,18.83674,13.8655,20,13,10
11,10,10,13.37002,10.49461,18.83674,13.8655,20,13,10.20966
12,10,10,13.37002,10.49461,18.83674,13.8655,20,13,10.50974
13,10,10,13.37002,10.49461,18.83674,13.8655,20,13,10.86614
14,10,10,13.37002,10.49461,18.83674,13.8655,20,13,11.25744
15,10,10,13.37002,10.49461,18.83674,13.8655,20,13,11.66739
16,10,10,13.37002,10.49461,18.83674,13.8655,20,13,12.08092
17,10,10,13.37002,10.49461,18.83674,13.8655,20,13,12.4804
18,10,10,13.37002,10.49461,18.83674,13.8655,20,13,12.8389
19,10,10,13.37002,10.49461,18.83674,13.8655,20,13,13.09863
20,10,10,13.37002,10.49461,18.83674,13.8655,20,13,13
10,10,10,13.39751,10.6543,18.73781,14.85455,20,14,10
11,10,10,13.39751,10.6543,18.73781,14.85455,20,14,10.26982
12,10,10,13.39751,10.6543,18.73781,14.85455,20,14,10.65275
13,10,10,13.39751,10.6543,18.73781,14.85455,20,14,11.10799
14,10,10,13.39751,10.6543,18.73781,14.85455,20,14,11.60937
15,10,10,13.39751,10.6543,18.73781,14.85455,20,14,12.13692
16,10,10,13.39751,10.6543,18.73781,14.85455,20,14,12.67214
17,10,10,13.39751,10.6543,18.73781,14.85455,20,14,13.19363
18,10,10,13.39751,10.6543,18.73781,14.85455,20,14,13.66927
19,10,10,13.39751,10.6543,18.73781,14.85455,20,14,14.03163
20,10,10,13.39751,10.6543,18.73781,14.85455,20,14,14
10,10,10,13.43162,10.8101,18.66418,15.82558,20,15,10
11,10,10,13.43162,10.8101,18.66418,15.82558,20,15,10.32723
12,10,10,13.43162,10.8101,18.66418,15.82558,20,15,10.79024
13,10,10,13.43162,10.8101,18.66418,15.82558,20,15,11.34215
14,10,10,13.43162,10.8101,18.66418,15.82558,20,15,11.95223
15,10,10,13.43162,10.8101,18.66418,15.82558,20,15,12.59681
16,10,10,13.43162,10.8101,18.66418,15.82558,20,15,13.25395
17,10,10,13.43162,10.8101,18.66418,15.82558,20,15,13.89849
18,10,10,13.43162,10.8101,18.66418,15.82558,20,15,14.49307
19,10,10,13.43162,10.8101,18.66418,15.82558,20,15,14.96088
20,10,10,13.43162,10.8101,18.66418,15.82558,20,15,15
10,10,10,13.47162,10.96158,18.60772,16.7883,20,16,10
11,10,10,13.47162,10.96158,18.60772,16.7883,20,16,10.38178
12,10,10,13.47162,10.96158,18.60772,16.7883,20,16,10.92224
13,10,10,13.47162,10.96158,18.60772,16.7883,20,16,11.56893
14,10,10,13.47162,10.96158,18.60772,16.7883,20,16,12.28676
15,10,10,13.47162,10.96158,18.60772,16.7883,20,16,13.04832
16,10,10,13.47162,10.96158,18.60772,16.7883,20,16,13.82817
17,10,10,13.47162,10.96158,18.60772,16.7883,20,16,14.59724
18,10,10,13.47162,10.96158,18.60772,16.7883,20,16,15.31285
19,10,10,13.47162,10.96158,18.60772,16.7883,20,16,15.88854
20,10,10,13.47162,10.96158,18.60772,16.7883,20,16,16
10,10,10,13.51676,11.10856,18.56355,17.7479,20,17,10
11,10,10,13.51676,11.10856,18.56355,17.7479,20,17,10.4334
12,10,10,13.51676,11.10856,18.56355,17.7479,20,17,11.04867
13,10,10,13.51676,11.10856,18.56355,17.7479,20,17,11.78835
14,10,10,13.51676,11.10856,18.56355,17.7479,20,17,12.61312
15,10,10,13.51676,11.10856,18.56355,17.7479,20,17,13.49183
16,10,10,13.51676,11.10856,18.56355,17.7479,20,17,14.39545
17,10,10,13.51676,11.10856,18.56355,17.7479,20,17,15.29083
18,10,10,13.51676,11.10856,18.56355,17.7479,20,17,16.12971
19,10,10,13.51676,11.10856,18.56355,17.7479,20,17,16.81559
20,10,10,13.51676,11.10856,18.56355,17.7479,20,17,17
10,10,10,13.5663,11.25099,18.52847,18.70725,20,18,10
11,10,10,13.5663,11.25099,18.52847,18.70725,20,18,10.4821
12,10,10,13.5663,11.25099,18.52847,18.70725,20,18,11.1695
13,10,10,13.5663,11.25099,18.52847,18.70725,20,18,12.00034
14,10,10,13.5663,11.25099,18.52847,18.70725,20,18,12.93125
15,10,10,13.5663,11.25099,18.52847,18.70725,20,18,13.92736
16,10,10,13.5663,11.25099,18.52847,18.70725,20,18,14.95593
17,10,10,13.5663,11.25099,18.52847,18.70725,20,18,15.97958
18,10,10,13.5663,11.25099,18.52847,18.70725,20,18,16.94411
19,10,10,13.5663,11.25099,18.52847,18.70725,20,18,17.74245
20,10,10,13.5663,11.25099,18.52847,18.70725,20,18,18
10,10,10,13.61956,11.38895,18.50028,19.66792,20,19,10
11,10,10,13.61956,11.38895,18.50028,19.66792,20,19,10.52792
12,10,10,13.61956,11.38895,18.50028,19.66792,20,19,11.28475
13,10,10,13.61956,11.38895,18.50028,19.66792,20,19,12.20486
14,10,10,13.61956,11.38895,18.50028,19.66792,20,19,13.24107
15,10,10,13.61956,11.38895,18.50028,19.66792,20,19,14.35482
16,10,10,13.61956,11.38895,18.50028,19.66792,20,19,15.50958
17,10,10,13.61956,11.38895,18.50028,19.66792,20,19,16.6635
18,10,10,13.61956,11.38895,18.50028,19.66792,20,19,17.75617
19,10,10,13.61956,11.38895,18.50028,19.66792,20,19,18.66927
20,10,10,13.61956,11.38895,18.50028,19.66792,20,19,19

I had it running on my CPU (was extremely slow, so I stopped it) earlier with a lot more data so I assume it's a bug somewhere. It's possible I'm doing something wrong. Googling that error leads me to stuff like: https://stackoverflow.com/a/52132342/254381

Maybe you're familiar with this already. I'll investigate more tomorrow.

Also for future reference since I can't find any FAQ. Does this project work with large datasets? I have some problems I want to try on it where my data ranges from 330 MBs to upwards of 15+ TB. I assume that is outside the scope of this project?

@sirisian
Copy link
Author

I should have just looked into this more first. The RTX 3090 isn't compatible with Cuda 10.0 and thus can't run Tensorflow 1.14 projects. Can you update this to work with Cuda 11 or the latest one? Switching to tensorflow-gpu 1.15 would be required. I'll close this an open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant