ResourceExhaustedError after several iterations in a grid search #482

bjtho08 · 2020-04-19T15:17:27Z

First off, make sure to check your support options.

The preferred way to resolve usage related matters is through the docs which are maintained up-to-date with the latest version of Talos.

If you do end up asking for support in a new issue, make sure to follow the below steps carefully.

1) Confirm the below

I have looked for an answer in the Docs
My Python version is 3.5 or higher
I have searched through the issues Issues for a duplicate
I've tested that my Keras model works as a stand-alone

2) Include the output of:

talos.__version__ == 0.6.7

3) Explain clearly what you are trying to achieve

I am running a grid search that gives 36 rounds.
After about 4 or 5 rounds, during a model.fit I suddenly get hit by a ResourceExhaustedError. I think this is very odd given that I am able to complete at least 3 rounds of fitting on the GPU (with a model and batch size that takes up pretty much all the gpu memory), so it seems that there is a small but significant memory leak somewhere. Any ideas what it could be?

The text was updated successfully, but these errors were encountered:

bjtho08 · 2020-04-19T15:35:34Z

My parameter dictionary is:

p = {
    "sigma_noise": [0, 0.01],
    "nb_filters_0": [16, 32, 64],
    "loss_func": ["cat_CE", "tversky_loss", "cat_FL"],
    "arch": ["U-Net"],
    "act": [Swish, ReLU],
}

And I'm running a U-net with 34 million trainable parameters (for nb_filters_0 == 64) and input dimensions of (208, 208, 3) with a batch size of 12 and 400 epochs.

bjtho08 · 2020-04-20T07:10:36Z

UPDATE: I did a "quick" test, where I ran each model for only 50 epochs and I got a ResourceExhaustedError again in round 4 during the 5th epoch, and I think that was actually the exact same spot as before when each of the 3 previous model had run for +100 epochs. This tells me, that the models are not properly cleaned out of the GPU memory and on top of that, I might have a memory leak in my generator. @mikkokotila, what do you think?

mikkokotila · 2020-04-21T07:58:54Z

Very interesting. Can you post your full trace.

bjtho08 · 2020-04-21T12:13:53Z

Of course! See below. I also added the output leading up, because I think it gives some idea of how the exception occurs.

---------------------------------------------------------------------------
ResourceExhaustedError                    Traceback (most recent call last)
<ipython-input-10-d1427f7c3b24> in <module>
    104     params=p,
    105     experiment_name="talos/" + date_string,
--> 106     reduction_method='gamify',
    107 )
    108 

~/.pyenv/versions/miniconda3-4.3.30/envs/tf_gpu/lib/python3.6/site-packages/talos/scan/Scan.py in __init__(self, x, y, params, model, experiment_name, x_val, y_val, val_split, random_method, seed, performance_target, fraction_limit, round_limit, time_limit, boolean_limit, reduction_method, reduction_interval, reduction_window, reduction_threshold, reduction_metric, minimize_loss, disable_progress_bar, print_params, clear_session, save_weights)
    194         # start runtime
    195         from .scan_run import scan_run
--> 196         scan_run(self)

~/.pyenv/versions/miniconda3-4.3.30/envs/tf_gpu/lib/python3.6/site-packages/talos/scan/scan_run.py in scan_run(self)
     24         # otherwise proceed with next permutation
     25         from .scan_round import scan_round
---> 26         self = scan_round(self)
     27         self.pbar.update(1)
     28 

~/.pyenv/versions/miniconda3-4.3.30/envs/tf_gpu/lib/python3.6/site-packages/talos/scan/scan_round.py in scan_round(self)
     17     # fit the model
     18     from ..model.ingest_model import ingest_model
---> 19     self.model_history, self.round_model = ingest_model(self)
     20     self.round_history.append(self.model_history.history)
     21 

~/.pyenv/versions/miniconda3-4.3.30/envs/tf_gpu/lib/python3.6/site-packages/talos/model/ingest_model.py in ingest_model(self)
      8                       self.x_val,
      9                       self.y_val,
---> 10                       self.round_params)

~/myapps/mmciad/src/mmciad/utils/hyper.py in talos_model(x, y, val_x, val_y, talos_params)
    301                 class_weight=class_weights,
    302                 verbose=internal_params["verbose"],
--> 303                 callbacks=model_callbacks + opti_callbacks,
    304             )
    305         return history, model

~/.pyenv/versions/miniconda3-4.3.30/envs/tf_gpu/lib/python3.6/site-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
     89                 warnings.warn('Update your `' + object_name + '` call to the ' +
     90                               'Keras 2 API: ' + signature, stacklevel=2)
---> 91             return func(*args, **kwargs)
     92         wrapper._original_function = func
     93         return wrapper

~/.pyenv/versions/miniconda3-4.3.30/envs/tf_gpu/lib/python3.6/site-packages/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   1730             use_multiprocessing=use_multiprocessing,
   1731             shuffle=shuffle,
-> 1732             initial_epoch=initial_epoch)
   1733 
   1734     @interfaces.legacy_generator_methods_support

~/.pyenv/versions/miniconda3-4.3.30/envs/tf_gpu/lib/python3.6/site-packages/keras/engine/training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
    218                                             sample_weight=sample_weight,
    219                                             class_weight=class_weight,
--> 220                                             reset_metrics=False)
    221 
    222                 outs = to_list(outs)

~/.pyenv/versions/miniconda3-4.3.30/envs/tf_gpu/lib/python3.6/site-packages/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight, reset_metrics)
   1512             ins = x + y + sample_weights
   1513         self._make_train_function()
-> 1514         outputs = self.train_function(ins)
   1515 
   1516         if reset_metrics:

~/.pyenv/versions/miniconda3-4.3.30/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/keras/backend.py in __call__(self, inputs)
   3290 
   3291     fetched = self._callable_fn(*array_vals,
-> 3292                                 run_metadata=self.run_metadata)
   3293     self._call_fetch_callbacks(fetched[-len(self._fetches):])
   3294     output_structure = nest.pack_sequence_as(

~/.pyenv/versions/miniconda3-4.3.30/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py in __call__(self, *args, **kwargs)
   1456         ret = tf_session.TF_SessionRunCallable(self._session._session,
   1457                                                self._handle, args,
-> 1458                                                run_metadata_ptr)
   1459         if run_metadata:
   1460           proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

ResourceExhaustedError: OOM when allocating tensor with shape[16,192,208,208] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node training/Adam/gradients/block1_u_conv1/convolution_grad/Conv2DBackpropInput}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

mikkokotila · 2020-04-21T13:46:37Z

Have you looked at this SO post.

mikkokotila · 2020-04-21T13:48:39Z

How much memory your GPU has?

bjtho08 · 2020-04-22T16:22:44Z

I just check out your link and it does not appear to describe the issue I am having, though at first it did look similar. I am running with an Nvidia GeForce GTX 1080 TI with 11 GB ram.

mikkokotila · 2020-04-23T07:12:35Z

Can you do this:

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

...and share the output you get.

bjtho08 · 2020-04-23T10:01:55Z

I would love to, but that option crashes my python kernel, so it's not really possible. This is a long-standing Keras bug, I believe.

mikkokotila · 2020-04-23T10:52:38Z

Yes, it most certainly is an upstream bug in Keras or TensorFlow.

To avoid doubt, can you share your Scan() command.

Also, how about giving Talos 1.0 a shot. It will use different backend, so you might have better luck.

bjtho08 · 2020-04-23T11:21:27Z

Sure! I use custom keras.utils.Sequence data generators, so I have two dummy variables for my scan command as shown below:

  dummy_x = np.empty((1, BATCH_SIZE, 208, 208))
   dummy_y = np.empty((1, BATCH_SIZE))

   scan_object = ta.Scan(
       x=dummy_x,
       y=dummy_y,
       disable_progress_bar=False,
       print_params=True,
       model=talos_model,
       params=p,
       experiment_name="talos/" + date_string,
       reduction_method='gamify',
   )

I will take a look at talos 1.0 right away!

bjtho08 · 2020-04-23T14:25:33Z

So running talos 1.0 had the same outcome, but with a slightly different error message at the end:

 14% |█▌        | 5/36 [1:52:10<11:43:52, 1362.35s/it]
{'act': <class 'keras_contrib.layers.advanced_activations.swish.Swish'>, 'arch': 'U-Net', 'loss_func': 'cat_CE', 'nb_filters_0': 64, 'sigma_noise': 0.01}
tracking <tf.Variable 'block1_d_Swish1/scaling_factor:0' shape=() dtype=float32> scaling_factor
tracking <tf.Variable 'block1_d_Swish2/scaling_factor:0' shape=() dtype=float32> scaling_factor
tracking <tf.Variable 'block2_d_Swish1/scaling_factor:0' shape=() dtype=float32> scaling_factor
tracking <tf.Variable 'block2_d_Swish2/scaling_factor:0' shape=() dtype=float32> scaling_factor
tracking <tf.Variable 'block3_d_Swish1/scaling_factor:0' shape=() dtype=float32> scaling_factor
tracking <tf.Variable 'block3_d_Swish2/scaling_factor:0' shape=() dtype=float32> scaling_factor
tracking <tf.Variable 'block4_d_Swish1/scaling_factor:0' shape=() dtype=float32> scaling_factor
tracking <tf.Variable 'block4_d_Swish2/scaling_factor:0' shape=() dtype=float32> scaling_factor
tracking <tf.Variable 'block5_bottom_Swish1/scaling_factor:0' shape=() dtype=float32> scaling_factor
tracking <tf.Variable 'block5_bottom_Swish2/scaling_factor:0' shape=() dtype=float32> scaling_factor
tracking <tf.Variable 'block4_u_Swish1/scaling_factor:0' shape=() dtype=float32> scaling_factor
tracking <tf.Variable 'block4_u_Swish2/scaling_factor:0' shape=() dtype=float32> scaling_factor
tracking <tf.Variable 'block3_u_Swish1/scaling_factor:0' shape=() dtype=float32> scaling_factor
tracking <tf.Variable 'block3_u_Swish2/scaling_factor:0' shape=() dtype=float32> scaling_factor
tracking <tf.Variable 'block2_u_Swish1/scaling_factor:0' shape=() dtype=float32> scaling_factor
tracking <tf.Variable 'block2_u_Swish2/scaling_factor:0' shape=() dtype=float32> scaling_factor
tracking <tf.Variable 'block1_u_Swish1/scaling_factor:0' shape=() dtype=float32> scaling_factor
tracking <tf.Variable 'block1_u_Swish2/scaling_factor:0' shape=() dtype=float32> scaling_factor
Training |          | 0% 0/5 [00:00<?, ?it/s]
Epoch 0  |██▌        | [loss: 2.2988, acc: 0.1848, jaccard1_coef: 0.0575] : 25% 113/451 [01:19<03:12, 1.76it/s]
---------------------------------------------------------------------------
ResourceExhaustedError                    Traceback (most recent call last)
<ipython-input-10-050e8f7c8199> in <module>
    104         params=p,
    105         experiment_name="talos/" + date_string,
--> 106         reduction_method='gamify',
    107     )
    108 

~/.pyenv/versions/miniconda3-4.3.30/envs/tf_gpu/lib/python3.6/site-packages/talos/scan/Scan.py in __init__(self, x, y, params, model, experiment_name, x_val, y_val, val_split, random_method, seed, performance_target, fraction_limit, round_limit, time_limit, boolean_limit, reduction_method, reduction_interval, reduction_window, reduction_threshold, reduction_metric, minimize_loss, disable_progress_bar, print_params, clear_session, save_weights)
    194         # start runtime
    195         from .scan_run import scan_run
--> 196         scan_run(self)

~/.pyenv/versions/miniconda3-4.3.30/envs/tf_gpu/lib/python3.6/site-packages/talos/scan/scan_run.py in scan_run(self)
     24         # otherwise proceed with next permutation
     25         from .scan_round import scan_round
---> 26         self = scan_round(self)
     27         self.pbar.update(1)
     28 

~/.pyenv/versions/miniconda3-4.3.30/envs/tf_gpu/lib/python3.6/site-packages/talos/scan/scan_round.py in scan_round(self)
     17     # fit the model
     18     from ..model.ingest_model import ingest_model
---> 19     self.model_history, self.round_model = ingest_model(self)
     20     self.round_history.append(self.model_history.history)
     21 

~/.pyenv/versions/miniconda3-4.3.30/envs/tf_gpu/lib/python3.6/site-packages/talos/model/ingest_model.py in ingest_model(self)
      8                       self.x_val,
      9                       self.y_val,
---> 10                       self.round_params)

~/myapps/mmciad/src/mmciad/utils/hyper.py in talos_model(x, y, val_x, val_y, talos_params)
    303                 class_weight=class_weights,
    304                 verbose=internal_params["verbose"],
--> 305                 callbacks=model_callbacks + opti_callbacks,
    306             )
    307         return history, model

~/.pyenv/versions/miniconda3-4.3.30/envs/tf_gpu/lib/python3.6/site-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
     89                 warnings.warn('Update your `' + object_name + '` call to the ' +
     90                               'Keras 2 API: ' + signature, stacklevel=2)
---> 91             return func(*args, **kwargs)
     92         wrapper._original_function = func
     93         return wrapper

~/.pyenv/versions/miniconda3-4.3.30/envs/tf_gpu/lib/python3.6/site-packages/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   1730             use_multiprocessing=use_multiprocessing,
   1731             shuffle=shuffle,
-> 1732             initial_epoch=initial_epoch)
   1733 
   1734     @interfaces.legacy_generator_methods_support

~/.pyenv/versions/miniconda3-4.3.30/envs/tf_gpu/lib/python3.6/site-packages/keras/engine/training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
    218                                             sample_weight=sample_weight,
    219                                             class_weight=class_weight,
--> 220                                             reset_metrics=False)
    221 
    222                 outs = to_list(outs)

~/.pyenv/versions/miniconda3-4.3.30/envs/tf_gpu/lib/python3.6/site-packages/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight, reset_metrics)
   1512             ins = x + y + sample_weights
   1513         self._make_train_function()
-> 1514         outputs = self.train_function(ins)
   1515 
   1516         if reset_metrics:

~/.pyenv/versions/miniconda3-4.3.30/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/keras/backend.py in __call__(self, inputs)
   3290 
   3291     fetched = self._callable_fn(*array_vals,
-> 3292                                 run_metadata=self.run_metadata)
   3293     self._call_fetch_callbacks(fetched[-len(self._fetches):])
   3294     output_structure = nest.pack_sequence_as(

~/.pyenv/versions/miniconda3-4.3.30/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py in __call__(self, *args, **kwargs)
   1456         ret = tf_session.TF_SessionRunCallable(self._session._session,
   1457                                                self._handle, args,
-> 1458                                                run_metadata_ptr)
   1459         if run_metadata:
   1460           proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[16,64,208,208] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node GaussianNoise_preout/cond/add-1-TransposeNHWCToNCHW-LayoutOptimizer}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[metrics/acc/Identity/_1095]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted: OOM when allocating tensor with shape[16,64,208,208] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node GaussianNoise_preout/cond/add-1-TransposeNHWCToNCHW-LayoutOptimizer}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

mikkokotila · 2020-04-27T06:45:18Z

Can you run the input model in a loop a few times and see if you can get the same result. If yes, I suggest posting this directly to TensorFlow.

bjtho08 · 2020-04-27T07:48:09Z

Do you mean a simple loop like this:

model = create_model(*args, **kwargs)

for _ in range(6):
    model.compile(**kwargs)
    model.fit(train_generator, epochs=10)

Or should I try to add some sort of garbage collection to this?

mikkokotila · 2020-04-27T10:57:49Z

No, just the simplest possible loop.

bjtho08 · 2020-04-28T08:47:21Z

Is the above code example simple enough or can it be even simpler?

EDIT: BTW, can you maybe elaborate a bit on what was changed in talos 1.0? I tried upgrading to tf-2.2.0rc3 because it fixed a memory leak in fit method related to the keras sequence class.

bjtho08 · 2020-04-28T12:09:56Z

So far, when running a loop like the one I wrote earlier, I am not getting any ResourceExhaustedError. I have almost completed 5 iterations of the loop with 50 epochs pr iteration. With talos it crashed in the beginning of the fifth iteration.

bjtho08 · 2020-04-29T07:32:43Z

Okay, so the loop was set to do ten training sessions of 50 epochs since I knew that 50 epochs was enough to get the ResourceExhaustedError after 5 iterations in the talos Scan(). Now it has completed all 10 passes of the loop without any errors whatsoever. I assume this rules out it being a TensorFlow bug?

bjtho08 · 2020-04-29T21:18:07Z

For good measure, I redid the Scan() just to confirm that updating some of the packages did not alter the outcome. Rather than getting the ResourceExhaustedError, my kernel crashed completely (though it may still be due to a ResourceExhaustedError). Any ideas on how to proceed?

bjtho08 · 2020-05-01T08:00:44Z

I have now tested it on a different machine with a larger GPU (the NVIDIA Quatro RTX6000 with 24 GB ram) and the same thing happens.

bjtho08 · 2020-05-03T18:03:41Z

To summarize

The bug(?) appears on two systems with the below configuration(s):

Nvidia GTX 1080 TI or Nvidia Quattro RTX 6000
Nvidia 418.87.00
CUDA 10.1
CuDNN 7.6.5
Python >=3.7.6
Both TensorFlow 1.13, 2.1, and >=2.2.0rc2
Talos >= 0.6.0

It does not appear to happen if a model is compiled and fitted several times in a simple loop, which seem to eliminate the possibility of this being a TensorFlow problem.

mikkokotila · 2020-05-05T07:12:16Z

Is it possible for you to share a Jupyter notebook or colab which is self-contained, so I can just run and repeat.

Also, is create_model identical in both cases?

bjtho08 · 2020-05-07T10:07:17Z

Yes, create_model is identical. I will try and see if I can make a self-contained notebook. Currently, my solution has been to modify talos to accept a new boolean parameter allow_resume, which if True savew ParamSpace, list of keys/metrics and the various stores to files on disk and in the event of a crash (or interrupt) it will read these files and restore the important parts of the Scan object before executing scan_run(). It might sacrifice some efficiency, but it sure beats never getting to the finish line ;)

BTW, are there any special considerations behind doing method level imports rather than module/top level imports?

EDIT: If you want, you can have a look at my fork of talos and see what i changed. I haven't committed the latest addition yet, but the primary stuff is in place.

mikkokotila · 2020-11-09T17:19:46Z

Sorry, I totally missed this.

BTW, are there any special considerations behind doing method level imports rather than module/top level imports?

Yes. Chunks of code are self-contained, readability improves, import only if need etc.

How about we implement the above-said feature into v1.1?

bjtho08 · 2020-11-11T08:57:37Z

We could do that, but I'm not sure my hack is the best way to go at this point. It makes sense for me, but it could be a lot cleaner, I think. Perhaps storing everything in one file rather than having about three different files to read from :) I will be happy to show you the changes I made, though, and you can decide for yourself what you think of it.

bjtho08 · 2020-11-11T10:00:03Z

You can look at the changes and additions here: https://github.com/bjtho08/talos/tree/1.0.1-dev

mikkokotila · 2020-11-11T16:33:17Z

Thanks. Do I understand correctly, the feature is simply to:

allow storing a "restore point" as an option of Scan()
being able to refer to a file the "restore point" is stored

Is there anything I'm missing?

bjtho08 · 2020-11-12T08:39:59Z

Yep, that pretty much sums it up.

In the project directory (where the logging csv file is stored), three additional files are created: a pickle that contains the various stores from each run; a yaml that lists the remaining instances in the paramspace (dumped from self.param_object) and and a yaml file containing the self._all_keys, self._metric_keys, and self._val_keys.

As I said, it can most likely be done in a cleaner fashion. I just hacked this together in a few days to work around my issue with constant crashing after a few iterations :)

Kristin-Schwarzmuller · 2020-11-14T14:08:56Z

Hello,
have you already found a working solution or a workaround for your problem? Because I am currently facing the same issue.

mikkokotila · 2020-11-15T11:17:09Z

I will try to work on this next week.

Kristin-Schwarzmuller · 2020-12-18T12:20:34Z

Disabling eager execution solved the problem for me:
tf.compat.v1.disable_eager_execution()

mikkokotila · 2020-12-22T17:29:31Z

@MolineraNegra wonderful.

@bjtho08 can you confirm if this works for you?

bjtho08 · 2020-12-23T14:34:14Z

@mikkokotila
I'm confused, I thought keras disabled eager execution by default even for tf2.x?

mikkokotila self-assigned this Apr 27, 2020

mikkokotila added the topic: tensorflow relates with tensorflow backend label Apr 27, 2020

mikkokotila added value: ⭐⭐⭐ high value priority: MEDIUM medium priority labels Nov 15, 2020

mikkokotila mentioned this issue Mar 13, 2021

Resume run possible? #545

Closed

mikkokotila closed this as completed Sep 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ResourceExhaustedError after several iterations in a grid search #482

ResourceExhaustedError after several iterations in a grid search #482

bjtho08 commented Apr 19, 2020

bjtho08 commented Apr 19, 2020

bjtho08 commented Apr 20, 2020

mikkokotila commented Apr 21, 2020

bjtho08 commented Apr 21, 2020

mikkokotila commented Apr 21, 2020

mikkokotila commented Apr 21, 2020

bjtho08 commented Apr 22, 2020

mikkokotila commented Apr 23, 2020

bjtho08 commented Apr 23, 2020

mikkokotila commented Apr 23, 2020

bjtho08 commented Apr 23, 2020

bjtho08 commented Apr 23, 2020

mikkokotila commented Apr 27, 2020

bjtho08 commented Apr 27, 2020

mikkokotila commented Apr 27, 2020

bjtho08 commented Apr 28, 2020 •

edited

bjtho08 commented Apr 28, 2020

bjtho08 commented Apr 29, 2020

bjtho08 commented Apr 29, 2020

bjtho08 commented May 1, 2020

bjtho08 commented May 3, 2020

mikkokotila commented May 5, 2020 •

edited

bjtho08 commented May 7, 2020 •

edited

mikkokotila commented Nov 9, 2020

bjtho08 commented Nov 11, 2020

bjtho08 commented Nov 11, 2020

mikkokotila commented Nov 11, 2020

bjtho08 commented Nov 12, 2020

Kristin-Schwarzmuller commented Nov 14, 2020

mikkokotila commented Nov 15, 2020

Kristin-Schwarzmuller commented Dec 18, 2020

mikkokotila commented Dec 22, 2020

bjtho08 commented Dec 23, 2020

ResourceExhaustedError after several iterations in a grid search #482

ResourceExhaustedError after several iterations in a grid search #482

Comments

bjtho08 commented Apr 19, 2020

1) Confirm the below

2) Include the output of:

3) Explain clearly what you are trying to achieve

bjtho08 commented Apr 19, 2020

bjtho08 commented Apr 20, 2020

mikkokotila commented Apr 21, 2020

bjtho08 commented Apr 21, 2020

mikkokotila commented Apr 21, 2020

mikkokotila commented Apr 21, 2020

bjtho08 commented Apr 22, 2020

mikkokotila commented Apr 23, 2020

bjtho08 commented Apr 23, 2020

mikkokotila commented Apr 23, 2020

bjtho08 commented Apr 23, 2020

bjtho08 commented Apr 23, 2020

mikkokotila commented Apr 27, 2020

bjtho08 commented Apr 27, 2020

mikkokotila commented Apr 27, 2020

bjtho08 commented Apr 28, 2020 • edited

bjtho08 commented Apr 28, 2020

bjtho08 commented Apr 29, 2020

bjtho08 commented Apr 29, 2020

bjtho08 commented May 1, 2020

bjtho08 commented May 3, 2020

To summarize

mikkokotila commented May 5, 2020 • edited

bjtho08 commented May 7, 2020 • edited

mikkokotila commented Nov 9, 2020

bjtho08 commented Nov 11, 2020

bjtho08 commented Nov 11, 2020

mikkokotila commented Nov 11, 2020

bjtho08 commented Nov 12, 2020

Kristin-Schwarzmuller commented Nov 14, 2020

mikkokotila commented Nov 15, 2020

Kristin-Schwarzmuller commented Dec 18, 2020

mikkokotila commented Dec 22, 2020

bjtho08 commented Dec 23, 2020

bjtho08 commented Apr 28, 2020 •

edited

mikkokotila commented May 5, 2020 •

edited

bjtho08 commented May 7, 2020 •

edited