slicetca.grid_search returned nan #1

441YSK441 · 2024-06-10T05:11:29Z

Hi. Thank you for developing the great tool for analysis.

When I run the below code in sliceTCA_notebook_1.ipynb, the value of loss_grid was "nan" while If I deleted "mask_train" and "mask_test", loss_grid returned something. Do you know how to solve this problem? I did not modify any part except "sample_size".

# this will take a while to run as it fits 3*3*3*4 = 108 models
loss_grid, seed_grid = slicetca.grid_search(reconstructed_noisy_tensor,
                                            min_ranks = [3, 0, 0],
                                            max_ranks = [5, 2, 2],
                                            sample_size=1,
                                            mask_train=train_mask,
                                            mask_test=test_mask,
                                            processes_grid=4,
                                            seed=1,
                                            min_std=10**-4,
                                            learning_rate=5*10**-3,
                                            max_iter=10**4,
                                            positive=True)

Another requirement
Could you share an additional code which describe the flow of analysis in figure 3 of Pellegrino et al paper?

The text was updated successfully, but these errors were encountered:

arthur-pe · 2024-06-11T14:57:15Z

Hi, when I run the grid_search with a mask I don't get nans in loss_grid, so I'd need a bit more information to reproduce the bug:

Could you let us know your Python and Pytorch versions (or if you are running the notebook on Colab)?
Can you share the arguments you are passing to slicetca.block_mask? One way I could see the loss be nan only when using a mask is if your train or test mask is False for all entries.

The pipeline schematized in Fig. 3. is roughly what is done in the notebook.

441YSK441 · 2024-06-12T11:15:02Z

Thank you for the reply.

I used Python 3.10.12 and torch 2.3.0 + cu121.
I did not change any content of the code except "sample_size (4 to 1)" in "slicetca.grid_search" for faster calculation.
When I used Colab, I could get correct loss_grid value.

One thing I notice is that after running "slicetca.grid_search" in my environment, "reconstructed_noisy_tensor" is changed to the matrices containing only zero and "train_mask" and "test_mask" are changed to the matrix containing only False. (before running "slicetca.grid_search", the value of these matrices are normal.)

Another thing is that the number of true and false in train_mask and test_mask is different between my environment and Colab environment (as shown below). Do you have any idea about the cause of these problem?

The number of true in train_mask: 6024557
The number of false in train_mask: 1209943
The number of true in test_mask: 672370
The number of false in test_mask: 6562130

Colab: The number of true in train_mask: 6027016
Colab: The number of false in train_mask: 1207484
Colab: The number of true in test_mask: 672298
Colab: The number of false in test_mask: 6562202

arthur-pe · 2024-06-17T20:10:46Z

I tried running the example notebook with Python 3.10.12 and torch 2.3.0 but I still can't reproduce the issue.

Perhaps you could try to run this to check this is not an issue with the notebook:

device = ('cuda' if torch.cuda.is_available() else 'cpu')

T = torch.randn((10, 10, 10), device=device)

mask_train, mask_test = slicetca.block_mask(list(T.shape), [1, 0, 1], [1, 0, 0], fraction_test=0.1, device=device)

loss_grid, seed_grid = slicetca.grid_search(T, mask_train=mask_train, mask_test=mask_test, min_ranks=[0, 0, 0], max_ranks=[1, 0, 1], max_iter=2)

print(loss_grid)

I indeed get a non-nan loss_grid. The test_mask doesn't get modified. Note that to check the proportion of masked entries you can do print(test_mask.float().mean())

Regarding the number of masked entries, I believe this is just a difference in the RNG seeds.

441YSK441 · 2024-06-20T01:36:29Z

When I run the given code once in my setup, loss_grid was nan. However when I run twice, I could get non-nan loss_grid.
In addition, when I run on Mac environment (I used to run the codes on Windows), I could get non-nan loss_grid by any code.

I'm sorry for the ambiguous comments. It's just case reports.
I totally don't know what the cause of the problem, but I could get values by using Mac. Maybe I will use Mac to calculate loss_grid. Thank you.

arthur-pe added the bug Something isn't working label Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slicetca.grid_search returned nan #1

slicetca.grid_search returned nan #1

441YSK441 commented Jun 10, 2024 •

edited

Loading

arthur-pe commented Jun 11, 2024

441YSK441 commented Jun 12, 2024

arthur-pe commented Jun 17, 2024

441YSK441 commented Jun 20, 2024

slicetca.grid_search returned nan #1

slicetca.grid_search returned nan #1

Comments

441YSK441 commented Jun 10, 2024 • edited Loading

arthur-pe commented Jun 11, 2024

441YSK441 commented Jun 12, 2024

arthur-pe commented Jun 17, 2024

441YSK441 commented Jun 20, 2024

441YSK441 commented Jun 10, 2024 •

edited

Loading