Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slicetca.grid_search returned nan #1

Open
441YSK441 opened this issue Jun 10, 2024 · 4 comments
Open

slicetca.grid_search returned nan #1

441YSK441 opened this issue Jun 10, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@441YSK441
Copy link

441YSK441 commented Jun 10, 2024

Hi. Thank you for developing the great tool for analysis.

When I run the below code in sliceTCA_notebook_1.ipynb, the value of loss_grid was "nan" while If I deleted "mask_train" and "mask_test", loss_grid returned something. Do you know how to solve this problem? I did not modify any part except "sample_size".

# this will take a while to run as it fits 3*3*3*4 = 108 models
loss_grid, seed_grid = slicetca.grid_search(reconstructed_noisy_tensor,
                                            min_ranks = [3, 0, 0],
                                            max_ranks = [5, 2, 2],
                                            sample_size=1,
                                            mask_train=train_mask,
                                            mask_test=test_mask,
                                            processes_grid=4,
                                            seed=1,
                                            min_std=10**-4,
                                            learning_rate=5*10**-3,
                                            max_iter=10**4,
                                            positive=True)

Another requirement
Could you share an additional code which describe the flow of analysis in figure 3 of Pellegrino et al paper?

@arthur-pe arthur-pe added the bug Something isn't working label Jun 11, 2024
@arthur-pe
Copy link
Owner

Hi, when I run the grid_search with a mask I don't get nans in loss_grid, so I'd need a bit more information to reproduce the bug:

  • Could you let us know your Python and Pytorch versions (or if you are running the notebook on Colab)?
  • Can you share the arguments you are passing to slicetca.block_mask? One way I could see the loss be nan only when using a mask is if your train or test mask is False for all entries.

The pipeline schematized in Fig. 3. is roughly what is done in the notebook.

@441YSK441
Copy link
Author

Thank you for the reply.

  1. I used Python 3.10.12 and torch 2.3.0 + cu121.
  2. I did not change any content of the code except "sample_size (4 to 1)" in "slicetca.grid_search" for faster calculation.
    When I used Colab, I could get correct loss_grid value.

One thing I notice is that after running "slicetca.grid_search" in my environment, "reconstructed_noisy_tensor" is changed to the matrices containing only zero and "train_mask" and "test_mask" are changed to the matrix containing only False. (before running "slicetca.grid_search", the value of these matrices are normal.)

Another thing is that the number of true and false in train_mask and test_mask is different between my environment and Colab environment (as shown below). Do you have any idea about the cause of these problem?

The number of true in train_mask: 6024557
The number of false in train_mask: 1209943
The number of true in test_mask: 672370
The number of false in test_mask: 6562130

Colab: The number of true in train_mask: 6027016
Colab: The number of false in train_mask: 1207484
Colab: The number of true in test_mask: 672298
Colab: The number of false in test_mask: 6562202

@arthur-pe
Copy link
Owner

I tried running the example notebook with Python 3.10.12 and torch 2.3.0 but I still can't reproduce the issue.

Perhaps you could try to run this to check this is not an issue with the notebook:

device = ('cuda' if torch.cuda.is_available() else 'cpu')

T = torch.randn((10, 10, 10), device=device)

mask_train, mask_test = slicetca.block_mask(list(T.shape), [1, 0, 1], [1, 0, 0], fraction_test=0.1, device=device)

loss_grid, seed_grid = slicetca.grid_search(T, mask_train=mask_train, mask_test=mask_test, min_ranks=[0, 0, 0], max_ranks=[1, 0, 1], max_iter=2)

print(loss_grid)

I indeed get a non-nan loss_grid. The test_mask doesn't get modified. Note that to check the proportion of masked entries you can do print(test_mask.float().mean())

Regarding the number of masked entries, I believe this is just a difference in the RNG seeds.

@441YSK441
Copy link
Author

When I run the given code once in my setup, loss_grid was nan. However when I run twice, I could get non-nan loss_grid.
In addition, when I run on Mac environment (I used to run the codes on Windows), I could get non-nan loss_grid by any code.

I'm sorry for the ambiguous comments. It's just case reports.
I totally don't know what the cause of the problem, but I could get values by using Mac. Maybe I will use Mac to calculate loss_grid. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants