New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: expected scalar type Long but found Int in [cd_spot_the_diff_mnist_wine.ipynb] #411
Comments
@tomaszek0 what's your |
@jklaise !pip install torch gave this output: |
I can't reproduce on a Linux machine. Have you had any problems training regular Pytorch models? Specifically evaluating loss functions (which is where the error comes from)? This is most likely a Pytorch issue, possible related to your setup (Windows + AMD cpu), there are a few issues that come up about this: https://github.com/pytorch/pytorch/search?q=expected+scalar+type+long+but+found+int&type=issues I would suggest trying to train some vanilla Pytorch models first and see if the same or similar issues happen. |
@jklaise, Thank you for your suggestions. The output of the code (import torch // torch.cuda.is_available()): False
W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found Testing on CPU Intel Core i5 and GPU NVIDIA GTX460 (336 CUDA cores, 32 ROPs, and 56 texture units; driver no. 23.21.13.8813 _2017): The output of the code (import torch // torch.cuda.is_available()): False
Traceback (most recent call last): preds_h0 = cd.predict(x_h0) RuntimeError: expected scalar type Long but found Int Testing on CPU Intel Core i5 and GPU NVIDIA GTX460 (336 CUDA cores, 32 ROPs, and 56 texture units; updated driver no. 23.21.13.9135 _2018): The output of the code (import torch // torch.cuda.is_available()): False As mentioned on https://discuss.pytorch.org/t/pytorch-nvidia-gtx460-version/62461 GTX 460 is too weak for PyTorch. |
@tomaszek0 thanks for that, this gives something to work with. It's encouraging that the basic Pytorch training tutorial works. On the 1st CUDA free system you say that the first few lines failed, but it looks like it's just the usual Tensorflow trying to look for CUDA and not finding it, hence reverting back to CPU. Can you confirm that there is no actual runtime error raised? It looks like for all 3 scenarios the execution was attempted on CPU (due to the older unsupported GPU as you mention). The code should be executable in the same way in either CPU or GPU, so I think we can disregard any GPU presence for further debugging. I think the best thing to do is check what the types are before hitting the error with a debugger. The steps would be to make a script with the code that causes the error, set a breakpoint at the line where the error happens For reference, running this on my environment I have the following dtypes: Would you be able to run through these steps? EDIT: Even more simply, can you evaluate |
@jklaise, So far here is what I have found: It is a bit strange. So for comparison, I executed the tutorial entitled “Online detection with MMD (Maximum Mean Discrepancy) and Pytorch” (https://docs.seldon.io/projects/alibi-detect/en/latest/examples/cd_online_wine.html). The whole tutorial code gave the expected output in both AMD and Intel-based systems (Jupyter Notebook). The running code on the AMD based system in the Visual Studio gave obvious logs: “W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found and yield the expected result (chart). After the execution of the next code segment that defines online drift detector "from alibi_detect.cd import MMDDriftOnline" I gave these logs and the expected output: “W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found Using the next code segments also yields the expected output (but some computations without displaying the charts). I also checked the outputs of the code: print(torch.rand(3,3).cuda()) conda install cuda -c nvidia It seems that installation TensorFlow and Pytorch in the same environment results in some issues. |
@tomaszek0 thanks for that. I'm not sure I fully follow, it seems you have both |
@jklaise, |
@tomaszek0 another way of doing it is writing |
@jklaise, thanks for the lead. I found something like these (in pdb).
|
@tomaszek0 can you try evaluating Curiously, on my machine |
Oh dear, I think I've found the issue and it's a Here we create labels to distinguish the reference and test dataset which should just be integers: numpy/numpy#17640 According to the above, in Windows I think a fix from our end would be to change from |
@jklaise, I obtained the same debugging output on Intel and AMD-based machines. (pdb) p loss_fn(y_hat.detach(), y) |
@jklaise, |
@tomaszek0 great to hear that worked. We will be submitting a fix for this soon as well as executing our test suite on a Windows platform to catch these types of bugs early. For the "no bars" issue, are they completely absent from the figure or just close to zero? Perhaps @ojcobb can comment. |
@jklaise, |
@tomaszek0 we're aware of this (see #390), there is some uncontrolled randomness still which requires further investigation. That being said, for these use cases there is some merit in not relying on reproducibility as one may fall into the trap of being lucky with the randomness and getting results that are not fully representative (although I'm aware it raises some questions about the robustness of the method @ojcobb @ascillitoe). |
I am getting the following error when trying to execute code (in [10] section "Interpretable Drift Detection on the Wine Quality Dataset"):
RuntimeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_3564/2414863533.py in
10 )
11
---> 12 preds_h0 = cd.predict(x_h0)
13 preds_corr = cd.predict(x_corr)
~\AppData\Roaming\Python\Python39\site-packages\alibi_detect\cd\spot_the_diff.py in predict(self, x, return_p_val, return_distance, return_probs, return_model)
173 data, and the trained model.
174 """
--> 175 return self._detector.predict(x, return_p_val, return_distance, return_probs, return_model)
~\AppData\Roaming\Python\Python39\site-packages\alibi_detect\cd\pytorch\spot_the_diff.py in predict(self, x, return_p_val, return_distance, return_probs, return_model)
212 data, and the trained model.
213 """
--> 214 preds = self._detector.predict(x, return_p_val, return_distance, return_probs, return_model=True)
215 preds['data']['diffs'] = preds['data']['model'].diffs.detach().cpu().numpy() # type: ignore
216 preds['data']['diff_coeffs'] = preds['data']['model'].coeffs.detach().cpu().numpy() # type: ignore
~\AppData\Roaming\Python\Python39\site-packages\alibi_detect\cd\base.py in predict(self, x, return_p_val, return_distance, return_probs, return_model)
241 """
242 # compute drift scores
--> 243 p_val, dist, probs_ref, probs_test = self.score(x)
244 drift_pred = int(p_val < self.p_val)
245
~\AppData\Roaming\Python\Python39\site-packages\alibi_detect\cd\pytorch\classifier.py in score(self, x)
182 self.model = self.model.to(self.device)
183 train_args = [self.model, self.loss_fn, dl_tr, self.device]
--> 184 trainer(*train_args, **self.train_kwargs) # type: ignore
185 preds = self.predict_fn(x_te, self.model.eval())
186 preds_oof_list.append(preds)
~\AppData\Roaming\Python\Python39\site-packages\alibi_detect\models\pytorch\trainer.py in trainer(model, loss_fn, dataloader, device, optimizer, learning_rate, preprocess_fn, epochs, reg_loss_fn, verbose)
53 y_hat = model(x)
54 optimizer.zero_grad() # type: ignore
---> 55 loss = loss_fn(y_hat, y) + reg_loss_fn(model)
56 loss.backward()
57 optimizer.step() # type: ignore
~\anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1101 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102 return forward_call(*input, **kwargs)
1103 # Do not call functions when jit is used
1104 full_backward_hooks, non_full_backward_hooks = [], []
~\anaconda3\lib\site-packages\torch\nn\modules\loss.py in forward(self, input, target)
1148
1149 def forward(self, input: Tensor, target: Tensor) -> Tensor:
-> 1150 return F.cross_entropy(input, target, weight=self.weight,
1151 ignore_index=self.ignore_index, reduction=self.reduction,
1152 label_smoothing=self.label_smoothing)
~\anaconda3\lib\site-packages\torch\nn\functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
2844 if size_average is not None or reduce is not None:
2845 reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2846 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
2847
2848
RuntimeError: expected scalar type Long but found Int
I use Python 3.8.8/ Win10 installed on the AMD Ryzen with integrated graphics (AMD).
The text was updated successfully, but these errors were encountered: