fix: stratified train/test split in notebook by meilame-tayebjee · Pull Request #60 · InseeFrLab/torch-fastText

meilame-tayebjee · 2025-05-19T11:33:38Z

avoid errors due to smaller num_classes than the max label

micedre · 2025-05-19T12:28:59Z

Still have this error when training on GPU (Nvidia T4 15G) :

2025-05-19 12:26:12 - torchFastText.torchFastText - Checking inputs...
2025-05-19 12:26:12 - torchFastText.torchFastText - Inputs successfully checked. Starting the training process..
2025-05-19 12:26:12 - torchFastText.torchFastText - Running on: cuda
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[31], line 1
----> 1 model.train(
      2     X_train,
      3     y_train,
      4     X_test,
      5     y_test,
      6     num_epochs=parameters_train['num_epochs'],
      7     batch_size=parameters_train['batch_size'],
      8     patience_scheduler=parameters_train['patience'],
      9     patience_train=parameters_train['patience'],
     10     lr=parameters_train['lr'],
     11     verbose = True
     12 )

File /usr/local/lib/python3.12/site-packages/torchFastText/torchFastText.py:589, in torchFastText.train(self, X_train, y_train, X_val, y_val, num_epochs, batch_size, cpu_run, num_workers, optimizer, optimizer_params, lr, scheduler, patience_scheduler, loss, patience_train, verbose, trainer_params)
    586         end = time.time()
    587         logger.info("Model successfully built in {:.2f} seconds.".format(end - start))
--> 589 self.pytorch_model = self.pytorch_model.to(self.device)
    591 # Dataloaders
    592 train_dataloader, val_dataloader = self.__build_data_loaders(
    593     train_categorical_variables=train_categorical_variables,
    594     training_text=training_text,
   (...)    600     num_workers=num_workers,
    601 )

File /usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py:1355, in Module.to(self, *args, **kwargs)
   1352         else:
   1353             raise
-> 1355 return self._apply(convert)

File /usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py:915, in Module._apply(self, fn, recurse)
    913 if recurse:
    914     for module in self.children():
--> 915         module._apply(fn)
    917 def compute_should_use_set_data(tensor, tensor_applied):
    918     if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
    919         # If the new tensor has compatible tensor type as the existing tensor,
    920         # the current behavior is to change the tensor in-place using `.data =`,
   (...)    925         # global flag to let the user control whether they want the future
    926         # behavior of overwriting the existing tensor or not.

File /usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py:942, in Module._apply(self, fn, recurse)
    938 # Tensors stored in modules are graph leaves, and we don't want to
    939 # track autograd history of `param_applied`, so we have to use
    940 # `with torch.no_grad():`
    941 with torch.no_grad():
--> 942     param_applied = fn(param)
    943 p_should_use_set_data = compute_should_use_set_data(param, param_applied)
    945 # subclasses may have multiple child tensors so we need to use swap_tensors

File /usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py:1341, in Module.to.<locals>.convert(t)
   1334     if convert_to_format is not None and t.dim() in (4, 5):
   1335         return t.to(
   1336             device,
   1337             dtype if t.is_floating_point() or t.is_complex() else None,
   1338             non_blocking,
   1339             memory_format=convert_to_format,
   1340         )
-> 1341     return t.to(
   1342         device,
   1343         dtype if t.is_floating_point() or t.is_complex() else None,
   1344         non_blocking,
   1345     )
   1346 except NotImplementedError as e:
   1347     if str(e) == "Cannot copy out of meta tensor; no data!":

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

micedre · 2025-05-19T14:18:25Z

+      ],
      "source": [
        "# Stable version\n",
        "pip install torchFastText \n",


Suggested change

"pip install torchFastText \n",

"!pip install torchFastText \n",

micedre · 2025-05-20T07:48:07Z

      "metadata": {},
      "outputs": [],
      "source": [
        "model.load_from_checkpoint(model.best_model_path) # or any other checkpoint path (string)"


This line cause an error when predicting (when training was done on GPU)

micedre · 2025-05-20T07:52:51Z

Still have this error when training on GPU (Nvidia T4 15G) :

2025-05-19 12:26:12 - torchFastText.torchFastText - Checking inputs...
2025-05-19 12:26:12 - torchFastText.torchFastText - Inputs successfully checked. Starting the training process..
2025-05-19 12:26:12 - torchFastText.torchFastText - Running on: cuda
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[31], line 1
----> 1 model.train(
      2     X_train,
      3     y_train,
      4     X_test,
      5     y_test,
      6     num_epochs=parameters_train['num_epochs'],
      7     batch_size=parameters_train['batch_size'],
      8     patience_scheduler=parameters_train['patience'],
      9     patience_train=parameters_train['patience'],
     10     lr=parameters_train['lr'],
     11     verbose = True
     12 )

File /usr/local/lib/python3.12/site-packages/torchFastText/torchFastText.py:589, in torchFastText.train(self, X_train, y_train, X_val, y_val, num_epochs, batch_size, cpu_run, num_workers, optimizer, optimizer_params, lr, scheduler, patience_scheduler, loss, patience_train, verbose, trainer_params)
    586         end = time.time()
    587         logger.info("Model successfully built in {:.2f} seconds.".format(end - start))
--> 589 self.pytorch_model = self.pytorch_model.to(self.device)
    591 # Dataloaders
    592 train_dataloader, val_dataloader = self.__build_data_loaders(
    593     train_categorical_variables=train_categorical_variables,
    594     training_text=training_text,
   (...)    600     num_workers=num_workers,
    601 )

File /usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py:1355, in Module.to(self, *args, **kwargs)
   1352         else:
   1353             raise
-> 1355 return self._apply(convert)

File /usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py:915, in Module._apply(self, fn, recurse)
    913 if recurse:
    914     for module in self.children():
--> 915         module._apply(fn)
    917 def compute_should_use_set_data(tensor, tensor_applied):
    918     if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
    919         # If the new tensor has compatible tensor type as the existing tensor,
    920         # the current behavior is to change the tensor in-place using `.data =`,
   (...)    925         # global flag to let the user control whether they want the future
    926         # behavior of overwriting the existing tensor or not.

File /usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py:942, in Module._apply(self, fn, recurse)
    938 # Tensors stored in modules are graph leaves, and we don't want to
    939 # track autograd history of `param_applied`, so we have to use
    940 # `with torch.no_grad():`
    941 with torch.no_grad():
--> 942     param_applied = fn(param)
    943 p_should_use_set_data = compute_should_use_set_data(param, param_applied)
    945 # subclasses may have multiple child tensors so we need to use swap_tensors

File /usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py:1341, in Module.to.<locals>.convert(t)
   1334     if convert_to_format is not None and t.dim() in (4, 5):
   1335         return t.to(
   1336             device,
   1337             dtype if t.is_floating_point() or t.is_complex() else None,
   1338             non_blocking,
   1339             memory_format=convert_to_format,
   1340         )
-> 1341     return t.to(
   1342         device,
   1343         dtype if t.is_floating_point() or t.is_complex() else None,
   1344         non_blocking,
   1345     )
   1346 except NotImplementedError as e:
   1347     if str(e) == "Cannot copy out of meta tensor; no data!":

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Could not reproduce it today... Either way, it works with cpu.

fix: stratified train/test split in notebook

0b6487a

avoid errors due to smaller num_classes than the max label

meilame-tayebjee requested a review from micedre May 19, 2025 11:33

micedre reviewed May 19, 2025

View reviewed changes

micedre reviewed May 20, 2025

View reviewed changes

micedre approved these changes Jun 25, 2025

View reviewed changes

micedre merged commit 156d374 into main Jun 25, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: stratified train/test split in notebook#60

fix: stratified train/test split in notebook#60
micedre merged 1 commit intomainfrom
54-failure-in-trying-to-reproduce-the-example-notebook-provided-by-the-repository

meilame-tayebjee commented May 19, 2025

Uh oh!

micedre commented May 19, 2025

Uh oh!

micedre May 19, 2025

Uh oh!

Uh oh!

micedre May 20, 2025

Uh oh!

micedre commented May 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	"pip install torchFastText \n",
	"!pip install torchFastText \n",

Conversation

meilame-tayebjee commented May 19, 2025

Uh oh!

micedre commented May 19, 2025

Uh oh!

micedre May 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

micedre May 20, 2025

Choose a reason for hiding this comment

Uh oh!

micedre commented May 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants