Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensemble for PyTorch Geometric #105

Open
ParasKoundal opened this issue Jan 1, 2022 · 8 comments
Open

Ensemble for PyTorch Geometric #105

ParasKoundal opened this issue Jan 1, 2022 · 8 comments
Labels
enhancement New feature or request

Comments

@ParasKoundal
Copy link

Hi,
I want to use Ensemble-PyTorch with PyTorch-Geometric. However, it doesn't recognize the dataloaders.

Is this under development or a bug.

@xuyxu
Copy link
Member

xuyxu commented Jan 3, 2022

Hi @ParasKoundal, could you provide the code snippet on using dataloaders with graph data, so that we can take a closer look.

@ParasKoundal
Copy link
Author

ParasKoundal commented Jan 3, 2022

@xuyxu
It is simple.

.....
from torch_geometric.loader import DataLoader
.......

train_loader = DataLoader(train_dataset, batch_size= batch_s, shuffle=True,drop_last=True)
val_loader = DataLoader(val_dataset, batch_size=2,drop_last=True)
test_loader = DataLoader(test_dataset, batch_size=2,drop_last=True)
.......

I have created a custom class to preprocess dataset before loading into dataloader.

After that I was trying as given in https://ensemble-pytorch.readthedocs.io/en/latest/quick_start.html. For regression I tried initially with VotingRegressor, doesn't work (error given in the initial issue raised). Similar with others too.

@xuyxu
Copy link
Member

xuyxu commented Jan 3, 2022

Could you further provide the full exception traceback, thanks!

@ParasKoundal
Copy link
Author

@xuyxu

Here's that

test_loader=test_loader
  File "/cr/data02/koundal/applications/gpu-project/lib/python3.7/site-packages/torchensemble/bagging.py", line 329, in fit
    self.n_outputs = self._decide_n_outputs(train_loader)
  File "/cr/data02/koundal/applications/gpu-project/lib/python3.7/site-packages/torchensemble/_base.py", line 267, in _decide_n_outputs
    _, target = split_data_target(elem, self.device)
  File "/cr/data02/koundal/applications/gpu-project/lib/python3.7/site-packages/torchensemble/utils/io.py", line 84, in split_data_target
    raise ValueError(msg)
ValueError: Invalid dataloader, please check if the input dataloder is valid.

@xuyxu
Copy link
Member

xuyxu commented Jan 6, 2022

This could possibly be the side-effect of the commit from issue #75. Will see if this could be fixed in a few days, thanks for reporting @ParasKoundal !

@xuyxu xuyxu added the enhancement New feature or request label Jan 6, 2022
@ParasKoundal
Copy link
Author

@xuyxu Any update on this?

@xuyxu
Copy link
Member

xuyxu commented Jan 12, 2022

Hi @ParasKoundal, sorry, I am kind of busy these days, and will take a look during the next weekend.

@xuyxu
Copy link
Member

xuyxu commented Jan 16, 2022

In torchensemble, at each iteration the input loader is expected to return a list in the following forms:

  • [data_tensor, target_tensor]
  • [data_tensor_1, data_tensor_2, ..., target_tensor]

The first kind of form is the most widely-used form of the dataloader (i.e., for batch_idx, (data, target) in enumerate(loader)), while the second one comes from the feature request from #75 to support multiple input tensors.

However, the dataloder in pytorch geometric conforms to neither of them:

  • (positive_batch, negative_batch)

which does not contain a target tensor since the label is simply the index of the batch in the tuple returned.

Here is a simple solution, please let me know if it solves your problem on using torchensmeble models in pytorch geometric. The general idea is to override the _sample method. Taking metapath2vec as an example, we could declare a new class like:

from torch_geometric.nn import MetaPath2Vec

class CustomMetaPath2Vec(MetaPath2Vec):

    def _sample(self, batch: List[int]) -> Tuple[Tensor, Tensor]:
        if not isinstance(batch, Tensor):
            batch = torch.tensor(batch, dtype=torch.long)
        
        pos_sample = self._pos_sample(batch)
        neg_sample = self._neg_sample(batch)
        
        data = torch.cat((pos_sample, neg_sample), dim=0)
        target = torch.cat(
            (torch.ones(pos_sample.size(0)), torch.zeros(neg_sample.size(0))),
            dim=0
        )

        return [data, target]

Using this new class, the positive_batch and negative_batch will be concatenated as one tensor data, and you can identify them via the target tensor.

In addition, some extra steps are required in the forward function of downstream base estimators.

Looking forward to your kind reply @ParasKoundal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants