Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Making sense of batch processing and tasks #2503

Open
EvenStrangest opened this issue Apr 1, 2024 · 0 comments
Open

[Docs] Making sense of batch processing and tasks #2503

EvenStrangest opened this issue Apr 1, 2024 · 0 comments

Comments

@EvenStrangest
Copy link

I'm trying to correctly express a single GP model applied independently along the batch dimension (as if by a for-loop with traditional_batch_length iterations), but have failed. Reading code, documentation, and examples, I'm finding myself confused by the meaning of batch_shape in GPyTorch (and how it relates to the traditional meaning of the term batch in deep learning), how it relates to "multitask", and by the different variational strategies. I'd appreciate any help. Please excuse the venue.

More specifically, I'm doing so in the context of DUE. The following is a snippet from within DUE, slightly adapted for brevity.

class GP(ApproximateGP):
    def __init__(
        self,
        num_outputs,
        initial_lengthscale,
        initial_inducing_points,
    ):
        n_inducing_points = initial_inducing_points.shape[0]

        if num_outputs > 1:
            batch_shape = torch.Size([num_outputs])
        else:
            batch_shape = torch.Size([])

        variational_distribution = CholeskyVariationalDistribution(
            n_inducing_points, batch_shape=batch_shape
        )

        variational_strategy = VariationalStrategy(
            self, initial_inducing_points, variational_distribution
        )

        if num_outputs > 1:
            variational_strategy = IndependentMultitaskVariationalStrategy(
                variational_strategy, num_tasks=num_outputs
            )

        super().__init__(variational_strategy)

        kwargs = {
            "batch_shape": batch_shape,
        }

        kernel = RBFKernel(**kwargs)
        kernel.lengthscale = initial_lengthscale * torch.ones_like(kernel.lengthscale)

        self.mean_module = ConstantMean(batch_shape=batch_shape)
        self.covar_module = ScaleKernel(kernel, batch_shape=batch_shape)

    def forward(self, x):
        mean = self.mean_module(x)
        covar = self.covar_module(x)

        return MultivariateNormal(mean, covar)

Applying this as follows (eventually, considering lazy eval. here and there) produces a CUDA OOM error, due to a dense covariance matrix with side traditional_batch_length.

feature_length = 128
traditional_batch_length = 100_000
initial_inducing_points= torch.rand(20, feature_length)
initial_lengthscale = 1.0
features = torch.rand(traditional_batch_length, feature_length)`
num_outputs = 1
gp = GP(num_outputs , initial_lengthscale , initial_inducing_points)
gp(features)  # (eventually) produces CUDA OOM

Here, by traditional_batch_length I mean that this is a traditional batch in the sense common in deep learning.
Needless to say, this also fails the same way with num_outputs > 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant