[Docs] Making sense of batch processing and tasks #2503

EvenStrangest · 2024-04-01T20:33:06Z

I'm trying to correctly express a single GP model applied independently along the batch dimension (as if by a for-loop with traditional_batch_length iterations), but have failed. Reading code, documentation, and examples, I'm finding myself confused by the meaning of batch_shape in GPyTorch (and how it relates to the traditional meaning of the term batch in deep learning), how it relates to "multitask", and by the different variational strategies. I'd appreciate any help. Please excuse the venue.

More specifically, I'm doing so in the context of DUE. The following is a snippet from within DUE, slightly adapted for brevity.

class GP(ApproximateGP):
    def __init__(
        self,
        num_outputs,
        initial_lengthscale,
        initial_inducing_points,
    ):
        n_inducing_points = initial_inducing_points.shape[0]

        if num_outputs > 1:
            batch_shape = torch.Size([num_outputs])
        else:
            batch_shape = torch.Size([])

        variational_distribution = CholeskyVariationalDistribution(
            n_inducing_points, batch_shape=batch_shape
        )

        variational_strategy = VariationalStrategy(
            self, initial_inducing_points, variational_distribution
        )

        if num_outputs > 1:
            variational_strategy = IndependentMultitaskVariationalStrategy(
                variational_strategy, num_tasks=num_outputs
            )

        super().__init__(variational_strategy)

        kwargs = {
            "batch_shape": batch_shape,
        }

        kernel = RBFKernel(**kwargs)
        kernel.lengthscale = initial_lengthscale * torch.ones_like(kernel.lengthscale)

        self.mean_module = ConstantMean(batch_shape=batch_shape)
        self.covar_module = ScaleKernel(kernel, batch_shape=batch_shape)

    def forward(self, x):
        mean = self.mean_module(x)
        covar = self.covar_module(x)

        return MultivariateNormal(mean, covar)

Applying this as follows (eventually, considering lazy eval. here and there) produces a CUDA OOM error, due to a dense covariance matrix with side traditional_batch_length.

feature_length = 128
traditional_batch_length = 100_000
initial_inducing_points= torch.rand(20, feature_length)
initial_lengthscale = 1.0
features = torch.rand(traditional_batch_length, feature_length)`
num_outputs = 1
gp = GP(num_outputs , initial_lengthscale , initial_inducing_points)
gp(features)  # (eventually) produces CUDA OOM

Here, by traditional_batch_length I mean that this is a traditional batch in the sense common in deep learning.
Needless to say, this also fails the same way with num_outputs > 1.

The text was updated successfully, but these errors were encountered:

EvenStrangest added the documentation label Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docs] Making sense of batch processing and tasks #2503

[Docs] Making sense of batch processing and tasks #2503

EvenStrangest commented Apr 1, 2024

[Docs] Making sense of batch processing and tasks #2503

[Docs] Making sense of batch processing and tasks #2503

Comments

EvenStrangest commented Apr 1, 2024