Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Extreme loss oscillation during training #2484

Open
sanaamouzahir opened this issue Feb 27, 2024 · 0 comments
Open

[Bug] Extreme loss oscillation during training #2484

sanaamouzahir opened this issue Feb 27, 2024 · 0 comments
Labels

Comments

@sanaamouzahir
Copy link

sanaamouzahir commented Feb 27, 2024

馃悰 Bug

Hi, I've been using the Multitask sparse variational Gaussian process framework of Gpytorch to model the velocity on a 2D grid of (150,50) points. The training data corresponds to a time series evolution of this velocity (4800 snapshots.). After training, I've noticed that there are extreme oscillation in the training loss (Variational ELBO), and I have not been able to figure out where this is coming from. In terms of pre processing of the data I've reshaped the time series into (4800,150*50) matrices and made sure to standardize it to match the 0 mean prior assumption. The number of tasks here corresponds to the second column and is 7500.

To reproduce

** Code snippet to reproduce **

fig, ax = plt.subplots(1, 1, figsize=(10, 5))
losses = []
num_tasks=Y_closure_u_test_reshaped.shape[1]
class MultitaskGPModel(gpytorch.models.ApproximateGP):
    def __init__(self,num_latents,num_tasks,n_features,inducing_points_centers):
        num_tasks=Y_closure_u_test_reshaped.shape[1]
        n_features = X_reshaped.size(-1)
        num_latents=20 # 20 BEST
        #num_latents=10
        inducing_points = np.repeat(inducing_points_centers[np.newaxis, :, :], num_latents, axis=0)
        inducing_points = torch.tensor(inducing_points, dtype=torch.float)
        variational_distribution = gpytorch.variational.CholeskyVariationalDistribution(
            inducing_points.size(-2), batch_shape=torch.Size([num_latents])
        )

        # We have to wrap the VariationalStrategy in a LMCVariationalStrategy
        # so that the output will be a MultitaskMultivariateNormal rather than a batch output
        variational_strategy = gpytorch.variational.LMCVariationalStrategy(
            gpytorch.variational.VariationalStrategy(
                self, inducing_points, variational_distribution, learn_inducing_locations=True
            ),
            num_tasks=num_tasks,
            num_latents=num_latents,
            latent_dim=-1
        )

        super().__init__(variational_strategy)
        self.mean_module = gpytorch.means.ConstantMean(batch_shape=torch.Size([num_latents]))
        self.covar_module = gpytorch.kernels.ScaleKernel(
            gpytorch.kernels.RBFKernel(batch_shape=torch.Size([num_latents])),
            batch_shape=torch.Size([num_latents])
        )        
    def forward(self, x):
   
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

num_tasks=Y_closure_u_test_reshaped.shape[1] 
n_features = X_reshaped.size(-1)
num_latents=20 
num_epochs=10000

model = MultitaskGPModel(num_latents,num_tasks,n_features,inducing_points_centers).to(device)
likelihood = gpytorch.likelihoods.MultitaskGaussianLikelihood(num_tasks=num_tasks).to(device)
model.train()
likelihood.train()
optimizer = torch.optim.Adam([
    {'params': model.parameters()},
    {'params': likelihood.parameters()},
], lr=0.01)

mll = gpytorch.mlls.VariationalELBO(likelihood, model, num_data=Y_closure_u_reshaped.size(0))
losses = []
for epoch in tqdm.notebook.tqdm(range(num_epochs), desc=f"Epoch (LR={0.01})"):
    optimizer.zero_grad()
    output = model(X_reshaped)
    loss = -mll(output, Y_closure_u_reshaped)
    if loss.item()<=-11000: 
        break
    losses.append(loss.item())
    loss.backward()
    #torch.cuda.empty_cache()
    optimizer.step()

%time

** Stack trace/error message **

// Paste the bad output here!

Expected Behavior

System information

Please complete the following information:

  • 1.11
  • 1.13.1
  • Windows

Additional context

I've attached the oscillation of the loss below. I don't understand why I have such huge oscillations. Anyone know what I could do to regularize the training process and overcome these huge oscillations?
training_loss

@sanaamouzahir sanaamouzahir changed the title [Bug] [Bug] Extreme loss oscillation during training Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant