Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misconfiguration Exception: The provided lr scheduler 'StepLR' doesn't follow PyTorch's LRScheduler API. And Leaked semaphore objects to clean up at shutdown. #524

Closed
cash-mckeeman opened this issue Apr 15, 2023 · 4 comments

Comments

@cash-mckeeman
Copy link

Greetings

I have been trying to go through your tutorials and keep getting this MisconfigurationException (below). I have tried running it locally and on Google Colab. Please let me know if there is an easy fix.

Thanks!
Ryan

import numpy as np import pandas as pd import pytorch_lightning as pl import matplotlib.pyplot as plt

from neuralforecast import NeuralForecast
from neuralforecast.models import MLP
from neuralforecast.losses.pytorch import DistributionLoss, Accuracy
from neuralforecast.tsdataset import TimeSeriesDataset
from neuralforecast.utils import AirPassengers, AirPassengersPanel, AirPassengersStatic

AirPassengersPanel['y'] = 1 * (AirPassengersPanel['trend'] % 12) < 2
Y_train_df = AirPassengersPanel[AirPassengersPanel.ds<AirPassengersPanel['ds'].values[-12]].reset_index(drop=True) # 132 train
Y_test_df = AirPassengersPanel[AirPassengersPanel.ds>=AirPassengersPanel['ds'].values[-12]].reset_index(drop=True) # 12 test

model = MLP(h=12,
input_size=24,
loss=DistributionLoss(distribution='Bernoulli', level=[80, 90], return_params=True),
valid_loss=Accuracy(),
stat_exog_list=['airline1'],
scaler_type='robust',
max_steps=200,
early_stop_patience_steps=2,
val_check_steps=10,
learning_rate=1e-3)

fcst = NeuralForecast(models=[model], freq='M')
fcst.fit(df=Y_train_df, static_df=AirPassengersStatic, val_size=12)
forecasts = fcst.predict(futr_df=Y_test_df)

Plot quantile predictions

Y_hat_df = forecasts.reset_index(drop=False).drop(columns=['unique_id','ds'])
plot_df = pd.concat([Y_test_df, Y_hat_df], axis=1)
plot_df = pd.concat([Y_train_df, plot_df])

plot_df = plot_df[plot_df.unique_id=='Airline1'].drop('unique_id', axis=1)
plt.plot(plot_df['ds'], plot_df['y'], c='black', label='True')
plt.plot(plot_df['ds'], plot_df['MLP-median'], c='blue', label='median')
plt.fill_between(x=plot_df['ds'][-12:],
y1=plot_df['MLP-lo-90'][-12:].values,
y2=plot_df['MLP-hi-90'][-12:].values,
alpha=0.4, label='level 90')
plt.legend()
plt.grid()
plt.plot()

Error:

MisconfigurationException
Traceback (most recent call last)
in <cell line: 28>()
26
27 fcst = NeuralForecast(models=[model], freq='M')
---> 28 fcst.fit(df=Y_train_df, static_df=AirPassengersStatic, val_size=12)
29 forecasts = fcst.predict(futr_df=Y_test_df)
30

10 frames
/usr/local/lib/python3.9/dist-packages/pytorch_lightning/core/optimizer.py in _validate_scheduler_api(lr_scheduler_configs, model)
348
349 if not isinstance(scheduler, LRSchedulerTypeTuple) and not is_overridden("lr_scheduler_step", model):
--> 350 raise MisconfigurationException(
351 f"The provided lr scheduler {scheduler.__class__.__name__} doesn't follow PyTorch's LRScheduler"
352 " API. You should override the LightningModule.lr_scheduler_step hook with your own logic if"

MisconfigurationException: The provided lr scheduler StepLR doesn't follow PyTorch's LRScheduler API. You should override the LightningModule.lr_scheduler_step hook with your own logic if you are using a custom LR scheduler.

@kdgutier
Copy link
Collaborator

kdgutier commented Apr 15, 2023

Hey @cash-mckeeman,

Is your bug is related to this past Issue #509?
I think they have a solution there

@cash-mckeeman
Copy link
Author

I tried the solution listed in Issue #509. After installing the latest from the GitHub main branch, I got a new error:

NotImplementedError: Support for validation_epoch_end has been removed in v2.0.0. NHITS implements this method. You can use the on_validation_epoch_end hook instead. To access outputs, save them in-memory as instance attributes. You can find migration examples in Lightning-AI/pytorch-lightning#16520.
2023-04-15 19:54:13,782 ERROR tune.py:794 -- Trials did not complete: [train_tune_f5a97_00000, train_tune_f5a97_00001, train_tune_f5a97_00002, train_tune_f5a97_00003, train_tune_f5a97_00004]
2023-04-15 19:54:13,789 WARNING experiment_analysis.py:621 -- Could not find best trial. Did you pass the correct metric parameter?

@cash-mckeeman
Copy link
Author

Hey @kdgutier - I tried the rest of the solution on Issue #509 and I think I got the dependencies fixed. I am now getting a new error:

UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown.

Based on my research, this could mean that my local machine ran out of memory trying to run the nf.fit() method for AutoNHITS and AutoTFT models. I see in your documentation that you have run nueuralforecast on ECR in AWS. Is there a link to review the configuration of the ECR and the Docker image that was used?

@kdgutier
Copy link
Collaborator

kdgutier commented Apr 20, 2023

Hey @cash-mckeeman,

The semaphore objects bug arises when the RAM is saturated. See #513.
My belief is that the new Pytorch 2.0 is using multiprocess that leaks memory when RAM is exceeded.

What solved the issue for me was to decrease the memory usage, you can do it through valid_batch_size, windows_batch_size, and decreasing the models hidden units. Can you confirm if this solves the issue?

On another note I would recommend you to use a GPU for computationally intensive methods like TFT.

@kdgutier kdgutier changed the title Misconfiguration Exception: The provided lr scheduler 'StepLR' doesn't follow PyTorch's LRScheduler API. Misconfiguration Exception: The provided lr scheduler 'StepLR' doesn't follow PyTorch's LRScheduler API. 1 leaked semaphore objects to clean up at shutdown. Apr 20, 2023
@kdgutier kdgutier changed the title Misconfiguration Exception: The provided lr scheduler 'StepLR' doesn't follow PyTorch's LRScheduler API. 1 leaked semaphore objects to clean up at shutdown. Misconfiguration Exception: The provided lr scheduler 'StepLR' doesn't follow PyTorch's LRScheduler API. And Leaked semaphore objects to clean up at shutdown. Apr 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants