Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError When lags is greater than number of steps skforecast==0.4.3 #151

Closed
JoaquinAmatRodrigo opened this issue Apr 27, 2022 · 9 comments
Labels
bug Something isn't working

Comments

@JoaquinAmatRodrigo
Copy link
Owner

Another beginner question - what are the conditions for
refit = True?

I have below error:

d:\programy\miniconda3\lib\site-packages\skforecast\ForecasterAutoreg\ForecasterAutoreg.py in _recursive_predict(self, steps, last_window, exog)
405
406 for i in range(steps):
--> 407 X = last_window[-self.lags].reshape(1, -1)
408 if exog is not None:
409 X = np.column_stack((X, exog[i, ].reshape(1, -1)))

IndexError: index -6 is out of bounds for axis 0 with size 4

If it is important from input side I have following data:

data.shape (50,)
data_train.shape (37,)
data_test.shape (13,)
steps = 13
initial lags: lags = int(data_train.shape[0]*0.4) = 14

whole grid search looks like that:

forecaster_rf = ForecasterAutoreg(
                    regressor = XGBRegressor(verbosity=1),
                    lags = lags
             )
param_grid = {
            'gamma': [0.5, 1, 1.5, 2, 5],
            'subsample': [0.6, 0.8, 1.0],
            'colsample_bytree': [0.6, 0.8, 1.0],
            'max_depth': np.arange(2, 22, 2)
            }

lags_grid = [6, 12, lags, [1, 3, 6, 12, lags]]

below lags throws an error too:
lags_grid = np.arange(1, 3, 1)
lags_grid = [1]

metric = mean_squared_log_error

results_grid = grid_search_forecaster(
                        forecaster         = forecaster_rf,
                        y                  = data_train,
                        param_grid         = param_grid,
                        steps              = steps,
                        metric             = metric,
                        refit              = True,
                        initial_train_size = int(len(data_train)*0.5),
                        return_best        = True,
                        verbose            = True
                   )

Originally posted by @spike8888 in #137 (comment)

@spike8888
Copy link

Hi!

has anyone time and chance to look at this problem?

@JoaquinAmatRodrigo
Copy link
Owner Author

Hi @spike8888,
This error is probably due to a bug in the piece of code that stores the values of last window. We are trying to identify and solve it.

@JavierEscobarOrtiz
Copy link
Collaborator

JavierEscobarOrtiz commented May 9, 2022

Hi @spike8888,

The error occurs when max_lag > observations used for training. In your example:

max_lag = 12
initial_train_size = 18

Therefore, the number of observations used in fit is 18 - 12 = 6.

Since last_window only stored the number of observations used in fit, 6 in this case, the function returns an error because it needs the last 12 values to predict the step n+1.

We fixed it in version 0.5.0. We are still developing this version but you can install it from GitHub using in the shell:

pip install git+https://github.com/JoaquinAmatRodrigo/skforecast@0.5.x 

Please, note that some features are still under development, like bayesian_search_forecaster, inside this release. But, whatever you do with the previous versions, should work in the new one.

@spike8888
Copy link

Thank you very much for an answer. I will check it out soon.

@spike8888
Copy link

I checked it out. Error gone, it seems there is stop rule in the code which is somewhat dangerous because in my case grid search stopped after 2 model calculated.
Please consider displaying warning informing that considering mix of lags and steps not all combinations will be calculated
Is there a function that can return max_lag based on the data?

@JavierEscobarOrtiz
Copy link
Collaborator

Hello @spike8888, Could you show an example of your grid_search? I didn't understand your problem.

Regarding max_lag, the training matrix will have a length equal to len(y) - max_lag. So, in an extreme case, if your serie y has 50 data points and you use a max_lag = 48 you will only have 2 rows to train your model.

@spike8888
Copy link

It seems I do not understand whole concept of lags. Are they used to predict next step (next value I want to predict)? If so why we put whole history as training much greater then lags?

@JavierEscobarOrtiz
Copy link
Collaborator

Hello @spike8888,

You can find a good explanation about lags and the training matrix in the documentation or even googling it.

To summarize, in an autoregressive model the model is trained with his past behavior. If you use for example lags=3 it will take the 3 steps before each point to train the model. The function create_train_X_y can help you to understand this:

# Create a forecaster with lags=3
# ==============================================================================
forecaster = ForecasterAutoreg(
                    regressor = RandomForestRegressor(random_state=123),
                    lags      = 3
             )

# Create a series with 10 points
# ==============================================================================
y = pd.Series(np.arange(10))

display(forecaster.create_train_X_y(y=y)[1])

Then we can print the training matrix.

X:

forecaster.create_train_X_y(y=y)[0]
lag_1 lag_2 lag_3
3 2 1 0
4 3 2 1
5 4 3 2
6 5 4 3
7 6 5 4
8 7 6 5
9 8 7 6

y:

forecaster.create_train_X_y(y=y)[1]
y
3 3
4 4
5 5
6 6
7 7
8 8
9 9

@JavierEscobarOrtiz JavierEscobarOrtiz changed the title refit = True gridsearch forecaster IndexError When lags is greater than number of steps skforecast==0.4.3 Jun 20, 2022
@JavierEscobarOrtiz JavierEscobarOrtiz added the bug Something isn't working label Jun 20, 2022
@JoaquinAmatRodrigo
Copy link
Owner Author

Fixed it in version 0.5.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants