IndexError When lags is greater than number of steps skforecast==0.4.3 #151

JoaquinAmatRodrigo · 2022-04-27T08:39:09Z

Another beginner question - what are the conditions for
refit = True?

I have below error:

d:\programy\miniconda3\lib\site-packages\skforecast\ForecasterAutoreg\ForecasterAutoreg.py in _recursive_predict(self, steps, last_window, exog)
405
406 for i in range(steps):
--> 407 X = last_window[-self.lags].reshape(1, -1)
408 if exog is not None:
409 X = np.column_stack((X, exog[i, ].reshape(1, -1)))

IndexError: index -6 is out of bounds for axis 0 with size 4

If it is important from input side I have following data:

data.shape (50,)
data_train.shape (37,)
data_test.shape (13,)
steps = 13
initial lags: lags = int(data_train.shape[0]*0.4) = 14

whole grid search looks like that:

forecaster_rf = ForecasterAutoreg(
                    regressor = XGBRegressor(verbosity=1),
                    lags = lags
             )

param_grid = {
            'gamma': [0.5, 1, 1.5, 2, 5],
            'subsample': [0.6, 0.8, 1.0],
            'colsample_bytree': [0.6, 0.8, 1.0],
            'max_depth': np.arange(2, 22, 2)
            }

lags_grid = [6, 12, lags, [1, 3, 6, 12, lags]]

below lags throws an error too:
lags_grid = np.arange(1, 3, 1)
lags_grid = [1]

metric = mean_squared_log_error

results_grid = grid_search_forecaster(
                        forecaster         = forecaster_rf,
                        y                  = data_train,
                        param_grid         = param_grid,
                        steps              = steps,
                        metric             = metric,
                        refit              = True,
                        initial_train_size = int(len(data_train)*0.5),
                        return_best        = True,
                        verbose            = True
                   )

Originally posted by @spike8888 in #137 (comment)

The text was updated successfully, but these errors were encountered:

spike8888 · 2022-05-05T11:54:40Z

Hi!

has anyone time and chance to look at this problem?

JoaquinAmatRodrigo · 2022-05-06T10:51:12Z

Hi @spike8888,
This error is probably due to a bug in the piece of code that stores the values of last window. We are trying to identify and solve it.

JavierEscobarOrtiz · 2022-05-09T19:16:23Z

Hi @spike8888,

The error occurs when max_lag > observations used for training. In your example:

max_lag = 12
initial_train_size = 18

Therefore, the number of observations used in fit is 18 - 12 = 6.

Since last_window only stored the number of observations used in fit, 6 in this case, the function returns an error because it needs the last 12 values to predict the step n+1.

We fixed it in version 0.5.0. We are still developing this version but you can install it from GitHub using in the shell:

pip install git+https://github.com/JoaquinAmatRodrigo/skforecast@0.5.x

Please, note that some features are still under development, like bayesian_search_forecaster, inside this release. But, whatever you do with the previous versions, should work in the new one.

spike8888 · 2022-05-19T20:54:12Z

Thank you very much for an answer. I will check it out soon.

spike8888 · 2022-05-24T11:12:58Z

I checked it out. Error gone, it seems there is stop rule in the code which is somewhat dangerous because in my case grid search stopped after 2 model calculated.
Please consider displaying warning informing that considering mix of lags and steps not all combinations will be calculated
Is there a function that can return max_lag based on the data?

JavierEscobarOrtiz · 2022-05-25T11:25:34Z

Hello @spike8888, Could you show an example of your grid_search? I didn't understand your problem.

Regarding max_lag, the training matrix will have a length equal to len(y) - max_lag. So, in an extreme case, if your serie y has 50 data points and you use a max_lag = 48 you will only have 2 rows to train your model.

spike8888 · 2022-06-11T22:36:55Z

It seems I do not understand whole concept of lags. Are they used to predict next step (next value I want to predict)? If so why we put whole history as training much greater then lags?

JavierEscobarOrtiz · 2022-06-13T08:28:00Z

Hello @spike8888,

You can find a good explanation about lags and the training matrix in the documentation or even googling it.

To summarize, in an autoregressive model the model is trained with his past behavior. If you use for example lags=3 it will take the 3 steps before each point to train the model. The function create_train_X_y can help you to understand this:

# Create a forecaster with lags=3
# ==============================================================================
forecaster = ForecasterAutoreg(
                    regressor = RandomForestRegressor(random_state=123),
                    lags      = 3
             )

# Create a series with 10 points
# ==============================================================================
y = pd.Series(np.arange(10))

display(forecaster.create_train_X_y(y=y)[1])

Then we can print the training matrix.

X:

forecaster.create_train_X_y(y=y)[0]

	lag_1	lag_2	lag_3
3	2	1	0
4	3	2	1
5	4	3	2
6	5	4	3
7	6	5	4
8	7	6	5
9	8	7	6

y:

forecaster.create_train_X_y(y=y)[1]

	y
3	3
4	4
5	5
6	6
7	7
8	8
9	9

JoaquinAmatRodrigo · 2022-09-24T09:23:31Z

Fixed it in version 0.5.0.

JavierEscobarOrtiz mentioned this issue Jun 11, 2022

IndexError When lags is greater than number of steps #164

Closed

JavierEscobarOrtiz changed the title ~~refit = True gridsearch forecaster~~ IndexError When lags is greater than number of steps skforecast==0.4.3 Jun 20, 2022

JavierEscobarOrtiz added the bug Something isn't working label Jun 20, 2022

JoaquinAmatRodrigo closed this as completed Sep 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndexError When lags is greater than number of steps skforecast==0.4.3 #151

IndexError When lags is greater than number of steps skforecast==0.4.3 #151

JoaquinAmatRodrigo commented Apr 27, 2022

spike8888 commented May 5, 2022

JoaquinAmatRodrigo commented May 6, 2022

JavierEscobarOrtiz commented May 9, 2022 •

edited

spike8888 commented May 19, 2022

spike8888 commented May 24, 2022

JavierEscobarOrtiz commented May 25, 2022

spike8888 commented Jun 11, 2022

JavierEscobarOrtiz commented Jun 13, 2022

JoaquinAmatRodrigo commented Sep 24, 2022

IndexError When lags is greater than number of steps skforecast==0.4.3 #151

IndexError When lags is greater than number of steps skforecast==0.4.3 #151

Comments

JoaquinAmatRodrigo commented Apr 27, 2022

spike8888 commented May 5, 2022

JoaquinAmatRodrigo commented May 6, 2022

JavierEscobarOrtiz commented May 9, 2022 • edited

spike8888 commented May 19, 2022

spike8888 commented May 24, 2022

JavierEscobarOrtiz commented May 25, 2022

spike8888 commented Jun 11, 2022

JavierEscobarOrtiz commented Jun 13, 2022

JoaquinAmatRodrigo commented Sep 24, 2022

JavierEscobarOrtiz commented May 9, 2022 •

edited