You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello Petronio, I noticed that Chen's conventional result looks “suspiciously” good as @wangtieqiao shared #25 . It appears that the plot is comparing the predicted t+1 values with the current time series T.
Interestingly, I encountered a similar issue while working with the library and a pwfts model of order 1. My validation plots (see image below) shows a great fit of my predicted values on the actual values (too great id say..) but the RMSE calculated using Measures.get_point_statistics turned out to be unexpectedly high at 2.32 compare to my other fit.
Here is my code:
def Cash_in(train_set, valid_set):
rows = []
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=[12, 8])
y_val = pd.Series(valid_set['Scale_Montant'])
y_train = pd.Series(train_set['Scale_Montant'])
ax.plot(y_val.values, label ='Validation',color='black')
for method in [pwfts.ProbabilisticWeightedFTS]:
for partitions in [Grid.GridPartitioner]:
for npart in [4]:
for order in [1]:
part = partitions(data=y_train.values, npart=npart, transformation=diff)
model = method(order=order, partitioner=part)
model.append_transformation(diff)
model.name = model.shortname + str(partitions).replace('>', '').replace('<', '').replace('class', '') +str(npart) + str(order)
model.fit(y_train.values)
# Validation forecast
forecasted_values_valid = model.predict(y_val.values)
# Plot the fitted values of the train set against the actual
ax.plot(np.array(forecasted_values_valid), label = str(model.shortname) + str(partitions) + str(npart) + " partitions" + str(order)+ ' order', color= 'blue')
ax.set_title('Validation')
# Performance measure on the validation set
rmse_v, mape_v, u_v = Measures.get_point_statistics(y_val.values, model)
rows.append([model.shortname, str(partitions).replace('>', '').replace('<', '').replace('class', ''), npart, order, rmse_v, mape_v, u_v])
handles, labels = ax.get_legend_handles_labels()
lgd = ax.legend(handles, labels, loc=1, bbox_to_anchor=(1, 1))
plt.show()
result_cash_in = pd.DataFrame(rows, columns=['Model', 'partitions_techniques','#_partitions', 'order','RMSE_Valid', 'MAPE_Valid', 'U_Valid'])
pd.set_option('max_colwidth', None)
return result_cash_in, forecasted_values_valid
Weekly_Cash_in_models, forecasts_df_valid= Cash_in(train_set, valid_set)
To investigate further, I manually computed the forecasted value on my validation set using this formula coming from #25:
Surprisingly, when computing it by hand using my forecasted array as is, it gave me a different RMSE of 1.04. Trying to figure out what was going on, I decided to add a none to the first observations of my forecast using:
for k in np.arange(order):
forecasted_values_valid.insert(0,None)
which effectively shifted the forecast array one position to the right. After doing this, I recalculated the RMSE and got a value of 2.34, much close to the 2.32 using get_point_statistic.
It turns out that the issue was caused by me not inserting none in the forecast array at the index [model.order:] (in my case, at index = 1). I didn't insert none for orders lower than 1 because I was following an example from one of your notebooks: [Link to the notebook].
for order in np.arange(1,4):
part = Grid.GridPartitioner(data=y, npart=10)
model = hofts.HighOrderFTS(order=order, partitioner=part)
model.fit(y)
forecasts = model.predict(y)
if order > 1:
for k in np.arange(order):
forecasts.insert(0,None)
I've noticed some "contradictory" information while going through various notebooks and the pyFTS tutorial ([Link to the tutorial](https://sbic.org.br/lnlm/wp-content/uploads/2021/12/vol19-no2-art3.pdf)). Some sources suggest that we need to insert none from [model.order:] even at order 1.
As I understand it, the parameter "order" represents the minimum number of lags used to predict the next observations. In the case of an order of 1, you can't have a predicted value at the first observation of the array since the first observation is used to predict the second one.
When examining one of the notebooks (picture below), it seems that not assigning none to the observations at positions [model.order:] causes the plots of fitted values to shift by 2 to the left. Thus rendering the graph invalid. I believe this could be what is happening in the Chen Conventional notebook.
I would greatly appreciate some clarification on why some notebooks recommend assigning none at the position [model.order:], while others do not. It's a bit confusing, especially when different examples use different manipulations on the same model.
Thank you for your, and I'm looking forward to resolving this confusion of mine.
The text was updated successfully, but these errors were encountered:
Hello Petronio, I noticed that Chen's conventional result looks “suspiciously” good as @wangtieqiao shared #25 . It appears that the plot is comparing the predicted t+1 values with the current time series T.
Interestingly, I encountered a similar issue while working with the library and a pwfts model of order 1. My validation plots (see image below) shows a great fit of my predicted values on the actual values (too great id say..) but the RMSE calculated using Measures.get_point_statistics turned out to be unexpectedly high at 2.32 compare to my other fit.
Here is my code:
To investigate further, I manually computed the forecasted value on my validation set using this formula coming from #25:
Surprisingly, when computing it by hand using my forecasted array as is, it gave me a different RMSE of 1.04. Trying to figure out what was going on, I decided to add a none to the first observations of my forecast using:
which effectively shifted the forecast array one position to the right. After doing this, I recalculated the RMSE and got a value of 2.34, much close to the 2.32 using get_point_statistic.
It turns out that the issue was caused by me not inserting none in the forecast array at the index [model.order:] (in my case, at index = 1). I didn't insert none for orders lower than 1 because I was following an example from one of your notebooks: [Link to the notebook].
I've noticed some "contradictory" information while going through various notebooks and the pyFTS tutorial ([Link to the tutorial](https://sbic.org.br/lnlm/wp-content/uploads/2021/12/vol19-no2-art3.pdf)). Some sources suggest that we need to insert none from [model.order:] even at order 1.
As I understand it, the parameter "order" represents the minimum number of lags used to predict the next observations. In the case of an order of 1, you can't have a predicted value at the first observation of the array since the first observation is used to predict the second one.
When examining one of the notebooks (picture below), it seems that not assigning none to the observations at positions [model.order:] causes the plots of fitted values to shift by 2 to the left. Thus rendering the graph invalid. I believe this could be what is happening in the Chen Conventional notebook.
I would greatly appreciate some clarification on why some notebooks recommend assigning none at the position [model.order:], while others do not. It's a bit confusing, especially when different examples use different manipulations on the same model.
Thank you for your, and I'm looking forward to resolving this confusion of mine.
The text was updated successfully, but these errors were encountered: