Dividing the dataset into train and test #3

mamadpierre · 2019-03-18T04:27:34Z

Two issues related to each other:

As I can see in the code, self.beta is directly responsible for prediction. However, this matrix is being updated recursively up to the end of the process (to be more exact len(sequence)-predictionStep.
For instance in FOS_ELM we have:
self.beta = self.beta + np.dot(self.M, np.dot(Ht, targets - np.dot(H, self.beta)))
and this train process is in a for loop with prediction of the next:
for i in range(numLags, len(sequence)-predictionStep-1):
net.train(X[[i], :], T[[i], :])
Y = net.predict(X[[i+1], :])
predictedInput[i+1] = Y[-1]
I believe this update should be continued up to the end of the training, then one needs to feed the testdata only to prediction function.
The whole data is normalized but if you divide the dataset to test and train, you cannot normalize test set. (you can use mean and variance of train set for normalizing test).

Based on these two issues, the NRMSE mentioned at the end of the process is not reliable.

The text was updated successfully, but these errors were encountered:

chickenbestlover · 2019-03-18T05:01:43Z

In the concept of online learning, we don't need to divide the dataset into test and train.
Given a pair of an input and a target, the model first predicts an output using its corresponding input and then the model is trained using its output and the corresponding target. (training does not affect prediction. This is because the prediction is performed before training for all samples.)
In other words, we can say that new training samples are obtained as time progresses.
You're right. Strictly speaking, the mean and variance of the dataset should also be obtained and calculated in an online manner. In other words, we should update mean and variance whenever a new training sample is obtrained. But I used the mean and the variance which is calculated from the whole dataset, which is my mistake.

As an excuse, since the dataset (NYC Taxi Demands) is a stationary time-series, it can be expected that the change of mean and variance values will be small after a certain length even if they are obtained in an online manner. Therefore, even if the method of calculating the mean and the variance is changed to the online method, the difference of the prediction performance will not be great.

mamadpierre · 2019-03-18T05:27:42Z

Thanks for your explanation. I understood what you said. This makes sense in the concept of online prediction.
But if I want to use the method in offline manner, then I should do the procedure I explained.

athammad · 2020-11-17T03:55:07Z

Hi,

any example on how to run the algorithm offline, with a train and test set?

Best wishes

ericleonardo mentioned this issue May 27, 2021

A question about the training process #5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dividing the dataset into train and test #3

Dividing the dataset into train and test #3

mamadpierre commented Mar 18, 2019 •

edited

Loading

chickenbestlover commented Mar 18, 2019

mamadpierre commented Mar 18, 2019 •

edited

Loading

athammad commented Nov 17, 2020

Dividing the dataset into train and test #3

Dividing the dataset into train and test #3

Comments

mamadpierre commented Mar 18, 2019 • edited Loading

chickenbestlover commented Mar 18, 2019

mamadpierre commented Mar 18, 2019 • edited Loading

athammad commented Nov 17, 2020

mamadpierre commented Mar 18, 2019 •

edited

Loading

mamadpierre commented Mar 18, 2019 •

edited

Loading