Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dividing the dataset into train and test #3

Open
mamadpierre opened this issue Mar 18, 2019 · 3 comments
Open

Dividing the dataset into train and test #3

mamadpierre opened this issue Mar 18, 2019 · 3 comments

Comments

@mamadpierre
Copy link

mamadpierre commented Mar 18, 2019

Two issues related to each other:

  1. As I can see in the code, self.beta is directly responsible for prediction. However, this matrix is being updated recursively up to the end of the process (to be more exact len(sequence)-predictionStep.
    For instance in FOS_ELM we have:
    self.beta = self.beta + np.dot(self.M, np.dot(Ht, targets - np.dot(H, self.beta)))
    and this train process is in a for loop with prediction of the next:
    for i in range(numLags, len(sequence)-predictionStep-1):
    net.train(X[[i], :], T[[i], :])
    Y = net.predict(X[[i+1], :])
    predictedInput[i+1] = Y[-1]
    I believe this update should be continued up to the end of the training, then one needs to feed the testdata only to prediction function.

  2. The whole data is normalized but if you divide the dataset to test and train, you cannot normalize test set. (you can use mean and variance of train set for normalizing test).

Based on these two issues, the NRMSE mentioned at the end of the process is not reliable.

@chickenbestlover
Copy link
Owner

  1. In the concept of online learning, we don't need to divide the dataset into test and train.
    Given a pair of an input and a target, the model first predicts an output using its corresponding input and then the model is trained using its output and the corresponding target. (training does not affect prediction. This is because the prediction is performed before training for all samples.)
    In other words, we can say that new training samples are obtained as time progresses.

  2. You're right. Strictly speaking, the mean and variance of the dataset should also be obtained and calculated in an online manner. In other words, we should update mean and variance whenever a new training sample is obtrained. But I used the mean and the variance which is calculated from the whole dataset, which is my mistake.

As an excuse, since the dataset (NYC Taxi Demands) is a stationary time-series, it can be expected that the change of mean and variance values will be small after a certain length even if they are obtained in an online manner. Therefore, even if the method of calculating the mean and the variance is changed to the online method, the difference of the prediction performance will not be great.

@mamadpierre
Copy link
Author

mamadpierre commented Mar 18, 2019

Thanks for your explanation. I understood what you said. This makes sense in the concept of online prediction.
But if I want to use the method in offline manner, then I should do the procedure I explained.

@athammad
Copy link

Hi,

any example on how to run the algorithm offline, with a train and test set?

Best wishes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants