New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions #21
Comments
Hi @jingw222, Thanks a lot, I'm really glad you find my book useful. :) Regarding your first question, there's an error in the text, sorry about that, it should read "on the last mini-batch and on the full test set" (it's listed in the errata page). You could evaluate the training error on the full training set rather than just on the last mini-batch, the only reason I did not do that is because it was very slow, but it would of course be more accurate. Regarding your second question, you have two options when implementing
Option 1 seems better, but in practice option 2 is actually just as good and simpler to implement. Hope this helps! |
Thanks a lot for your detailed explanation. I cannot be more grateful for that. Here's my dummy example: Let's just assume we have a training set with index So, then |
You're welcome. You're doing it right, this is just one possible technique (option 2), which indeed means that a particular batch may have the same instance multiple times (but it will be very rare for large datasets and small batches). If you want to implement option 1 instead, here's one approach: import numpy as np
# Create dummy training set
m = 50
n = 2
X_train = np.arange(m * n).reshape(m, n)
y_train = np.arange(m)
# Start training
n_epochs = 3
n_batches = 10
for epoch in range(n_epochs):
indices = np.random.permutation(50)
for batch_indices in np.array_split(indices, n_batches):
X_batch = X_train[batch_indices]
y_batch = y_train[batch_indices]
print(X_batch, y_batch) # train on batch |
Great help! I really learned a lot. Thank you very much indeed. And I'm going to continue to dive into the book. :D |
You're welcome! Closing this issue. |
I've been following along your wonderful book for quite a while, and it's a great treasure for the whole community. But there're a couple of questions I've run into recently:
In Execution Phase of Char 10, it says 'Next, at the end of each epoch, the code evaluates the model on the last mini-batch and on the full training set, and it prints out the result. Finally, the model parameters are saved to disk.' However, the code evaluating the training set is only feeded with the last
batch_size
training data, which confuses me a lot. Why do you evaluate only the last batch training set instead of the whole training set(X_train, y_train)
?It always baffles me that how do make sure all the batches in one epoch make up the whole training set? I don't know how
mnist.train.next_batch()
inside, but could you explain a little bit about thefetch_batch()
:What are the batches you sample with the seeds like? To be more specific, do the batches have overlapping instances? And is it correct to assume that all the batches in one epoch consist of all the samples?
The text was updated successfully, but these errors were encountered: