When the network backpropagates the error it should be passing the learning rate divided by the epoch size. By just passing the learning rate each example is treated as its own epoch causing massive changes in the weights and biases. If the code is run the epoch size would not affect the results because of this issue.