Shuffling produces empirically worse results

When we shuffle the training set, DeepClean fails to converge. This seems to go against most DL training intuition.

Think of the batch as its own sort of "meta-sample" used for optimization, composed of smaller individual samples whose individual information contributions are averaged during the computation of the gradient for backpropagation. When we randomly shuffle the dataset, we create combinatorially many meta-samples that each average different information and produce diverse gradient updates, helping to combat overfitting. When we batch things sequentially, we're essentially downsizing our dataset by a factor of the batch size, forcing the network to learn from the same information over and over again.

It would be really great to get an understanding of why we're observing this phenomenon, because it does make it feel as if there's some performance we're leaving on the table by not understanding it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Shuffling produces empirically worse results #28

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Shuffling produces empirically worse results #28

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions