Skip to content

Shuffling produces empirically worse results #28

@alecgunny

Description

@alecgunny

When we shuffle the training set, DeepClean fails to converge. This seems to go against most DL training intuition.

Think of the batch as its own sort of "meta-sample" used for optimization, composed of smaller individual samples whose individual information contributions are averaged during the computation of the gradient for backpropagation. When we randomly shuffle the dataset, we create combinatorially many meta-samples that each average different information and produce diverse gradient updates, helping to combat overfitting. When we batch things sequentially, we're essentially downsizing our dataset by a factor of the batch size, forcing the network to learn from the same information over and over again.

It would be really great to get an understanding of why we're observing this phenomenon, because it does make it feel as if there's some performance we're leaving on the table by not understanding it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    dataResearch topic about data used to train DeepCleanresearch topicQuestion about DeepClean optimization and interpretation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions