22-1132/info.json

{
    "abstract": "In this work, we provide a characterization of the feature-learning process in two-layer ReLU networks trained by gradient descent on the logistic loss following random initialization.  We consider data with binary labels that are generated by an XOR-like function of the input features.  We permit a constant fraction of the training labels to be corrupted by an adversary.  We show that, although linear classifiers are no better than random guessing for the distribution we consider, two-layer ReLU networks trained by gradient descent achieve generalization error close to the label noise rate.  We develop a novel proof technique that shows that at initialization, the vast majority of neurons function as random features that are only weakly correlated with useful features, and the gradient descent dynamics `amplify\u2019 these weak, random features to strong, useful features.",
    "authors": [
        "Spencer Frei",
        "Niladri S. Chatterji",
        "Peter L. Bartlett"
    ],
    "emails": [
        "frei@berkeley.edu",
        "niladri@cs.stanford.edu",
        "peter@berkeley.edu"
    ],
    "id": "22-1132",
    "issue": 303,
    "pages": [
        1,
        49
    ],
    "title": "Random Feature Amplification: Feature Learning and Generalization in Neural Networks",
    "volume": 24,
    "year": 2023
}