Address potential overfitting wrt matching data #55

Radonirinaunimi · 2022-10-05T13:03:46Z

It seems that the models are over-fitting on the Yadism matching data, see the following report for example. We might do something about this.

RoyStegeman · 2022-10-05T22:54:04Z

Why do you think they are overfitted? The training chi2 is ~1 for the yadism data and ~2 for the experimental data, which is what one would expect I think. Namely, yadism data has only one level of statistical fluctuations (that corresponding to the pseudata generation), while the experimental data has the fluctuations from psuedodata generation but this is on top of the fluctuations already present from the fact that the experimental central values are already randomly sampled values (acompanied by some possible inconsistencies due to experiment or theory that may further affect the chi2).

If you want to really test for overfitting you could of course check how the agreement with the fitted yadism data compares to the agreement with some other predictions that are not in the matching dataset, but for now these results don't really worry me too much.

Radonirinaunimi · 2022-10-06T07:40:16Z

I was indeed expecting the $\chi^2$ of the matching data to be better than the real experimental data. What I was slightly worried about was the very small values of $\chi^{2, \rm exp}_{\rm match}$. It could be that these values are what one would expect, in the sense that this situation is similar to a level 0 CT(?).

RoyStegeman · 2022-10-06T07:47:11Z

Is the exp chi2 defined wrt the central value PDF or is it calculated for each PDF and then averaged? In the first case I would indeed expect it to vanish as 1/sqrt(Nrep). In the latter case I am not entirely sure what to expect, but note that also for a regular NNPDF fit the average experimental chi2 is quite a bit lower than the average test/validation losses (with a tiny contribution coming from the t0 prescription being used for exp and not tr/vl losses)

Radonirinaunimi · 2022-10-06T08:29:13Z

Is the exp chi2 defined wrt the central value PDF or is it calculated for each PDF and then averaged? In the first case I would indeed expect it to vanish as 1/sqrt(Nrep). In the latter case I am not entirely sure what to expect, but note that also for a regular NNPDF fit the average experimental chi2 is quite a bit lower than the average test/validation losses (with a tiny contribution coming from the t0 prescription being used for exp and not tr/vl losses)

Currently, the experimental $\chi^2$ are calculated as the latter, ie calculated for each PDF and then averaged.

RoyStegeman · 2022-10-06T08:48:14Z

So the pseudodata by construction has chi2=1 (within stat. fluctuations). In the report it seems that somehow the chi2 defined to the central data is close to 0, while the chi2 defined to the psuedodata is close to 1. This almost gives the impression that the NN doesn't really fit the fluctuations but is rather unaffected by the level-1 noise we introduce when generating pseudodata. Not sure if this is a problem or not (maybe it is, since the uncertainties of the matching are coming from NNPDF4.0 and maybe we want to reproduce them exactly and not get smaller uncertainties), but if anything I would think it's underfitting rather than overfitting. If it is a problem, adding an additional layer of noise (so level-2 data) should place the matching data on the same footing as the real data change the posterior distribution in the matching region

juanrojochacon · 2022-10-06T09:00:32Z

Hi @RoyStegeman @Radonirinaunimi I would treat yadism and real data on exactly the same footing, so also the yadism pseudo-data should be fluctuated twice wrt to the true values (like in a level 2 closure test, basically).

If we do this, we should find that after the fit, for individual replicas chi2/ndat \sim 2, while for the central prediction averaged over replicas chi2/ndat \sim 1, both for real data and for yadism pseudo-data.

I think this is the correct approach conceptually, and it is also easier to explain.

Does it make sense?

RoyStegeman · 2022-10-06T09:09:02Z

I agree that, that seems to be the way to go.

This is indeed what we want to achieve:

the central prediction averaged over replicas chi2/ndat \sim 1

juanrojochacon · 2022-10-06T09:17:02Z

perfect, then let;s get this done ;)

RoyStegeman mentioned this issue Oct 7, 2022

Treat yadism matching data as level-2 closure test data #56

Merged

Radonirinaunimi closed this as completed in #56 Oct 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address potential overfitting wrt matching data #55

Address potential overfitting wrt matching data #55

Radonirinaunimi commented Oct 5, 2022

RoyStegeman commented Oct 5, 2022 •

edited

Loading

Radonirinaunimi commented Oct 6, 2022

RoyStegeman commented Oct 6, 2022

Radonirinaunimi commented Oct 6, 2022

RoyStegeman commented Oct 6, 2022

juanrojochacon commented Oct 6, 2022

RoyStegeman commented Oct 6, 2022

juanrojochacon commented Oct 6, 2022

Address potential overfitting wrt matching data #55

Address potential overfitting wrt matching data #55

Comments

Radonirinaunimi commented Oct 5, 2022

RoyStegeman commented Oct 5, 2022 • edited Loading

Radonirinaunimi commented Oct 6, 2022

RoyStegeman commented Oct 6, 2022

Radonirinaunimi commented Oct 6, 2022

RoyStegeman commented Oct 6, 2022

juanrojochacon commented Oct 6, 2022

RoyStegeman commented Oct 6, 2022

juanrojochacon commented Oct 6, 2022

RoyStegeman commented Oct 5, 2022 •

edited

Loading