Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address potential overfitting wrt matching data #55

Closed
Radonirinaunimi opened this issue Oct 5, 2022 · 8 comments · Fixed by #56
Closed

Address potential overfitting wrt matching data #55

Radonirinaunimi opened this issue Oct 5, 2022 · 8 comments · Fixed by #56

Comments

@Radonirinaunimi
Copy link
Member

It seems that the models are over-fitting on the Yadism matching data, see the following report for example. We might do something about this.

@RoyStegeman
Copy link
Member

RoyStegeman commented Oct 5, 2022

Why do you think they are overfitted? The training chi2 is ~1 for the yadism data and ~2 for the experimental data, which is what one would expect I think. Namely, yadism data has only one level of statistical fluctuations (that corresponding to the pseudata generation), while the experimental data has the fluctuations from psuedodata generation but this is on top of the fluctuations already present from the fact that the experimental central values are already randomly sampled values (acompanied by some possible inconsistencies due to experiment or theory that may further affect the chi2).

If you want to really test for overfitting you could of course check how the agreement with the fitted yadism data compares to the agreement with some other predictions that are not in the matching dataset, but for now these results don't really worry me too much.

@Radonirinaunimi
Copy link
Member Author

I was indeed expecting the $\chi^2$ of the matching data to be better than the real experimental data. What I was slightly worried about was the very small values of $\chi^{2, \rm exp}_{\rm match}$. It could be that these values are what one would expect, in the sense that this situation is similar to a level 0 CT(?).

@RoyStegeman
Copy link
Member

Is the exp chi2 defined wrt the central value PDF or is it calculated for each PDF and then averaged? In the first case I would indeed expect it to vanish as 1/sqrt(Nrep). In the latter case I am not entirely sure what to expect, but note that also for a regular NNPDF fit the average experimental chi2 is quite a bit lower than the average test/validation losses (with a tiny contribution coming from the t0 prescription being used for exp and not tr/vl losses)

@Radonirinaunimi
Copy link
Member Author

Is the exp chi2 defined wrt the central value PDF or is it calculated for each PDF and then averaged? In the first case I would indeed expect it to vanish as 1/sqrt(Nrep). In the latter case I am not entirely sure what to expect, but note that also for a regular NNPDF fit the average experimental chi2 is quite a bit lower than the average test/validation losses (with a tiny contribution coming from the t0 prescription being used for exp and not tr/vl losses)

Currently, the experimental $\chi^2$ are calculated as the latter, ie calculated for each PDF and then averaged.

@RoyStegeman
Copy link
Member

So the pseudodata by construction has chi2=1 (within stat. fluctuations). In the report it seems that somehow the chi2 defined to the central data is close to 0, while the chi2 defined to the psuedodata is close to 1. This almost gives the impression that the NN doesn't really fit the fluctuations but is rather unaffected by the level-1 noise we introduce when generating pseudodata. Not sure if this is a problem or not (maybe it is, since the uncertainties of the matching are coming from NNPDF4.0 and maybe we want to reproduce them exactly and not get smaller uncertainties), but if anything I would think it's underfitting rather than overfitting. If it is a problem, adding an additional layer of noise (so level-2 data) should place the matching data on the same footing as the real data change the posterior distribution in the matching region

@juanrojochacon
Copy link
Contributor

Hi @RoyStegeman @Radonirinaunimi I would treat yadism and real data on exactly the same footing, so also the yadism pseudo-data should be fluctuated twice wrt to the true values (like in a level 2 closure test, basically).

If we do this, we should find that after the fit, for individual replicas chi2/ndat \sim 2, while for the central prediction averaged over replicas chi2/ndat \sim 1, both for real data and for yadism pseudo-data.

I think this is the correct approach conceptually, and it is also easier to explain.

Does it make sense?

@RoyStegeman
Copy link
Member

I agree that, that seems to be the way to go.

This is indeed what we want to achieve:

the central prediction averaged over replicas chi2/ndat \sim 1

@juanrojochacon
Copy link
Contributor

perfect, then let;s get this done ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants