Treat yadism matching data as level-2 closure test data #56

RoyStegeman · 2022-10-07T15:19:14Z

This accounts for the fact that unlike experimental data, matching data doesn't contain sampling fluctuations. With this implemented the spread of both matching and experimental data is of similar scale/order.

Resolves #55

This accounts for the fact that unlike experimental data, matching data doesn't contain sampling fluctuations. With this implemented the spread of both matching and experimental data is of similar scale/order.

Radonirinaunimi · 2022-10-07T16:31:56Z

@RoyStegeman Thanks for this! I will run a fit a bit later to check how much this would change the fit.

Radonirinaunimi · 2022-10-10T14:34:40Z

The fit results are available here https://data.nnpdf.science/NNUSF/reports/addnoise-matching-1c526f9-221010/output/

PS: There are now new entries in the summary table expr: $\chi^2_{\rm exp}$ for the real data, and expt: $\chi^2_{\rm exp}$ for the combined real and matching data.

RoyStegeman · 2022-10-10T15:11:45Z

Thanks @Radonirinaunimi. I get the impression that we're underfitting a bit (at least in some region). 1) because the <chi2>/N of the matching data is still so small, suggesting that the fit is too smooth and therefore the data unceratinties are not properly propagated to the fit and 2) because expr chi2 is rather much larger than 1 suggesting the data is not properly reproduced.

In particular point 2 is not a super strong argument since I don't know what we can reasonable obtain, but still it may be interesting to see if a more aggressive optimization could improve things. Do you maybe have plots of the SF predictions? Those might give us some intuition as to whether we are indeed underfitting.

Radonirinaunimi · 2022-10-11T06:20:35Z

My guess on what might be happening is that the fit is more biased towards the Yadism datasets, that is if one was to add weights to the real data the $\chi^2_{\rm real}$ would improve.

We do indeed have plots of the SF predictions, I will post them in the slack, but the plots look very similar to before.

RoyStegeman · 2022-10-11T06:35:58Z

My guess on what might be happening is that the fit is more biased towards the Yadism datasets, that is if one was to add weights to the real data the

This could also be the case, though while the exp chi2 of the matching is much better than that of the real data, this is not the case (at least to the same extend) for the training chi2. I think to answer the question of whether the fit is biased towards yadsim data, the training chi2 is more relevant than the exp chi2.

We do indeed have plots of the SF predictions, I will post them in the slack, but the plots look very similar to before

Okay perfect. Curious to see what they look like. P.S. do you think we should start including SF replica plots as well?

Radonirinaunimi · 2022-10-11T07:30:18Z

This could also be the case, though while the exp chi2 of the matching is much better than that of the real data, this is not the case (at least to the same extend) for the training chi2. I think to answer the question of whether the fit is biased towards yadsim data, the training chi2 is more relevant than the exp chi2.

True! This then indicates that the fluctuation is somehow the main problem here.

RoyStegeman · 2022-10-11T07:39:49Z

True! This then indicates that the fluctuation is somehow the main problem here.

Exactly, that's why I would be interested to know if we are underfitting or not. As said before, in principle underfitting could explain the both poor chi2 of the experimental data (for obvious reasons), and on the yadism side, a lack of fluctuations related to the fitting of the pseudodata replicas might also explain the low chi2 of the central data.

Just a hypothesis at this stage of course, but I think it's plausible.

Radonirinaunimi · 2022-10-11T12:56:05Z

True! This then indicates that the fluctuation is somehow the main problem here.

Exactly, that's why I would be interested to know if we are underfitting or not. As said before, in principle underfitting could explain the both poor chi2 of the experimental data (for obvious reasons), and on the yadism side, a lack of fluctuations related to the fitting of the pseudodata replicas might also explain the low chi2 of the central data.

Just a hypothesis at this stage of course, but I think it's plausible.

I will explore this and will be running various fits, so far also we don't have any other plausible causes and solutions.

Previousely the level-1 shift was changed replica-by-replica. This should not be done because it corresponds to a covmat different from the experimental one. While freezing level-1 fluctuations produces a central value that differs from the experimental value, this is not a problem as our methodology accounts for this (since it also occurs in experimental measurements).

Radonirinaunimi · 2022-10-15T09:50:03Z

nnusf/src/nnusf/sffit/load_data.py

Lines 59 to 66 in 5db6400

    
           # If needed, generate the level-1 shift 
        
           shift_data = 0 
        
           if dataset_name.endswith('_MATCHING') and shift: 
        
               np_rng_state = np.random.get_state() 
        
               np.random.seed(matching_seed) 
        
               random_samples = np.random.randn(dataset.n_data) 
        
               np.random.set_state(np_rng_state) 
        
               shift_data = cholesky @ random_samples

Thanks! This exactly should now treat the pseudodata in the same footing as the real datasets. I will run a fit and check (unless our cluster is again polluted by the ATLAS guy...).

Radonirinaunimi · 2022-10-17T09:41:59Z

In 5b3bd29, I just moved the adding of the L1 level noise to the data.loader module for the easy of computing the $\chi^{2, \rm match}_{\rm exp}$ later as we'd now need to compute the $\chi^2$ wrt L1-fluctuated data.

src/nnusf/sffit/load_data.py

src/nnusf/data/loader.py

RoyStegeman · 2022-10-17T12:03:12Z

If we're confident this is what we want to do, I guess this can be merged?

Co-authored-by: Roy Stegeman <r.stgmn@gmail.com>

Radonirinaunimi · 2022-10-17T14:35:13Z

If we're confident this is what we want to do, I guess this can be merged?

With the couple of replicas already done (the fit is not fully complete yet), the results are as what we'd expect $\chi^{2, \rm match}_{\rm exp} \sim 1$. So, yes, this is exactly what we wanted to do.

RoyStegeman · 2022-10-17T14:44:08Z

Okay good. Well, my reason for merging is that from a purely methodological point of view, this is what we agreed to do. Whether we get chi2~1 or not. If it turns out that there are other problems, and this does not produce the desired results, we can address those problems in a separate PR (and keep the discussions organized).

Radonirinaunimi · 2022-10-18T06:59:22Z

Finally, here is the result of the fit https://data.nnpdf.science/NNUSF/reports/addl1noise-matching-e774de7-221018/output/ . The results are now as we'd expect. On the data comparison plots, the Yadism datapoints are now the fluctuated ones.

Yes, I agree that we should merge this now, any minor change will be added in different places.

Incrase the variance of matching pseudodata

1c526f9

This accounts for the fact that unlike experimental data, matching data doesn't contain sampling fluctuations. With this implemented the spread of both matching and experimental data is of similar scale/order.

RoyStegeman requested a review from Radonirinaunimi October 7, 2022 15:19

Radonirinaunimi mentioned this pull request Oct 10, 2022

Add theory error to matching grids #47

Merged

Radonirinaunimi added 3 commits October 15, 2022 11:50

Merge branch 'main' into increase_matching_variance

5b3bd29

Move L1 noise of matching to loader

b1840dc

Merge branch 'main' into increase_matching_variance

e774de7

RoyStegeman commented Oct 17, 2022

View reviewed changes

src/nnusf/sffit/load_data.py Outdated Show resolved Hide resolved

src/nnusf/data/loader.py Show resolved Hide resolved

fix left-over comment concerning L1 in load_data

e1d101c

Co-authored-by: Roy Stegeman <r.stgmn@gmail.com>

RoyStegeman changed the title ~~Incrase the variance of matching pseudodata~~ Treat yadism matching data as level-2 closure test data Oct 17, 2022

Radonirinaunimi merged commit 5d3ad49 into main Oct 18, 2022

Radonirinaunimi deleted the increase_matching_variance branch October 18, 2022 07:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Treat yadism matching data as level-2 closure test data #56

Treat yadism matching data as level-2 closure test data #56

RoyStegeman commented Oct 7, 2022 •

edited

Loading

Radonirinaunimi commented Oct 7, 2022

Radonirinaunimi commented Oct 10, 2022

RoyStegeman commented Oct 10, 2022 •

edited

Loading

Radonirinaunimi commented Oct 11, 2022

RoyStegeman commented Oct 11, 2022 •

edited

Loading

Radonirinaunimi commented Oct 11, 2022

RoyStegeman commented Oct 11, 2022 •

edited

Loading

Radonirinaunimi commented Oct 11, 2022

Radonirinaunimi commented Oct 15, 2022

Radonirinaunimi commented Oct 17, 2022

RoyStegeman commented Oct 17, 2022

Radonirinaunimi commented Oct 17, 2022

RoyStegeman commented Oct 17, 2022

Radonirinaunimi commented Oct 18, 2022

Treat yadism matching data as level-2 closure test data #56

Treat yadism matching data as level-2 closure test data #56

Conversation

RoyStegeman commented Oct 7, 2022 • edited Loading

Radonirinaunimi commented Oct 7, 2022

Radonirinaunimi commented Oct 10, 2022

RoyStegeman commented Oct 10, 2022 • edited Loading

Radonirinaunimi commented Oct 11, 2022

RoyStegeman commented Oct 11, 2022 • edited Loading

Radonirinaunimi commented Oct 11, 2022

RoyStegeman commented Oct 11, 2022 • edited Loading

Radonirinaunimi commented Oct 11, 2022

Radonirinaunimi commented Oct 15, 2022

Radonirinaunimi commented Oct 17, 2022

RoyStegeman commented Oct 17, 2022

Radonirinaunimi commented Oct 17, 2022

RoyStegeman commented Oct 17, 2022

Radonirinaunimi commented Oct 18, 2022

RoyStegeman commented Oct 7, 2022 •

edited

Loading

RoyStegeman commented Oct 10, 2022 •

edited

Loading

RoyStegeman commented Oct 11, 2022 •

edited

Loading

RoyStegeman commented Oct 11, 2022 •

edited

Loading