Misc changes by scarlehoff · Pull Request #6 · FPPDF/fppdf

scarlehoff · 2025-05-20T14:00:41Z

I've moved things a bit around, the structure I'm trying to achieve is more or less the following:

A read-only class holding all information that needs to be pass around (to avoid modifying a global state which makes things complicated) and which takes care of the nnpdf stuff (prepare the covariance matrix, data, replicas, training/validation split and so on)
A PDF class where all that needs to be given is a function with a signature:

   (flavour, parameters, xgrid) -> pdf_{i} (parameters ; xgrid)

which fwiw doesn't even need to be python.

The rest (minimization algorithm, hessian part, glue) will remain more or less the same other than what needed to change for the two points above.

With the idea that one can give an nnpdf runcard + a config file from here and then you get a fixpar version of what you would get from nnpdf.

To run with this branch you need this PR from NNPDF: NNPDF/nnpdf#2320 (it includes the speed up for the convolutions I mentioned the other day @tgiani). The long minimization takes about 1 hour in my PC with 8 cores.

LucianHL · 2025-05-20T14:51:36Z

Thanks @scarlehoff ! I will give this a go soon myself and let you know

scarlehoff · 2025-05-21T12:57:51Z

One question @LucianHL this diff_2 variable https://github.com/LucianHL/fixpar_nnpdf/blob/ef15a43c594cce182a67abc97c5e78059768fc23/fixpar_nnpdf.py#L431
can be set always to False?

I see that it is set to True for the direct fit to pseudodata but why would the derivative need to be different in that case?

(I'm also wondering about the pos_const flag, which as far as I can see i sset to false but the positivity is still being computed?

Removing both of them would simplify the code quite a bit)

LucianHL · 2025-05-21T13:10:39Z

Hi yes the diff_2 flag is not used for anything anymore - it existed for earlier incarnations of the code where the finite differences were done at the level of the theory and if true did more steps for this. Your update does this automatically and so that is not needed.

I am not sure about pos_const though - I am pretty sure that if that is true then positivity is imposed but not otherwise? It e.g. is not in the test runs I provided most recently.

scarlehoff · 2025-05-21T13:18:06Z

The output of e.g. the longmin config file is:

chi2(exp) in: 325255.73895760323 70.32556518002232
chi2(t0) in: 319530.5202672612 69.0876800577862
pos penalty in =  95039.3317379255 20.549044700092
chi2(exp) out: 63546.98114224037 13.73988781453846
chi2(exp) out (no pos pen): 5328.013632216906 1.1520029475063582
chi2(t0) out: 63966.03248814855 13.830493510951039
chi2(t0) out (no pos pen): 5747.06497812508 1.2426086439189363
pos penalty out =  58218.96751002347 12.5878848670321
time=  14207.790558556

does that mean that the constraint is computed but not utilized?

LucianHL · 2025-05-21T13:25:46Z

Ah I see what you mean - no it is not imposed in the minimisation by default at the moment. The key line is:

https://github.com/LucianHL/fixpar_nnpdf/blob/ef15a43c594cce182a67abc97c5e78059768fc23/fixpar_nnpdf.py#L417

and so nnpdf_pos is the overall flag that has to be set to true in the config file for this to happen. What you are seeing in the log at the end is the output of the fit, which does show the positivity penalty - even if this is not imposed during the minimisation (which you can also see it is not from the huge positivity penalty!). See around l. 561 where that is happening.

I have to head out to give a talk now so will be offline for the rest of the afternoon but can pick up any further queries after that.

LucianHL · 2025-05-22T13:13:32Z

Hi, sorry a rather basic query - I have so far been using conda to install the nnpdf code so far and am not certain of the easiest way to get that to work with your PR?

scarlehoff · 2025-05-22T18:38:40Z

What you are seeing in the log at the end is the output of the fit, which does show the positivity penalty - even if this is not imposed during the minimisation

I see, than that's still missing here. I'll add it, thanks!

Hi, sorry a rather basic query - I have so far been using conda to install the nnpdf code so far and am not certain of the easiest way to get that to work with your PR?

You can install the specific branch with pip, so either cloning the repository -> checking out the branch -> pip install -e .

or just pip install git+https://github.com/NNPDF/nnpdf.git@speed_up_convolutions

(in the conda environment).

What I usually do is to install with conda just lhapdf and the rest with pip.

scarlehoff · 2025-05-23T10:00:51Z

So, @LucianHL

I'm looking at this,
https://github.com/LucianHL/fixpar_nnpdf/blob/ef15a43c594cce182a67abc97c5e78059768fc23/src/chi2s.py#L1897

I see that when pos_const is active, there are these two loops in which the chiarrm and chiarr arrays get filled, but then only the chiarr is used (so, if I understand correctly, the jacobian d(positivity)/dparameters is summed to d(chi2)/dparameters)

Then, chiarrm is simply discarded and thus this array here, https://github.com/LucianHL/fixpar_nnpdf/blob/ef15a43c594cce182a67abc97c5e78059768fc23/src/chi2s.py#L164C17-L164C30 pdflabel_marr is not necessary.

Is that correct? (I guess this is part of jaccalc_"oldmin" only?)

(just to say, no rush at all!! as you can see i can only work intermitently...)

LucianHL · 2025-05-23T11:03:19Z

No problem - you are doing all the work at the moment so I'm happy to answer questions as quickly as I can! Yes that is right that chiarrm. But unfortunately things are a bit messy here...

For the 'oldmin' it seems I have deleted a necessary few lines of code which actually use chiarrm in the original old jaccalc(. This should be the positivity constraint with "PDF pars - eps" and then the difference is taken to give the derivative. As things change very quickly wrt the positivity constraints (if you are in a negative region) by default for the old minimisation this was done with both a plus and minus eps always - and also I think because this part was not so time consuming anyway, so it was easier to be a bit more accurate. I have commited the change to jaccalc( in the main branch for future reference, even if this `oldmin' is not actually used anymore.
For the new minimisation, when ip is greater than zero at

https://github.com/LucianHL/fixpar_nnpdf/blob/ef15a43c594cce182a67abc97c5e78059768fc23/src/chi2s.py#L1898

this should correspond to the PDF being fed in being dPDF/dpar. So by default that should I think give the correct derivative for the positivity if the PDF is negative (in which case the penalty is just ~ the PDF anyway) but won't when it is positive (where the - small - penalty is not as simple). The upshot is that having updating the manner in which PDF derivative where taken with my code, I had not so far got on to updating this for the case where minimisation is imposed. It should be pretty straightforward to do that though, so I can certaintly look at that.

Of course that would be for the sake of cross checking alone, as you are always taking numerical derivative wrt the whole theory (not just the PDFs) in the new code. So nothing - real theory vs. positivity - should be treated differently in your case.

scarlehoff · 2025-05-23T14:11:04Z

you are doing all the work at the moment

I'm doing a lot of buraucracy and meetings these days, this is the fun part :__

but won't when it is positive (where the - small - penalty is not as simple)

so, the reason we use elu in the NNPDF code is that we hit a few infinities for some optimizers and that fixed it (instead of just having a rectifier). So if the positive part is problematic here we can just set it to 0.

Ok, knowing this I'll try to add the positivity "naively" without looking at what was done before.

LucianHL · 2025-05-23T14:15:32Z

Sounds good, thanks!

so, the reason we use elu in the NNPDF code is that we hit a few infinities for some optimizers and that fixed it (instead of just having a rectifier). So if the positive part is problematic here we can just set it to 0.

Yes actually on reflection if you are feeding in simply dPDF/dpar to the positivity then it should be fine to just have this as 0 if the PDF is positive and ~ PDF if it is negative. I guess the issue comes in when you take finite differences, if these happen to lie across the point where the PDF is zero.

Anyway yes as you say you don't really need to know any of this for what you are doing. From my point of view though I will update the newmin version with this so at least we still have a version to compare against.

directly the datasets whenever possible

scarlehoff · 2025-05-26T08:15:50Z

chi2/N_dat=1.1595
chi2tot (no pos)=5346.1
pos pen =16.632
chi2tot (no pos)/N_dat=1.1559
chi2(exp) in: 230348.46342234776 49.80507317239952
chi2(t0) in: 224623.24473213058 48.567188050190396
pos penalty in =  132.05620257661212 0.028552692448997215
chi2(exp) out: 5362.703986617387 1.1595035646740297
chi2(exp) out (no pos pen): 5346.071558353817 1.1559073639683928
chi2(t0) out: 5773.800627504976 1.2483893248659408
chi2(t0) out (no pos pen): 5757.168199241405 1.2447931241603039
pos penalty out =  16.632428263570546 0.003596200705636875

So the naive (just 0 when it is positive) implementation seems to do ok. This was with a smaller positivity multiplier with respect to the default. Although the moment the positivity is added it takes much longer to converge. Not sure whether just running for longer will get it to 0 (or with a bigger multiplier) or whether it makes sense to do like in NNPDF and have a dynamic value for lambda that grows with the number of trials.

Now the only things missing (in the sense that I'm not sure I understand what they do) are:

The branches with:

fit_pars.deld_const

it says
"if true impose weight to prefer delta_d > 0.25"
but not sure about the practical effect

The closure test part. Which I've seen there are several parts where there's a specific branch for these. But in principle everything should be equal just different data is loaded?

Everything else I think it is for plotting or reporting which I think can be ignored for now

(getting the final error set, evolved and everything @tgiani was looking into it so I shouldn't have touched that part, if I did and broke something it was by mistake!)

LucianHL · 2025-05-29T15:27:30Z

Thanks a lot, and sorry for the delay!

So the naive (just 0 when it is positive) implementation seems to do ok. This was with a smaller positivity multiplier with respect to the default. Although the moment the positivity is added it takes much longer to converge. Not sure whether just running for longer will get it to 0 (or with a bigger multiplier) or whether it makes sense to do like in NNPDF and have a dynamic value for lambda that grows with the number of trials.

Indeed, when I have implemented positivity constraints this was done in a rather similar way to NNPDF - basically just running a set of fits, with lambda set e.g. 10, then increased and so on. This is actually still there in fixpar_nnpdf.py so I would suspect you have been using that in your version? That would probably explain the slowness. I will in any case now finally get round to checking through the latest code version so can comment more on that after that.

"if true impose weight to prefer delta_d > 0.25"
but not sure about the practical effect

That was very much an intermediary check for a particular fit - it can just be removed (i.e. assumed to be false).

The closure test part. Which I've seen there are several parts where there's a specific branch for these. But in principle everything should be equal just different data is loaded?

Correct, possible what you are seeing is different flags to output pseudodata? Or alternatively in the code there is the possibility of fitting PDF level pseudodata (which possibly we don't want to keep), so it might be that?

LucianHL · 2025-05-29T16:28:15Z

That runs for me nicely with a quick test - I will come back to this soon but a quick observation is that by the look of it you will indeed be doing the multiple runs with increasing lampos - see around l.487 of fixpar_nnpdf.py. I'm sure one could write the code in a better/more flexible way but for sure the basic point is that you just try to impose positivity with too large a penalty term things will not converge. From a procedural point of view I'm not sure there is any better way than doing a full LM minimisation for each value of lampos that would be completely stable, but it is possible you might be able to vary it within the LM algorithm I suppose.

I will try some of the longer runs to test them as well.

scarlehoff · 2025-06-16T11:13:03Z

I've pushed my changes to the positivity, I just activate it when the chi2 is about ~2.5 (or when the fit is going to go out because it converged) and seems to work ok without a huge penalty to the time:

chi2(exp) out: 5438.363845796493 1.1758624531451876
chi2(exp) out (no pos pen): 5426.94490517244 1.1733934930102572
chi2(t0) out: 5869.093663916722 1.2689932246306426
chi2(t0) out (no pos pen): 5857.674723292669 1.2665242644957122
pos penalty out =  11.418940624052993 0.002468960134930377
time=  8393.40132706

it probably can be improved, but I think I'll change now to make it into 3 scripts: fit, evolution, hessian errors that can be ready for making it public.

LucianHL · 2025-06-16T11:56:06Z

Thank you! I will take a look.

scarlehoff added 5 commits May 20, 2025 15:56

add some numba decorators, start moving code away from globals

02e6533

clean up data_theory

f187fa7

clean up lhapdf_funs

d24453d

vectorize PDF calls

375af3a

simplify jacc newmin

9ac4a2c

update chebyshevs.py, remove unused xmin argument add positivity and use

bba3b5b

directly the datasets whenever possible

implement positivity in a cheap-ish way

ca1dfd1

scarlehoff mentioned this pull request Jun 18, 2025

Publicable version #7

Merged

scarlehoff merged commit 4b6726f into first_pass_jcm Jan 14, 2026

scarlehoff deleted the speed_and_clean branch February 4, 2026 10:06

Conversation

scarlehoff commented May 20, 2025

Uh oh!

LucianHL commented May 20, 2025

Uh oh!

scarlehoff commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LucianHL commented May 21, 2025

Uh oh!

scarlehoff commented May 21, 2025

Uh oh!

LucianHL commented May 21, 2025

Uh oh!

LucianHL commented May 22, 2025

Uh oh!

scarlehoff commented May 22, 2025

Uh oh!

scarlehoff commented May 23, 2025

Uh oh!

LucianHL commented May 23, 2025

Uh oh!

scarlehoff commented May 23, 2025

Uh oh!

LucianHL commented May 23, 2025

Uh oh!

scarlehoff commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LucianHL commented May 29, 2025

Uh oh!

LucianHL commented May 29, 2025

Uh oh!

scarlehoff commented Jun 16, 2025

Uh oh!

LucianHL commented Jun 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

scarlehoff commented May 21, 2025 •

edited

Loading

scarlehoff commented May 26, 2025 •

edited

Loading