Skip to content

Misc changes#6

Merged
scarlehoff merged 7 commits intofirst_pass_jcmfrom
speed_and_clean
Jan 14, 2026
Merged

Misc changes#6
scarlehoff merged 7 commits intofirst_pass_jcmfrom
speed_and_clean

Conversation

@scarlehoff
Copy link
Collaborator

I've moved things a bit around, the structure I'm trying to achieve is more or less the following:

  • A read-only class holding all information that needs to be pass around (to avoid modifying a global state which makes things complicated) and which takes care of the nnpdf stuff (prepare the covariance matrix, data, replicas, training/validation split and so on)
  • A PDF class where all that needs to be given is a function with a signature:
   (flavour, parameters, xgrid) -> pdf_{i} (parameters ; xgrid)

which fwiw doesn't even need to be python.

The rest (minimization algorithm, hessian part, glue) will remain more or less the same other than what needed to change for the two points above.

With the idea that one can give an nnpdf runcard + a config file from here and then you get a fixpar version of what you would get from nnpdf.

To run with this branch you need this PR from NNPDF: NNPDF/nnpdf#2320 (it includes the speed up for the convolutions I mentioned the other day @tgiani). The long minimization takes about 1 hour in my PC with 8 cores.

@LucianHL
Copy link
Collaborator

Thanks @scarlehoff ! I will give this a go soon myself and let you know

@scarlehoff
Copy link
Collaborator Author

scarlehoff commented May 21, 2025

One question @LucianHL this diff_2 variable https://github.com/LucianHL/fixpar_nnpdf/blob/ef15a43c594cce182a67abc97c5e78059768fc23/fixpar_nnpdf.py#L431
can be set always to False?

I see that it is set to True for the direct fit to pseudodata but why would the derivative need to be different in that case?

(I'm also wondering about the pos_const flag, which as far as I can see i sset to false but the positivity is still being computed?

Removing both of them would simplify the code quite a bit)

@LucianHL
Copy link
Collaborator

Hi yes the diff_2 flag is not used for anything anymore - it existed for earlier incarnations of the code where the finite differences were done at the level of the theory and if true did more steps for this. Your update does this automatically and so that is not needed.

I am not sure about pos_const though - I am pretty sure that if that is true then positivity is imposed but not otherwise? It e.g. is not in the test runs I provided most recently.

@scarlehoff
Copy link
Collaborator Author

The output of e.g. the longmin config file is:

chi2(exp) in: 325255.73895760323 70.32556518002232
chi2(t0) in: 319530.5202672612 69.0876800577862
pos penalty in =  95039.3317379255 20.549044700092
chi2(exp) out: 63546.98114224037 13.73988781453846
chi2(exp) out (no pos pen): 5328.013632216906 1.1520029475063582
chi2(t0) out: 63966.03248814855 13.830493510951039
chi2(t0) out (no pos pen): 5747.06497812508 1.2426086439189363
pos penalty out =  58218.96751002347 12.5878848670321
time=  14207.790558556

does that mean that the constraint is computed but not utilized?

@LucianHL
Copy link
Collaborator

Ah I see what you mean - no it is not imposed in the minimisation by default at the moment. The key line is:

https://github.com/LucianHL/fixpar_nnpdf/blob/ef15a43c594cce182a67abc97c5e78059768fc23/fixpar_nnpdf.py#L417

and so nnpdf_pos is the overall flag that has to be set to true in the config file for this to happen. What you are seeing in the log at the end is the output of the fit, which does show the positivity penalty - even if this is not imposed during the minimisation (which you can also see it is not from the huge positivity penalty!). See around l. 561 where that is happening.

I have to head out to give a talk now so will be offline for the rest of the afternoon but can pick up any further queries after that.

@LucianHL
Copy link
Collaborator

Hi, sorry a rather basic query - I have so far been using conda to install the nnpdf code so far and am not certain of the easiest way to get that to work with your PR?

@scarlehoff
Copy link
Collaborator Author

What you are seeing in the log at the end is the output of the fit, which does show the positivity penalty - even if this is not imposed during the minimisation

I see, than that's still missing here. I'll add it, thanks!

Hi, sorry a rather basic query - I have so far been using conda to install the nnpdf code so far and am not certain of the easiest way to get that to work with your PR?

You can install the specific branch with pip, so either cloning the repository -> checking out the branch -> pip install -e .

or just pip install git+https://github.com/NNPDF/nnpdf.git@speed_up_convolutions

(in the conda environment).

What I usually do is to install with conda just lhapdf and the rest with pip.

@scarlehoff
Copy link
Collaborator Author

So, @LucianHL

I'm looking at this,
https://github.com/LucianHL/fixpar_nnpdf/blob/ef15a43c594cce182a67abc97c5e78059768fc23/src/chi2s.py#L1897

I see that when pos_const is active, there are these two loops in which the chiarrm and chiarr arrays get filled, but then only the chiarr is used (so, if I understand correctly, the jacobian d(positivity)/dparameters is summed to d(chi2)/dparameters)

Then, chiarrm is simply discarded and thus this array here, https://github.com/LucianHL/fixpar_nnpdf/blob/ef15a43c594cce182a67abc97c5e78059768fc23/src/chi2s.py#L164C17-L164C30 pdflabel_marr is not necessary.

Is that correct? (I guess this is part of jaccalc_"oldmin" only?)

(just to say, no rush at all!! as you can see i can only work intermitently...)

@LucianHL
Copy link
Collaborator

No problem - you are doing all the work at the moment so I'm happy to answer questions as quickly as I can! Yes that is right that chiarrm. But unfortunately things are a bit messy here...

  1. For the 'oldmin' it seems I have deleted a necessary few lines of code which actually use chiarrm in the original old jaccalc(. This should be the positivity constraint with "PDF pars - eps" and then the difference is taken to give the derivative. As things change very quickly wrt the positivity constraints (if you are in a negative region) by default for the old minimisation this was done with both a plus and minus eps always - and also I think because this part was not so time consuming anyway, so it was easier to be a bit more accurate. I have commited the change to jaccalc( in the main branch for future reference, even if this `oldmin' is not actually used anymore.

  2. For the new minimisation, when ip is greater than zero at

https://github.com/LucianHL/fixpar_nnpdf/blob/ef15a43c594cce182a67abc97c5e78059768fc23/src/chi2s.py#L1898

this should correspond to the PDF being fed in being dPDF/dpar. So by default that should I think give the correct derivative for the positivity if the PDF is negative (in which case the penalty is just ~ the PDF anyway) but won't when it is positive (where the - small - penalty is not as simple). The upshot is that having updating the manner in which PDF derivative where taken with my code, I had not so far got on to updating this for the case where minimisation is imposed. It should be pretty straightforward to do that though, so I can certaintly look at that.

Of course that would be for the sake of cross checking alone, as you are always taking numerical derivative wrt the whole theory (not just the PDFs) in the new code. So nothing - real theory vs. positivity - should be treated differently in your case.

@scarlehoff
Copy link
Collaborator Author

you are doing all the work at the moment

I'm doing a lot of buraucracy and meetings these days, this is the fun part :__

but won't when it is positive (where the - small - penalty is not as simple)

so, the reason we use elu in the NNPDF code is that we hit a few infinities for some optimizers and that fixed it (instead of just having a rectifier). So if the positive part is problematic here we can just set it to 0.

Ok, knowing this I'll try to add the positivity "naively" without looking at what was done before.

@LucianHL
Copy link
Collaborator

Sounds good, thanks!

so, the reason we use elu in the NNPDF code is that we hit a few infinities for some optimizers and that fixed it (instead of just having a rectifier). So if the positive part is problematic here we can just set it to 0.

Yes actually on reflection if you are feeding in simply dPDF/dpar to the positivity then it should be fine to just have this as 0 if the PDF is positive and ~ PDF if it is negative. I guess the issue comes in when you take finite differences, if these happen to lie across the point where the PDF is zero.

Anyway yes as you say you don't really need to know any of this for what you are doing. From my point of view though I will update the newmin version with this so at least we still have a version to compare against.

@scarlehoff
Copy link
Collaborator Author

scarlehoff commented May 26, 2025

chi2/N_dat=1.1595
chi2tot (no pos)=5346.1
pos pen =16.632
chi2tot (no pos)/N_dat=1.1559
chi2(exp) in: 230348.46342234776 49.80507317239952
chi2(t0) in: 224623.24473213058 48.567188050190396
pos penalty in =  132.05620257661212 0.028552692448997215
chi2(exp) out: 5362.703986617387 1.1595035646740297
chi2(exp) out (no pos pen): 5346.071558353817 1.1559073639683928
chi2(t0) out: 5773.800627504976 1.2483893248659408
chi2(t0) out (no pos pen): 5757.168199241405 1.2447931241603039
pos penalty out =  16.632428263570546 0.003596200705636875

So the naive (just 0 when it is positive) implementation seems to do ok. This was with a smaller positivity multiplier with respect to the default. Although the moment the positivity is added it takes much longer to converge. Not sure whether just running for longer will get it to 0 (or with a bigger multiplier) or whether it makes sense to do like in NNPDF and have a dynamic value for lambda that grows with the number of trials.

Now the only things missing (in the sense that I'm not sure I understand what they do) are:

  1. The branches with:
fit_pars.deld_const

it says
"if true impose weight to prefer delta_d > 0.25"
but not sure about the practical effect

  1. The closure test part. Which I've seen there are several parts where there's a specific branch for these. But in principle everything should be equal just different data is loaded?

Everything else I think it is for plotting or reporting which I think can be ignored for now

(getting the final error set, evolved and everything @tgiani was looking into it so I shouldn't have touched that part, if I did and broke something it was by mistake!)

@LucianHL
Copy link
Collaborator

Thanks a lot, and sorry for the delay!

So the naive (just 0 when it is positive) implementation seems to do ok. This was with a smaller positivity multiplier with respect to the default. Although the moment the positivity is added it takes much longer to converge. Not sure whether just running for longer will get it to 0 (or with a bigger multiplier) or whether it makes sense to do like in NNPDF and have a dynamic value for lambda that grows with the number of trials.

Indeed, when I have implemented positivity constraints this was done in a rather similar way to NNPDF - basically just running a set of fits, with lambda set e.g. 10, then increased and so on. This is actually still there in fixpar_nnpdf.py so I would suspect you have been using that in your version? That would probably explain the slowness. I will in any case now finally get round to checking through the latest code version so can comment more on that after that.

"if true impose weight to prefer delta_d > 0.25"
but not sure about the practical effect

That was very much an intermediary check for a particular fit - it can just be removed (i.e. assumed to be false).

The closure test part. Which I've seen there are several parts where there's a specific branch for these. But in principle everything should be equal just different data is loaded?

Correct, possible what you are seeing is different flags to output pseudodata? Or alternatively in the code there is the possibility of fitting PDF level pseudodata (which possibly we don't want to keep), so it might be that?

@LucianHL
Copy link
Collaborator

That runs for me nicely with a quick test - I will come back to this soon but a quick observation is that by the look of it you will indeed be doing the multiple runs with increasing lampos - see around l.487 of fixpar_nnpdf.py. I'm sure one could write the code in a better/more flexible way but for sure the basic point is that you just try to impose positivity with too large a penalty term things will not converge. From a procedural point of view I'm not sure there is any better way than doing a full LM minimisation for each value of lampos that would be completely stable, but it is possible you might be able to vary it within the LM algorithm I suppose.

I will try some of the longer runs to test them as well.

@scarlehoff
Copy link
Collaborator Author

I've pushed my changes to the positivity, I just activate it when the chi2 is about ~2.5 (or when the fit is going to go out because it converged) and seems to work ok without a huge penalty to the time:

chi2(exp) out: 5438.363845796493 1.1758624531451876
chi2(exp) out (no pos pen): 5426.94490517244 1.1733934930102572
chi2(t0) out: 5869.093663916722 1.2689932246306426
chi2(t0) out (no pos pen): 5857.674723292669 1.2665242644957122
pos penalty out =  11.418940624052993 0.002468960134930377
time=  8393.40132706

it probably can be improved, but I think I'll change now to make it into 3 scripts: fit, evolution, hessian errors that can be ready for making it public.

@LucianHL
Copy link
Collaborator

Thank you! I will take a look.

@scarlehoff scarlehoff mentioned this pull request Jun 18, 2025
@scarlehoff scarlehoff merged commit 4b6726f into first_pass_jcm Jan 14, 2026
@scarlehoff scarlehoff deleted the speed_and_clean branch February 4, 2026 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants