Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconstructed energy is written in logscale while true energy in linear scale #139

Closed
HealthyPear opened this issue May 16, 2021 · 2 comments · Fixed by #141
Closed

Reconstructed energy is written in logscale while true energy in linear scale #139

HealthyPear opened this issue May 16, 2021 · 2 comments · Fixed by #141
Labels
wrong behaviour The code works but produces clearly wrong results

Comments

@HealthyPear
Copy link
Member

This is because we write to file the direct estimation from the energy regressor which is decided by the target value in the model, log10_true_energy by default.

This creates 2 problems:

  • classifier features related to reconstructed energy need then to be written as

log10_reco_energy: reco_energy # Averaged-estimated energy of the shower

which is horrible

  • benchmarking code is not elastic enough so results can seem wrong but only because cuts are done in the wrong scale...
@HealthyPear HealthyPear added the wrong behaviour The code works but produces clearly wrong results label May 16, 2021
@HealthyPear HealthyPear added this to Needs triage in Bugs and wrong behaviours via automation May 16, 2021
@kosack
Copy link
Contributor

kosack commented May 17, 2021

I think the solution is to allow a transformation to normalize/re-scale the predicted variable. E.g. the predicted value should always be "energy" (not log10_energy), but you should have an option

transform: np.log10
inverse_transform: lambda p: 10**p

And then during training you call the transform so all computations are in log10_energy, and after predict call the inverse transform to go back to energy. You could also include the scaling there to TeV, unless that is just assumed that energies are in TeV.

So the sequence of steps is:

  • input training data → transform → train
  • input testing data → predict → inverse_transform → prediction

The same could even be used for input data to the training, if you really want to be general. I.e. you could allow a column name + transform + inverse_transform for all variables (e.g. intensity → log10(intensity) → training)
However, I guess the user-defined features solve that problem, so it's probably only needed for the input/output parameter

@HealthyPear
Copy link
Member Author

Just a small clarification: I found this problem only now because in the previous AdaBoost config the true target was true_energy and not log10_true_energy so the estimated value was always in linear scale (not sure if this was also one of the factors for which resolution was bad before)

Bugs and wrong behaviours automation moved this from Needs triage to Closed May 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wrong behaviour The code works but produces clearly wrong results
Projects
Development

Successfully merging a pull request may close this issue.

2 participants