Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in building classifier model #17

Closed
adonini opened this issue Oct 30, 2019 · 7 comments
Closed

Error in building classifier model #17

adonini opened this issue Oct 30, 2019 · 7 comments
Labels
bug Something isn't working divergent pointing pipeline applications Application of the pipeline to specific studies

Comments

@adonini
Copy link

adonini commented Oct 30, 2019

Using the script build_model.py with classifier.yaml I get the following error:

Traceback (most recent call last):
File "/storage/gpfs_data/ctalocal/adonini/protopipe/protopipe/scripts/build_model.py", line 248, in <module>
main()
File "/storage/gpfs_data/ctalocal/adonini/protopipe/protopipe/scripts/build_model.py", line 169, in main
force_same_nsig_nbkg=use_same_number_of_sig_and_bkg_for_training,
File "/storage/gpfs_data/ctalocal/adonini/protopipe/protopipe/mva/train_model.py", line 66, in split_data
target_name=self.target_name
File "/storage/gpfs_data/ctalocal/adonini/protopipe/protopipe/mva/utils.py", line 54, in split_train_test
run_max_train = obs_ids[max_train_obs_idx]
IndexError: index 0 is out of bounds for axis 0 with size 0

The files I use should be right:

filename_sig: DL1/for_classification/dl1_tail_gamma_merged.h5
filename_bkg: DL1/for_classification/dl1_tail_proton_merged.h5

and the data are loaded correctly in data_sig and data_bkg:

Schermata 2019-10-30 alle 14 22 44

But if I print them after the "add label", lines 149-150 in build_model.py, I get an empty data frame:

data_sig: Empty DataFrame
data_bkg: Empty DataFrame

The problem should be in the function prepare_data in utils.py, but I cannot see it.

@HealthyPear
Copy link
Member

Running a mini-test using 2 gamma runs for energy estimation & 2 gamma runs + 2 proton runs for classification, I managed to get to the end.
Of course, I got warnings related to the small statistics, but I didn't get your same error.

I guess the energy model was produced without any problem.

As far as I see there are three places in which something could go wrong:

  • the use of the merge script,
  • the commands launched,
  • the configuration file.

I propose to go step-by-step into each of these.

@adonini
Copy link
Author

adonini commented Oct 30, 2019

I guess the energy model was produced without any problem.

yess I had no problems and in the merging, both the files (gamma and proton) seem fine. All the columns have values, except for the ones labeled err_est_pos and err_est_dir, that are nan, but I don't think this is the problem.
I launched the script with the following command:

python adonini/protopipe/protopipe/scripts/build_model.py --config_file adonini/proto_output/configs/protopipe/classifier.yaml --tail

Instead for the dl1_write I used:

python /adonini/protopipe/protopipe/scripts/write_dl1.py
--config_file adonini/proto_output/configs/protopipe/analysis.yaml
-o adonini/proto_output/data/DL1/for_classification/${output_name}.h5
-i adonini/proto_output/data/DL0/for_classification/ -f "$filename" --tail
--estimate_energy True --regressor_dir adonini/proto_output/estimators/energy_regressor

Regarding the configuration file I only add my directory and I left the parameters you had.

@HealthyPear HealthyPear mentioned this issue Nov 8, 2019
3 tasks
@HealthyPear HealthyPear added pipeline applications Application of the pipeline to specific studies bug Something isn't working labels Nov 8, 2019
@adonini
Copy link
Author

adonini commented Nov 12, 2019

I found the problem: the value of the parameter offset in divergent data. This parameter is calculated as
offset = angular_separation(run_array_direction[0],run_array_direction[1], reco_result.az, reco_result.alt)
but in divergent mode there's no "array_direction", and by default is set at 90deg. Thus the value I obtained for the offset parameter are around 20deg, since in the MCs the source is simulated at Zd 20deg.
In the configuration file classifier.yaml the cut for this parameter is set at 0.1 and 0.5, so in my case no data can pass these cuts.

@HealthyPear HealthyPear added this to Low priority in Bugs and wrong behaviours Jan 31, 2020
@HealthyPear HealthyPear moved this from Low priority to Needs triage in Bugs and wrong behaviours Jan 31, 2020
@HealthyPear HealthyPear moved this from Needs triage to Low priority in Bugs and wrong behaviours Feb 19, 2020
@HealthyPear
Copy link
Member

Hi Alice,

Can you confirm that this is fixed in your analysis by just modifying the model building configuration (namely, just the appropriate YAML file)?

If this is true, this is not really a bug and we can just add an appropriate section in the docs to let people know.

@adonini
Copy link
Author

adonini commented Jun 8, 2020

Hi, yes it's not a bug it's just the missing of an array_direction in divergent mode. Actually modifying the YAML file is not a good idea because you have problem later in the performance calculation. For now I hard coded the value of the array pointing in the code, but with simtel is possible to define an array_direction in case of different telescope pointing, so with new simulations there should be no problem. Anyway regarding this we opened PR #38

@HealthyPear
Copy link
Member

Perfetct! This helps to clear thing up.

So (I add to the discussion also @vuillaut, which opened said PR) if I understand correctly:

  • the PR should solve, even if indirectly, this issue
  • it is anyway needed for the new simulations (I assume you mean simulations dedicated to divergent pointing - if it is not already true, should the same PR take into account this?)

@HealthyPear
Copy link
Member

FINAL UPDATE

This is now solved by simtel for newer divergent simulations.

Pipeline applications automation moved this from To do to Done Oct 22, 2020
Bugs and wrong behaviours automation moved this from Low priority to Closed Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working divergent pointing pipeline applications Application of the pipeline to specific studies
Development

No branches or pull requests

2 participants