Random updates and refactoring of the combined analysis pipeline #91

YoshikiOhtani · 2022-08-19T03:29:53Z

This pull request includes some random updates and refactoring of the combined analysis pipeline. I tested running the updated pipeline with a few data samples and it worked. However, I will keep this pull request open until I confirm that the pipeline produces the consistent results with the whole Crab samples that I usually analyze. Meanwhile, it would be very much appreciated if you could give me some comments or feedbacks. Let me assign @aleberti and @jsitarek as reviewers.

Here let me summarize the changes as follows:

General changes

Refactored the pipeline by following suggestions of flake8, isort and black with the max-line-length = 88 setting (72 for docstrings as following PEP8). Constant variables are now defined in capital letters. Please note that the most updated version of black consider the magic trailing comma by default, but in this pull request I removed this functionality with the option -C (or --skip-magic-trailing-comma) to reduce the user flexibility as much as possible.
Created magicctapipe/io directory and the modules container.py, gadf.py and io.py, and moved some functions to them from magicctapipe/scripts/lst1_magic/*.py and magicctapipe/utils/functions.py.
Updated the documentation of the scripts and added explanations to the code as much as possible so that users can understand what is done in the pipeline.
Updated the logger information display so that while running the scripts users can easily cross-check if the settings are correct or not.
Naming of the output files is simplified, i.e., just replacing specific characters assuming that the input file names do not change from the default ones.

Important changes for users

Changed the format of some settings in the configuration file to string in case they have units, so that astropy Quantity can read the unit directly from the string. See the following example:

magic-cta-pipe/magicctapipe/scripts/lst1_magic/config.yaml

Line 79 in 95f311e

theta_uplim: "6 arcmin"
Changed the argument to train RFs from --train-direction to --train-disp, and also the name of the output RF file from direction_regressors_*.joblib to disp_regressors_*.joblib.
Changed the parameter name dist_sum to disp_diff_sum, which is the sum of the angular distances of DISP candidates.
Changed the configuration name from software_with_any2 to software, and software to software_only_3tel, since the analysis with any2 events would be more standard for the software coincidence.
(EDIT) Added the parameter disp_diff_mean which is the mean of the angular distances of DISP candidates.
(EDIT) Changed the argument of lst1_magic_train_rfs.py from --input-file-gamma, --input-file-bkg to --input-dir-gamma, --input-dir-proton so that multiple input MC files produced by different pointing directions can be read.

From here let me describe the other noticeable changes for each script or module:

functions.py

Renamed calculate_angular_distance to calculate_off_coordinates and simplify the functionality to return only OFF coordinates. Please note that the needed arguments are also changed.
Renamed check_tel_combination to get_stereo_events, and changed the functionality to apply quality cuts and return a new data frame of stereo events with the combo_type and multiplicity parameters.
Renamed save_pandas_to_table to save_pandas_data_in_table.
Moved get_stereo_events, get_dl2_mean and save_pandas_data_in_table to magicctapipe/io/io.py.
Moved create_gh_cuts_hdu to magicctapipe/io/gadf.py.

lst1_magic_mc_dl0_to_dl1.py

Moved the custom containers to magicctapipe/io/containers.py and adapted the script to import them from the module.
Pass only the pixels surviving the cleaning to the parametrization, which speeds up the process around 10%.
(EDIT) Skip the process of the event whose cleaned image contains negative charges, since it fails the Hillas and timing parametrization.
Skip the process of the event which failed to reconstruct finite timing parameters.

magic_calib_to_dl1.py

The same updates as lst1_magic_mc_dl0_to_dl1.py.

lst1_magic_event_coincidence.py

Renamed load_lst_data_file and load_magic_data_file to load_lst_dl1_data_file and load_magic_dl1_data_files, moved them to magicctapipe/io/io.py, and adapted the script to import them from the module.
Changed the name of the setting from type_lst_time to timestamp_type_lst, which is the type of the LST timestamp.
Changed the name of the setting from window_width (= 600 ns) to window_half_width (= 300 ns), which is the coincidence window half width.
Changed the philosophy of the coincidence algorithm to scale all the timing-related parameters (time_lst, time_magic, time_offsets, window_half_width) in nanoseconds and use the int64bit format, instead of subtracting the UNIX time of an observation day and using float64bit type. I think this is a safer way in terms of a rounding issue, and now we don't need to round the parameters which happened everywhere in the script. Also I believe it improves the readability of the code.

lst1_magic_stereo_reco.py

Use loc method instead of query to extract a shower event, since it speeds up the process around 20%.
Skip the process of the event if it fails to reconstruct valid stereo parameters, which happens when the event contains images of width = 0 (though they should be removed by the default quality cuts).
Don't save the stereo parameters which do not have finite values even if the reconstruction succeeds, i.e., core_uncert, h_max_uncert, is_valid. See the definition of the reconstructor.

lst1_magic_train_rfs.py

Moved load_train_data_file to magicctapipe/io/io.py and adapted the script to import it from the module.
Removed check_feature_importance and added the functionality to the training functions.
Removed get_events_at_random and added the functionality directly inside train_event_classifier.
Renamed train_direction_regressor to train_disp_regressor as following the change of the regressor module from DirectionRegressor to DispRegressor.
(EDIT) Renamed load_train_data_file to load_train_data_files and the argument from input_file to input_dir so that multiple MC files can be read.

lst1_magic_dl1_stereo_to_dl2.py

Separated the functionality of DirectionRegressor.predict to DispRegressor.predict and reconstruct_arrival_direction functions to make it clear that DispRegressor only trains and predicts the DISP parameter, and the reconstruction of the arrival direction is done separately with the DISP method. Also, DirectionRegressor assumed that the input data is already separated per telescope combination type, but now DispRegressor does not assume it. So we can easily test training/applying RFs with the combo_type parameter without separating the RFs per telescope combination types.
Don't transform telescope-wise Alt/Az direction to the RA/Dec coordinate, since for creating the IRFs/DL3 the RA/Dec direction transformed from the mean Alt/Az direction is used.

lst1_magic_create_irf.py

Renamed load_dl2_data_file to load_mc_dl2_data_file, moved it to magicctapipe/io/io.py, and adapted the script to import it from the module.
Removed apply_dynamic_gammaness_cut and apply_dynamic_theta_cut, and added the functionality to create_irf.

lst1_magic_dl2_to_dl3.py

Removed calculate_deadc and added the functionality inside load_dl2_data_file.
Moved load_dl2_data_file, load_irf_files to magicctapipe/io/io.py, and adapted the script to import them from the module
Renamed create_event_list, create_gti_table and create_pointing_table to create_event_hdu, create_gti_hdu and create_pointing_hdu, changed the functionality to return the HDUs, moved them to magicctapipe/io/gadf.py, and adapted the script to import them from the module.
Calculate the mean pointing Az direction in a range where the standard deviation of the pointing direction is smaller, 0 < az < 360 deg or -180 < az < 180 deg, which was originally suggested by @jsitarek in the pull request Implement IRF interpolation method #81.

…-pipe into style-unif

… and reconstruct_arrival_directions

…ripts

…-pipe into style-unif

…-cta-pipe into style-unif

…f exiting the program

… offset steps

…-pipe into style-unif

YoshikiOhtani · 2022-09-28T16:03:08Z

Thank you very much @jsitarek for your checks and comments. We already talked via emails, but just in case let me write also here. I implemented all his comments in this pull request and resolved them. I would like to push a few more commits about refactoring with which I came up during implementing his comments, and also about the implementation of a functionality to create the full-enclosure IRFs. I will let you know once I'm done, and then it will be finally ready to be merged to the master branch. @aleberti if you have any comments or suggestions to this branch please let me know.

jsitarek · 2022-09-28T16:21:16Z

Thinking more about this I think it is best if the remaining refactoring goes into this PR and the full enclosure IRF, as new development go into a separate PR. Cheers, Julian śr., 28 wrz 2022, 17:03 użytkownik Yoshiki Ohtani ***@***.***> napisał:

…

Thank you very much @jsitarek <https://github.com/jsitarek> for your checks and comments. We already talked via emails, but just in case let me write also here. I implemented all his comments in this pull request and resolved them. I would like to push a few more commits about refactoring with which I came up during implementing his comments, and also about the implementation of a functionality to create the full-enclosure IRFs. I will let you know once I'm done, and then it will be finally ready to be merged to the master branch. @aleberti <https://github.com/aleberti> if you have any comments or suggestions to this branch please let me know. — Reply to this email directly, view it on GitHub <#91 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AH36DYLW5TW3J4UKIWLYQ63WART4PANCNFSM567IJAAA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

YoshikiOhtani · 2022-09-28T16:22:57Z

Hi @jsitarek, OK thanks, actually I was also thinking about that, so I will create another pull request later.

YoshikiOhtani · 2022-09-29T10:00:29Z

OK so I have done. There are really minor changes so I don't think they need to be reviewed thoroughly. If there is no objection, let me merge it to the master branch.

aleberti · 2022-09-29T12:05:21Z

Hi @YoshikiOhtani, yes, please go ahead and merge.

YoshikiOhtani · 2022-09-29T12:20:56Z

OK, I will merge it now.

YoshikiOhtani added 3 commits August 17, 2022 18:44

Update some modules, unify the code style mainly up to dl2 level

62e1b14

Merge branch 'master' of https://github.com/cta-observatory/magic-cta…

2fcdc39

…-pipe into style-unif

Style unification up to dl3 level

506f0cc

YoshikiOhtani changed the title ~~Refactoring and style unification of the combined analysis code~~ Random updates and refactoring of the combined analysis pipeline Aug 24, 2022

YoshikiOhtani mentioned this pull request Aug 24, 2022

Bug in the stereo reconstruction of any 2 events #92

Closed

YoshikiOhtani added 13 commits August 25, 2022 21:59

Refactor the scripts again up to DL2 level

e7e17ce

Refactor the scripts again up to dl3 level

c273838

Move functions to proper modules

b13e42f

Remove some functions, adapt the scripts to the changes

295a9d7

Separate the functionality of the DirectionRegressor to DispRegressor…

942a093

… and reconstruct_arrival_directions

Recover the calculate_pointing_separation function

42bc34c

Use string to specify units in the configuration file, adapted the sc…

fd2b873

…ripts

Change the max-line-length to 88 to be consistent with black

32da37c

Change the max-line-length from 90 to 88 to be consistent with black

f53dce3

Some random modifications up to DL2 level

b29f43f

Some random modifications up to DL3 level

a96cf03

Follow suggestions by black

95f311e

Some minor modifications

a2b7a1c

YoshikiOhtani marked this pull request as ready for review September 5, 2022 12:30

YoshikiOhtani requested review from aleberti and jsitarek September 5, 2022 12:30

YoshikiOhtani linked an issue Sep 5, 2022 that may be closed by this pull request

Move functions to proper modules #88

Closed

YoshikiOhtani added 8 commits September 6, 2022 17:14

Minor refactoring mainly for logger info

f257707

Merge branch 'master' of https://github.com/cta-observatory/magic-cta…

45cd453

…-pipe into style-unif

Fix a typo

84da92f

Change the judgement for the existence of background MCs

741eae2

Very minor modifications

9fd0502

Merge branch 'style-unif' of https://github.com/cta-observatory/magic…

f272ce0

…-cta-pipe into style-unif

Fix an issue when merging files subrun-wise

a41ea0d

Skip the event if its cleaned image contains negative charges

a0bd19d

YoshikiOhtani added 8 commits September 27, 2022 14:11

Add a comment, and small refactoring in calculate_off_coordinates

16b4d64

Remove the events whose angular distances larger than limit instead o…

4c3c3a4

…f exiting the program

Fix an issue of compatibility

3b8933e

Update the documentation

709f0f8

Remove the checks of the number of coincident events between the time…

bcab3dc

… offset steps

Set the random seed

2de8577

Transform the string to u.Quantity inside the load function

9457098

Refactor the code for training RFs

005239d

jsitarek mentioned this pull request Sep 28, 2022

Interpolation of the energy dispersion matrix #94

Closed

YoshikiOhtani added 2 commits September 28, 2022 14:12

Merge branch 'master' of https://github.com/cta-observatory/magic-cta…

01f18ae

…-pipe into style-unif

Refactor the training script again

aae69f0

YoshikiOhtani added 9 commits September 29, 2022 15:44

Slightly update the comments in magic_calib_to_dl1.py

cc398e8

Slightly update the comment on lst1_mc_dl0_to_dl1.py

374c59e

Slightly modify the comments and change the variable names

7a07b93

Slightly modify the stereo_reco and event_coincidence

e52238f

Refactor again the train_rfs

ea2a375

Slightly update the comments and change the variable name

35050c7

Slightly modify the comment on merge_hdf_files.py

9287169

Add the disp_diff_mean cut (< 0.22 deg) as a default cut

74d2d1d

Slightly modify the comments and change the variable name on io.py

4d86914

YoshikiOhtani merged commit 050e1b2 into master Sep 29, 2022

YoshikiOhtani deleted the style-unif branch September 29, 2022 12:21

This was referenced Oct 24, 2022

Fix the issue in the pointing separation check #114

Merged

Minor refactoring #113

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random updates and refactoring of the combined analysis pipeline #91

Random updates and refactoring of the combined analysis pipeline #91

YoshikiOhtani commented Aug 19, 2022 •

edited

Loading

YoshikiOhtani commented Sep 28, 2022

jsitarek commented Sep 28, 2022 via email

YoshikiOhtani commented Sep 28, 2022

YoshikiOhtani commented Sep 29, 2022

aleberti commented Sep 29, 2022

YoshikiOhtani commented Sep 29, 2022

Random updates and refactoring of the combined analysis pipeline #91

Random updates and refactoring of the combined analysis pipeline #91

Conversation

YoshikiOhtani commented Aug 19, 2022 • edited Loading

YoshikiOhtani commented Sep 28, 2022

jsitarek commented Sep 28, 2022 via email

YoshikiOhtani commented Sep 28, 2022

YoshikiOhtani commented Sep 29, 2022

aleberti commented Sep 29, 2022

YoshikiOhtani commented Sep 29, 2022

YoshikiOhtani commented Aug 19, 2022 •

edited

Loading