Better handling of model features #90

HealthyPear · 2021-01-20T15:09:46Z

Currently the modeling features,

are defined through the configuration files (either regressor.yaml or classifier.yaml)
when the appropriate classes in protopipe.mva read them they pass through protopipe.mva.utils.prepare_data which adds modified versions of the basic DL1/DL2a variables into the dataframes (such as e.g. log10 of variables or more complex analytical combinations)
most importantly they are hardcoded into write_dl2.py

These 3 steps as they are make a bit difficult if not annoying and easily error-prone to play with different features.

My current idea is to make a dictionary - open to the user through the documentation - where all possible features and new ones (so this dictionary would be open-ended) are mapped to integers.

So something like,

1: hillas_width
2: hillas_intensity
....
14: some crazy function

In doing this then the user would input the features from the configuration files in form of a list of integers which will be then read by the DL2 script as it is mapping unambiguously the features to the estimation section.

The text was updated successfully, but these errors were encountered:

kosack · 2021-01-20T15:15:13Z

May be a good reason to look at aict-tools for that part. It already has the input features fully configurable.

https://github.com/fact-project/aict-tools/blob/master/examples/config_energy.yaml

For the output of features generated by write_dl2, that will be replaced by the ctapipe DL2Writer or whatever we call it, and the philosophy will be similar to the DL1 files: compute and store all parameters always, so no configuration should be needed.

HealthyPear · 2021-01-20T15:19:08Z

May be a good reason to look at aict-tools for that part. It already has the input features fully configurable.

https://github.com/fact-project/aict-tools/blob/master/examples/config_energy.yaml

Yes, this issue is of course related to the current implementation provided by protopipe.mva to provide an easier use of protopipe from 0.5.0 onwards.

My initial intention is to allow the pipeline to host a number of libraries for ML.
The only requirement for this would be a common configuration system and at least 1 common data format (like now we are using the pickled files from scikit-learn).

For the output of features generated by write_dl2, that will be replaced by the ctapipe DL2Writer or whatever we call it, and the philosophy will be similar to the DL1 files: compute and store all parameters always, so no configuration should be needed.

Yes these are no problem, here I refer to the Model features, so the parameters used to train the model(s).

HealthyPear · 2021-01-20T15:22:04Z

I saw the aict-tools config, but there they use simple unique DL1/DL2a variables (I have no idea about more complex choices and I would need time to play with it - that's the reason I want first to provide a first easy solution with what we currently have).

What I am talking about is handling features "anonymously" in a way that I do not have to worry about reading some more complex analytical functions like e.g. atan2(cog_y - dir_y, cog_x - dir_x) or log10(Width*Length/Size)

HealthyPear added enhancement New feature or request good first issue Good for newcomers machine learning labels Jan 20, 2021

HealthyPear added this to the v0.5.0 milestone Jan 20, 2021

HealthyPear mentioned this issue Jan 22, 2021

Comparison with CTA-MARS: Energy estimation #92

Closed

12 tasks

HealthyPear linked a pull request Feb 23, 2021 that will close this issue

Improve models generation #96

Merged

9 tasks

HealthyPear closed this as completed in #96 Apr 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better handling of model features #90

Better handling of model features #90

HealthyPear commented Jan 20, 2021

kosack commented Jan 20, 2021

HealthyPear commented Jan 20, 2021

HealthyPear commented Jan 20, 2021

Better handling of model features #90

Better handling of model features #90

Comments

HealthyPear commented Jan 20, 2021

kosack commented Jan 20, 2021

HealthyPear commented Jan 20, 2021

HealthyPear commented Jan 20, 2021