-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better handling of model features #90
Comments
May be a good reason to look at aict-tools for that part. It already has the input features fully configurable. https://github.com/fact-project/aict-tools/blob/master/examples/config_energy.yaml For the output of features generated by write_dl2, that will be replaced by the ctapipe DL2Writer or whatever we call it, and the philosophy will be similar to the DL1 files: compute and store all parameters always, so no configuration should be needed. |
Yes, this issue is of course related to the current implementation provided by protopipe.mva to provide an easier use of protopipe from 0.5.0 onwards. My initial intention is to allow the pipeline to host a number of libraries for ML.
Yes these are no problem, here I refer to the Model features, so the parameters used to train the model(s). |
I saw the aict-tools config, but there they use simple unique DL1/DL2a variables (I have no idea about more complex choices and I would need time to play with it - that's the reason I want first to provide a first easy solution with what we currently have). What I am talking about is handling features "anonymously" in a way that I do not have to worry about reading some more complex analytical functions like e.g. |
Currently the modeling features,
are defined through the configuration files (either
regressor.yaml
orclassifier.yaml
)when the appropriate classes in
protopipe.mva
read them they pass throughprotopipe.mva.utils.prepare_data
which adds modified versions of the basic DL1/DL2a variables into the dataframes (such as e.g. log10 of variables or more complex analytical combinations)most importantly they are hardcoded into
write_dl2.py
These 3 steps as they are make a bit difficult if not annoying and easily error-prone to play with different features.
My current idea is to make a dictionary - open to the user through the documentation - where all possible features and new ones (so this dictionary would be open-ended) are mapped to integers.
So something like,
In doing this then the user would input the features from the configuration files in form of a list of integers which will be then read by the DL2 script as it is mapping unambiguously the features to the estimation section.
The text was updated successfully, but these errors were encountered: