Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handling of model features #90

Closed
HealthyPear opened this issue Jan 20, 2021 · 3 comments · Fixed by #96
Closed

Better handling of model features #90

HealthyPear opened this issue Jan 20, 2021 · 3 comments · Fixed by #96
Labels
enhancement New feature or request good first issue Good for newcomers machine learning
Milestone

Comments

@HealthyPear
Copy link
Member

Currently the modeling features,

  1. are defined through the configuration files (either regressor.yaml or classifier.yaml)

  2. when the appropriate classes in protopipe.mva read them they pass through protopipe.mva.utils.prepare_data which adds modified versions of the basic DL1/DL2a variables into the dataframes (such as e.g. log10 of variables or more complex analytical combinations)

  3. most importantly they are hardcoded into write_dl2.py

These 3 steps as they are make a bit difficult if not annoying and easily error-prone to play with different features.

My current idea is to make a dictionary - open to the user through the documentation - where all possible features and new ones (so this dictionary would be open-ended) are mapped to integers.

So something like,

1: hillas_width
2: hillas_intensity
....
14: some crazy function

In doing this then the user would input the features from the configuration files in form of a list of integers which will be then read by the DL2 script as it is mapping unambiguously the features to the estimation section.

@HealthyPear HealthyPear added enhancement New feature or request good first issue Good for newcomers machine learning labels Jan 20, 2021
@HealthyPear HealthyPear added this to the v0.5.0 milestone Jan 20, 2021
@kosack
Copy link
Contributor

kosack commented Jan 20, 2021

May be a good reason to look at aict-tools for that part. It already has the input features fully configurable.

https://github.com/fact-project/aict-tools/blob/master/examples/config_energy.yaml

For the output of features generated by write_dl2, that will be replaced by the ctapipe DL2Writer or whatever we call it, and the philosophy will be similar to the DL1 files: compute and store all parameters always, so no configuration should be needed.

@HealthyPear
Copy link
Member Author

May be a good reason to look at aict-tools for that part. It already has the input features fully configurable.

https://github.com/fact-project/aict-tools/blob/master/examples/config_energy.yaml

Yes, this issue is of course related to the current implementation provided by protopipe.mva to provide an easier use of protopipe from 0.5.0 onwards.

My initial intention is to allow the pipeline to host a number of libraries for ML.
The only requirement for this would be a common configuration system and at least 1 common data format (like now we are using the pickled files from scikit-learn).

For the output of features generated by write_dl2, that will be replaced by the ctapipe DL2Writer or whatever we call it, and the philosophy will be similar to the DL1 files: compute and store all parameters always, so no configuration should be needed.

Yes these are no problem, here I refer to the Model features, so the parameters used to train the model(s).

@HealthyPear
Copy link
Member Author

I saw the aict-tools config, but there they use simple unique DL1/DL2a variables (I have no idea about more complex choices and I would need time to play with it - that's the reason I want first to provide a first easy solution with what we currently have).

What I am talking about is handling features "anonymously" in a way that I do not have to worry about reading some more complex analytical functions like e.g. atan2(cog_y - dir_y, cog_x - dir_x) or log10(Width*Length/Size)

@HealthyPear HealthyPear linked a pull request Feb 23, 2021 that will close this issue
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers machine learning
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants