Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find a better solution for configuration #100

Open
brenmous opened this issue Jul 2, 2020 · 2 comments
Open

Find a better solution for configuration #100

brenmous opened this issue Jul 2, 2020 · 2 comments

Comments

@brenmous
Copy link
Collaborator

brenmous commented Jul 2, 2020

Currently UncoverML is controlled by a YAML that gets read into a Config object that has various key: value pairs set as attributes.

This object has gotten pretty complex and there's a lot of dependencies between the attributes. It's also the biggest cause of tests breaking - new attributes get added or attributes get modified and then in the code they no longer exist on the config object in certain execution paths where they were previously being read/checked. It would be great to streamline this.

YAML is also an issue. YAML is really easy to make mistakes with. It's very syntax sensitive and small typos can lead to confusing errors. It also makes it hard to verify that the user has provided the correct values for the desired workflow. And the biggest issue (in my opinion) is that parameter name typos aren't handled. The user might think they've provided an optional parameter to activate a feature, but the key is misspelled. So when parsing the YAML file (by looking up parameters based on keys) that parameter won't get set and the related processing won't occur, but if it doesn't cause any errors (often in the case of optional features/parameters) the user won't realise.

Another concern is that the Config object contains state - it owns the FeatureSet and TransformSet objects. These contain the paths to covariate data and covariate statistics that are used for applying transforms.

I've been considering the Python module route. That is, have a config.py module the user is expected to modify. The parameter names are baked in as attributes so there's no concern about parameter name typos. It also gets around a lot of YAML's annoying syntax issues. However I'm open to any solutions that keep things simple and solve the mentioned issues.

This is a laborious task as just about everything in UncoverML touches the Config object. It also means extracting the stateful FeatureSet and TransformSet.

@bluetyson
Copy link
Contributor

Yes, this one is hard. As we have see, very easy to make mistakes even for the experienced. :)

@bluetyson
Copy link
Contributor

A thought would be a gui...or website that makes config files, too, or at least the important skeletons, config.py things you can automate of course good, too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants