Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scientific notation for numbers in parameter grids not parsed correctly #318

Closed
desilinguist opened this issue May 13, 2016 · 4 comments
Closed
Assignees
Milestone

Comments

@desilinguist
Copy link
Member

desilinguist commented May 13, 2016

Currently, SKLL does not correctly deal with numbers in scientific notation when specified in parameter grids. The main problem is how PyYAML is, by default, set up to accept floating point numbers. We use yaml.load to read in the param grids when parsing the config file. To wit, consider the following example with the same config parameter grid expressed with and without scientific notation:

>>> import yaml
>>> yaml.load("[{'C': [0.01, 0.1, 1.0, 10, 100]}]")
[{'C': [0.01, 0.1, 1.0, 10, 100]}]
>>> yaml.load("[{'C': [1e-2, 1e-1, 1.0, 1e1, 1e2]}]")
[{'C': ['1e-2', '1e-1', 1.0, '1e1', '1e2']}]

You have to basically specify the above list in a particular version of the scientific notation:

>>> yaml.load("[{'C': [1.0e-2, 1.0e-1, 1.0, 1.0e+1, 1.0e+2]}]")
[{'C': [0.01, 0.1, 1.0, 10.0, 100.0]}]

So we need to get PyYAML to do the right thing by explicitly patching its floating point resolver, e.g., as shown here since it's clear that PyYAML won't fix the problem itself.

@desilinguist
Copy link
Member Author

desilinguist commented May 13, 2016

Actually, may be we can just switch to ruamel.yaml which is an up-to-date and in-development replacement for PyYAML that also has a conda package in the defaults channel.

@dan-blanchard
Copy link
Contributor

Wow, how had I not heard of that? Yes, we should definitely switch about from the abandonware that is PyYAML.

Hmm... people probably consider chardet abandonware too at this point, since I haven't made a release since late 2014...

@desilinguist
Copy link
Member Author

desilinguist commented May 13, 2016

I can confirm that ruamel_yaml works just fine with scientific notation:

>>> import ruamel_yaml as yaml
>>> yaml.load("[{'C': [1e-2, 1e-1, 1.0, 1e1, 1e2]}]")
[{'C': [0.01, 0.1, 1.0, 10.0, 100.0]}]

The only annoying thing is that the conda package is called ruamel_yaml where as the PyPI package is called ruamel.yaml.

@desilinguist desilinguist self-assigned this May 13, 2016
@desilinguist desilinguist added this to the 1.2.1 milestone May 13, 2016
@desilinguist
Copy link
Member Author

Addressed by #320.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants