support for truncated normal distribution #188

pbalapra · 2021-07-26T19:38:01Z

Adding support for truncated normal distribution for NormalIntegerHyperparameter and NormalFloatHyperparameter with lower and upper bound.

Deathn0t · 2021-09-01T07:09:15Z

Hello @mfeurer , what do you think about this PR? We found it extremely useful to have hyper-parameters with normal distribution and boundary constraints.

ConfigSpace/hyperparameters.pyx

eddiebergman

Hi @pbalapra, @Deathn0t
These changes look good and don't intrude on any current functionality. However I think tests are much required for this to ensure it works correctly in all cases. Some interesting test cases which should be tested against:

How do both of these act when upper == lower, are they both inclusive bounds or exclusive?
I assume from your usage, the quantization and transforming to uniform works but we would needs tests to confirm that this functionality will always work as intended.

I think the main doc string for both classes could do with a simple line describing that bounds can be set. You can see how the docstrings are rendered here.

ConfigSpace/hyperparameters.pyx

…d border case, correcting representation

Deathn0t · 2021-09-14T15:00:00Z

Hello @eddiebergman,

I think I took into consideration all your comments. I added the test in the test_hyperparameters.py file. All tests are passing for me with pytest. Let me know if everything is ok.

codecov · 2021-09-14T18:14:56Z

Codecov Report

Merging #188 (f4494e0) into master (5e8acfd) will decrease coverage by 1.88%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master     #188      +/-   ##
==========================================
- Coverage   68.22%   66.33%   -1.89%     
==========================================
  Files          18       17       -1     
  Lines        1775     1613     -162     
==========================================
- Hits         1211     1070     -141     
+ Misses        564      543      -21

Impacted Files	Coverage Δ
ConfigSpace/read_and_write/irace.py

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5e8acfd...f4494e0. Read the comment docs.

eddiebergman · 2021-09-14T18:21:07Z

Hi @Deathn0t,

All looks good to me, thanks for adding the tests! The only last small change required is that one of the tests pre-commit failed. This is just a code standard checker for which we use flake8.

You can either manually fix this and push, I'll rerun the tests and you can see if anything else needs to be fixed. Alternatively, to run this check locally.

# Using pre-commit
pip install pre-commit
pre-commit run --all-files

You can also just use a flake8 linter if it's readily available in whatever editor you use, the results should be the same.

@mfeurer if you want to have a final look, please do, otherwise I'm happy with the changes.

mfeurer

Hey, thanks a lot for the PR. I just also had a look and think there are a few things missing:

test for sampling from the truncated normal distribution
you need to adapt the get_neighbors functions to take the bounds into account as well
please also see my inline comments

setup.py

ConfigSpace/hyperparameters.pyx

mfeurer · 2021-09-14T19:16:26Z

test/test_hyperparameters.py

+        self.assertEqual(
+            "param, Type: NormalFloat, Mu: 5.0 Sigma: 10.0, Range: [0.1, 10.0], Default: 5.0, on log-scale, Q: 0.1", str(f6))
+
+        self.assertNotEqual(f1, f2)


This is new duplicated.

I am sorry, I don't see the duplicated? I followed the previous test cases with f4, f4_ which was already there.

Co-authored-by: Matthias Feurer <lists@matthiasfeurer.de>

…d test cases

Deathn0t · 2021-09-15T07:55:48Z

I added this test for the sampling:

    def test_sample_NormalFloatHyperparameter_with_bounds(self):
        hp = NormalFloatHyperparameter("nfhp", 0, 1, lower=-3, upper=3)

        def actual_test():
            rs = np.random.RandomState(1)
            counts_per_bin = [0 for i in range(11)]
            for i in range(100000):
                value = hp.sample(rs)
                index = min(max(int((np.round(value + 0.5)) + 5), 0), 9)
                counts_per_bin[index] += 1

            self.assertEqual([0, 0, 0, 2184, 13752, 34078, 34139, 13669,
                              2178, 0, 0], counts_per_bin)

            self.assertIsInstance(value, float)
            return counts_per_bin

        self.assertEqual(actual_test(), actual_test())

Deathn0t · 2021-09-15T08:27:23Z

Hello @mfeurer and @eddiebergman, I think I answered all the comments.

While editing the get_neighbors function I found it quite inconsistent... especially when comparing to other types of hyperparameters. In the case of normal hyperparameters having a "neighbor" based on the variance of the distribution is strange no? The neighbor can be a value out of the bounds returned by the function to_uniform for example.

I was not able to find a documentation of the expected behaviour for this function.

Deathn0t · 2021-09-21T06:51:55Z

Hello, any update on this @eddiebergman @mfeurer?

eddiebergman · 2021-09-21T09:22:32Z

Hello, any update on this @eddiebergman @mfeurer?

Hi @Deathn0t,

Sorry for the delay and thanks for pinging us again on it. I reviewed the PR and I am happy with the changes. @mfeurer will be available later this week or next and I don't want to merge without a final look over by him. Apologies again for this delay.

In the meantime I dug a little deeper and there are some discussion points if you'd like to have a future say in how we handle some inconsistencies you brought to our attention.

While editing the get_neighbors function I found it quite inconsistent... especially when comparing to other types of hyperparameters. In the case of normal hyperparameters having a "neighbor" based on the variance of the distribution is strange no? The neighbor can be a value out of the bounds returned by the function to_uniform for example.

I was not able to find a documentation of the expected behaviour for this function.

get_neighbors - Seems to n parameter values that are neighbors to the current value, by some definition of neighbor or close.
- This function for NormalFloatHypereparameter as it stands has an odd definition of neighbors, it just creates a normal distribution centered on whatever value is passed and then draws n neighbors from that distribution.
  neighbours = [ rs.normal(value, self.sigma) for _ in range(n) ].
- I'm not sure of my own definition but to me it would make more sense to have this sample from some reduced sigma, i.e.
  neighbours = [ rs.normal(value, self.sigma / 3) for _ in range(n) ], where 3 is arbitrary and a more reasonable value should be used.
- What would your definition of a neighbor be?
- The bounding method you use creates a non-normal distribution. For example, consider the case where lower and upper are close to value (mean) but where self.sigma is quite wide. This bounding would result in a large amount of neighbors with the same value as lower or upper. I don't really know how to tackle this to be honest. Would users still expect neighbors to be normally distributed within these bounds or be aware of the fact that with tight bounds, most of the neighbors will be on the boundary.
```
            new_value = rs.normal(value, self.sigma)
            if self.lower is not None and self.upper is not None:
                new_value = min(max(new_value, self.lower), self.upper)
```
to_uniform - Well, converts a normally distributed parameter to a uniform one.
- You're right, NormalFloatHyperparameter.get_neighbors could return a value outside of the UniformFloatHypereparameter that is created by to_uniform. While for our own use cases, I don't think this is an issue (maybe?) I could see how this might cause unexpected behaviors.
- I'm not sure there's a fully sound solution to this but please correct me and suggest if I am wrong here.
  - A normal distribution is technically unbounded in the values it could return. (without it being specified by params as you did)
  - A uniform distribution is bounded (or at least in how we have it implemeneted)
  - There is no fully safe way to specify a boundary on the normal distribution and so how can it be set for the uniform distribution.
  - Without a formally correct way then in practice we have to pick some value. I believe mu +- 3*sigma is reasonable but if you have other suggestions, do let us know.

Deathn0t · 2021-10-18T09:01:08Z

Hello @eddiebergman @mfeurer sorry to ping you again about this PR but do you know when it could be integrated? I am wondering if I should fork and build a different wheel or wait for the integration.

@eddiebergman I did not have time to think very deeply about the neighbour generation but for truncated normal we could also directly sample from the truncated distribution instead of applying the min(max(...)..) to maybe avoid duplication.

eddiebergman · 2021-10-18T09:30:51Z

Hi @Deathn0t ,

Apologies once again, @mfeurer has been quite busy and will not be available for some time.

I'm going to merge this seeing as the tests pass. I will also raise an issue with the remaining points as I think there may be some unexpected outputs with the get_neighbours and to_uniform.

pbalapra and others added 6 commits July 25, 2021 23:20

adding trucated normal support

d6935cf

adding trucated normal support

9fd03aa

updating the documentation

65432b5

edited setup.py with scipy

99bbff1

minor editions to improve integration of truncated normal

c7abc32

removing dublicated definition of public parameters lower/upper

8ea0182

eddiebergman reviewed Sep 1, 2021

View reviewed changes

ConfigSpace/hyperparameters.pyx Outdated Show resolved Hide resolved

ConfigSpace/hyperparameters.pyx Show resolved Hide resolved

eddiebergman requested changes Sep 1, 2021

View reviewed changes

ConfigSpace/hyperparameters.pyx Show resolved Hide resolved

ConfigSpace/hyperparameters.pyx Outdated Show resolved Hide resolved

ConfigSpace/hyperparameters.pyx Outdated Show resolved Hide resolved

improving documentation consistency, adding tests for quantization an…

fa2ea3c

…d border case, correcting representation

mfeurer requested changes Sep 14, 2021

View reviewed changes

Deathn0t and others added 2 commits September 15, 2021 09:13

Update setup.py

7b4781b

Co-authored-by: Matthias Feurer <lists@matthiasfeurer.de>

fixed NormalInteger with truncated normal conversion to integer, adde…

f1f290a

…d test cases

Deathn0t added 2 commits September 15, 2021 10:19

adaptation of get_neighbors with tests

d4b306f

pre-commit tests passing

f4494e0

eddiebergman merged commit e9573c1 into automl:master Oct 18, 2021

github-actions bot pushed a commit that referenced this pull request Oct 18, 2021

Prasanna Balaprakash: support for truncated normal distribution (#188)

b2f0399

eddiebergman mentioned this pull request Oct 18, 2021

Truncated Normal Distribution, get_neighbours and to_uniform #200

Open

eddiebergman mentioned this pull request Feb 2, 2022

Maint: Update changelog and version to 0.4.21 #233

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for truncated normal distribution #188

support for truncated normal distribution #188

pbalapra commented Jul 26, 2021

Deathn0t commented Sep 1, 2021

eddiebergman left a comment •

edited

Loading

Deathn0t commented Sep 14, 2021

codecov bot commented Sep 14, 2021 •

edited

Loading

eddiebergman commented Sep 14, 2021

mfeurer left a comment

mfeurer Sep 14, 2021

Deathn0t Sep 15, 2021

Deathn0t commented Sep 15, 2021

Deathn0t commented Sep 15, 2021 •

edited

Loading

Deathn0t commented Sep 21, 2021

eddiebergman commented Sep 21, 2021

Deathn0t commented Oct 18, 2021

eddiebergman commented Oct 18, 2021

support for truncated normal distribution #188

support for truncated normal distribution #188

Conversation

pbalapra commented Jul 26, 2021

Deathn0t commented Sep 1, 2021

eddiebergman left a comment • edited Loading

Choose a reason for hiding this comment

Deathn0t commented Sep 14, 2021

codecov bot commented Sep 14, 2021 • edited Loading

Codecov Report

eddiebergman commented Sep 14, 2021

mfeurer left a comment

Choose a reason for hiding this comment

mfeurer Sep 14, 2021

Choose a reason for hiding this comment

Deathn0t Sep 15, 2021

Choose a reason for hiding this comment

Deathn0t commented Sep 15, 2021

Deathn0t commented Sep 15, 2021 • edited Loading

Deathn0t commented Sep 21, 2021

eddiebergman commented Sep 21, 2021

Deathn0t commented Oct 18, 2021

eddiebergman commented Oct 18, 2021

eddiebergman left a comment •

edited

Loading

codecov bot commented Sep 14, 2021 •

edited

Loading

Deathn0t commented Sep 15, 2021 •

edited

Loading