Skip to content

Fixed DT's nondeterministic behavior

Pre-release
Pre-release

Choose a tag to compare

@detsutut detsutut released this 21 Oct 14:21
· 19 commits to master since this release

As described in scikit-learn/scikit-learn#8443, the Scikit-Learn implementation for the decision tree algorithm by default is not deterministic as it should be. This is due to a design choice here](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_splitter.pyx#L381-L401) where even if max_features = n_features, the algorithm still randomly samples up to max_features.
To address this unexpected behavior, the internal random state of the DecisionTreeClassifier in Araucana has been fixed to 1. The global randomness of Araucana (e.g., during oversampling) can still be controlled with the 'seed' parameter.