This repository was archived by the owner on Jan 19, 2025. It is now read-only.
feat(parser): differentiate between required and optional parameters using statistical hypothesis testing #913
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #912.
Closes #842.
Closes #828.
Closes #822.
Summary of Changes
By default we want to make a parameter constant, provided there is only one value and it's a literal. This part is unchanged.
If we cannot make the parameter constant, we now try to make a parameter required. If the most common value is not a literal, we always do so. Otherwise, we check how often the most common and the second most common value occur. Our null hypothesis is that they are equally likely to be chosen. Unless this hypothesis is rejected, we make the parameter required. If the hypothesis is rejected, we make the parameter optional. We reject the hypothesis if the probability that the observed values or even more extreme differences are produced, assuming our null hypothesis is true, is at most 5%.
On the
sklearn
data this produces only slightly different results than the previous approach:Compared to the previous approach we now make a few very rarely used parameters (on very rarely used functions) required.
The new approach is quite a bit slower than the old one. Generating annotations for
sklearn
now takes around 17s compared to the previous 8s. However, we now have a scientific basis compared to the ad-hoc approach we had previously.The significance level (currently 5%) might be adjusted later after more practical testing.