feat(parser): differentiate between required and optional parameters using statistical hypothesis testing #913

lars-reimann · 2022-07-04T10:04:51Z

Closes #912.
Closes #842.
Closes #828.
Closes #822.

Summary of Changes

By default we want to make a parameter constant, provided there is only one value and it's a literal. This part is unchanged.

If we cannot make the parameter constant, we now try to make a parameter required. If the most common value is not a literal, we always do so. Otherwise, we check how often the most common and the second most common value occur. Our null hypothesis is that they are equally likely to be chosen. Unless this hypothesis is rejected, we make the parameter required. If the hypothesis is rejected, we make the parameter optional. We reject the hypothesis if the probability that the observed values or even more extreme differences are produced, assuming our null hypothesis is true, is at most 5%.

On the sklearn data this produces only slightly different results than the previous approach:

Variant	Old Count	New Count
constant	1038	1038
required	1024	1091
optional	1121	1054

Compared to the previous approach we now make a few very rarely used parameters (on very rarely used functions) required.

The new approach is quite a bit slower than the old one. Generating annotations for sklearn now takes around 17s compared to the previous 8s. However, we now have a scientific basis compared to the ad-hoc approach we had previously.

The significance level (currently 5%) might be adjusted later after more practical testing.

…using statistical hypothesis testing

github-actions · 2022-07-04T10:08:11Z

MegaLinter status: ✅ SUCCESS

Descriptor	Linter	Files	Fixed	Errors	Elapsed time
✅ CREDENTIALS	secretlint	yes		no	3.36s
✅ GIT	git_diff	yes		no	0.01s
✅ JSON	eslint-plugin-jsonc	4	0	0	2.99s
✅ JSON	jsonlint	4		0	1.46s
✅ JSON	prettier	4	0	0	1.89s
✅ JSON	v8r	4		0	2.63s
✅ PYTHON	bandit	1		0	0.24s
✅ PYTHON	black	1	0	0	0.29s
✅ PYTHON	flake8	1		0	0.36s
✅ PYTHON	isort	1	0	0	0.13s
✅ PYTHON	mypy	1		0	7.55s
✅ PYTHON	pylint	1		0	1.4s

See errors details in artifact MegaLinter reports on CI Job page
Set VALIDATE_ALL_CODEBASE: true in mega-linter.yml to validate all sources, not only the diff

github-actions · 2022-07-04T10:26:03Z

🎉 This PR is included in version 1.41.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

lars-reimann added 4 commits July 4, 2022 11:17

test(parser): more test cases

9103a1e

feat(parser): differentiate between required and optional parameters …

7e4eb24

…using statistical hypothesis testing

chore(data): regenerate annotations

e4491a4

test(parser): update tests to new approach

10f0d40

lars-reimann linked an issue Jul 4, 2022 that may be closed by this pull request

Improve differentiation between required and optional #912

Closed

3 tasks

build(parser): fix required python version

96fa09c

lars-reimann force-pushed the 912-improve-differentiation-between-required-and-optional branch from e7da2e2 to 96fa09c Compare July 4, 2022 10:08

style: apply automatic fixes of linters

feeb563

lars-reimann merged commit 6906f29 into main Jul 4, 2022

lars-reimann deleted the 912-improve-differentiation-between-required-and-optional branch July 4, 2022 10:22

github-actions bot added the released Included in a release label Jul 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(parser): differentiate between required and optional parameters using statistical hypothesis testing #913

feat(parser): differentiate between required and optional parameters using statistical hypothesis testing #913

Uh oh!

lars-reimann commented Jul 4, 2022 •

edited

Loading

Uh oh!

github-actions bot commented Jul 4, 2022 •

edited

Loading

Uh oh!

github-actions bot commented Jul 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat(parser): differentiate between required and optional parameters using statistical hypothesis testing #913

feat(parser): differentiate between required and optional parameters using statistical hypothesis testing #913

Uh oh!

Conversation

lars-reimann commented Jul 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of Changes

Uh oh!

github-actions bot commented Jul 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

MegaLinter status: ✅ SUCCESS

Uh oh!

github-actions bot commented Jul 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lars-reimann commented Jul 4, 2022 •

edited

Loading

github-actions bot commented Jul 4, 2022 •

edited

Loading