Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: auto-fetching #186

Closed
wants to merge 30 commits into from
Closed

feat: auto-fetching #186

wants to merge 30 commits into from

Conversation

Berkmann18
Copy link
Member

@Berkmann18 Berkmann18 commented May 21, 2019

What:
Adds an auto-fetching mechanism accessible via the fetch command (which requires a PRIVATE_TOKEN env. variable to be set).

As noted in all-contributors/all-contributors#18, not all 27 categories can be picked from a GH repo alone so here's what this PR can handle and (roughly) how well;

category status
blog N/A
bug WIP
business N/A
code TBD
content N/A
design N/A
doc N/A
eventOrganizing N/A
example N/A
financial N/A
fundingFinding N/A
ideas TBD
infra N/A
maintenance N/A
platform N/A
plugin N/A
projectManagement N/A
question TBD
review Nearly done
security WIP
talk N/A
test N/A
tool N/A
translation N/A
tutorial N/A
userTesting N/A
video N/A

Why:
To resolve #117 and partly all-contributors/all-contributors#18 (TL;DR: auto adding contributors from a repo).
Re mntnr/name-your-contributors#45

How:
Using name-your-contributors and ac-learn.

Checklist:

  • Documentation
  • Tests
  • Ready to be merged

Added `name-your-contributors` for the upcoming auto-fetching feature
And started improving the auto-fetching
Went for a string comparison approach because RE seemed ineffective and NLP being overkill for
matching repo labels to categories.
Re-arranged some label lists/dicts and added more edge case handling
(when the label is not categorisable or when it's similar to one of the
exceptions).
Added a dataset of labels with categories with the files to test and use them.
The `labels` dataset is more robust (although a long way from being
robust enough), there's **finally** a test case checking how well
`findBestCategory` does in regards to the dataset
- Added a tokenizer
- Improved the dataset (might not be the last time)
- Refactored the category finder and the tests (to be done again)
Improved `findCategory` and `labels`.
More labels to work with, a more correct **category finder** (from ~55% accuracy to ~79.5%).
And improved the `tokenizer`
Refactored `token`, tweaked labels and added exceptions and commented out some code (82.025%
accuracy :party_hat:)
The accuracy of `findCategory` is now at 95.696% (:hooray:).
Added a learner component utilising `ac-learn` with a saved classifier (`learner.json`). Added the
missing `fetch` option in the prompt choices and improved the fetching process
@Berkmann18
Copy link
Member Author

Berkmann18 commented May 21, 2019

For some reason the data read by fs.readFileSync(configPath, 'utf-8') in ./util/config-file.js sometimes ends up being 0 which screws up the adding process.
Cf. #187

Changed the contributor adding process within the `fetch` command function and added rejection
handlers where appropriate
... to make it work with `name-your-contributors` which requires `node >= 10.0`
Removed the files that are used for the `nyc` branch (which were in fact useless) and updated the fetching steps in `cli`
Berkmann18 added a commit that referenced this pull request Jul 17, 2019
Berkmann18 added a commit that referenced this pull request Jul 17, 2019
@Berkmann18
Copy link
Member Author

Closing this in favour of #196

@Berkmann18 Berkmann18 closed this Jul 17, 2019
All Contributors Kanban automation moved this from In progress to Done Jul 17, 2019
@Berkmann18 Berkmann18 removed this from Done in All Contributors Kanban Jul 17, 2019
Berkmann18 added a commit to Berkmann18/all-contributors-cli that referenced this pull request Oct 3, 2019
@Berkmann18 Berkmann18 deleted the ml branch May 24, 2020 01:23
Berkmann18 pushed a commit that referenced this pull request May 24, 2020
* added function to overwrite incorrect data with know data

* updated test snapshots

* update snaps

* lockfile
Berkmann18 added a commit that referenced this pull request Jul 23, 2023
Berkmann18 added a commit that referenced this pull request Jul 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fetch or auto-discover contributors (auto-generate)
1 participant