feat: auto-fetching (initial iteration) #196

Berkmann18 · 2019-07-17T10:47:50Z

What:
Adds an auto-fetching mechanism accessible via the fetch command (which requires a PRIVATE_TOKEN env. variable to be set) like #186 but with a cleaner PR that isn't too far from the master branch (which made the other PR hard to rebase).

As noted in all-contributors/all-contributors#18, not all 27 categories can be picked from a GH repo alone (cf. #186 for a table listing what it was able to detect so far).

Why:
To resolve #117 and partly all-contributors/all-contributors#18 (TL;DR: auto adding contributors from a repo).
Re mntnr/name-your-contributors#45

How:
Using name-your-contributors and ac-learn.

Checklist:

Documentation
Tests
Ready to be merged

For some reason the data read by fs.readFileSync(configPath, 'utf-8') in ./util/config-file.js sometimes ends up being 0 which screws up the adding process.
Cf. #187

TODO:

Use name-your-contributors@1.9.0 (or later if I find a way to get it to return a list of PR titles and issue titles for each prCreators and issueCreators respectively without having to get the full output and doing some of the summarisation here).
Look at the labels array for prCreators (only keeping those with state = 'MERGED if looking at the full output with pullRequests in it instead of the summarised one) objects, then feed that to the learner (similar or better than how it's done for issues).
Look at the title field of pullRequests objects (or what comes out of 1.) and try to parse it to get info on the PR (it will be assumed to be code by default).

jdalrymple · 2020-04-20T16:58:39Z

For the error, do you have a specific testing procedure you would use to get the weird behaviour?

Berkmann18 · 2020-04-20T17:32:55Z

@jdalrymple No, other than what I wrote on #187, it's basically just running the fetch command and watching the output and debugging the relevant files/values.

jdalrymple · 2020-04-20T23:47:49Z

I think ive fixed this up, check out the latest PR!

* fix: Adjusting file access that was clashing with uncontrolled promises * review: Adding code review changing and linting * fix(cli): rectify an async call Co-authored-by: Maximilian Berkmann <maxieberkmann@gmail.com>

Berkmann18 · 2020-05-26T01:17:35Z

Note (to anyone stumbling upon this PR): the ML model isn't well equipped, as the dataset used is still quite imbalanced (fortunately, not as much as before) so results may not be amazing from the get-go.
I'll try to see if I can improve it while making progress on this feature and ship it.
So if you happen to have datasets with GitHub/Bitbucket/GitLab/... labels then check out all-contributors/ac-learn#37 and help.
If you have better models or feature extractors (or anything that could help) in mind then feel free to submit a PR at https://github.com/all-contributors/ac-learn.

tenshiAMD · 2022-10-06T19:18:20Z

@Berkmann18 @gr2m this one is the last stale milestone. Can you guide me on what are the missing details? so we can complete this one. Thanks! 🎉

Berkmann18 · 2022-11-01T22:18:40Z

@Berkmann18 @gr2m this one is the last stale milestone. Can you guide me on what are the missing details? so we can complete this one. Thanks! tada

Well, other than solidifying the ML model, it would be to test the command (see if #187 doesn't occur for you) and see if on repos (dummy or real ones), the contributors/type list is fetched properly and look correct.
I've got two busy weekends ahead and don't know how much my evenings will be, but I'll see what's left to do.

For more info, please check #186

* fix: Adjusting file access that was clashing with uncontrolled promises * review: Adding code review changing and linting * fix(cli): rectify an async call Co-authored-by: Maximilian Berkmann <maxieberkmann@gmail.com>

and some visibly forgotten DidYouMean stuff from the person who added that feature

JoshuaKGoldberg · 2023-07-25T19:48:09Z

src/discover/learner.js

@@ -0,0 +1,28 @@
+const {existsSync} = require('fs')
+const Learner = require('ac-learn')


I'm -1 on directly including a machine learning approach such as ac-learn for a couple reasons:

It's not 100% predictable, only high 90s at best. For dev tooling I personally prefer simpler, predictable patterns that can be debugged & changed more directly & transparently

It requires more compute & storage per-repository (see the large learner.json file here)

It feels ... unusual? to me to start off by including the more heavyweight, targeted approach directly in the CLI (or at least its repository).

From my (admittedly not very informed) perspective, it would be nice to see a bit more trying it out before fully onboarding. Could we add it to the docs first as a third party approach? See existing repos try it out successfully?

By the way, sorry for not posting serious thoughts on this till now 😞. I'd been meaning to say something and it slipped my mind. But I'm very up for discussing more - and am not very high confidence that what I'm saying is at all reasonable!

I see where you're coming from and initially started that feature with some non-ML pattern matching and stuff, which were... okay but not scalable enough to support enough of the contributions.
I guess the ML-driven categorisation could be put behind a command-line flag (like a feature flag) and when it's off it will only categorise the minimal set of contribution types (like review, code, bug and docs).

Berkmann18 changed the title Dl feat: auto-fetching Jul 17, 2019

Berkmann18 mentioned this pull request Jul 17, 2019

feat: auto-fetching #186

Closed

3 tasks

Berkmann18 added enhancement priority: high status: in progress labels Jul 17, 2019

Berkmann18 added this to In progress in All Contributors Kanban via automation Jul 17, 2019

mrchief mentioned this pull request Oct 3, 2019

Fetch or auto-discover contributors (auto-generate) #117

Open

Berkmann18 marked this pull request as ready for review October 3, 2019 11:46

Berkmann18 added a commit to Berkmann18/all-contributors-cli that referenced this pull request Oct 3, 2019

chore(package): added packages needed in all-contributors#196

af22235

Berkmann18 mentioned this pull request Nov 22, 2019

Add option to directly add all contributors with reason "code" all-contributors/all-contributors#311

Closed

Andre601 mentioned this pull request Nov 23, 2019

Only have one major branch for contributor-PRs all-contributors/app#261

Closed

jdalrymple mentioned this pull request Apr 20, 2020

feat: auto-fetching - continuation of #196 #259

Merged

Berkmann18 mentioned this pull request Nov 3, 2020

There has to be a way to automate this all-contributors/all-contributors#18

Open

Berkmann18 mentioned this pull request Mar 22, 2021

Path to the new All Contributors all-contributors/architecture#3

Open

13 tasks

Berkmann18 mentioned this pull request Apr 8, 2021

[RFC] Webapp to add/manage contributor table all-contributors/all-contributors#243

Open

Berkmann18 mentioned this pull request Jul 25, 2021

We want to recognized all old contributors all-contributors/all-contributors#539

Open

tenshiAMD added the pinned label Sep 6, 2022

tenshiAMD changed the title ~~feat: auto-fetching~~ WIP feat: auto-fetching Sep 22, 2022

Berkmann18 requested a review from a team as a code owner November 1, 2022 22:10

Berkmann18 force-pushed the dl branch from b33585b to e117cc7 Compare July 15, 2023 15:09

Berkmann18 added 3 commits July 23, 2023 15:50

chore(package): added packages needed in #186

cdd00de

feat: added the discover module + fetch command

06cebab

For more info, please check #186

chore(discover): re-add missing model

feefb2a

jdalrymple and others added 5 commits July 23, 2023 15:54

feat: auto-fetching - continuation of #196 (#259)

f54703d

* fix: Adjusting file access that was clashing with uncontrolled promises * review: Adding code review changing and linting * fix(cli): rectify an async call Co-authored-by: Maximilian Berkmann <maxieberkmann@gmail.com>

chore(deps): update ac-learn, NYC and clui

a6c8b0d

chore(deps): Bump NYC and babel/runtime

e21f46e

feat(discover): updated the learner + look at issue titles

ace4970

and some visibly forgotten DidYouMean stuff from the person who added that feature

feat(cli): include pr based contributions

b0e1213

Berkmann18 force-pushed the dl branch from 2f3e4dd to b0e1213 Compare July 23, 2023 14:54

Berkmann18 changed the title ~~WIP feat: auto-fetching~~ feat: auto-fetching (initial iteration) Jul 23, 2023

Berkmann18 mentioned this pull request Jul 23, 2023

Adding contributors wreck #187

Closed

refactor: clean-up

fb14296

JoshuaKGoldberg reviewed Jul 25, 2023

View reviewed changes

Merge branch 'master' into dl

b36ac3c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: auto-fetching (initial iteration) #196

feat: auto-fetching (initial iteration) #196

Berkmann18 commented Jul 17, 2019 •

edited

jdalrymple commented Apr 20, 2020

Berkmann18 commented Apr 20, 2020

jdalrymple commented Apr 20, 2020

Berkmann18 commented May 26, 2020

tenshiAMD commented Oct 6, 2022

Berkmann18 commented Nov 1, 2022

JoshuaKGoldberg Jul 25, 2023

Berkmann18 Jul 28, 2023

		@@ -0,0 +1,28 @@
		const {existsSync} = require('fs')
		const Learner = require('ac-learn')

feat: auto-fetching (initial iteration) #196

Are you sure you want to change the base?

feat: auto-fetching (initial iteration) #196

Conversation

Berkmann18 commented Jul 17, 2019 • edited

jdalrymple commented Apr 20, 2020

Berkmann18 commented Apr 20, 2020

jdalrymple commented Apr 20, 2020

Berkmann18 commented May 26, 2020

tenshiAMD commented Oct 6, 2022

Berkmann18 commented Nov 1, 2022

JoshuaKGoldberg Jul 25, 2023

Choose a reason for hiding this comment

Berkmann18 Jul 28, 2023

Choose a reason for hiding this comment

Berkmann18 commented Jul 17, 2019 •

edited