Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: auto-fetching (initial iteration) #196

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

feat: auto-fetching (initial iteration) #196

wants to merge 10 commits into from

Conversation

Berkmann18
Copy link
Member

@Berkmann18 Berkmann18 commented Jul 17, 2019

What:
Adds an auto-fetching mechanism accessible via the fetch command (which requires a PRIVATE_TOKEN env. variable to be set) like #186 but with a cleaner PR that isn't too far from the master branch (which made the other PR hard to rebase).

As noted in all-contributors/all-contributors#18, not all 27 categories can be picked from a GH repo alone (cf. #186 for a table listing what it was able to detect so far).

Why:
To resolve #117 and partly all-contributors/all-contributors#18 (TL;DR: auto adding contributors from a repo).
Re mntnr/name-your-contributors#45

How:
Using name-your-contributors and ac-learn.

Checklist:

  • Documentation
  • Tests
  • Ready to be merged

For some reason the data read by fs.readFileSync(configPath, 'utf-8') in ./util/config-file.js sometimes ends up being 0 which screws up the adding process.
Cf. #187

TODO:

  • Use name-your-contributors@1.9.0 (or later if I find a way to get it to return a list of PR titles and issue titles for each prCreators and issueCreators respectively without having to get the full output and doing some of the summarisation here).
  • Look at the labels array for prCreators (only keeping those with state = 'MERGED if looking at the full output with pullRequests in it instead of the summarised one) objects, then feed that to the learner (similar or better than how it's done for issues).
  • Look at the title field of pullRequests objects (or what comes out of 1.) and try to parse it to get info on the PR (it will be assumed to be code by default).

@jdalrymple
Copy link
Contributor

For the error, do you have a specific testing procedure you would use to get the weird behaviour?

@Berkmann18
Copy link
Member Author

@jdalrymple No, other than what I wrote on #187, it's basically just running the fetch command and watching the output and debugging the relevant files/values.

@jdalrymple
Copy link
Contributor

I think ive fixed this up, check out the latest PR!

Berkmann18 added a commit that referenced this pull request May 24, 2020
* fix: Adjusting file access that was clashing with uncontrolled promises

* review: Adding code review changing and linting

* fix(cli): rectify an async call

Co-authored-by: Maximilian Berkmann <maxieberkmann@gmail.com>
@Berkmann18
Copy link
Member Author

Note (to anyone stumbling upon this PR): the ML model isn't well equipped, as the dataset used is still quite imbalanced (fortunately, not as much as before) so results may not be amazing from the get-go.
I'll try to see if I can improve it while making progress on this feature and ship it.
So if you happen to have datasets with GitHub/Bitbucket/GitLab/... labels then check out all-contributors/ac-learn#37 and help.
If you have better models or feature extractors (or anything that could help) in mind then feel free to submit a PR at https://github.com/all-contributors/ac-learn.

@tenshiAMD
Copy link
Member

@Berkmann18 @gr2m this one is the last stale milestone. Can you guide me on what are the missing details? so we can complete this one. Thanks! 🎉

@Berkmann18 Berkmann18 requested a review from a team as a code owner November 1, 2022 22:10
@Berkmann18
Copy link
Member Author

@Berkmann18 @gr2m this one is the last stale milestone. Can you guide me on what are the missing details? so we can complete this one. Thanks! tada

Well, other than solidifying the ML model, it would be to test the command (see if #187 doesn't occur for you) and see if on repos (dummy or real ones), the contributors/type list is fetched properly and look correct.
I've got two busy weekends ahead and don't know how much my evenings will be, but I'll see what's left to do.

jdalrymple and others added 5 commits July 23, 2023 15:54
* fix: Adjusting file access that was clashing with uncontrolled promises

* review: Adding code review changing and linting

* fix(cli): rectify an async call

Co-authored-by: Maximilian Berkmann <maxieberkmann@gmail.com>
and some visibly forgotten DidYouMean stuff from the person who added that feature
@Berkmann18 Berkmann18 changed the title WIP feat: auto-fetching feat: auto-fetching (initial iteration) Jul 23, 2023
@@ -0,0 +1,28 @@
const {existsSync} = require('fs')
const Learner = require('ac-learn')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm -1 on directly including a machine learning approach such as ac-learn for a couple reasons:

  • It's not 100% predictable, only high 90s at best. For dev tooling I personally prefer simpler, predictable patterns that can be debugged & changed more directly & transparently
  • It requires more compute & storage per-repository (see the large learner.json file here)

It feels ... unusual? to me to start off by including the more heavyweight, targeted approach directly in the CLI (or at least its repository).

From my (admittedly not very informed) perspective, it would be nice to see a bit more trying it out before fully onboarding. Could we add it to the docs first as a third party approach? See existing repos try it out successfully?

By the way, sorry for not posting serious thoughts on this till now 😞. I'd been meaning to say something and it slipped my mind. But I'm very up for discussing more - and am not very high confidence that what I'm saying is at all reasonable!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see where you're coming from and initially started that feature with some non-ML pattern matching and stuff, which were... okay but not scalable enough to support enough of the contributions.
I guess the ML-driven categorisation could be put behind a command-line flag (like a feature flag) and when it's off it will only categorise the minimal set of contribution types (like review, code, bug and docs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
All Contributors Kanban
  
In progress
Development

Successfully merging this pull request may close these issues.

Fetch or auto-discover contributors (auto-generate)
4 participants