Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch or auto-discover contributors (auto-generate) #117

Open
mrchief opened this issue Oct 29, 2018 · 40 comments · Fixed by #259 · May be fixed by #196
Open

Fetch or auto-discover contributors (auto-generate) #117

mrchief opened this issue Oct 29, 2018 · 40 comments · Fixed by #259 · May be fixed by #196

Comments

@mrchief
Copy link

mrchief commented Oct 29, 2018

  • all-contributors-cli version: 5.4.1

Problem description:

Currently, add let's you manually add contributors. However, for existing projects, it may be tedious and error prone to follow this manual approach; thereby creating a high barrier of entry for anyone who wants to start implementing the spec.

Suggested solution:

It'd be great to have a way of auto-discovering this information. E.g.

  • commmitters -> code
  • issue creators -> bug
  • pull requests -> code

Github's contributors API can help discover the first item; while issues API can take care of the remaining two.

If there is interest in adding this, I can help send a PR.

@kentcdodds
Copy link
Collaborator

Unless I'm mistaken, you can use check for this

@mrchief
Copy link
Author

mrchief commented Nov 7, 2018

Can check discover the ones not listed in the rc file?

@Berkmann18
Copy link
Member

Berkmann18 commented Jan 15, 2019

@mrchief check will compare what you have in your .all-contributorsrc file with the list of contributors on the repo and then it will tell you who's missing.

As far as I'm aware there's no way to add multiple people in one go (@jakebolam ?) but it's an interesting feature to have.

@mrchief
Copy link
Author

mrchief commented Jan 15, 2019

@Berkmann18 Right, that was my understanding too. Adding multiple people (either manually or via discovery) lowers the bar of entry for existing packages. In a fairly active repo, adding everyone one by one manually can be daunting (first time setup).

@Berkmann18
Copy link
Member

@mrchief It's perfectly understandable, would you like to submit a PR for this?

@mrchief
Copy link
Author

mrchief commented Jan 15, 2019

Sure I'll see what I can do. Progress may be a bit slow as I'm swamped at work right now.

@Berkmann18
Copy link
Member

@mrchief No problem.

@mrchief
Copy link
Author

mrchief commented Jan 16, 2019

@Berkmann18 Any preference on how to structure this? I was thinking either:

  • add new methods fetchCodeContributors and fetchBugContributors to repo/github.js
  • or add a new folder, say, discover and then add the new methods in github.js (and later gitlab.js`).

And then surface them like other commands. Sound about right?

@mrchief
Copy link
Author

mrchief commented Jan 16, 2019

Also, I see that you already have getContributors which is used in check flow. I think this is what @kentcdodds was alluding to?

If so, I can use it in the new discover command to get the list of code committers. And then all that is left is to fetch issue list, filter out PR authors and you'll have your bug contributors.

Am I missing anything?

@jakebolam
Copy link
Collaborator

Yes I believe so!

You may also start running into rate limiting #121 #53. We may want to consider something like #69 (and the recommending the #122) for large projects first setup.

@jakebolam
Copy link
Collaborator

Any ideas on how the flow would work for the bot? https://github.com/all-contributors/all-contributors-bot

@mrchief
Copy link
Author

mrchief commented Jan 16, 2019

@jakebolam Yeah, I'm well aware of rate limiting as I have run into it in the past. I'm using the same approach as getContributorsPage so I guess that should handle it. In fact, it's almost the same function for the most part (and so there is an opportunity to DRY it up a bit). We can discuss more after I push my code, it'll be easier that way.

As far as the bot goes, I think it can leverage the same discovery features and can be activated with something like please add all new contributors. Frankly, I wasn't aware of the bot until today so I'm not all too familiar with it.

@jakebolam
Copy link
Collaborator

Sounds good to me!

Yes that's a great idea for the bot 👍

Definitely challenging to auto-setup for all contributors. But this would be a great base where projects can start from and branch out into.

@Berkmann18
Copy link
Member

Berkmann18 commented Jan 16, 2019

@mrchief

or add a new folder, say, discover and then add the new methods in github.js (and later gitlab.js`)

I think that would be better but I'm open to the other option.

@jakebolam We could always try this out on the CLI for packages with existing contributors and when it works fine, the bot could get this functionality so we could add this in a regression style (considering the bot might have more factors in play).

Related all-contributors#18.

@mrchief
Copy link
Author

mrchief commented Jan 18, 2019

or add a new folder, say, discover and then add the new methods in github.js (and later gitlab.js`)

@Berkmann18, that was my first approach too. But in that case, I see a lot of code duplication between different files (basically request setup and paging logic will be duped).

I settled on repo/github.js as it felt that should be single place to do anything with the repo. In future, it could be refactored to be its own folder with different files (for various API endpoints) and all. Just thinking out loud here.

@Berkmann18
Copy link
Member

@Berkmann18, that was my first approach too. But in that case, I see a lot of code duplication between different files (basically request setup and paging logic will be duped).

That can be dealt with refactoring and a well-structured codebase so it shouldn't be an issue.

@jakebolam jakebolam changed the title Fetch or auto-discover contributors Fetch or auto-discover contributors (auto-generate) Jan 22, 2019
@mrchief
Copy link
Author

mrchief commented Feb 27, 2019

If only github had a way to keep track of issues I'm contributing to... This got dropped from my radar:)

I've been swamped lately and haven't made any progress on this beyond the first few iterations. I'll try to get back to this as soon as possible (hopefully that'll be weeks and not months).

@Berkmann18
Copy link
Member

@mrchief You can still get access to the ones you read you know?
Anyway, nice to see you back at it.

@mrchief
Copy link
Author

mrchief commented Feb 27, 2019

Yes, it's a tedious process checking them and making sure I didn't miss any commitments. I didn't have this problem (of keeping track) until now so didn't put much thought into it. I've started a todo list now. :)

@mrchief
Copy link
Author

mrchief commented May 12, 2019

@Berkmann18 @jakebolam I got something working. I ran a crude test against all-contributors/all-contributors-bot repo (which has these nice labels):

image

and I got this:

image

I'm gonna tidy things up a bit and a send a PR soon.

@Berkmann18 Berkmann18 mentioned this issue May 21, 2019
3 tasks
@Berkmann18 Berkmann18 linked a pull request Jul 17, 2019 that will close this issue
6 tasks
@protoEvangelion
Copy link

@mrchief can you commit the code you used to achieve this so we can see 😻

@protoEvangelion
Copy link

For those who would like a work around, I hacked a workflow together:

To add every one who committed code in one swoop:

npx name-your-contributors --wipe-cache --full -u user -r repo > combined-out.json

Grab that list and save it to a file file.txt with format:

username1 code
username2 code

Then run: cat file.txt | xargs -I % sh -c 'all-contributors add %;'

To get users who opened bug reports, I used Ocktokit:

    const data = await GitHub.paginate(
       GitHub.search.issuesAndPullRequests.endpoint(payload)
    )

    const users = data
            .map(({ user }) => {
                return user.login
            })
            .filter((v, i, a) => a.indexOf(v) === i)
    
    console.log('users', users)

and added them to a text file like and run like above

username1 bug
username2 bug

Add multiple users and contribution types at once

Add them to the text file with comma separated contribution types and run like above

username1 bug,code,security
username2 bug,doc

@Berkmann18
Copy link
Member

@protoEvangelion That looks nice, the problem with this is that there's no auto-fetching and is limited to bug/code contributions.

@protoEvangelion
Copy link

True! It's definitely not ideal ;) Works decent if you label issues well though.

@mrchief
Copy link
Author

mrchief commented Sep 24, 2019

@protoEvangelion This hasn't escaped my mind. I've been awfully busy lately. I'll try to send it across soon.

@ericclemmons
Copy link

I ended up copy/pasting the output of yarn all-contributors check into a file (e.g. add-contributors.sh), split on , , then made each line read like:

# add-contributors.sh
yarn all-contributors name1 code
yarn all-contributors name2 code

Then running:

bash add-contributors.sh

Rate limits apply, so using PRIVATE_TOKEN is necessary:

https://allcontributors.org/docs/en/cli/usage#github-users

@mrchief
Copy link
Author

mrchief commented Oct 3, 2019

@protoEvangelion Been a while so took me a while to piece things together. So it seems I did create a PR for this #184 which got superseded by the work @Berkmann18 was doing in #196.

In case you're interested, my code lives here: https://github.com/mrchief/all-contributors-cli/tree/feat/discover.

@Berkmann18 it seems like it's been a while since this got any movement. Do you want me to pick this up again?

@Berkmann18
Copy link
Member

@mrchief Yes, it's been a while due to a variety of stuff including one issue (#187) which blocked progress and I haven't got around to resolving that (hopefully the break would have been helpful).
As I mentioned to @protoEvangelion, any help is welcome as I was pretty much the only one looking after most of the AC repos lately (+ other projects).

I'll go through the code again tomorrow (or on the weekend) and try to get the PR moving.

In the meantime, I'll be more than happy if anyone would look into what we have so far.

@mrchief
Copy link
Author

mrchief commented Oct 3, 2019

@Berkmann18 Sounds good. I can look it over this weekend. Could you add me to dl branch?

@Berkmann18
Copy link
Member

Berkmann18 commented Oct 3, 2019

@jakebolam @kentcdodds Could either of you do that? I don't seem to have access to the settings for this repo.

@mrchief In the meantime, I've added you to a fork so you can work on it as soon as you can.

@jdalrymple
Copy link
Contributor

For those who would like a work around, I hacked a workflow together:

To add every one who committed code in one swoop:

npx name-your-contributors --wipe-cache --full -u user -r repo > combined-out.json

Grab that list and save it to a file file.txt with format:

username1 code
username2 code

Then run: cat file.txt | xargs -I % sh -c 'all-contributors add %;'

To get users who opened bug reports, I used Ocktokit:

    const data = await GitHub.paginate(
       GitHub.search.issuesAndPullRequests.endpoint(payload)
    )

    const users = data
            .map(({ user }) => {
                return user.login
            })
            .filter((v, i, a) => a.indexOf(v) === i)
    
    console.log('users', users)

and added them to a text file like and run like above

username1 bug
username2 bug

Add multiple users and contribution types at once

Add them to the text file with comma separated contribution types and run like above

username1 bug,code,security
username2 bug,doc

Would be nice if we could convert the output from name-your-contributors to all-contributors format which some basic config to map contribution types :'(

@Berkmann18
Copy link
Member

@jdalrymple That's essentially automatically done in #196 but I haven't had the time to troubleshoot some issues.

@jdalrymple
Copy link
Contributor

jdalrymple commented Apr 20, 2020

@Berkmann18 Could you add the issues you are still having in the PR. Id be happy to take a look 😄

EDIT: is this the problem?

For some reason the data read by fs.readFileSync(configPath, 'utf-8') in ./util/config-file.js sometimes ends up being 0 which screws up the adding process.

@Berkmann18
Copy link
Member

@jdalrymple

EDIT: is this the problem?

For some reason the data read by fs.readFileSync(configPath, 'utf-8') in ./util/config-file.js sometimes ends up being 0 which screws up the adding process.

Yup.

@jdalrymple
Copy link
Contributor

Ill see what i can figure out 👼

@JoshuaKGoldberg
Copy link
Contributor

If this project is still maintained, I'd be happy to donate the code in https://github.com/JoshuaKGoldberg/all-contributors-for-repository for this feature. ❤️

@Berkmann18
Copy link
Member

@JoshuaKGoldberg It still is, although most of the maintainers (at least myself and Greg) have been busy with life commitments (families, work and such).
I don't know if you read all of the comments and saw the WIP PR I made (which was improved by fellow coders); it's a shame I've not had the chance to retest the solution but hopefully, I'll get around to do that and see if it's decent enough to merge (I'll rather make sure the model is reliable enough then have the feature but the assigned categories aren't always correct).
If you want to join forces or have a better solution than what I came up with then I'll be happy to consider it and get this long-overdue feature released.

@JoshuaKGoldberg
Copy link
Contributor

Heh, such is life in open source... if you all would like some help, I'd be happy to pitch in on this summer! This project is an excellent point for the industry and I really appreciate all the work you all have put into it! ❤️

To be honest I mostly skimmed the PR and then forgot about it 😅. I'm also about to go off for part-work, part-vacation travel for 3 weeks so probably can't effectively collaborate on it till mid-June... but if you're still free then, I'd love to pitch in however would be helpful!

@Berkmann18
Copy link
Member

Berkmann18 commented May 19, 2023

I recently got back.
I went through https://github.com/all-contributors/cli/pull/196/files (the PR I mentioned) and the diagram I made outlining how all the contribution categories could be recognised
AllContributorsCategoryClassification drawio

Although I'm currently more active due to an injury that has prevented me from doing what sponsored players would do (like competing and practising), I'll try to block time every week for AC repos (or GH projects in general) so I should be more active from now on.

Here are the contribution categories found by PR (with notes on whether it's found by the AI model so not fully accurate yet) and what your repo has (note: the package I used in the PR for the non-AI stuff doesn't get labels from issues/PRs so more of the AI-suggested categories could be more accurately fetched if we get labels)

Category Handled by the PR Handled by all-contributors-for-repository
audio
a11y Not quite
bug Yes (AI) Yes
blog
business
code Yes Yes
content
data Not quite
doc Yes (AI) Yes
design Yes (AI)
example Not quite
eventOrganizing
financial
fundingFinding
ideas Yes (AI)
infra Yes (AI) Yes
maintenance Yes (AI) Yes
mentoring
platform Yes (AI)
plugin Yes (AI)
projectManagement
promotion
question Yes (AI)
research
review Yes Yes
security Yes (AI)
tool Yes (AI) Yes
translation
test Yes (AI) Yes
tutorial
talk
userTesting
video

Another note: all the categories found by the AI model may not be based on enough data fed to the model (cf. all-contributors/ac-learn#37 for more info) or ones that could reliably be assigned based on repo data alone. And the PR I have doesn't fully take advantage of what NYC returns.

@Berkmann18
Copy link
Member

Just an update on the feature. I've got something that works (there's room for improvement, especially with the contributions assigned to users, but I don't want to delay this feature more just because of that) on #196.

@jdalrymple @JoshuaKGoldberg @mrchief Feel free to review the PR and test locally with some repos and let me know if you see any glaring issues. Apologies for the massive delay on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment