New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Populate co-authors from mentionable users from the GitHub API #1476

Merged
merged 68 commits into from May 30, 2018

Conversation

Projects
None yet
4 participants
@smashwilson
Member

smashwilson commented May 18, 2018

If the current repository has one or more remotes hosted on a GitHub instance and you've authenticated to that instance on the GitHub tab, fetch the "mentionable" users from the GraphQL API rather than the local git history. This will give us GitHub usernames which we can use for autocompletion, and, importantly, it will omit users who have been blocked by the current user.

Along the way, I'm building out some infrastructure we can use to test components that query GraphQL.

  • Acquire fetch query functions for GraphQL endpoints that have already been authenticated within RelayNetworkLayerManager.
  • When the current repository has at least one GitHub remote with an authentication token:
    • Retrieve mentionable users from that repository via GraphQL instead of pulling them from git history
    • Iterate over multiple pages
    • Handle users without a public email account
  • Use an Author model now that we have enough data to warrant one
  • Restructure UserStore storage to pre-sort authors
  • Use a ModelObserver within the UserStore to watch the Repository and reload authors when it broadcasts a change
  • Pass the UserStore through the component tree as a prop. Extract authors with an <ObserveModel> component closer to the co-author field.
  • Observe the login model and ensure that co-authors are fetched if a token becomes available later
  • Clear authors entirely when the Repository changes or when a GitHub remote is added or removed.
  • Throttle GraphQL requests so we don't query over and over again when the local Repository changes
  • Render and autocomplete GitHub handles when they are available
  • Verify OAuth scopes and treat insufficient scopes as "logged out"
  • Remove users from the list with shift-delete

smashwilson added some commits May 18, 2018

smashwilson added some commits May 21, 2018

@smashwilson smashwilson added this to In Progress 💻 in Short-Term Roadmap May 21, 2018

smashwilson added some commits May 21, 2018

@smashwilson

This comment has been minimized.

Member

smashwilson commented May 29, 2018

Is there any way to remove any co-author suggestions from the list?

Ah, there is not! That should be easy enough. What I'm thinking now:

  • Highlight the entry and shift-delete to add an email address to a list in your Atom config
  • The UserStore omits entries with email addresses matching those in the config list

The nice thing here is that this list will be respected across all of your Atom projects, even those that aren't backed by .com.

An alternative would be to actually block users on .com when doing this. Would that be better? It feels a bit too aggressive for a keybinding you can trigger by mistake...

let instance = null;
export default class GithubLoginModel {
// Be sure that we're requesting at least this many scopes on the token we grant through github.atom.io or we'll
// give everyone a really frustrating experience ;-)
static REQUIRED_SCOPES = ['repo', 'read:org', 'user:email']

This comment has been minimized.

@annthurium

annthurium May 29, 2018

Contributor

how hard would it be to leverage what we've built here into eventually having a unified Atom login experience? For Teletype, the GitHub package, and any other future packages that might want it?

This comment has been minimized.

@smashwilson

smashwilson May 29, 2018

Member

unified Atom login experience

I would love to have this so much. We'd need to find a way to make "GitHub identity" a part of the core API somehow, so that packages could just consume it. Maybe have a separate package that exists purely to handle the UI for authentication...

I'm not sure how much effort it would be. We'd need to scope it out first in an Atom RFC and get buy-in from the core team. Maybe we could wait to see how @shana can improve our authentication story first, and then generalize from there.

}
const response = await fetch(host, {
method: 'HEAD',

This comment has been minimized.

@annthurium

annthurium May 29, 2018

Contributor

TIL -- I didn't realize HEAD is a http method. (tho I guess there are all kindsa other weird ones like PURGE)

if (a.name > b.name) { return 1; }
return 0;
});
return this.users;

This comment has been minimized.

@annthurium

annthurium May 29, 2018

Contributor

question...if users haven't created a GitHub access token, do we fall back on local git history? Or do we not populate the list of users at all?

This comment has been minimized.

@smashwilson

smashwilson May 29, 2018

Member

We fall back on local git history. We also clear the UserStore and fetch exclusively from GraphQL if a token does become available, or if a GitHub remote is added.

return function fetchQuery(operation, variables, cacheConfig, uploadables) {
const currentToken = tokenPerEnvironmentUrl.get(url);
return fetch(url, {
if (atom.inSpecMode()) {

This comment has been minimized.

@annthurium

annthurium May 29, 2018

Contributor

ohh, I did not know about inSpecMode but that's a good thing to know for telemetry

@@ -616,8 +626,9 @@ export default class Present extends State {
// For now we'll do the naive thing and invalidate anytime HEAD moves. This ensures that we get new authors
// introduced by newly created commits or pulled commits.
// This means that we are constantly re-fetching data. If performance becomes a concern we can optimize

This comment has been minimized.

@annthurium

annthurium May 29, 2018

Contributor

nit: is this comment still relevant since we have implemented some level of graphql caching?

This comment has been minimized.

@smashwilson

smashwilson May 29, 2018

Member

Ah, this particular comment is because it's used to pull authors from git history. Optimizing here would involve remembering the previous HEAD and only scanning the log of the difference, so we only need to parse a small number of commits each time... which would be tricky.

@annthurium

This comment has been minimized.

Contributor

annthurium commented May 29, 2018

@smashwilson looks good to me! All my comments are basically "I can haz more context plz" not "suggestion on how to do this differently.

bitmoji

@dmleong

This comment has been minimized.

dmleong commented May 29, 2018

One quick question: If an email address isn't found in the mentionable list, what happens?

I agree, blocking a user from the Atom interface sounds much too easy, as you mentioned!

@smashwilson

This comment has been minimized.

Member

smashwilson commented May 29, 2018

One quick question: If an email address isn't found in the mentionable list, what happens?

Do you mean if a user has no public email address on their profile? In that case I'm currently inferring a noreply address from the login, which should let us populate the co-author trailer and pull the avatar properly.

Edit: like so:

if (node.email === '') {
  node.email = `${node.login}@users.noreply.github.com`;
}

Is this the preferred way to generate noreplies? Or should I be using an ID-based one instead?

@dmleong

This comment has been minimized.

dmleong commented May 29, 2018

@smashwilson our preference is to only use publicly verified emails for this! This way, we ensure that there are no leaks to private emails or accounts

@smashwilson

This comment has been minimized.

Member

smashwilson commented May 29, 2018

@dmleong Hmm am I doing that wrong?

When I was investigating the GraphQL API, it looked like mentionable users with email address privacy enabled were returned as empty strings:

query {
  user(login: "simurai") {
    name
    email
  }
}
{
  "data": {
    "user": {
      "name": "simurai",
      "email": ""
    }
  }
}

Re-reading the commit email address docs, it looks like I should be prepending the user ID, to correlate across username changes... although I can't find a way to pull the right user ID from the GraphQL API yet. But ID-based anonymized email addresses still leak the user login, right... ?

Note: If you created your GitHub account after July 18, 2017, your GitHub-provided no-reply email address is a seven-digit ID number and your username in the form of ID+username@users.noreply.github.com.

I suppose I'm not sure what you mean by "private account" and the best way to prevent leaking them 😅

@dmleong

This comment has been minimized.

dmleong commented May 29, 2018

@smashwilson I think we are discussing two different things regarding email privacy. Is the intention of using email addresses to find others who aren't in the mentionable users GraphQL API or is it to verify that they are mentionable users?

If the users are already returned via GraphQL as mentionable, I think we are okay to proceed without email verification.

Regarding a fallback using email addresses, I would prefer we only use public, verified email addresses to look up the user.

I hope that clarifies things!

@smashwilson

This comment has been minimized.

Member

smashwilson commented May 29, 2018

Is the intention of using email addresses to find others who aren't in the mentionable users GraphQL API or is it to verify that they are mentionable users?

We're using email addresses to:

If we query the GraphQL API, the co-author dropdown will show no users who are not returned by the GraphQL query - we use the mentionable user list as the source of truth. We will only include users from your git history if we don't have a way to get to GraphQL (and we clear them out if that changes, by logging in for example).

If the users are already returned via GraphQL as mentionable, I think we are okay to proceed without email verification.

Okay, good 👍 Especially because I don't see a way to distinguish between verified and non-verified addresses in the API, hah. I thought that an account's primary email had to be verified regardless?

Regarding a fallback using email addresses, I would prefer we only use public, verified email addresses to look up the user.

The only looking up we do is for avatar images, which uses whatever email is present in the git data. In the case of commits containing co-author trailers built with this UI, that's done with emails returned from GraphQL's user object, which are either public and verified or login@users.noreply.github.com.

We don't stop users from manually adding private email addresses here:

screen shot 2018-05-29 at 3 17 06 pm

... or by setting git's user.name and user.email, although at least the push will be rejected then 🤔

If we do have private email addresses in your git history, we'll leak them to someone sniffing your network traffic in the form of HTTP requests to https://avatars.githubusercontent.com/u/e?email=.... I'm afraid I don't know if we can do anything about that because we don't have a good way to identify the address as private.

I think you could also expose a co-author's private email address within git history by mistake if you enter it yourself. Does the "block pushes" setting apply to co-author trailers, too... ? Otherwise I don't know how to detect that, either. The help docs around co-authors imply that this is the committer's responsibility when composing the message:

If a person chooses to keep their email address private, you should use their GitHub-provided no-reply email to protect their privacy. Otherwise, the co-author's email will be available to the public in the commit message. If you want to keep your email private, you can choose to use a GitHub-provided no-reply email for Git operations and ask other co-authors to list your no-reply email in commit trailers.

smashwilson added some commits May 29, 2018

@dmleong

This comment has been minimized.

dmleong commented May 29, 2018

@smashwilson thank you for the thorough explanation!

Let's go ahead and proceed with what you had. It looks like since it's git data, there's not a whole lot of protection we can use (since git is inherently built this way) and we can use commit signing as a form of security on the end user's side.

smashwilson added some commits May 30, 2018

@smashwilson

This comment has been minimized.

Member

smashwilson commented May 30, 2018

@dmleong 🙇Thanks for your input and for bearing with my long-windedness 😄 Co-author exclusion is in; authors can be excluded by pressing shift-delete while an author is highlighted in the select list, or by adding email addresses to the package configuration.

@smashwilson smashwilson merged commit 39c676c into master May 30, 2018

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

Short-Term Roadmap automation moved this from In Progress 💻 to Complete ✅ May 30, 2018

@smashwilson smashwilson deleted the aw/mentionable-users branch May 30, 2018

@smashwilson smashwilson removed this from Complete ✅ in Short-Term Roadmap May 30, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment