Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add `MentionsAuthor` edges to the graph #808

Merged
merged 1 commit into from Sep 12, 2018
Merged

Add `MentionsAuthor` edges to the graph #808

merged 1 commit into from Sep 12, 2018

Conversation

@decentralion
Copy link
Member

decentralion commented Sep 12, 2018

This commit builds on the work in #806, adding the
MentionsAuthorReferences to the graph. It thus resolves #804.

Empirically, the addition of these edges does not change the users' cred
distribution much. Consider the results with the following 3 forward
weights for the edge (results for ipfs/go-ipfs):

User w=1/32 w=1/2 w=2
whyrusleeping 228.04 225.69 223.86
jbenet 102.04 100.26 99.53
kubuxu 66.60 67.80 69.36
...
btc 22.69 22.29 21.38

The small effect on users' cred is not that surprising: the
MentionsAuthor references always "shadow" a direct comment->user
reference. In principle, the overall cred going to the user should be
similar; the difference is that now some more cred flows in between the
various comments authored by that user, on the way to the user. (And if
those other comments had references, then it flows out from them, etc.)

Empirically, the variance on comments' scores seems to increase as a
result of having this heuristic, which is great—the fact that all
comments had about the same score was a bug, not a feature.

Sadly, we don't have good tooling for proper statistical analysis of the
effect this is having. We'll want to study the effect of this heuristic
more later, as we build tooling and canonical datasets that makes that
analysis feasible.

We choose to add this heuristic, despite the ambiguous effect on users'
cred, because we think it is principled, and adds meaningful structure
to the graph.

Test plan:
The commit is a pretty straightforward generalization of our existing
GitHub edge logic. All of the interesting logic was thoroughly tested in
the preceding pull, so this commit just tests the integration. Observe
that standard (de)serialization of the edge works, that the snapshot is
updated with a MentionsAuthor reference edge, and that the graph
invariant checker, after update, does not throw errors. Also, I manually
tested this change on the ipfs/go-ipfs repo. (It does not require
regenerating data.)

@decentralion decentralion requested a review from wchargin Sep 12, 2018
src/plugins/github/pluginAdapter.js Outdated Show resolved Hide resolved
@@ -47,12 +50,17 @@ export type ReferencesAddress = {|
+referrer: GithubNode.TextContentAddress,
+referent: GithubNode.ReferentAddress,
|};
export type MentionsAuthorAddress = {|
+type: typeof MENTIONS_AUTHOR_TYPE,
+reference: MentionsAuthorReference,

This comment has been minimized.

Copy link
@wchargin

wchargin Sep 12, 2018

Member

Interesting decision to tie this to the reference type, as opposed to
transcluding the fields directly. If the property names on the reference
type are changed/added/removed, then Flow will catch an error on line
216 of edges.js, right? If so, this is fine with me.

This comment has been minimized.

Copy link
@decentralion

decentralion Sep 12, 2018

Author Member

Correct.

This commit builds on the work in #806, adding the
`MentionsAuthorReference`s to the graph. It thus resolves #804.

Empirically, the addition of these edges does not change the users' cred
distribution much.  Consider the results with the following 3 forward
weights for the edge (results for ipfs/go-ipfs):

| User          | w=1/32 | w=1/2  | w=2    |
|---------------|-------:|-------:|-------:|
| whyrusleeping | 228.04 | 225.69 | 223.86 |
| jbenet        | 102.04 | 100.26 | 99.53  |
| kubuxu        | 66.60  | 67.80  | 69.36  |
| ...           | —      | —      | —      |
| btc           | 22.69  | 22.29  | 21.38  |

The small effect on users' cred is not that surprising: the
MentionsAuthor references always "shadow" a direct comment->user
reference. In principle, the overall cred going to the user should be
similar; the difference is that now some more cred flows in between the
various comments authored by that user, on the way to the user. (And if
those other comments had references, then it flows out from them, etc.)

Empirically, the variance on comments' scores seems to increase as a
result of having this heuristic, which is great—the fact that all
comments had about the same score was a bug, not a feature.

Sadly, we don't have good tooling for proper statistical analysis of the
effect this is having. We'll want to study the effect of this heuristic
more later, as we build tooling and canonical datasets that makes that
analysis feasible.

We choose to add this heuristic, despite the ambiguous effect on users'
cred, because we think it is principled, and adds meaningful structure
to the graph.

Test plan:
The commit is a pretty straightforward generalization of our existing
GitHub edge logic. All of the interesting logic was thoroughly tested in
the preceding pull, so this commit just tests the integration. Observe
that standard (de)serialization of the edge works, that the snapshot is
updated with a MentionsAuthor reference edge, and that the graph
invariant checker, after update, does not throw errors. Also, I manually
tested this change on the ipfs/go-ipfs repo. (It does not require
regenerating data.)
@decentralion decentralion force-pushed the heuristic-edge branch from 4d1850d to 293bfb3 Sep 12, 2018
@decentralion decentralion merged commit 737ed4d into master Sep 12, 2018
2 checks passed
2 checks passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
@decentralion decentralion deleted the heuristic-edge branch Sep 12, 2018
decentralion added a commit that referenced this pull request Sep 13, 2018
It now mentions that we added `MentionsAuthor` edges to the GitHub
graph in #808.

Thanks @whyrusleeping for suggesting this heuristic.

Test plan: n/a
@decentralion decentralion mentioned this pull request Sep 13, 2018
decentralion added a commit that referenced this pull request Sep 13, 2018
It now mentions that we added `MentionsAuthor` edges to the GitHub
graph in #808.

Thanks @whyrusleeping for suggesting this heuristic.

Test plan: n/a
decentralion added a commit that referenced this pull request Sep 13, 2018
It now mentions that we added `MentionsAuthor` edges to the GitHub
graph in #808.

Thanks @whyrusleeping for suggesting this heuristic.

Test plan: n/a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

2 participants
You can’t perform that action at this time.