Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize mentions for merge requests in Gitlab #3437

Merged
merged 2 commits into from
Sep 16, 2022
Merged

Sanitize mentions for merge requests in Gitlab #3437

merged 2 commits into from
Sep 16, 2022

Conversation

andrcuns
Copy link
Contributor

@andrcuns andrcuns commented Apr 2, 2021

I have encountered an issue that quite often handles in gitlab will be the same as on github. This creates an issue where PR descriptions in various projects can send notifications to people mentioned in release notes.

I'm not entirely sure if same issue could exist on bitbucket as well and maybe this check should be removed all together. For now it seemed safer to explicitly just add gitlab to conditional.

@andrcuns andrcuns requested a review from a team as a code owner April 2, 2021 22:26
Copy link
Contributor

@feelepxyz feelepxyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is what you want, this will just sanitize links to github on pull requests created on GitLab. GitLab MergeRequests shouldn't create issue/PR references on github.

Are you looking at sanitizing links to GitLab MergeRequests and Issues?

@andrcuns
Copy link
Contributor Author

andrcuns commented Apr 9, 2021

Yes, exactly, you need to sanitize mentions in Gitlab Merge Requests as well.

There is currently an issue, that if dependabot creates some merge request where release notes contain a mention of @andrcuns for example, it would send a notification to this user because it is also a valid handle in Gitlab which also uses @ to mention specific users. This results in quite a lot of spam to active contributors who have same username on Github and Gitlab.

I have had this change as monkey patch for quite a while already, but I think this should be added in core since I imagine quite a few users run dependabot as a script on Gitlab.

@andrcuns andrcuns changed the title sanitize links and mentions for gitlab Sanitize links and mentions for gitlab Apr 9, 2021
@andrcuns andrcuns changed the title Sanitize links and mentions for gitlab Sanitize mentions for merge requests in Gitlab Apr 9, 2021
Copy link
Contributor

@feelepxyz feelepxyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrcuns yeah makes sense 👍 we won't be able to rely on the LinkAndMentionSanitizer for this though as this only works for github users/links/PRs and re-writes @userhandles to github.com links. I think the way to go here would be adding a GitLabLinkAndMentionSanitizer and using this instead when the source is gitlab.

We're love a contribution to fix this and happy to review it.

@andrcuns
Copy link
Contributor Author

andrcuns commented Apr 9, 2021

@andrcuns yeah makes sense 👍 we won't be able to rely on the LinkAndMentionSanitizer for this though as this only works for github users/links/PRs and re-writes @userhandles to github.com links. I think the way to go here would be adding a GitLabLinkAndMentionSanitizer and using this instead when the source is gitlab.

We're love a contribution to fix this and happy to review it.

Well, technically it works as expected, because all users/links/PR's always reference Github, I don't think dependabot fetches any info from non Github sources, does it?

Here is an example: https://gitlab.com/dependabot-gitlab/dependabot/-/merge_requests/567. The changelog actually points to Github user profiles, which is who actually contributed the changes. I don't think it would make much sense to sanitize these user profiles and create links that point to Gitlab profile counter parts.

Do You think it makes sense to implement GitLabLinkAndMentionSanitizer which pretty much inherit's the LinkAndMentionSanitizer. At the moment I don't see what custom logic it might have given PR descriptions contain references only to Github related objects (users/PR's).

@feelepxyz
Copy link
Contributor

Well, technically it works as expected, because all users/links/PR's always reference Github, I don't think dependabot fetches any info from non Github sources, does it?

It depends on the dependency, if the package is hosted on gitlab it will.

I'm still not following exactly what the problem is. The MergeRequest you link to has two user mentions, @​honnix and @​jerbob92, these look like github usernames afaikt, but those users should not receive any notifications on github from these merge requests. They should only receive notifications on gitlab if they also exist there. The existing LinkAndMentionSanitizer won't help with this as it will start linking these users to github instead of gitlab.

Is the problem not with gitlab usernames getting mentions on gitlab?

@andrcuns
Copy link
Contributor Author

andrcuns commented Apr 9, 2021

Well, technically it works as expected, because all users/links/PR's always reference Github, I don't think dependabot fetches any info from non Github sources, does it?

It depends on the dependency, if the package is hosted on gitlab it will.

I'm still not following exactly what the problem is. The MergeRequest you link to has two user mentions, @​honnix and @​jerbob92, these look like github usernames afaikt, but those users should not receive any notifications on github from these merge requests. They should only receive notifications on gitlab if they also exist there. The existing LinkAndMentionSanitizer won't help with this as it will start linking these users to github instead of gitlab.

Is the problem not with gitlab usernames getting mentions on gitlab?

The problem has nothing to do with Github, it's Gitlab only :) I must be bad at explaining it.

The problem is exactly that these 2 users would receive a notification on Gitlab about a merge request in a project that has nothing to do with them simply because they are mentioned in the changelog.

The example MR works correctly and did not ping those 2 users, because it was created by a script that has the proposed changes as monkey patch and it transformed the usernames to urls to github.

I need to dig deeper in the code then, if description can reference release notes or changelog that comes from gitlab, then indeed it would need it's own sanitizer.

@feelepxyz
Copy link
Contributor

Ah gotcha, think I follow now. So it will work as expected if the mentioned users are always github users. I was thinking of the case where these users are only on gitlab, in which case this change would suddenly link them to non-existent github users.

Hm not entirely sure what the best approach is here. Will have a poke at the code to see if we know where we are fetching changelogs/release notes from and see if we can toggle based on this instead of the updated repo.

@andrcuns
Copy link
Contributor Author

andrcuns commented Apr 9, 2021

Yes, exactly what I was thinking now. If it's possible to know where release notes come from, it should be pretty straight forward to implement the proper sanitation. I was not aware that release notes can even come from Gitlab so didn't think that linking to non existent Github accounts might be an issue. I guess similar problem can happen with referencing issues and PR's/MR's in changelog since it could reference to incorrect platform.

Let me know what you find. I'm willing to try and implement it.

@@ -244,7 +244,7 @@ def source_provider_supports_html?
end

def sanitize_links_and_mentions(text, unsafe: false)
return text unless source.provider == "github"
return text unless source.provider == "github" || source.provider == "gitlab"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly we need something like...

return text if running_on?(:github) && source.provider == 'github'
return text if running_on?(:gitlab) && source.provider == 'gitlab'

Does that capture the reason that this PR is blocked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, the issue is determining the source of Release notes and Changelog so we do not sanitize links pointing natively to Gitlab as Github links (I have not encountered such a case myself though).

If I understand correctly, source.provider is not that, it is basically a pointer where the dependabot-core runs and not where changelog comes from.

@jeffwidman
Copy link
Member

👋 Sorry for the delay here. Both @xlgmokha and @feelepxyz have transitioned off the core team, so this got dropped for a bit.

I was not aware that release notes can even come from Gitlab so didn't think that linking to non existent Github accounts might be an issue.

I haven't dug deeply into that code, but if we don't support querying non-GitHub sources today, I'm sure we'll support it in the future. Having correct/useful metadata all in one place in the PR description is one of the value-adds of Dependabot--I absolutely loved that about Dependabot when using it at my previous employer. So if we query a registry like PyPI for that data and it points us at GitLab, then we should be grabbing this data from GitLab. Again, not sure if we do this today, but given the importance we'll do it in the future so I'm a bit hesitant to extend the current "hack" of assuming that if we're running on GitHub that all links point to GitHub.

I guess similar problem can happen with referencing issues and PR's/MR's in changelog since it could reference to incorrect platform.

Exactly.

If it's possible to know where release notes come from, it should be pretty straight forward to implement the proper sanitation.

Yeah, that's a much more sustainable / better solution IMO. I opened #5729 to track that.

This probably isn't something we can get to anytime soon, but if you want to take a crack at it I'll try to be responsive on the PR reviews to keep it moving.

Given all this, I'm going to close this PR as "the right problem but wrong solution". And again, my apologies for the delay here!

@jeffwidman jeffwidman closed this Sep 15, 2022
@jeffwidman
Copy link
Member

jeffwidman commented Sep 15, 2022

Actually, I think I jumped the gun on closing this.

The reality is that most open source projects are hosted on GitHub, and most dependency bumps are open-source so extending the hack to include doing the sanitization on GitLab is probably fine in the short term.

Long term we'd still want to address this through looking at the metadata source (#5729), but in the meantime this PR probably improves the user experience for anyone running on GitLab. Alternatively we could remove the check altogether and improve the user experience for all the other platforms.

The one user experience that would be negatively impacted is a team working at GitLab across a bunch of internal repos... suddenly their internal links will start pointing at GitHub rather than their internal repos. But given that @andrcuns already applies this as a monkey patch, I doubt many folks will be affected.

I'm going to get a second opinion internally before merging, just to be sure I'm not missing something.

@jeffwidman jeffwidman reopened this Sep 15, 2022
Copy link
Member

@jeffwidman jeffwidman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After noodling on it and getting a second opinon, going to merge this!

I'm going to also open a follow-up PR to remove the guard altogether, as we probably should sanitize on all platforms. And then address the underlying bug in #5729.

@jeffwidman jeffwidman merged commit ba9665a into dependabot:main Sep 16, 2022
jeffwidman added a commit that referenced this pull request Sep 16, 2022
As pointed out in #5729, we should be using the "source" of the metadata
for deciding how to sanitize, not the "place where Dependabot is hosted".

This hack was annoying the folks who use GitLab to the point that [they monkey-patched this code to also run the sanitizer on GitLab](#3437 (comment)), and then contributed that back in #3437.

If it's true on GitLab, it's even more true on other platforms as well for the reasons explained in #3437 (comment).

So remove this guard altogether.
jeffwidman added a commit that referenced this pull request Sep 21, 2022
As pointed out in #5729, we should be using the "source" of the metadata
for deciding how to sanitize, not the "place where Dependabot is hosted".

This hack was annoying the folks who use GitLab to the point that [they monkey-patched this code to also run the sanitizer on GitLab](#3437 (comment)), and then contributed that back in #3437.

If it's true on GitLab, it's even more true on other platforms as well for the reasons explained in #3437 (comment).

So remove this guard altogether.
@pavera pavera mentioned this pull request Oct 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants