Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backstage doesn't recognize branches that contain forward slashes #2815

Closed
taras opened this issue Oct 8, 2020 · 48 comments
Closed

Backstage doesn't recognize branches that contain forward slashes #2815

taras opened this issue Oct 8, 2020 · 48 comments
Labels
bug Something isn't working help wanted Help/Contributions wanted from community members stale

Comments

@taras
Copy link
Member

taras commented Oct 8, 2020

Backstage relies on git-url-parse to parse URLs. This library has a bug that causes it to incorrectly parse URLs that point to branches with forward slashes in them.

Expected Behavior

I expect to be able to use branches with / because it's a useful way to organized branches into groups.

Current Behavior

If I attempt to create a template from a branch that uses a / in its name, the operation will fail because Backstage (via git-url-parse) will assume that I'm trying to get code from branch called tm. It will ignore everything past first /.

Possible Solution

🤷‍♀️

Steps to Reproduce

  1. Try to create a template from https://github.com/spotify/backstage/blob/dependabot/npm_and_yarn/testing-library/jest-dom-5.11.4/plugins/scaffolder-backend/sample-templates/create-react-app/template.yaml

Context

Some of the projects I'm involved in require that each team member prefixes their branch with their initials. It's a habit at this point to start every branch name with tm/.

@freben
Copy link
Member

freben commented Oct 8, 2020

Oof - thanks for reporting, of course this should work and I always do the same.

That library has been a bit less than stellar, I wonder if there are alternatives. Then again, just looking at the url in your example, it looks hard to deduce where the ref ends and the path begins...

@taras
Copy link
Member Author

taras commented Oct 9, 2020

I created a runkit that has every GitHub URL parsing library that I could find. None of them support properly detecting the branch name. We might need a completely different strategy for doing this.

@Rugvip
Copy link
Member

Rugvip commented Oct 9, 2020

https://github.com/spotify/backstage/blob/blob/blob/blob/blob/blob/blob.txt

There is actually a bit of method to this madness though. Git itself has a limitation that you can only create branches that can be represented in the filesystem. For example, you can't have both the branch a/b and a/b/c, because when you create a/b b is a text file on disk, and therefore you can't create a/b/c since b would have to be a folder for that.

With that in mind it means there's at least a deterministic way to interpret the URL, the annoying bit is that it includes using the GitHub API to discover the branches that are present in the repo.

@taras
Copy link
Member Author

taras commented Oct 9, 2020

the annoying bit is that it includes using the GitHub API to discover the branches that are present in the repo

I was thinking the same thing.

There could be another approach that could work but I don't know how universal it wold be.

GitHub website doesn't seem to care about extra forward slashes. We could for example use // forward slash to represent beginning and end of a ref. It would also be annoying, but could be better than auto detecting because the user is expressing their intention by adding the extra forward slash.

For example, 'https://github.com/spotify/backstage/blob//dependabot/npm_and_yarn/testing-library/jest-dom-5.11.4//package.json' still leads to the same file on the GitHub website.

@Rugvip
Copy link
Member

Rugvip commented Oct 9, 2020

@taras tbh we want users to be able to just copy the address from the address bar. Otherwise we could just go with our own shorter format too

@taras
Copy link
Member Author

taras commented Oct 9, 2020

true true. I'm going to email a friend at Microsoft to see if he can connect us with someone at GitHub. I want to find out from them if there is a way that they'd recommend.

@IonicaBizau
Copy link

Like I commented in IonicaBizau/git-url-parse#114 (comment), I don't think there is a way to split the url itself into branch name and filename... Pinging @izuzak here just in case he has any ideas how this can be improved. ❇️

@stale
Copy link

stale bot commented Mar 29, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@timbonicus
Copy link
Contributor

@mtlewis and I spent a hot second looking at the code in question. As mentioned, there's not enough information in the URL to tell branch name versus path when the branch has a slash. I didn't find any alternative URL formats that work with GitHub where this might be possible.

This code could have some additional logic, though. I think we could either:

  1. List all branches for the repo to match a substring
  2. Catch the 404 and retry, expanding the branch name to the next /

Since (1) could suffer on a repo with a ton of branches, and could have false positives, the second seems preferable to me. We could retry N times, expanding to the next slash each time, where N is a fairly low number.

@freben
Copy link
Member

freben commented Jul 8, 2021

One issue iirc is that there's a lot of code that just uses git-url-parse or things like resolveUrl that aren't even async in the first place so there's very little room for doing smart or retrying or remote-querying things. Feels to me like it unfortunately would take a big refactoring to make it good.

@eperdeme
Copy link

eperdeme commented Jul 8, 2021

Evening,
I'm not a developer but doing loops to attempt to work out if its a branch or a path seems inelegant.

In other projects I've seen the URL is normally constructed for git repos with parameters that get extracted to let the client libs know what the branch is;

For example;

Terraform
git::https://example.com/vpc.git?ref=v1.2.0
Helmfile
name: remote-repo url: git+https://github.com/repo/test@deploy/helm?ref=master

kustomize
github.com/kubernetes-sigs/kustomize/examples/multibases?ref=v1.0.6

@freben
Copy link
Member

freben commented Jul 8, 2021

@eperper thanks for those pointers - interesting. One thing to take into account here is convenience of copy+paste as mentioned by @Rugvip above. Having to form a custom url might be error prone. That being said, I think that approach can be worth considering nonetheless, especially if it's an "alternative form" that you can use only when necessary or if it's easily available somehow in the web interface besides the address bar of the browser.

@FilipPyrek
Copy link
Contributor

Hi guys, what's a status on this one? Unfortunately it's a TechDocs show stopper for us.. 😢

@freben
Copy link
Member

freben commented Sep 1, 2021

Sorry but this is still a current issue. There's some discussion in the comments above. Having unencoded slashes in the URL makes it super problematic to parse and understand GitHub URLs - there's no surefire way to actually know where the branch ends and the path begins, without resorting to hammering the GitHub API with queries to resolve the situation - and this needs to be done in a lot of places where the code isn't even async ready, but rather expects to be able to do that parsing instantly and locally.

@FilipPyrek
Copy link
Contributor

@freben I like the idea of "alternative form" which would solve the problem but didn't require any groundbreaking changes. I like format where ref is supplied as a query string. That solution would make it extensible also for other information in future - if needed.
Do you guys at @backstage prefer to implement this yourself? Or is this okay to be done by a community? Do you have any requirements & tips how you would like this to be implemented (and where)?

@freben
Copy link
Member

freben commented Sep 2, 2021

@FilipPyrek Yeah that's probably the most viable way forward that we have seen so far, at least. We could make sure to implement support for the alternative form as well as the original native form - we do want to keep the convenience of copy+paste from the browser.

There's TONS of direct usages of git-url-parse across the code base. If we went for this strategy, we would have to rewrite them all to use some wrapper function e.g. in @backstage/integration instead. There's also probably a fair bit of code that does direct string manipulation, or does new URL(str).pathname.split('/') and similar trickery. So yeah, there will be a bunch of chasing down code that isn't up to par with the new requirement.

@freben
Copy link
Member

freben commented Sep 2, 2021

And to answer your question, we'd appreciate help with it.

@freben freben added the help wanted Help/Contributions wanted from community members label Sep 2, 2021
@FilipPyrek
Copy link
Contributor

Understood and totally agree @freben.

I will see what available capacity we will have and if somebody from us will start working on that, I will let you guys know. 👍

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jun 19, 2023
@Rugvip
Copy link
Member

Rugvip commented Jun 22, 2023

Way forward with this issue is what was started in #16407 but unfortunately abandoned

@github-actions github-actions bot removed the stale label Jun 22, 2023
@VladMasarik
Copy link
Contributor

yello, I started looking at the abandoned PR, and would like to continue from there.

So far I think identified that issue to be here
https://github.com/backstage/backstage/blob/master/packages/integration/src/helpers.ts#L98

Specifically, we pass in url="../a.yaml", and the base path is changed, but the query should be actually changed.

Expected: "https://github.com/backstage/backstage/tree/branchName/withSlashes?path=a.yaml#L17"
Received: "https://github.com/backstage/backstage/tree/a.yaml?path=test/README.md#L17"

I would like to edit this part, but do you think this is the correct place? Or would you actually know of a better / proffered place for that fix?

@freben
Copy link
Member

freben commented Aug 3, 2023

@VladMasarik That's the default resolver, and each individual integration may or may not choose to use it. Each integration class' resolveUrl method is what's actually being consumed, and for example the Azure one does something completely different and more complex. So this will have to be resolved on a per-provider level since they all have different idiosyncrasies. In the case of your example URLs, it'd be here.

Now, to make things worse, URLs are used in a huge number of places, and they are acted on in various ways. Not just resolveUrl, but for example parsing them and taking out their component parts. A lot of code uses the git-url-parse library directly for this since the integrations package doesn't offer this yet. See here for example. We need to make sure that these various loose usages of URLs also work with the secondary format that has the query parameter. If you can double check that it'd be great.

GIven these circumstances I think we'll have a hard time supporting the URL format that has the path inside the URL path itself rather than in a path query parameter. We just can't know for sure where the branch name ends and the path starts...

@VladMasarik
Copy link
Contributor

@freben I am still assuming that for the Default and Github resolver it could be useful, and it wouldn't be perfect, but at least something.

However, do I understand correctly that although we might resolve the issue of parsing branches with / in them in one place, it won't fix all of the issues because in other places the library cannot parse the /? As in, lets say we would actually fix the issue for github resolver, but when manipulating the same URL in other place everything would suddenly stop working?

I looked at the library directly, there is an open issue as well, but it has been seemingly abandoned because it is tricky for the same issue you mentioned. If I were still interested in fixing this issue, do you think fixing the library first would be preferred?

Also, I am getting the impression that you would very much welcome this fix, however you don't expect it to be actually implemented because of the mentioned issues, and as such you are not very excited to draw this out even further? :D

Just as an extra context, I did not read all the messages that were discussed in the last two years that this PR was open. My main reason for trying to revive this is the last comment

Way forward with this issue is what was started in #16407 but unfortunately abandoned

@freben
Copy link
Member

freben commented Aug 10, 2023

Yeah I think the core of my worry is: This looks like a point fix in a single place but there's probably a lot more places that still won't work with URLs on the proposed form. We need help validating that things actually work throughout Backstage with URLs on a new form and it might be a bit of an undertaking. What I need is a bit of confidence in merging whatever gets proposed. Because if we "officially" say that "yeah if your branch contains slashes, just register the URL on this form instead" and that makes e.g. the scaffolder not work at all for those URLs ... then we've introduced new problems :) Does that make sense? On the surface I see a proposed change in the resolve method that looks like it makes sense in isolation, but I don't know the impact that the change has on a larger scale, and have limited time to dig through and validate things by hand on my own end.

@VladMasarik
Copy link
Contributor

What I need is a bit of confidence in merging whatever gets proposed

Ahh yeah, that is understandable. In that case, I would like to first take care of the issue in the git-url-parse library first, as it seems that there might not even be a simple resolution to it. Since it turned out to be a bit larger undertaking, I won't promise any progress (as a note for anyone else reading this), but I will try to add any new findings in here. Glad I first asked before resurrecting the old PR right away :D

Thanks for sharing your perspective Frederik 👌

Just some references for the PRs / issues I was talking about in the git-url-parse library
IonicaBizau/git-url-parse#114
IonicaBizau/git-url-parse#104

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Oct 13, 2023
@camilaibs camilaibs removed the stale label Oct 16, 2023
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Dec 15, 2023
@vinzscam vinzscam removed the stale label Dec 18, 2023
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Feb 16, 2024
@camilaibs camilaibs removed the stale label Feb 19, 2024
@piatkiewicz
Copy link
Contributor

piatkiewicz commented Feb 23, 2024

Maybe a message could be a little more descriptive. I can't reproduce it now for new repo and branch when creating some simple catalog-info, but in some old project I have this message when trying to import catalog-info.yaml from branch with a slash in the name
image
For branches without a slash it imports fine.

{"error":{"name":"InputError","message":"NotFoundError: Unable to read url, NotFoundError: https://github.com/myorg/myrepo/tree/preslash/postslash-branchname/catalog-info.yaml could not be read as https://api.github.com/repos/myorg/myrepo/contents/postslash-branchname/catalog-info.yaml?ref=preslash, 404 Not Found"},"request":{"method":"POST","url":"/locations?dryRun=true"},"response":{"statusCode":400}}

Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Apr 23, 2024
@VladMasarik
Copy link
Contributor

ping, still a problem AFAIK. the last time i was looking at this, I think it was possible to fix, however when I was searching for the package, I found only a few matches so i was confused because someone mentioned it is used in a lot of places. However, later I found out that I was looking at the newest package, and not an older version, and then I found like 50 places where it was used, and never got to fixing all of them.

@github-actions github-actions bot removed the stale label Apr 23, 2024
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jun 22, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Help/Contributions wanted from community members stale
Projects
None yet
Development

No branches or pull requests