Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: treat github sources as case insensitive #471

Merged
merged 11 commits into from
Feb 19, 2021

Conversation

stakach
Copy link
Contributor

@stakach stakach commented Jan 28, 2021

fixes #470 #275

@bcardiff
Copy link
Member

And I noticed the PR after mi comment in the issue


Would it be enough to normalize when github: is used, or should it also be lowercased when git: https://... is used?

If so, we should not transform user/password to avoid breakinig https://<token>@github.com/owner/repo.git or https://<token>:x-oauth-basic@github.com/owner/repo.git private repos.

@bcardiff
Copy link
Member

Could you add some tests git_resolver_spec.cr? 🙈

@stakach
Copy link
Contributor Author

stakach commented Jan 29, 2021

@bcardiff added code to deal with generic git: urls when they match github, bitbucket or gitlab

spec/unit/git_resolver_spec.cr Outdated Show resolved Hide resolved
src/resolvers/git.cr Outdated Show resolved Hide resolved
src/resolvers/git.cr Show resolved Hide resolved
@asterite
Copy link
Member

Just my two cents, but maybe it would have been better to fix this issue just for github. The original PR was a one line change.

@stakach
Copy link
Contributor Author

stakach commented Jan 29, 2021

@asterite I would have agreed with you before implementing @bcardiff suggestion
it feels more complete / robust this way

src/resolvers/git.cr Outdated Show resolved Hide resolved
src/resolvers/git.cr Outdated Show resolved Hide resolved
src/resolvers/git.cr Show resolved Hide resolved
Copy link
Contributor

@Sija Sija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the changes here should be scoped only to the 3 resolvers: github, gitlab and bitbucket. All of the logic for git resolver is a "magic" in charge of fixing user mistakes - which IMO shouldn't be the case at all - user is free to choose the resolver, and with git it will simply have to be correct, as that's the nature of generic git - or for that matter http - paths.

src/resolvers/git.cr Outdated Show resolved Hide resolved
src/resolvers/git.cr Outdated Show resolved Hide resolved
src/resolvers/git.cr Show resolved Hide resolved
src/resolvers/git.cr Show resolved Hide resolved
src/resolvers/git.cr Outdated Show resolved Hide resolved
@stakach
Copy link
Contributor Author

stakach commented Feb 1, 2021

@Sija we're only normalising github, bitbucket and gitlab paths, these services and their behaviour are well known and the path is being normalised to the one they recommend when you click the clone button on their webpage.

Not only will this improve performance, as a HTTP to HTTPS redirect will occur if you are cloning from the service
it also improves security, preventing man in the middle in case someone entered HTTP instead of HTTPS (thinking of the recent SolarWinds attack where a build service was hijacked)

@Sija
Copy link
Contributor

Sija commented Feb 1, 2021

[...] it also improves security, preventing man in the middle in case someone entered HTTP instead of HTTPS (thinking of the recent SolarWinds attack where a build service was hijacked)

  1. We're talking about the git resolver, not the known provider specific ones.
  2. No it won't, this kind of mistakes are made and will be made even more if you "magically" start fixing the insecure URLs. I'd rather display big fat warning instead so the user is made aware about the situation.

@stakach
Copy link
Contributor Author

stakach commented Feb 1, 2021

I personally think it's fine as we're only fixing up well known services.
That said, I don't particularly care either way as I assume most people will always use the specific service resolver, at least based on what I've seen in the community

@Sija
Copy link
Contributor

Sija commented Feb 1, 2021

@stakach I'm also confused about the SolarWinds remarks - what does that have to do with any of this?

@stakach
Copy link
Contributor Author

stakach commented Feb 1, 2021

https://www.zdnet.com/article/third-malware-strain-discovered-in-solarwinds-supply-chain-attack/

For example, lets say you have a popular app that might be a target of hackers

  • I add a HTTP github URL in my shard.yml for an application I'm building
  • a hacker breaks into my build server and notices that they can man-in-the-middle the build
  • they edit the servers hosts file so they can inject whatever code they want without me knowing, as long as the builds succeed

was using the solarwinds example as a reference to show that this kind of attack does happen, getting around code signing

@Sija
Copy link
Contributor

Sija commented Feb 1, 2021

  • I add a HTTP github URL in my shard.yml for an application I'm building

How would you do it, if the copy-pasta from every website you mentioned uses https? Would you remove the s by hand? How else would you end up with http protocol?

  • they edit the servers hosts file so they can inject whatever code they want without me knowing, as long as the builds succeed

What's the difference here between http and https? Why wouldn't the attacker use a https endpoint with Let's Encrypt certificate for instance?

@straight-shoota
Copy link
Member

I think treating git URLs with well-known hosts the same way as the respective named resolver is good. That automatically removes any potential issues with non-matching sources. If some shard adds a dependency with git: and a github URL we don't want that to conflict with some other shard referencing the same dependdency with github: plus path. They should just be equivalent. That means if github.com resolves different request URLs to the same repository, shards should normalize them. I don't think there's a downside to that.

shardbox.org normalizes resolvers that way, too: https://github.com/shardbox/shardbox-core/blob/19fb582433bbd914ba31295c76cf411d322c2a71/src/repo/ref.cr#L22 (there's no downcasing the path, but it still works because the DB field is case-insensitive; should probably normalize that explicitly, though).

To address security concerns with unencrypted connections, we could consider to make http protocol an error for services known to only serve on HTTPS. But that's really a different topic.

Co-authored-by: Johannes Müller <straightshoota@gmail.com>
src/resolvers/git.cr Outdated Show resolved Hide resolved
@stakach
Copy link
Contributor Author

stakach commented Feb 1, 2021

@Sija

What's the difference here between http and https? Why wouldn't the attacker use a https endpoint with Let's Encrypt certificate for instance?

Let's Encrypt will only generate a certificate for a domain if you can prove you control it, they just make the proving process super simple.
Because of this proof requirement HTTPS prevents man in the middle attacks and is the main reason the HSTS header exists and is present on all the major repository hosting platforms

@bcardiff
Copy link
Member

bcardiff commented Feb 1, 2021

I think I would leave out git:// from the normalization. There are some private repo that use git: git@github.owner/repo.git sources to use the SSH identity directly. Using https will force them to do workarounds like https://stackoverflow.com/a/64587959/30948 which seems more indirect than using the SSH identity.

I guess that using github: or git: with http[s] is the most common thing. @straight-shoota can you confirm from shardbox that? If we leave out the git:// from the normalization the case sensitivity issues on those pattern will still be solved and it will not break existing workflows with private shards.

Other than that the PR looks good to me. I think it's fine to rewrite http to https for these known hosts.

@stakach don't you have private shards that would be affected by this change for example?

cc: @bararchy

@stakach
Copy link
Contributor Author

stakach commented Feb 1, 2021

@bcardiff the git protocol itself doesn't support authentication at all.
So no private repos will work using git and git protocol requires the user to verify the server identity, unlike HTTPS where this is automated

The accepted answer in that stackoverflow post mentions:

git:// URLs are read only, and it looks like private repos do not allow this form of access.

to use ssh with authentication you need to specify the URL in the form of:
git@github.com or ssh://git@github.com/ both of which will continue to work

@straight-shoota
Copy link
Member

I guess that using github: or git: with http[s] is the most common thing. @straight-shoota can you confirm from shardbox that?

Yes. All dependencies known to shardbox with git: source actually use https scheme. Not even a single http URL.

@bcardiff bcardiff merged commit 8caa6c5 into crystal-lang:master Feb 19, 2021
@bcardiff bcardiff added this to the v0.14.0 milestone Feb 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

github ambiguous sources
5 participants