-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does the projects endpoint need to be case sensitive #2063
Comments
Hmmm. In the general case the URL pathname can be case-sensitive, so it's best to treat it that way. Git is also normally case-sensitive (depending on the underlying filesystem). Some specific systems have pathnames that are case-insensitive, but there's no obvious way to determine which is which. We support arbitrary repos, not just GitHub and GitLab. It's true that the domain name is not case sensitive when it's ASCII, per IETF RFC 4343. E.g., "I" and "i" are considered the same (apologies to those who speak Turkish). Handling this in the general case is hard. Here's one idea:
What do you think? |
wouldn't URI.parse allow you to grab the path separately from the fqdn? |
To clarify, is this your proposal for scorecard or for the best practices API? |
Yes, it's definitely possible. The problem is "what to do with the information". Whether or not the path is case-sensitive depends on the details of the specific system being queried. It can even change over time for a given system being queried. I think "case-sensitive first, then case-insensitive" covers all cases and is simpler to implement.
I'm thinking of this as a proposal for the best practices badge, as this is an issue against the best practice badge. This might make sense to do this in Scorecard as well, but I think that should be a different issue in that case. |
I think the only case this doesn't cover is a false match. Consider a host where path is case sensitive, and there are two projects, but only one is in the best practices dataset:
A request for |
@spencerschrock - you're right, this approach does risk a false match. I think the risk is low, but it does give pause. I can't think of another approach though, so I think we end up with two possibilities:
Anyone have a third way? |
I say: accept the risk and go with case-sensitive then case-insensitive. |
Scorecard uses the
/projects.json?url=
endpoint when checking a project's best practices badge status. We've have an open issue (ossf/scorecard#3466) where some projects call scorecard with a different capitalization than their official repo name.For example, github.com/kubearmor/KubeArmor was effectively running scorecard with:
Which makes an http call to
/projects.json?url=https://github.com/kubearmor/kubearmor
, which provides an empty response.Compared to the expected call of
/projects.json?url=https://github.com/kubearmor/KubeArmor
which is a hit.GitHub has an API call which returns the "official" capitalization of a repo, so Scorecard can likely satisfy this requirement when we make the request, but opening an issue in case this was unintentional.
I know the code in question is here, but I'm not much of a Ruby guy. There's also some comments about indices and efficiency, so feel free to close this if it's not feasible.
best-practices-badge/app/models/project.rb
Lines 123 to 132 in 91b3474
The text was updated successfully, but these errors were encountered: