Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

label: trim non-word chars for import path to repo #688

Merged
merged 1 commit into from
Jan 8, 2020
Merged

label: trim non-word chars for import path to repo #688

merged 1 commit into from
Jan 8, 2020

Conversation

carnott-snap
Copy link
Contributor

@carnott-snap carnott-snap commented Jan 7, 2020

Currently ImportPathToBazelRepoName converts -s and .s to _s. This does
not appear to be required by Bazel's repo name conventions, but may
follow best practices. Unfortunately, this means characters valid for
import paths cause build failures, like git.sr.ht/~sircmpwn/go-example.
This change replaces all non-word, \W or [^A-Za-z0-9_] characters, not
just - and ., with _.

This change also preallocates reversed, since its length is known.

Copy link
Contributor

@jayconrod jayconrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. Couple comments below.

label/label.go Outdated
@@ -191,11 +191,11 @@ func ImportPathToBazelRepoName(importpath string) string {
importpath = strings.ToLower(importpath)
components := strings.Split(importpath, "/")
labels := strings.Split(components[0], ".")
var reversed []string
var reversed = make([]string, 0, len(labels))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're trying to avoid extra allocation here, we should use len(labels)+len(components)-1, since we append again on L199.

Also, let's use reversed := ... instead of var reversed = ... since we're initializing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to prefer var when defining anything that is semantically empty, especially when len(slice) == 0, but I will defer to your preference/style guide.

label/label.go Outdated
for i := range labels {
l := labels[len(labels)-i-1]
reversed = append(reversed, l)
}
repo := strings.Join(append(reversed, components[1:]...), "_")
return strings.NewReplacer("-", "_", ".", "_").Replace(repo)
return regexp.MustCompile(`\W`).ReplaceAllString(repo, "_")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's lift the regexp.MustCompile out of the function to avoid compiling it every time.

Let's also match \W+. Two or more adjacent non-word characters should be replaced by one underscore. So git.sr.ht/~sircmpwn/go-example would become ht_sr_git_sircmpwn_go_example.

This is probably complicated enough that we should have a small unit test in label_test.go to check it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What name would you suggest for the lifted regexp.Regexp?

This is not backwards compatible, currently github.com/user/go-.repo -> com_github_user_go__repo. Is that OK?

I will add a unit test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the lifted regexp, maybe nonWordRe? It'll be close, so it doesn't have to be too verbose.

About compatibility, I think it's okay. A change here should only affect newly added rules or labels without corresponding go_repository. We try to match on existing go_repository rules' importpath attributes before we generate a new name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was also required to change strings.Join, to a character that would be replaced, to prevent ht_sr_git_~sircmpwn_go-example -> ht_sr_git__sircmpwn_go_example. Feel free to suggest a better character than ..

Currently ImportPathToBazelRepoName converts -s and .s to _s. This does
not appear to be required by Bazel's repo name conventions, but may
follow best practices. Unfortunately, this means characters valid for
import paths cause build failures, like git.sr.ht/~sircmpwn/go-example.
This change replaces all consecutive non-word, \W+ or [^A-Za-z0-9_]+
characters, not just - and ., with _. This change also preallocates
reversed, since its length is known.
Copy link
Contributor

@jayconrod jayconrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@jayconrod jayconrod merged commit 33ccc50 into bazel-contrib:master Jan 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants