Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lenient utf8 parser #34

Merged
merged 1 commit into from
Sep 16, 2022
Merged

lenient utf8 parser #34

merged 1 commit into from
Sep 16, 2022

Conversation

jhump
Copy link
Member

@jhump jhump commented Sep 16, 2022

As of today, protoc allows invalid UTF8. That means that proto sources that are mainly compiled with protoc (such as the googleapis module) could have bad encoding. And that means that protocompile, at least for now, needs to allow it, too.

This makes protocompile work the same way as protoparse: bad encoding bytes are silently replaced with the unicode replacement char. This is how lenient UTF8 decoders are expected to work. This does not match the behavior of protoc, but this is an acceptable variance for now.

This addresses an old bug filed by @amckinney: jhump/protocompile-old#5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants