New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lossy utf-8 string support #314
Comments
Yeah, currently struct Message2 {
stringField @0 :Data;
} and read the message at that type. I think the real fix will be to make |
Thank you for the reply. |
That would immensely help my use case too, which involves calling |
Hello,
This should allow to have no breaking change on the user's side once the API is generated as the Thoughts? Happy to make a PR once I've finished implementing if that sounds suitable. |
Sounds good to me! |
@antoinecordelle could you describe more about why you need this feature? In particular, is it for performance reasons, or is it just to be able to recover the data in the event of encountering some non-utf8 bytes? If it's the latter, then there's a possibility that we could get away with a much smaller change than #429. I'd also be curious to hear about what @osiewicz is doing. Why does In any case, having more context will help justify whatever decision we make. |
#429 is going to require a lot of people to update their code and will make things more verbose. I want to make sure we have a solid justification for doing that. |
Hey, I've moved on from the project since making that comment. @Shaddy should be able to give more perspective. |
Sure. In this case, this is mostly to be able to recover the data from Text fields that may have some non-utf8 bytes. Performance reasons can be a nice to have but the main reason was non-utf8 text. One note that differs from the initial discussions in this issue though, is that I'd need to be able to recover the data in a non-lossy way, so not replacing invalid bytes by placeholders. I mostly implemented it as such in #429 as that seemed like the change that conceptually made the most sense to me. But happy to hear about the simpler way of achieving that if that can limit the breaking change and not require people to change their code. |
The reference c++ library allows users to decide how to approach invalid utf8 encoded text.
for example.
Here we have a Message with one text stringField, if stringField contains invalid utf8 data, the replace_invalid()1 method replaces the invalid characters with the
0xfffd
, the same approach which is the standard rust String doing for String::from_utf8_lossy(). We are losing some information but still, this is a better approach than throwing out the Message entirely (depending on the use case ofc.).I am having a hard time replicating something similar in rust with this library. From what I have tried I am only able to extract a valid UTF-8 &str or error. But I cannot find a way to replace the invalid characters with Unicode replacement characters2.
Is there any way how achieve something similar to the c++ example above?
The text was updated successfully, but these errors were encountered: