Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes the check for the valid UTF8 symbols #2135

Merged
merged 1 commit into from Nov 15, 2021

Conversation

crsib
Copy link
Member

@crsib crsib commented Nov 15, 2021

Resolves: #2132
Resolves: #2134

(short description of the changes and the motivation to make the changes)

  • I signed CLA
  • The title of the pull request describes an issue it addresses
  • If changes are extensive, then there is a sequence of easily reviewable commits
  • Each commit's message describes its purpose and effects
  • There are no behavior changes unnecessary for the stated purpose of the PR

Recommended:

  • Each commit compiles and runs on my machine without known undesirable changes of behavior

@crsib crsib added the P1 Highest level priority bugs (ship blocker / must fix) label Nov 15, 2021
@crsib crsib requested a review from Paul-Licameli Nov 15, 2021
@crsib crsib added this to In progress in Sprint 9 - 3.2 release needed R&D via automation Nov 15, 2021
@crsib crsib moved this from In progress to Review in progress in Sprint 9 - 3.2 release needed R&D Nov 15, 2021
@Paul-Licameli
Copy link
Member

Paul-Licameli commented Nov 15, 2021

This looks harmless. I don't immediately understand why this fixes the bug. Is it because char is signed? But the signed-unsigned comparison would have coerced the negative to a large unsigned value.

More seriously I should ask why all the logic in XMLWriterx::XMLEsc was not reproduced in XMLUtf8BufferWriter::WriteEscaped. I should have asked that question with the earlier PR.

@crsib
Copy link
Member Author

crsib commented Nov 15, 2021

But the signed-unsigned comparison would have coerced the negative to a large unsigned value.

This is a signed-signed conversion, unfortunately.

More seriously I should ask why all the logic in XMLWriterx::XMLEsc was not reproduced in

The problem is that it was mostly reproduced, just with the exception of surrogate pairs handling.

@crsib crsib merged commit d707721 into audacity:release-3.1.2 Nov 15, 2021
4 checks passed
Sprint 9 - 3.2 release needed R&D automation moved this from Review in progress to Ready for QA Nov 15, 2021
@crsib crsib deleted the 2132_utf8_handling branch Nov 15, 2021
@Paul-Licameli
Copy link
Member

Paul-Licameli commented Nov 15, 2021

But the signed-unsigned comparison would have coerced the negative to a large unsigned value.

This is a signed-signed conversion, unfortunately.

DUH! I see 0x... and assume unsigned, but there wasn't ...u after it. Of course.

More seriously I should ask why all the logic in XMLWriterx::XMLEsc was not reproduced in

The problem is that it was mostly reproduced, just with the exception of surrogate pairs handling.

@Paul-Licameli
Copy link
Member

Paul-Licameli commented Nov 15, 2021

But the signed-unsigned comparison would have coerced the negative to a large unsigned value.

This is a signed-signed conversion, unfortunately.

More seriously I should ask why all the logic in XMLWriterx::XMLEsc was not reproduced in

The problem is that it was mostly reproduced, just with the exception of surrogate pairs handling.

And we know the string has been converted to utf8 here

void XMLUtf8BufferWriter::WriteAttr(const std::string_view& name, const Identifier& value) 
{
   const wxScopedCharBuffer utf8Value = value.GET().utf8_str();

   WriteAttr(name, { utf8Value.data(), utf8Value.length() });
}

And that makes the surrogate handling unnecessary? It is ok to leave all the utf8 encodings un-escaped?

@crsib
Copy link
Member Author

crsib commented Nov 15, 2021

It is ok to leave all the utf8 encodings un-escaped

Surrogates are easier in UTF8. All the symbols we wanted to keep away from XML (although I do not really understand why) or escape are in the lower 7 bits. If the most significant bit is set - we definitely have a multiple bytes sequence, which we considered to be "safe" for XML.

And we know the string has been converted to utf8 here

This is an easy point for further improvements of the XMLWriter class though. The cases when we really need to convert the value from the wxString (or even to construct it!) are rare. At least the name argument should always be a std::string_view or even a const char*. This will like improve the serialization performance twice.

@Penikov Penikov moved this from Ready for QA to Done in Sprint 9 - 3.2 release needed R&D Nov 16, 2021
@AnitaBats AnitaBats added this to the Audacity 3.1.2 milestone Nov 16, 2021
@AnitaBats AnitaBats added this to In progress in Sprint 8 - 3.1 Stabilisation via automation Nov 16, 2021
@AnitaBats AnitaBats moved this from In progress to In QA in Sprint 8 - 3.1 Stabilisation Nov 16, 2021
@AnitaBats AnitaBats moved this from In QA to Done in Sprint 8 - 3.1 Stabilisation Nov 16, 2021
hugofloresgarcia pushed a commit to audacitorch/audacity that referenced this pull request Dec 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 Highest level priority bugs (ship blocker / must fix)
Projects
No open projects
3 participants