Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][MATLAB] Add utility to convert UTF-8 strings to UTF-16 and UTF-16 strings to UTF-8 #36166

Closed
sgilmore10 opened this issue Jun 19, 2023 · 1 comment · Fixed by #36167
Closed

Comments

@sgilmore10
Copy link
Member

sgilmore10 commented Jun 19, 2023

Describe the enhancement requested

MATLAB strings are UTF-16 encoded on all platforms. We need to write a utility that converts UTF-8 strings to UTF-16. We will also need a utility that converts UTF-16 strings to UTF-8.

Component(s)

C++, MATLAB

@sgilmore10
Copy link
Member Author

take

@sgilmore10 sgilmore10 changed the title [MATLAB] Create utility to convert UTF-8 strings to UTF-16 strings [MATLAB] Add utility to convert UTF-8 strings to UTF-16 and UTF-16 strings to UTF-8 Jun 19, 2023
@sgilmore10 sgilmore10 changed the title [MATLAB] Add utility to convert UTF-8 strings to UTF-16 and UTF-16 strings to UTF-8 [C++] [MATLAB] Add utility to convert UTF-8 strings to UTF-16 and UTF-16 strings to UTF-8 Jun 19, 2023
@kou kou changed the title [C++] [MATLAB] Add utility to convert UTF-8 strings to UTF-16 and UTF-16 strings to UTF-8 [C++][MATLAB] Add utility to convert UTF-8 strings to UTF-16 and UTF-16 strings to UTF-8 Jun 19, 2023
kou pushed a commit that referenced this issue Jun 19, 2023
…6 and UTF-16 strings to UTF-8 (#36167)

### Rationale for this change

MATLAB uses UTF-16 encoded strings, but arrow uses UTF-8.  We need a way to convert between the two encodings. 

### What changes are included in this PR?

Added two new utility functions:

1. `std::string UTF16StringToUTF8(const std::basic_string<char16_t>& source)`
2. `std::basic_string<char16_t> UTF8StringToUTF16(const std::string& source)` 

### Are these changes tested?

Added two test cases to `utf8_util_test.cc`:

1. `UTF16StringToUTF8`
2. `UTF8StringToUTF16`

### Are there any user-facing changes?
No, these APIs are intended for developers.

### Future Directions

In a followup PR, we will update the MATLAB Interface source code to use these utilities when converting between UTF16 and UTF8 encoded strings.
* Closes: #36166

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@kou kou added this to the 13.0.0 milestone Jun 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment