Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Int#chr to accept an optional encoding, much like String.new(bytes : Bytes, encoding : String) #11258

Open
postmodern opened this issue Sep 29, 2021 · 3 comments

Comments

@postmodern
Copy link
Contributor

postmodern commented Sep 29, 2021

Discussion

Allow Int#chr to accept an optional encoding for converting a byte value to the encodings defined character for said byte value. String.new(bytes : Bytes, encoding : String) already allows specifying a custom encoding, so I feel like Int#chr should as well.

@straight-shoota
Copy link
Member

straight-shoota commented Sep 29, 2021

The idea does not sound very convincing. It seems like a very rare use case. String.new should be reasonably easy to use. Or you can implement a custom single character encoding method going through an IO if you need to. But I don't see this as a common problem.

Can you explain why you think this would be generally useful as a standard lib feature?

@HertzDevil
Copy link
Contributor

HertzDevil commented Sep 29, 2021

Ths two methods are not related; String.new relates to UTF-8 (or arbitrary) byte sequences, whereas Int#chr deals with UTF-8 codepoints. Not all encodings define their characters in terms of codepoints, so such a generalization isn't possible here. For example, "あ" in Shift JIS is the byte sequence 0x82 0xA0; it is not the result of mapping 0xA082 or 0x82A0 to a code point, and it most certainly does not imply that 0x82.chr(encoding: "Shift-JIS") + 0xA0.chr(encoding: "Shift-JIS") is a well-formed way to produce a Shift JIS-encoded character.

@asterite
Copy link
Member

Right. It seems you can pass an encoding to Ruby's chr, but in my mind that doesn't make much sense. Maybe it only works for one byte, but more bytes aren't supported? But I don't see that as very useful.

Maybe a method Char.new(byte, encoding) and Char.new(bytes, encoding) would make sense, where a single char is decoded from a byte or a byte sequence?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants