Skip to content

Commit

Permalink
reword faq to remove reference to indexing strings
Browse files Browse the repository at this point in the history
Fixes #19344
  • Loading branch information
steveklabnik committed Nov 28, 2014
1 parent f33d879 commit 4d1cb78
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/doc/complement-lang-faq.md
Expand Up @@ -108,7 +108,7 @@ The `str` type is UTF-8 because we observe more text in the wild in this encodin

This does mean that indexed access to a Unicode codepoint inside a `str` value is an O(n) operation. On the one hand, this is clearly undesirable; on the other hand, this problem is full of trade-offs and we'd like to point a few important qualifications:

* Scanning a `str` for ASCII-range codepoints can still be done safely octet-at-a-time, with each indexing operation pulling out a `u8` costing only O(1) and producing a value that can be cast and compared to an ASCII-range `char`. So if you're (say) line-breaking on `'\n'`, octet-based treatment still works. UTF8 was well-designed this way.
* Scanning a `str` for ASCII-range codepoints can still be done safely octet-at-a-time. If you use `.as_bytes()`, pulling out a `u8` costs only O(1) and produces a value that can be cast and compared to an ASCII-range `char`. So if you're (say) line-breaking on `'\n'`, octet-based treatment still works. UTF8 was well-designed this way.
* Most "character oriented" operations on text only work under very restricted language assumptions sets such as "ASCII-range codepoints only". Outside ASCII-range, you tend to have to use a complex (non-constant-time) algorithm for determining linguistic-unit (glyph, word, paragraph) boundaries anyways. We recommend using an "honest" linguistically-aware, Unicode-approved algorithm.
* The `char` type is UCS4. If you honestly need to do a codepoint-at-a-time algorithm, it's trivial to write a `type wstr = [char]`, and unpack a `str` into it in a single pass, then work with the `wstr`. In other words: the fact that the language is not "decoding to UCS4 by default" shouldn't stop you from decoding (or re-encoding any other way) if you need to work with that encoding.

Expand Down

5 comments on commit 4d1cb78

@bors
Copy link
Contributor

@bors bors commented on 4d1cb78 Nov 28, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

saw approval from alexcrichton
at steveklabnik@4d1cb78

@bors
Copy link
Contributor

@bors bors commented on 4d1cb78 Nov 28, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merging steveklabnik/rust/gh19344 = 4d1cb78 into auto

@bors
Copy link
Contributor

@bors bors commented on 4d1cb78 Nov 28, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

steveklabnik/rust/gh19344 = 4d1cb78 merged ok, testing candidate = 29e928f

@bors
Copy link
Contributor

@bors bors commented on 4d1cb78 Nov 28, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bors
Copy link
Contributor

@bors bors commented on 4d1cb78 Nov 28, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fast-forwarding master to auto = 29e928f

Please sign in to comment.