Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Remove ancient string types
This bids fairwell to the C<AnyChar>, C<Char>, C<CharLingua>
C<Grapheme>, C<Codepoint>, and C<Byte> types. Note that S32::Str needs
some more involved editing to be more in line with how strings are handled
handled nowadays.
  • Loading branch information
ShimmerFairy committed Jul 24, 2015
1 parent 9fc6bb7 commit af82a6f
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 75 deletions.
4 changes: 1 addition & 3 deletions S03-operators.pod
Expand Up @@ -3804,9 +3804,8 @@ the table assumes the following types will behave similarly:
named values created with
Class, Enum, or Role,
or generic type binding Type
Char Cat Str
Cat Str
Int UInt etc. Num
Byte Str or Int
Buf Str or Array of Int

(Note, however, that these mappings can be overridden by explicit
Expand Down Expand Up @@ -3898,7 +3897,6 @@ Various proposed-but-deprecated smartmatch behaviors may be easily
Hash Num hash element truth .{X}
Hash Num hash key existence .{X}:exists
Buf Int buffer contains int .match(X)
Str Char string contains char .match(X)
Str Str string contains string .match(X)
Array Scalar array contains item .any === X
Str Array array contains string X.any
Expand Down
45 changes: 1 addition & 44 deletions S29-functions.pod
Expand Up @@ -56,49 +56,6 @@ The following type declarations are assumed:

=over

=item AnyChar

The root class of all "character" types, regardless of level.

This is a subtype of C<Str>, limited to a length of 1 at its highest
supported Unicode level.

The type name C<Char> is aliased to the maximum supported Unicode level
in the current lexical scope (where "current" is taken to mean the
eventual lexical scope for generic code (roles and macros), not the
scope in which the generic code is defined). In other words, use C<Char>
when you don't care which level you're writing for.

Subclasses (things that are C<isa AnyChar>):

=over

=item CharLingua (language-defined characters)

=item Grapheme (language-independent graphemes)

=item Codepoint

=item Byte

Yes, Byte is both a string and a number.

=back

The short name for C<Grapheme> is typically C<Char> since that's the
default Unicode level. A grapheme is defined as a base codepoint plus
any subsequent "combining" codepoints that apply to that base codepoint.
Graphemes are always assigned a unique integer id which, in the case of
a grapheme that has a precomposed codepoint, happens to be the same as
that codepoint.

There is no short name for C<CharLingua> because the type is meaningless
outside the scope of a particular language declaration. In fact,
C<CharLingua> is itself an abstract type that cannot be instantiated.
Instead you have names like C<CharFrench>, C<CharJapanese>,
C<CharTurkish>, etc. for instantiated C<CharLingua> types.
(Plus the corresponding C<StrLingua> types, presumably.)

=item Matcher

subset Matcher of Mu where * !=== any(Bool,Match,Nil)
Expand Down Expand Up @@ -341,7 +298,7 @@ detailed information on object creation.

multi sub chrs( Int *@grid --> Str )
multi method ords( Str $string: --> List of Int ) is export
multi method chr( Int $grid: --> Char ) is export
multi method chr( Int $grid: --> Str ) is export
multi method ord( Str $string: --> Int ) is export

C<chrs> takes zero or more integer grapheme ids and returns the
Expand Down
39 changes: 11 additions & 28 deletions S32-setting-library/Str.pod
Expand Up @@ -18,29 +18,12 @@ The document is a draft.

General notes about strings:

A Str can exist at several Unicode levels at once. Which level you
interact with typically depends on what your current lexical context has
declared the "working Unicode level to be". Default is C<Grapheme>.
[Default can't be C<CharLingua> because we don't go into "language"
mode unless there's a specific language declaration saying either
exactly what language we're going into or, in the absence of that, how to
find the exact language somewhere in the environment.]

Attempting to use a string at a level higher it can support is handled
without warning. The current highest supported level of the string
is simply mapped Char for Char to the new higher level. However,
attempting to stuff something of a higher level a lower-level string
is an error (for example, attempting to store Kanji in a Byte string).
An explicit conversion function must be used to tell it how you want it
encoded.

Attempting to use a string at a level lower than what it supports is not
allowed.

If a function takes a C<Str> and returns a C<Str>, the returned C<Str>
will support the same levels as the input, unless specified otherwise.

The following are all provided by the C<Str> role:
The C<Str> class contains strings encoded at the NFG level. Other standard
Unicode normalizations can be found in their appropriately-named types: C<NFC>,
C<NFD>, C<NFKC>, and C<NFKD>. The C<Uni> type contains a string in a mixture of
normalizations (i.e. not normalized). S15 describes these in more detail.

The following are all provided by the C<Str> class, as well as related classes:

=over

Expand Down Expand Up @@ -296,11 +279,11 @@ is equivalent to

map {.Str}, $string.match(rx:global:x(0..$n):c/pat/)

You may also comb lists and filehandles. C<+$*IN.comb> counts the characters
on standard input, for instance. C<comb(/./, $thing)> returns a list of single
C<Char> strings from anything that can give you a C<Str>. Lists and
filehandles are automatically fed through C<cat> in order to pretend to
be string. This C<Cat> is also lazy.
You may also comb lists and filehandles. C<+$*IN.comb> counts the characters on
standard input, for instance. C<comb(/./, $thing)> returns a list of single
character strings from anything that can give you a C<Str>. Lists and
filehandles are automatically fed through C<cat> in order to pretend to be
string. This C<Cat> is also lazy.

If the C<:match> adverb is applied, a list of C<Match> objects (one
per match) is returned instead of strings. This can be used to
Expand Down

0 comments on commit af82a6f

Please sign in to comment.