/
structures.pod6
475 lines (373 loc) · 20.2 KB
/
structures.pod6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
=begin pod :tag<perl6>
=TITLE Data structures
=SUBTITLE How Perl 6 deals with data structures and what we can expect from them.
=head1 Scalar structures
Some classes do not have any I<internal> structure, and to access parts
of them specific methods have to be used. Numbers, strings, and some
other monolithic classes are included in that class. They use the C<$>
sigil, although complex data structures can also use it.
my $just-a-number = 7;
my $just-a-string = "8";
There is a L<Scalar> class, which is used internally to assign a default
value to variables declared with the C<$> sigil.
my $just-a-number = 333;
say $just-a-number.VAR.^name; # OUTPUT: «Scalar»
Any complex data structure can be I<scalarized> by using the L<item
contextualizer C<$>|/type/Any#index-entry-%24_(item_contextualizer)>:
(1,2,3, $(4,5))[3].VAR.^name.say; # OUTPUT: «Scalar»
However, this means that it will be treated as such in the context they are. You
can still access its internal structure.
(1,2,3, $(4,5))[3][0].say; # OUTPUT: «4»
An interesting side effect, or maybe intended feature, is that scalarization
conserves identity of complex structures.
for ^2 {
my @list =(1,1);
say @list.WHICH;
} # OUTPUT: «Array|93947995146096Array|93947995700032»
Every time C<(1,1)> is assigned, the variable created is going to be different
in the sense that C<===> will say it is; as it is shown, different values of the
internal pointer representation are printed. However
for ^2 {
my $list =(1,1);
say $list.WHICH
} # OUTPUT: «List|94674814008432List|94674814008432»
In this case, C<$list> is using the Scalar sigil and thus will be an C<Scalar>. Any scalar will the same value will be exactly the same, as shown when printing the pointers.
=head1 Complex data structures
Complex data structures fall in two different broad categories
L<Positional|/type/Positional>, or list-like and
L<Associative|/type/Associative>, or key-value pair like, according to how you
access its first-level elements. In general, complex data structures, including
objects, will be a combination of both, with object properties assimilated to
key-value pairs. While all objects subclass L<Mu>, in general complex objects are instances of subclasses of L<Any>. While it is theoretically possible to mix in C<Positional> or C<Associative> without doing so, most methods who apply to complex data structures are implemented in C<Any>.
Navigating these complex data structures is a challenge, but Perl 6 provides a couple of functions that can be used on them: L<C<deepmap>|/routine/deepmap> and L<C<duckmap>|/routine/duckmap>. While the former will go to every single element, in order, and do whatever the block passed requires,
say [[1,2,[3,4]],[[5,6,[7,8]]]].deepmap( *.elems );
# OUTPUT: «[[1 1 [1 1]] [1 1 [1 1]]]»
which returns 1 because it goes to the deeper level and applies C<elems> to
them, C<deepmap> can perform more complicated operations:
say [[1,2,[3,4]],[[5,6,[7,8]]]].duckmap:
-> $array where .elems == 2 { $array.elems }
# OUTPUT: «[[1 2 2] [5 6 2]]»
In this case, it dives into the structure, but returns the element itself if it
does not meet the condition in the block (C<1,2>), returning the number of
elements of the array if it does (the two C<2>s at the end of each subarray).
Since C<d(eep|uck)map> are C<Any> methods, they also apply to Associative arrays:
say %( first => [1,2], second => [3,4]).deepmap( *.elems )
# OUTPUT: «{first => [1 1], second => [1 1]}»
Only in this case, they will be applied to every list or array that is a value, leaving the keys alone.
C<Positional> and C<Associative> can be turned into each other.
say %( first => [1,2], second => [3,4]).list[0]# OUTPUT: «second => [3 4]»
However, in this case, and for Rakudo >= 2018.05, it will return a different
value every time you run. A hash will be turned into a list of the key-value
pairs, but it is guaranteed to be disordered. You can also do the operation in
the opposite direction, as long as the list has an even number of elements (odd
number will result in an error):
say <a b c d>.Hash # OUTPUT: «{a => b, c => d}»
But
say <a b c d>.Hash.kv # OUTPUT: «(c d a b)»
will obtain a different value every time you run it;
L<C<kv>|/type/Pair#method_kv> turns every C<Pair> into a list.
Complex data structures are also L<Iterable>. Generating an L<iterator> out of
them will allow the program to visit the first level of the structure, one by
one:
.say for 'א'..'ס' # OUTPUT: «אבגדהוזחטיךכלםמןנס»
C<'א'..'ס'> is a L<Range>, a complex data structure, and with C<for> in front it
will iterate until the list is exhausted. You can use C<for> on your complex
data structures by overriding the L<iterator> method (from role C<Iterable>):
class SortedArray is Array {
method iterator() {
self.sort.iterator
}
};
my @thing := SortedArray.new([3,2,1,4]);
.say for @thing; # OUTPUT: «1234»
C<for> calls directly the C<iterator> method on C<@thing> making it return the
elements of the array in order. Much more on L<iterating on the page devoted to it|/language/iterating>.
=head1 Functional structures
Perl 6 is a functional language and, as such, functions are first-class I<data>
structures. Functions follow the L<Callable> role, which is the 4th element in
the quartet of fundamental roles. L<Callable> goes with the C<&> sigil, although
in most cases it is elided for the sake of simplicity; this sigil elimination is
always allowed in the case of C<Callables>.
my &a-func= { (^($^þ)).Seq };
say a-func(3), a-func(7)# OUTPUT: «(0 1 2)(0 1 2 3 4 5 6)»
L<Block>s are the simplest callable structures, since C<Callable>s cannot be
instantiated. In this case we implement a block that logs events and can
retrieve them
my $logger = -> $event, $key = Nil {
state %store;
if ( $event ) {
%store{ DateTime.new( now ) } = $event;
} else {
%store.keys.grep( /$key/ )
}
}
$logger( "Stuff" );
$logger( "More stuff");
say $logger( Nil, "2018-05-28"); # OUTPUT: «(Stuff More stuff)»
A C<Block> has a L<Signature>, in this case two arguments, the first of
which is the event that is going ot be logged, and the second is the key
to retrieve the events. They will be used in an independent way, but its
intention is to showcase the use of a L<state variable|/syntax/state>
that is kept from every invocation to the next. This state variable is
encapsulated within the block, and cannot be accessed from outside
except by using the simple API the block provides: calling the block
with a second argument. The two first invocations log two events, the
third invocation at the bottom of the example use this second type of
call to retrieve the stored values. C<Block>s can be cloned
=begin code :preamble<my $logger = sub ( $a, $b ){ return $a, $b }>
my $clogger = $logger.clone;
$clogger( "Clone stuff" );
$clogger( "More clone stuff");
say $clogger( Nil, "2018-05-28");
# OUTPUT: «(Clone stuff More clone stuff)»
=end code
And cloning will reset the state variable; instead of cloning, we can create
I<façades> that change the API. For instance, eliminate the need to use C<Nil>
as first argument to retrieve the log for a certain date:
=begin code :preamble<my $logger = sub ( $a, $b ){ return $a, $b }>
my $gets-logs = $logger.assuming( Nil, * );
$logger( %(changing => "Logs") );
say $gets-logs( "2018-05-28" );
# OUTPUT: «({changing => Logs} Stuff More stuff)»
=end code
L<C<assuming>|/type/Block#(Code)_method_assuming> wraps around a block
call, giving a value (in this case, C<Nil>) to the arguments we need,
and passing on the arguments to the other arguments we represent using
C<*>. In fact, this corresponds to the natural language statement "We
are calling C<$logger> I<assuming> the first argument is C<Nil>". We can
slightly change the appearance of these two Blocks to clarify they are
actually acting on the same block:
=begin code :preamble<my $logger = sub ( $a, $b ){ return $a, $b }>
my $Logger = $logger.clone;
my $Logger::logs = $Logger.assuming( *, Nil );
my $Logger::get = $Logger.assuming( Nil, * );
$Logger::logs( <an array> );
$Logger::logs( %(key => 42 ) );
say $Logger::get( "2018-05-28");
=end code
Although C<::> is generally used for invocation of class methods, it is
actually a valid part of the name of a variable. In this case we use
them conventionally to simply indicate C<$Logger::logs> and
C<$Logger::get> are actually calling C<$Logger>, which we have
capitalized to use a class-like appearance. The point of this tutorial
is that using functions as first-class citizens, together with the use
of state variables, allows the use of certain interesting design
patterns such as this one.
As such first class data structures, callables can be used anywhere another type of data can.
my @regex-check = ( /<alnum>/, /<alpha>/, /<punct>/ );
say @regex-check.map: "33af" ~~ *;
# OUTPUT: «(「3」 alnum => 「3」 「a」 alpha => 「a」 Nil)»
Regexes are actually a type of callable:
say /regex/.does( Callable );#OUTPUT: «True»
And in the example above we are calling regexes stored in an array, and
applying them to a string literal.
Callables are composed by using the L<function composition operator ∘|/language/operators#infix_%E2%88%98>:
=begin code :preamble<my $Logger::logs = sub ( $a ) { $a }>
my $typer = -> $thing { $thing.^name ~ ' → ' ~ $thing };
my $Logger::withtype = $Logger::logs ∘ $typer;
$Logger::withtype( Pair.new( 'left', 'right' ));
$Logger::withtype( ¾ );
say $Logger::get( "2018-05-28" );
# OUTPUT: «(Pair → left right Rat → 0.75)»
=end code
We are composing C<$typer> with the C<$Logger::logs> function defined
above, obtaining a function that logs an object prececed by its type,
which can be useful for filtering, for instance. C<$Logger::withtype>
is, in fact, a complex data structure composed of two functions which
are applied in a serial way, but every one of the callables composed can
keep state, thus creating complex transformative callables, in a design
pattern that is similar to object composition in the object oriented
realm. You will have to choose, in every particular case, what is the programming style which is most suitable for your problem.
=head1 Defining and constraining data structures
Perl 6 has different ways of defining data structures, but also many
ways to constrain them so that you can create the most adequate data
structure for every problem domain. L<C<but>|/routine/but>, for example,
mixes roles or values into a value or a variable:
=begin code
my %not-scalar := %(2 => 3) but Associative[Int,Int];
say %not-scalar.^name; # OUTPUT: «Hash+{Associative[Int,Int]}»
say %not-scalar.of; # Associative[Int,Int]»
%not-scalar{3}=4;
%not-scalar<thing>=3;
say %not-scalar; # OUTPUT: «{2 => 3, 3 => 4, thing => 3}»
=end code
In this case, C<but> is mixing in the C<Associative[Int,Int]> role;
please note that we are using binding so that the type of the variable
is the one defined, and not the one imposed by the C<%> sigil; this
mixed-in role shows in the C<name> surrounded by brackets. What does
that really mean? That role includes two methods, C<of> and C<keyof>; by
mixing the role in, the new C<of> will be called (the old C<of> would
return C<Mu>, which is the default value type for Hashes). However, that
is all it does. It is not really changing the type of the variable, as
you can see since we are using any kind of key and values in the next
few statements.
However, we can provide new functionality to a variable using this type
of mixin:
=begin code
role Lastable {
method last() {
self.sort.reverse[0]
}
}
my %hash-plus := %( 3 => 33, 4 => 44) but Lastable;
say %hash-plus.sort[0]; # OUTPUT: «3 => 33»
say %hash-plus.last; # OUTPUT: «4 => 44»
=end code
In C<Lastable> we use the universal C<self> variable to refer to
whatever object this particular role is mixed in; in this case it will
contain the hash it is mixed in with; it will contain something else
(and possibly work some other way) in other case. This role will provide
the C<last> method to any variable it's mixed with, providing new,
L<Subsets|/language/typesystem#subset> can also be used to constrain the
possible values a variable might hold; they are Perl 6 attempt at
L<gradual typing|https://en.wikipedia.org/wiki/Gradual_typing>; it is
not a full attempt, because subsets are not really types in a strict
sense, but they allow runtime type checking. It adds type-checking
functionality to regular types, so it helps create a richer type system,
allowing things like the one shown in this code:
=begin code
subset OneOver where (1/$_).Int == 1/$_;
my OneOver $one-fraction = ⅓;
say $one-fraction; # OUTPUT: «0.333333»
=end code
On the other hand, C<my OneOver $ = ⅔;> will cause a type-check error.
Subsets can use C<Whatever>, that is, C<*>, to refer to the argument;
but this will be instantiated every time you use it to a different
argument, so if we used it twice in the definition we would get an
error. In this case we are using the topic single variable, C<$_>, to
check the instantiation. Subsetting can be done directly, without the
need of declaring it, in L<signatures|/language/typesystem#subset>.
=head1 Infinite structures and laziness
It might be assumed that all the data contained in a data structure is actually
I<there>. That is not necessarily the case: in many cases, for efficiency
reasons or simply because it is not possible, the elements contained in a data
structure only jump into existence when they are actually needed. This
computation of items as they are needed is called
L<reification|/language/glossary#Reify>.
# A list containing infinite number of un-reified Fibonacci numbers:
my @fibonacci = 1, 1, * + * … ∞;
# We reify 10 of them, looking up the first 10 of them with array index:
say @fibonacci[^10]; # OUTPUT: «(1 1 2 3 5 8 13 21 34 55)»
# We reify 5 more: 10 we already reified on previous line, and we need to
# reify 5 more to get the 15th element at index 14. Even though we need only
# the 15th element, the original Seq still has to reify all previous elements:
say @fibonacci[14]; # OUTPUT: «987»
Above we were reifying a L<Seq> we created with the L<sequence
operator|/language/operators#index-entry-%E2%80%A6_operators>, but other data
structures use the concept as well. For example, an un-reified L<Range> is just
the two end points. In some languages, calculating the sum of a huge range is a
lengthy and memory-consuming process, but Perl 6 calculates it instantly:
say sum 1 .. 9_999_999_999_999; # OUTPUT: «49999999999995000000000000»
Why? Because the sum can be calculated I<without> reifying the Range; that is,
without figuring out all the elements it contains. This is why this feature
exists. You can even make your own things reify-on-demand, using
L«C<gather> and C<take>|/syntax/gather%20take»:
my $seq = gather {
say "About to make 1st element"; take 1;
say "About to make 2nd element"; take 2;
}
say "Let's reify an element!";
say $seq[0];
say "Let's reify more!";
say $seq[1];
say "Both are reified now!";
say $seq[^2];
# OUTPUT:
# Let's reify an element!
# About to make 1st element
# 1
# Let's reify more!
# About to make 2nd element
# 2
# Both are reified now!
# (1 2)
Following the output above, you can see the print statements I<inside> the
C<gather> got executed only when we reified the individual elements while looking
up an element. Also note that the elements got reified just once. When we printed
the same elements again on the last line of the example, the messages inside
C<gather> was no longer printed. This is because the construct used
already-reified elements from the L<Seq>'s cache.
Note that above we assigned the C<gather> to a L<Scalar> container (the C<$>
sigil), not the L<Positional> one (the C<@> sigil). The reason is that the
C<@>-sigiled variables are I<mostly eager>. What this means is they
I<reify the stuff assigned to them> right away I<most of the time>. The only
time they don't do it is when the items are known to be
L«C<is-lazy>|/routine/is-lazy», like our sequence generated with infinity as the
end point. Were we to assign the C<gather> to a C<@>-variable, the C<say>
statements inside of it would've been printed right away.
Another way to fully-reify a list, is by calling L«C<.elems>|/routine/elems» on
it. This is the reason why checking whether a list contains any items is best
done by using C<.Bool> method (or just using C<if @array { … }>), since you
don't need to reify I<all> the elements to find out if there are C<any> of them.
There are times where you I<do> want to fully-reify a list before doing
something. For example, the L«C<IO::Handle.lines>|/type/IO::Handle#method_lines»
returns a L<Seq>. The following code contains a bug; keeping reification in
mind, try to spot it:
=for code
my $fh = "/tmp/bar".IO.open;
my $lines = $fh.lines;
close $fh;
say $lines[0];
We open a L<filehandle|/type/IO::Handle>, then assign return of
L«C<.lines>|/type/IO::Handle#method_lines» to a L<Scalar> variable, so the
returned L<Seq> does not get reified right away. We then
L«C<close>|/routine/close» the filehandle, and try to print an element from
C<$lines>.
The bug in the code is by the time we reify the C<$lines> L<Seq> on the last
line, we've I<already closed> the filehandle. When the C<Seq's> iterator tries
to generate the item we've requested, it results in the error about attempting
to read from a closed handle. So, to fix the bug we can either assign to
a C<@>-sigiled variable or call L«C<.elems>|/routine/elems» on C<$lines> before
closing the handle:
=for code
my $fh = "/tmp/bar".IO.open;
my @lines = $fh.lines;
close $fh;
say @lines[0]; # no problem!
We can also use any function whose side effect is reification, like C<.elems>
mentioned above:
=for code
my $fh = "/tmp/bar".IO.open;
my $lines = $fh.lines;
say "Read $lines.elems() lines"; #reifying before closing handle
close $fh;
say $lines[0]; # no problem!
Using L<eager|/routine/eager> will also reify the whole sequence:
=for code
my $fh = "/tmp/bar".IO.open;
my $lines = eager $fh.lines; # Uses eager for reification.
close $fh;
say $lines[0];
=head1 Introspection
Languages that allow
L<introspection|https://en.wikipedia.org/wiki/Introspection> like Perl 6
have functionalities attached to the type system that let the developer
access container and value metadata. This metadata can be used in a
program to carry out different actions depending on their value. As is
obvious from the name, metadata are extracted from a value or container
via the metaclass.
my $any-object = "random object";
my $metadata = $any-object.HOW;
say $metadata.^mro; # OUTPUT: «((ClassHOW) (Any) (Mu))
say $metadata.can( $metadata, "uc" ); # OUTPUT: «(uc uc)»
With the first C<say> we show the class hierarchy of the metamodel class, which
in this case is L<Metamodel::ClassHOW>. It inherits directly from C<Any>,
meaning any method there can be used; it also mixes in several roles which can
give you information about the class structure and functions. But one of the
methods of that particular class is L<C<can>|/Metamodel::ClassHOW#method_can>,
which we can use to look up whether the object can use the C<uc> (uppercase)
method, which it obviously can. However, it might not be so obvious in some
other cases, when roles are mixed in directly into a variable. For instance, in
the L<case of C<%hash-plus> defined above|#Defining_and_constraining_data_structures>:
=for code :preamble<my %hash-plus>
say %hash-plus.^can("last"); # OUTPUT «(last)»
In this case we are using the I<syntactic sugar> for C<HOW.method>, C<^method>,
to check if your data structure responds to that method; the output, which shows
the name of the methods that match, certifies that we can use it.
See also L<this article on class
introspection|https://perl6advent.wordpress.com/2015/12/19/day-19-introspection/>
on how to access class properties and methods, and use it to generate test data
for a class; this L<Advent Calendar article describes the meta-object protocol>
extensively.
=end pod
# vim: expandtab softtabstop=4 shiftwidth=4 ft=perl6