@@ -11,7 +11,7 @@ The de facto standard package in the Haskell ecosystem for integer-indexed
11
11
array data is the [ vector package] ( http://www.stackage.org/package/vector ) .
12
12
This corresponds at a high level to arrays in C, or the vector class in C++'s
13
13
STL. However, the vector package offers quite a bit of functionality not
14
- familiar to those used to options in imperative and mutable languages.
14
+ familiar to those used to the options in imperative and mutable languages.
15
15
16
16
While the interface for vector is relatively straightforward, the abundance of
17
17
different modules can be daunting. This article will start off with an overview
@@ -20,9 +20,9 @@ examples of using the package.
20
20
21
21
## Example
22
22
23
- Since we're about to jump into a few section of descriptive text, let's kick
23
+ Since we're about to jump into a few sections of descriptive text, let's kick
24
24
this off with a concrete example to whet your appetite. We're going to count
25
- the frequency of different bytes that appear on standard output , and then
25
+ the frequency of different bytes that appear on standard input , and then
26
26
display this content.
27
27
28
28
Note that this example is purposely written in a very generic form. We'll build
@@ -133,8 +133,8 @@ create a pointer to the value in the current cell. This takes up a lot of
133
133
memory for holding pointers, and makes it inefficient to index or traverse the
134
134
list (indexing to position N requires N pointer dereferences).
135
135
136
- By contract , vectors are stored in a packed format in memory, meaning indexing
137
- is an O(1) operation, and the memory overhead per additional item in the vector
136
+ In contrast , vectors are stored in a contiguous set of memory locations , meaning random access
137
+ is a constant time operation, and the memory overhead per additional item in the vector
138
138
is much smaller (depending on the type of vector, which we'll cover in a
139
139
moment). However, compared to lists, prepending an item to a vector is
140
140
relatively expensive: it requires creating a new buffer in memory, copying the
@@ -143,8 +143,9 @@ old values, and then adding the new value.
143
143
There are other data structures that can be considered for list-like data, such
144
144
as ` Seq ` from containers, or in some cases a ` Set ` , ` IntMap ` , or ` Map ` .
145
145
Figuring out the best choice for each use case can only be reliably determined
146
- via profiling. But as a general rule: densely populated lists with integral
147
- access to the values will be best served by vector.
146
+ via profiling and benchmarking. As a general rule though, a densely populated
147
+ collection requiring integral or random access to the values will be best served by
148
+ a vector.
148
149
149
150
Now let's talk about some of the other things that make vector so efficient.
150
151
@@ -204,9 +205,9 @@ memory?
204
205
The vector package has a powerful technique: stream fusion. Using GHC rewrite
205
206
rules, it's able to find many cases where creating a vector is unnecessary, and
206
207
instead create a tight inner loop. In our case, GHC will end up generating code
207
- that can avoid touching system memory, and instead work on just the registers,
208
+ that can avoid touching system memory, and instead work on just the [ registers] ( https://en.wikipedia.org/wiki/Processor_register ) ,
208
209
yielding not only a tiny memory footprint, but performance close to a for-loop
209
- in C. This is one of the beauties of this library: you get to write high-level
210
+ in C. This is one of the beauties of this library: you can write high-level
210
211
code, and optimizations can churn out something much more CPU-friendly.
211
212
212
213
### Slicing
@@ -262,7 +263,7 @@ prefix your function calls with `V.`.
262
263
263
264
* Exercise 3: Use an unboxed (or storable) vector instead of the boxed vectors
264
265
we were using above. What code did you have to change from the original
265
- example? Do your examples from exercise 2 all work still ?
266
+ example? Do all of your examples from exercise 2 still work?
266
267
267
268
There are also a number of functions in the ` Data.Vector ` module with no
268
269
corresponding function in ` Prelude ` . Many of these are related to mutable
@@ -333,15 +334,16 @@ random index, read the old value at that index, increment it, and write it
333
334
back.
334
335
335
336
After we're finished, we _ freeze_ the vector (more on that in the next section)
336
- and print it. The results are the same (or close - we are dealing with random
337
- numbers here) to the previous immutable one . But instead of 48MB and 1.968s,
337
+ and print it. The resulting distribution of values is the same (or close - we are dealing with random
338
+ numbers here) as the previous calculation using an immutable vector . But instead of 48MB and 1.968s,
338
339
this program has a maximum residency of 44KB and runs in 0.247s! That's a
339
340
significant improvement!
340
341
341
342
If we feel like being even more adventurous, we can replace our ` read ` and
342
343
` write ` calls with ` unsafeRead ` and ` unsafeWrite ` . That will disable some
343
344
bounds checks before reading and writing. This can be a nice performance boost
344
- in very tight loops, but has the potential to segfault your program, so caveat
345
+ in very tight loops, but has the potential to [ segfault] ( https://en.wikipedia.org/wiki/Segmentation_fault )
346
+ your program, so caveat
345
347
emptor! For example, try replacing ` replicate 10 ` with ` replicate 9 ` , change
346
348
the ` read ` for an ` unsafeRead ` , and run your program. You'll see something
347
349
like:
@@ -394,8 +396,8 @@ Why not just freeze it in place? Two reasons, actually:
394
396
`write` call would modify our `ivector` value, meaning that the first and
395
397
second call to `print ivector` would have different results!
396
398
397
- 2 . When you freeze a mutable vector, its memory is marked different for
398
- garbage collection purposes. Later trying to write to that same memory can
399
+ 2 . When you freeze a mutable vector, its memory is marked differently for
400
+ garbage collection purposes. Later attempts to write to that same memory can
399
401
lead to a segfault.
400
402
401
403
However , if you really want to avoid that extra buffer copy, and are certain
0 commit comments