Skip to content

Commit d3bf0bc

Browse files
committed
Merge pull request #16 from gittywithexcitement/typos
Update vector.md: wording
2 parents e5e32fc + 0ab82f6 commit d3bf0bc

File tree

1 file changed

+17
-15
lines changed

1 file changed

+17
-15
lines changed

content/vector.md

Lines changed: 17 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ The de facto standard package in the Haskell ecosystem for integer-indexed
1111
array data is the [vector package](http://www.stackage.org/package/vector).
1212
This corresponds at a high level to arrays in C, or the vector class in C++'s
1313
STL. However, the vector package offers quite a bit of functionality not
14-
familiar to those used to options in imperative and mutable languages.
14+
familiar to those used to the options in imperative and mutable languages.
1515

1616
While the interface for vector is relatively straightforward, the abundance of
1717
different modules can be daunting. This article will start off with an overview
@@ -20,9 +20,9 @@ examples of using the package.
2020

2121
## Example
2222

23-
Since we're about to jump into a few section of descriptive text, let's kick
23+
Since we're about to jump into a few sections of descriptive text, let's kick
2424
this off with a concrete example to whet your appetite. We're going to count
25-
the frequency of different bytes that appear on standard output, and then
25+
the frequency of different bytes that appear on standard input, and then
2626
display this content.
2727

2828
Note that this example is purposely written in a very generic form. We'll build
@@ -133,8 +133,8 @@ create a pointer to the value in the current cell. This takes up a lot of
133133
memory for holding pointers, and makes it inefficient to index or traverse the
134134
list (indexing to position N requires N pointer dereferences).
135135

136-
By contract, vectors are stored in a packed format in memory, meaning indexing
137-
is an O(1) operation, and the memory overhead per additional item in the vector
136+
In contrast, vectors are stored in a contiguous set of memory locations, meaning random access
137+
is a constant time operation, and the memory overhead per additional item in the vector
138138
is much smaller (depending on the type of vector, which we'll cover in a
139139
moment). However, compared to lists, prepending an item to a vector is
140140
relatively expensive: it requires creating a new buffer in memory, copying the
@@ -143,8 +143,9 @@ old values, and then adding the new value.
143143
There are other data structures that can be considered for list-like data, such
144144
as `Seq` from containers, or in some cases a `Set`, `IntMap`, or `Map`.
145145
Figuring out the best choice for each use case can only be reliably determined
146-
via profiling. But as a general rule: densely populated lists with integral
147-
access to the values will be best served by vector.
146+
via profiling and benchmarking. As a general rule though, a densely populated
147+
collection requiring integral or random access to the values will be best served by
148+
a vector.
148149

149150
Now let's talk about some of the other things that make vector so efficient.
150151

@@ -204,9 +205,9 @@ memory?
204205
The vector package has a powerful technique: stream fusion. Using GHC rewrite
205206
rules, it's able to find many cases where creating a vector is unnecessary, and
206207
instead create a tight inner loop. In our case, GHC will end up generating code
207-
that can avoid touching system memory, and instead work on just the registers,
208+
that can avoid touching system memory, and instead work on just the [registers](https://en.wikipedia.org/wiki/Processor_register),
208209
yielding not only a tiny memory footprint, but performance close to a for-loop
209-
in C. This is one of the beauties of this library: you get to write high-level
210+
in C. This is one of the beauties of this library: you can write high-level
210211
code, and optimizations can churn out something much more CPU-friendly.
211212

212213
### Slicing
@@ -262,7 +263,7 @@ prefix your function calls with `V.`.
262263

263264
* Exercise 3: Use an unboxed (or storable) vector instead of the boxed vectors
264265
we were using above. What code did you have to change from the original
265-
example? Do your examples from exercise 2 all work still?
266+
example? Do all of your examples from exercise 2 still work?
266267

267268
There are also a number of functions in the `Data.Vector` module with no
268269
corresponding function in `Prelude`. Many of these are related to mutable
@@ -333,15 +334,16 @@ random index, read the old value at that index, increment it, and write it
333334
back.
334335

335336
After we're finished, we _freeze_ the vector (more on that in the next section)
336-
and print it. The results are the same (or close - we are dealing with random
337-
numbers here) to the previous immutable one. But instead of 48MB and 1.968s,
337+
and print it. The resulting distribution of values is the same (or close - we are dealing with random
338+
numbers here) as the previous calculation using an immutable vector. But instead of 48MB and 1.968s,
338339
this program has a maximum residency of 44KB and runs in 0.247s! That's a
339340
significant improvement!
340341

341342
If we feel like being even more adventurous, we can replace our `read` and
342343
`write` calls with `unsafeRead` and `unsafeWrite`. That will disable some
343344
bounds checks before reading and writing. This can be a nice performance boost
344-
in very tight loops, but has the potential to segfault your program, so caveat
345+
in very tight loops, but has the potential to [segfault](https://en.wikipedia.org/wiki/Segmentation_fault)
346+
your program, so caveat
345347
emptor! For example, try replacing `replicate 10` with `replicate 9`, change
346348
the `read` for an `unsafeRead`, and run your program. You'll see something
347349
like:
@@ -394,8 +396,8 @@ Why not just freeze it in place? Two reasons, actually:
394396
`write` call would modify our `ivector` value, meaning that the first and
395397
second call to `print ivector` would have different results!
396398

397-
2. When you freeze a mutable vector, its memory is marked different for
398-
garbage collection purposes. Later trying to write to that same memory can
399+
2. When you freeze a mutable vector, its memory is marked differently for
400+
garbage collection purposes. Later attempts to write to that same memory can
399401
lead to a segfault.
400402

401403
However, if you really want to avoid that extra buffer copy, and are certain

0 commit comments

Comments
 (0)