Skip to content

ProposedChanges

Albert Graef edited this page Mar 26, 2017 · 1 revision

ag's Notes: These are the notes that dubiousjim sent me about his proposed changes to the standard Pure library, which can be found at https://github.com/dubiousjim/unspoiled. (Please note that the commits in dubiousjim's repo are relative to a revision of the Pure sources which is several months old, so they won't apply cleanly to the current trunk. But I plan to merge at least some of these changes back into the current version.)

There are some nice additions in there, such as a coarse-grained equality predicate ==? implementing Baker's EGAL, an alternative implementation of the dictionary and set data types based on 2-3 trees, an implementation of weak references, optimizations of existing library functions, as well as a bunch of useful new functions operating on lists and other containers. So integrating this stuff into the standard library should be a worthwhile endeavour.

On the other hand, many of the changes are also fairly disruptive, and while dubiousjim has put considerable effort into maintaining backward compatibility, there might be bugs and incompatibilities with existing code. This means that integrating the changes will take time and effort, in order not to break existing functionality in the standard library which is fairly mature and well-tested at this point. The purpose of this page is to provide some materials so that we can begin discussing and implementing these changes.

Proposed Changes to the Standard Library

by dubiousjim

1.

Consider the two lists:

let cell = ref 0;
[set[1,2,3,4],cell], [set[4,3,2,1],cell];

Perhaps an odd combination, but this demonstrates the issue in a concise way. There's a natural sense in which these lists are equivalent, since the two sets are equivalent and the two lists contain physically the same reference cell. However, none of Pure's existing predicates express that equivalence. Comparing the lists with == will not fully reduce, since == is not defined on reference cells (not even when a reference cell is compared to itself). Comparing the lists with === will return false, since although the sets are ==, their backend implementing trees have different structure.

We are adding to Pure a new pair of predicates, ==? and ~=?, that express the kind of equivalence exhibited by these lists. You can think of === as expressing fine-grained structural equivalence, and ==? as expressing an equivalence that may be coarser-grained in ways the user specifies, as we do for sets. Roughly speaking, ==? tests whether == is defined on its operands, and if so, returns that result; else it falls back to whatever === returns. (This is only "roughly speaking" because ==? tries to recurse intelligently.) Unlike ==, ==? is totally defined. Unlike ===, you can override ==? when introducing new types: you do so by implementing a definition of == for those types.

It's natural to wonder, why not just make this the behavior of ==? We could define reference cells to be == when they are ===, and we could work out a way to make == totally defined while still providing hooks for the user to override it for new types. That would also be a possible strategy, but Albert presented some good reasons for not doing things that way. In ordinary cases, though, you needn't confront any three-way choice between ===, ==? and ==. If you want fine-grained structural equality, use ===. If you want a coarser-grained equality that counts set[1,2,3,4] and set[4,3,2,1] as equivalent, use ==?. You can use == instead if you know that == will be defined for all of your object's components. But ==? should agree with == in all cases where they're both defined.

Pure has a member function for its collections like sets. But for some time it has omitted any definition of member for lists, largely because it wasn't clear what equality predicate would be appropriate. If you wanted to use ===, you could write any (value===) xs; if you wanted to use ==, you'd write any (value==) xs, or piggyback onto index, which was defined using ==. Now that ==? is available, we think it does now make sense to provide member for lists, defined in terms of ==?.

We're also redefining index in terms of ==? as well. (This should not break any code unless it relied on index failing to reduce in some cases.)

If you hate the identifier ==? and want to advocate using equal or something else as an infix operator, now is the time to say so, before this new predicate gets widely deployed.

2.

We're also providing a delete function for lists, similar to the existing delete function for collections like sets. The signature of this function is:

delete xs value

and it will return a version of xs that omits its first element (if any) that satisfies (value==?). This behavior is also expressible in terms of other list functions, for example, using the new rmfirstby function defined below it's:

rmfirstby (value==?) xs

But we thought it'd be useful to provide the specific behavior of delete as a separate function, especially since this is already provided for other collections.

We scratched our heads for a while over an apparent conflict in the signature patterns for lists and collections: in list functions, the list tends to come last, whereas in collection function, the collection tends to come first. Then we realized that there is in fact a consistent pattern to Pure's existing functions, which all the new functions we're now adding can also conform to. Here is what the pattern looks like, abstractly:

operation function_or_predicate seed_or_base aggregate key_or_value

No Pure function exhibits all of these elements at the same time, but here are how several functions exhibit pieces of it:

foldl     f                     a            xs
filter    p                                  xs
index                                        xs        x
delete                                       xs        x

It's just an accident of which functions we've had available for lists, and which functions for collections, that made it look like lists would always come last and collections always come first.

3.

Pure has the functions take n xs and drop n xs. Sometimes it's useful to have both of these results at once, and it's natural to retrieve that pair of results using only a single traversal of the list. We offer this function as split n xs. Other languages use names like splitAt, split-at, and stream-split here.

This pattern is useful in some other cases as well. So we offer the function span p xs to return the pair of takewhile p xs and dropwhile p xs, more efficiently than if you called both of those functions explicitly. And we offer the function partition p xs to return the pair of filter p xs and filter ((~).p) xs.

4.

filter p xs gives you all the members of the list that satisfy p. Sometimes it's useful to retrieve only the first member of the list that does so. One way to achieve that is by using dropwhile ((~).p) xs, and then check whether the resulting list has a head. When the predicate p is just checking for equality to a specific value, you could also use index or member. But it is useful to have a version of this that works for general predicates. On occasion, it's also useful to retrieve the last member of the list that satisfies the predicate (and to do so more efficiently than by reversing the list and retrieving the first). So we provide two new groups of functions:

  • firstby p default xs returns the first x that satisfies p, or else default

  • rmfirstby p xs returns xs with the first x that satisfies p (if any) removed

  • pickfirstby p xs returns the pair of the previous two expressions, or else () if no x satisfies p

  • lastby p default xs returns the last x that satisfies p, or else default

  • rmlastby p xs returns xs with the last x that satisfies p (if any) removed

  • picklastby p xs returns the pair of the previous two expressions, or else () if no x satisfies p

You can think of firstby and rmfirstby as being versions of the first and rmfirst function for ordered collections, that operate with a predicate rather than a specific position in the collection. Haskell also uses by with something roughly akin to this meaning.

A variety of different signatures are possible for these functions. Here are some alternatives:

  • Instead of (), pickfirstby p xs might return the original xs in the case that no x satisfies p
  • Or it might take a default argument like firstby does
  • Or both of these functions might omit the default argument and throw an exception
  • Or firstby might omit the default argument, and return either {| x |} or {}, where {| |} is the recently-introduced nonsplicing vector constructor. In this case, pickfirstby p xs would return either {| x, newxs |} or {}. Here {} plays the role of Haskell's Nothing, and any non-empty vector plays the role of Haskell's Just.

If we chose the last strategy, that would naturally invite returning multiple values from split and span and partition using vectors instead of (,)-separated tuples, as well. That has some advantages and some disadvantages. For the time being, we're favoring the more conservative approach of using (,)-separated tuples throughout. But we welcome your feedback on this, and also on the question of whether any of the other alternative signatures listed for firstby and pickfirstby look more attractive to you.

We surveyed a broad range of names for these functions, including alternatives like find, find_last, pop, remove, and so on. Many of the alternatives look better in some uses but are hard to fit into a satisfying pattern, while also respecting other names already present in Pure. If you have opinions about the names proposed here, we'd be glad to hear them. We don't think the names proposed here are far-and-away superior to all natural alternatives; but we have considered many such, and we do have reasons for preferring these.

One easy switch, though, would be to use findfirstby instead of firstby.

5.

We provide a few curried nonsplicing vector and nonsplicing tuple constructors:

vector1 x = {| x |};
vector2 x y = {| x, y |};
vector3 x y z = {| x, y, z |};
tuple2 x y = '(x, y);
tuple3 x y z = '(x, y, z);

Observe that x,() reduces to x, and (x,y),z reduces to x,(y,z); but tuple2 x () is x,() and tuple2 (x,y) z is (x,y),z. Suppressing the natural splicing behavior of (,) in this way is useful for some purposes, such as the defintions of zip (see item 6, below) and the pick* functions.

6.

Many list functions have been moved from prelude.pure into a new file lists.pure, which the prelude automatically includes. In addition to the functions listed above (member, delete, split, span, partition, the firstby/rmfirstby/pickfirstby triad and the corresponding triad for last), we have also added or modified the following list functions:

  • foldl2, foldl3, foldr2, foldr3 fold 2 or 3 lists in parallel, stopping with the shortest

  • catstream [xs, ys, ...] efficient version of cat [stream xs, stream ys, ...]: you might prefer this to xs+ys or cat [xs, ys] when xs isn't already a stream Python calls this function chain

  • rotate xs efficient version of last xs:init xs, providing both results from a single traversal

  • zip xs ys Now joins the head of the two lists using (nonsplicing) tuple2, rather than (,). Formerly: zip [(1,2),()] [3,4] would return [(1,(2,3)), 4] which couldn't be unzipped without error. And disregarding the final element, even if we could unzip it we wouldn't get our original lists back, since (1,2),3 has been flattened into 1,(2,3). Now zipping those two lists will instead return [((1,2),3), ((),4)], and unzipping that result will return exactly the original lists.

The same changes have been made for zip3. If you want a version of zip that uses vectors instead of (,)-tuples, you can use: zipwith vector2 xs ys or zipwith3 vector3 xs ys zs`.

  • reverse_onto, map_onto, catmap_onto,
  • zip_onto, zip3_onto, zipwith_onto, zipwith3_onto

These are most easily explained with some examples:

reverse               [1,2,3] // evaluates to [3,2,1]
reverse_onto  []      [1,2,3] // evaluates to [3,2,1]
reverse_onto  [40,50] [1,2,3] // evaluates to [3,2,1,40,50]

map      succ         [1,2,3] // evaluates to [2,3,4]
map_onto succ []      [1,2,3] // evaluates to [2,3,4]
map_onto succ [40,50] [1,2,3] // evaluates to [2,3,4,40,50]

As per Pure's standing policy, all of these functions are also implemented for strings.

7.

Some new conditional compilation flags have been introduced.

If you run Pure with --enable=list-opt, then many of the existing and new list functions will switch to the skip-ahead implementations Dubiousjim described in an earlier message to the mailing list. For longer lists, these implementations can be 15% or so faster, and use substantially less memory, which still being safe from stack overflow. By default, this flag is disabled, and Pure uses the existing implementations, or implementations in the same style, which achieve stack safety by accumulating the result in an auxiliary list that gets reversed at the end.

If you run Pure with --enable=trees23, then Dubiousjim's balanced tree implementation is used instead of the older avltrees. Again, this can be modestly faster and use less memory. By default, this flag is disabled, and Pure still uses avltrees.

These flags should only affect performance. The semantics of your programs shouldn't be affected. If you find evidence to the contrary, please let us know about it.

8.

Some new functions have been added to the collections libraries.

It was already the case that list (myset) would convert a set (or a bag or dict) into a list. Now one can also do stream (myset) to convert the collection directly into a stream. (No list intermediary needs to be built.)

Additionally, foldl, foldl1, foldr, and foldr1 have been implemented for all collections.

Additionally the following have been implemented for various collections:

  • get, get_last For bags and mdicts, where (!k) currently returns all values associated with that key, these will instead return the first-added value (which is also listed first in the enumeration) and the last-added value (listed last in the enumeration)

  • delete_last For bags and mdicts, where delete ... k and delete_all ... k currently delete the first-added value, or all values, associated with that key

  • pick This complements the existing functions (!) and delete. For bags and mdicts it complements get rather than (!)

  • pick_all, pick_last These complement (!) and get_last for bags and mdicts

  • pick_val This complements the existing function delete_val for dicts

  • pickfirst/picklast Unlike pick and pickfirstby, these don't take a key or predicate, but just return the first member of an ordered collection, paired with the rest of the collection. They complement the existing functions first/last, rmfirst/rmlast

  • filter

  • partition

  • any/all

We would eventually like to port other functionality from the lists library to arrays, including:

  • firstby/rmfirstby/pickfirstby, and likewise for last
  • take/drop/split
  • takewhile/dropwhile/span
  • zipwith*, zipwith3*, unzip, unzip3
  • map*, and do/dowith/dowith3
  • cat/(+)/catmap*
  • reverse*
  • the fold2 and fold3 functions
  • rotn n::int _::array, where n>0 rotates to right, n<0 to left
  • perhaps index/member/delete
  • [scan and sort omitted for now]