Simple sequence abstract datatype, intended to iterate efficiently on collections while performing some transformations.
Clone or download
Latest commit ef45c8a Jun 2, 2018

README.adoc

Sequence

Simple sequence abstract datatype, intended to iterate efficiently on collections while performing some transformations.

Common operations supported by Sequence include filter, map, take, drop, append, flat_map, etc. Sequence is not designed to be as general-purpose or flexible as, say, Batteries' 'a Enum.t. Rather, it aims at providing a very simple and efficient way of iterating on a finite number of values, only allocating (most of the time) one intermediate closure to do so. For instance, iterating on keys, or values, of a Hashtbl.t, without creating a list.

Build Status

Documentation

There is only one important type, 'a Sequence.t, and lots of functions built around this type. To get an overview of sequence, its origins and why it was created, you can start with the slides of a talk I (@c-cube) made at some OCaml meeting.

See the online API for more details on the set of available functions.

Build

  1. via opam opam install sequence

  2. manually (need OCaml >= 4.02.0): make all install

If you have qtest installed, you can build and run tests with

$ ./configure --enable-tests
$ make test

If you have benchmarks installed, you can build and run benchmarks with

$ make benchs
$ ./benchs.native

To see how to use the library, check the examples directory. tests.ml has a few examples of how to convert basic data structures into sequences, and conversely.

Short Tutorial

Transferring Data

Conversion between n container types would take n² functions. In practice, for a given collection we can at best hope for to_list and of_list. With sequence, if the source structure provides a iter function (or a to_seq wrapper), it becomes:

# let q = Queue.create();;
# Sequence.( 1 -- 10 |> to_queue q);;
- : unit = ()
# Sequence.of_queue q |> Sequence.to_list ;;
- : int list = [1; 2; 3; 4; 5; 6; 7; 8; 9; 10]

# let s = Stack.create();;
# Sequence.(of_queue q |> to_stack s);;
- : unit = ()
# Sequence.of_stack s |> Sequence.to_list ;;
- : int list = [10; 9; 8; 7; 6; 5; 4; 3; 2; 1]

Note how the list of elements is reversed when we transfer them from the queue to the stack.

Another example is extracting the list of values of a hashtable (in an undefined order that depends on the underlying hash function):

# let h = Hashtbl.create 16;;
# for i = 0 to 10 do
     Hashtbl.add h i (string_of_int i)
  done;;
- : unit = ()

# Hashtbl.length h;;
- : int = 11

(* now to get the values *)
# Sequence.of_hashtbl h |> Sequence.map snd |> Sequence.to_list;;
- : string list = ["6"; "2"; "8"; "7"; "3"; "5"; "4"; "9"; "0"; "10"; "1"]

Replacing for loops

The for loop is a bit limited, and lacks compositionality. Instead, it can be more convenient and readable to use Sequence.(--) : int → int → int Sequence.t.

# Sequence.(1 -- 10_000_000 |> fold (+) 0);;
- : int = 50000005000000

# let p x = x mod 5 = 0 in
  Sequence.(1 -- 5_000
    |> filter p
    |> map (fun x -> x * x)
    |> fold (+) 0
  );;
- : int = 8345837500
Note
with flambda under sufficiently strong optimization flags, such compositions of operators should be compiled to an actual loop with no overhead!

Iterating on sub-trees

A small λ-calculus AST, and some operations on it.

# type term =
  | Var of string
  | App of term * term
  | Lambda of term ;;

# let rec subterms : term -> term Sequence.t =
  fun t ->
    let open Sequence.Infix in
    Sequence.cons t
      (match t with
      | Var _ -> Sequence.empty
      | Lambda u -> subterms u
      | App (a,b) ->
        Sequence.append (subterms a) (subterms b))
  ;;

(* Now we can define many other functions easily! *)
# let vars t =
    Sequence.filter_map
      (function Var s -> Some s | _ -> None)
      (subterms t) ;;
val vars : term -> string sequence = <fun >

# let size t = Sequence.length (subterms t) ;;
val size : term -> int = <fun >

# let vars_list l = Sequence.(of_list l |> flat_map vars);;
val vars_list : term list -> string sequence = <fun >

Permutations

Makes it easy to write backtracking code (a non-deterministic function returning several 'a will just return a 'a Sequence.t). Here, we generate all permutations of a list by enumerating the ways we can insert an element in a list.

# open Sequence.Infix;;
# module S = Sequence ;;
# let rec insert x l = match l with
  | [] -> S.return [x]
  | y :: tl ->
    S.append
      S.(insert x tl >|= fun tl' -> y :: tl')
      (S.return (x :: l)) ;;

# let rec permute l = match l with
  | [] -> S.return []
  | x :: tl -> permute tl >>= insert x ;;

# permute [1;2;3;4] |> S.take 2 |> S.to_list ;;
- : int list list = [[4; 3; 2; 1]; [4; 3; 1; 2]]

Advanced example

The module examples/sexpr.mli exposes the interface of the S-expression example library. It requires OCaml>=4.0 to compile, because of the GADT structure used in the monadic parser combinators part of examples/sexpr.ml. Be careful that this is quite obscure.

Comparison with gen

  • Gen is an external iterator. It means that the code which consumes some iterator of type 'a Gen.t is the one which decides when to go to the next element. This gives a lot of flexibility, for example when iterating on several iterators at the same time:

    let zip (g1: 'a Gen.t) (g2:'b Gen.t) : ('a * 'b) Gen.t =
      let x1 = ref (g1 ()) in
      let x2 = ref (g2 ()) in
      fun () -> match !x1, !x2 with
        | None, _ | _, None -> None
        | Some x, Some y ->
          (* fetch next elements from g1 and g2 *)
          x1 := g1 ();
          x2 := g2 ();
          Some (x,y)
  • Sequence is an internal iterator. When one wishes to iterate over an 'a Sequence.t, one has to give a callback f : 'a → unit that is called in succession over every element of the sequence. Control is not handed back to the caller before the whole iteration is over. This makes zip impossible to implement. However, the type 'a Sequence.t is general enough that it can be extracted from any classic iter function, including from data structures such as Map.S.t or Set.S.t or Hashtbl.t; one cannot obtain a 'a Gen.t from these without having access to the internal data structure.

In short, 'a Gen.t is more expressive than 'a Sequence.t, but it also requires more knowledge of the underlying source of items. For some operations such as map or flat_map, Sequence is also extremely efficient and will, if flambda permits, be totally removed at compile time (e.g. Sequence.(--) becomes a for loop, and Sequence.filter becomes a if test).

License

Sequence is available under the BSD license.