A description of a universal implementation of lazy sequence manipulation via coroutines (or generators)
This repository proposes a universal implementation description of lazy sequence manipulation using coroutines/generators in any programming language that supports them.
This description will use a custom pseudocode language that is easy to understand and will avoid to tie the description to any programing language. The language may share similarities with C++, JavaScript and PHP.
It will also favor a statically typed description so that it may be implemented in statically typed languages without any hassle.
It will propose an object oriented representation, it is up to you to adapt it to how the target language works.
Implementation details are in the impl
folder.
- C++'s Range V3
- JavaScript's sequency
- Java 8's Stream API
- Kotlin's sequences
- Rust's iter
- Writing my own stuff for lolz
A Stream is a (maybe infinite) sequence of data of one particular type (e.g. Stream<int>
). It may hold the minimum amount of state to be evaluated. A Stream must have a parent Stream or Coroutine from which it gets data.
An Operation is a morphism that transforms values of one type into values of another type (which may be the same).
A Creator is an Operation that transforms data that is not a Stream into a Stream. For instance: streamFromArray([])
.
An Operator is an Operation that transforms a Stream of a given type into a Stream of another type (which may be the same). For instance: stream.map(identity)
. These operations are lazy evaluated, meaning they are not evaluated until a Terminator is applied. An Operator may be Stateful or Stateless. These operations are lazy evaluated, meaning they are not evaluated until a Terminator is applied.
Internal State designates state that is not directly derived from the input of the Operation. For instance: a set a already processed values.
A Stateful Operator is an Operator that needs to maintain Internal State for its evaluation. For instance stream.unique()
or stream.take(5)
. If it does not introduce a change in complexity, then it can be considered Stateless.
A Stateless Operator is an Operator that does not need to maintain internal state for its evaluation. For instance stream.filter(predicate)
.
A Terminator is an Operation that transforms a Stream of a given type into values of data of another type (which may be the same but must not be a Stream). For instance: stream.forEach(print)
.
A Coroutine or Generator describes a mechanism that allows to "lazily" provide values one at a time. For the sake of implementation, they must be monadic: a Coroutine must be able to provide the values of another Coroutine; put simple they must be Iterable.
An Iterable provides an Iterator to access a sequence of data.
An Iterator is a view-like mechanism to traverse a sequence of data that holds info like its validity (are there more?) and the current value.
A Coroutine/Generator factory is a special kind of function which, once called, returns a Coroutine/Generator.
A Sequence Manipulation is a sequence of Operations that start with one Creator, may contain several Operators, and ends with a Terminator.
An Evaluation is the process of evaluating an Operation, i.e. giving it its input and getting its output.
A Sequence manipulation that only uses Stateless Operators is guaranteed to be O(n) which means it is strictly equivalent to a single for loop.
A Sequence Manipulation that uses k Stateful Operators is guaranteed to be at worst O(n^(k+1)) which means it is equivalent of k+1 nested for loops.
Languages that do not propose generators/coroutines may not directly implement as is. Although, if you have access to iterators, you may implement everything using those.
See the iterator-hack
directory for examples.
If you cannot have access to the validity (whether there are more elements or not) and a way to access the current then the following cannot be implemented as is:
- zipWith
This language does not take into account memory management, lifetime and so on. It is your duty to make the right choices in your implementation.
template <T>
provides the type T
as a type argument for generic implementations of classes or functions.
@StaticMethod
denotes that the decorated function may be used as a static method of the return type. As such, the decorated function cannot be used as a free function unless it is also decorated with @FreeFunction
. They also allow the use of the static
keyword to refer to the type for which they are a static method. For instance:
template <T>
@StaticMethod
MyType<T> stuff(){
// return MyType<T>{};
return static{};
}
MyType::stuff<int>();
MyType<int>::stuff();
@FreeFunction
denotes that the decorated function may be used as a free function.
@ExtensionMethod
denotes that the decorated function is an (extension) method of the type of the this
argument of the function which must be present and must be the first. As such, the decorated function cannot be used as a free function. As they are extensions, they are public and also have access to private and protected members. Like static methods, they have access to the static
keyword to refer to the current type.
@ExtensionMethod
int zero(int[] this){
return 0;
}
[1,2,3].zero() == 0;
class Stuff{
constructor(){}
}
@ExtensionMethod
Stuff create(){
return static{};
}
For simplicity, the auto
type may be used, it behaves exactly as the C++ keyword.
Anonymous functions are literal functions that may be assigned to variables. They do not use external state unless explicitly specified. Note that lambda functions are a variation of anonymous functions that cannot contain a body (i.e. it directly returns a single expression) and automatically "capture" external state:
auto fn = function(x){ return x * 2; };
auto fn = x => x * 2;
auto y = 32;
auto life = function(x = 10) using(y) { return x + y };
auto life = (x = 10) => x + y;
Coroutine/Generator factories are declared as special anonymous functions that can yield data:
auto factory = function* (my_args){
yield stuff; // yield a single value
yield* [1,2,3];
// yield all from an iterable, as if using a for loop and yielding each element individually
};
auto gen = factory(args);
Callable types may have contravariance/covariance:
template <Arg, R>
type MyCallable<Arg, R> = (Arg) => R;
MyCallable<String, int> a = str => str.length;
MyCallable<String, int> b = () => 0; // also valid, just ignores the argument
a("str");
b("str");
b(); // invalid as it requires an argument even if the implementation doesn't
template <T>
type MyCallback<T> = (T) => void;
MyCallback<String> a = function(String str){
print(str);
};
MyCallback<String> b = str => str; // also valid, just ignores the return value
Classes encapsulate data with behavior. By default members are public, unless an encapsulation guard is specified. Classes may implement/extend other classes using the :
notation (as in C++). The use of the this
keyword is mandatory. Classes may only have one constructor (to allow for maximum interoperability). The static
keyword may be used to refer to the current type for inheritance.
class MyClass : MyBaseClass, MyInterface, MyCrtp<static> {
constructor(int stuff){
this.stuff = 0;
}
protected:
int stuff;
public:
int getStuff(){ return this.stuff; }
private:
float stuff2 = 42.05;
}
Objects are instantiated using C++'s uniform initialization syntax: MyClass obj{}
or MyClass obj = {}
or auto obj = MyClass{}
. Methods on objects are accessed using dot notation: obj.method()
. Methods on classes/types are accessed using namespace notation: MyType::myStaticMethod()
.
Classes may nest type alias declarations:
class MyClass {
type MyAlias = int;
template <U>
type MyArray<U> = U[];
}
Type aliases may also be global. There is a special notation to declare callable type aliases:
type MyGlobalAlias = String;
template <T>
type MyCallable<T> = (int) => T; // anonymous function, lambda, coroutine factory, callable object, etc...
This language also uses several types for simplicity:
Either<L, R>
which is eitherL
orR
and is contravariant : we can assign aT
to aEither<T, null>
. It is convertible toVariant<L, R>
.Nullable<T>
which isEither<T, null>
Variant<A, B, ...Types>
which isA
,B
or any of the providedTypes
. It is convertible to a tree ofEither
(e.g.Variant<A, B, C, D>
is alsoEither<A, Either<B, Either<C, D>>>
).Convertible<A, B>
which means that it is anA
but can be converted to aB
(not necessarily vice-versa)T[]
means an array ofT
Iterable<T>
is any iterable type whose values are of typeT
. It has a propertyiterator
which is anIterator<T>
. It is a property as some iterable may only be traversed forward and once (e.g. seeded sequence)Iterator<T>
is used to check validity and traverse its associated iterable[A, B, C]
is a tuple whose first value is of typeA
, second value of typeB
and third value of typeC
. It is convertible toEither<A, Either<B, C>>
.Class<T>
a way to hold information of the typeT
so it can be used dynamically withinstanceof
. It also offers a method calledtemplate <U> T cast(Convertible<U, T> value)
to cast to that type.MyClass.cast
is also that method.