In [1]:
import { requireCytoscape, requireCarbon } from "./lib/draw";

requireCarbon();
requireCytoscape();

# Lazy Programming, Streams, and Generators

# Where Were We?

1. Language primitives (i.e., building blocks of languages)
2. **Language paradigms** (i.e., combinations of language primitives)
    - Last time: first-class concurrency in Go
    - This time: **lazy programming, streams, and generators**
3. Building a language (i.e., designing your own language)

## Non-Eager Evaluation: Short-Circuit Evaluation

- We will look at our first-example of non-eager evaluation: *short-circuit* evaluation.
- *Eager* evaluation is what most programming languages implement: evaluate every expression eagerly.

### Short-Circuit Evaluation and Exceptions

- Pretend that our language throws an exception when dividing by zero.
- Many languages do this, but JavaScript does not, so we must fake it.

In [2]:
5/0

[33mInfinity[39m


In [3]:
function divide(a: number, b: number): number {
    if (b === 0) {
        throw new Error("Can't divide by zero");
    }
    
    return a/b;
}

This will throw:

In [4]:
// divide(5, 0);
try {
    divide(5, 0);
} catch(err) {
    console.log("Wrong", err);
}

Wrong Error: Can't divide by zero
    at Proxy.divide (evalmachine.<anonymous>:5:15)
    at evalmachine.<anonymous>:4:13
    at evalmachine.<anonymous>:10:3
[90m    at sigintHandlersWrap (node:vm:268:12)[39m
[90m    at Script.runInThisContext (node:vm:127:14)[39m
[90m    at Object.runInThisContext (node:vm:305:38)[39m
    at Object.execute (/Users/dehuang/Documents/teaching/csc600/f22/lectures/node_modules/[4mtslab[24m/dist/executor.js:162:38)
    at JupyterHandlerImpl.handleExecuteImpl (/Users/dehuang/Documents/teaching/csc600/f22/lectures/node_modules/[4mtslab[24m/dist/jupyter.js:219:38)
    at /Users/dehuang/Documents/teaching/csc600/f22/lectures/node_modules/[4mtslab[24m/dist/jupyter.js:177:57


Normally operators **eagerly** evaluate their operands:

In [5]:
try {
    15 + divide(5, 0);
} catch (err) {
    console.log("Wrong", err);
}

Wrong Error: Can't divide by zero
    at Proxy.divide (evalmachine.<anonymous>:5:15)
    at evalmachine.<anonymous>:3:18
    at evalmachine.<anonymous>:9:3
[90m    at sigintHandlersWrap (node:vm:268:12)[39m
[90m    at Script.runInThisContext (node:vm:127:14)[39m
[90m    at Object.runInThisContext (node:vm:305:38)[39m
    at Object.execute (/Users/dehuang/Documents/teaching/csc600/f22/lectures/node_modules/[4mtslab[24m/dist/executor.js:162:38)
    at JupyterHandlerImpl.handleExecuteImpl (/Users/dehuang/Documents/teaching/csc600/f22/lectures/node_modules/[4mtslab[24m/dist/jupyter.js:219:38)
    at /Users/dehuang/Documents/teaching/csc600/f22/lectures/node_modules/[4mtslab[24m/dist/jupyter.js:177:57


But the **ternary operator** does not:

In [6]:
const a: number = 5;
const b: number = 0;

console.log(b === 0 ? "invalid" : divide(a, b));

invalid


- It first evaluates the condition, and then only evaluates the "then" or "else" clause.
- This is called **short-circuit evaluation**.
- It evaluates what is necessary before evaluating the rest of the expression.
- Consequently, the error caused by the division is avoided.

### Short-Circuit Evaluation and Infinite Loops

In [7]:
function forever() {
    while (true) {
        // Nothing.
    }
}

In [8]:
const a = 4;
const b = 5;

console.log(a < b ? "nope" : forever());

nope


### How do we encode the teneray operator?

- Imagine that we didn't have the ternary operator, only `if` statements, and wanted a conditional expression.
- Indeed, a language like **Go** does not have the ternary operator.
- Would we be able to implement the ternary operator?

In [9]:
function ternaryWrong(condition: boolean, trueValue: any, falseValue: any): any {
    if (condition) {
        return trueValue;
    } else {
        return falseValue;
    }
}

In [10]:
const a: number = 5;
const b: number = 0;

try {
    // console.log(b === 0 ? "invalid" : divide(a, b));
    console.log(ternaryWrong(b === 0, "invalid", divide(a, b)));
} catch (err) {
    console.log("Wrong", err);
}

Wrong Error: Can't divide by zero
    at Proxy.divide (evalmachine.<anonymous>:5:15)
    at evalmachine.<anonymous>:9:66
    at evalmachine.<anonymous>:15:3
[90m    at sigintHandlersWrap (node:vm:268:12)[39m
[90m    at Script.runInThisContext (node:vm:127:14)[39m
[90m    at Object.runInThisContext (node:vm:305:38)[39m
    at Object.execute (/Users/dehuang/Documents/teaching/csc600/f22/lectures/node_modules/[4mtslab[24m/dist/executor.js:162:38)
    at JupyterHandlerImpl.handleExecuteImpl (/Users/dehuang/Documents/teaching/csc600/f22/lectures/node_modules/[4mtslab[24m/dist/jupyter.js:219:38)
    at /Users/dehuang/Documents/teaching/csc600/f22/lectures/node_modules/[4mtslab[24m/dist/jupyter.js:177:57


### What went wrong?

- This didn't work because function arguments are eagerly evaluated, as you might expect.
- Let's rewrite the code to show this more explicitly.

In [11]:
// Equivalent code
const a: number = 5;
const b: number = 0;

try {
    const arg1 = b === 0;
    const arg2 = "invalid";
    const arg3 = divide(a, b); // <- this will throw the error
    console.log(ternaryWrong(arg1, arg2, arg3));
} catch (err) {
    console.log("Wrong", err);
}

Wrong Error: Can't divide by zero
    at Proxy.divide (evalmachine.<anonymous>:5:15)
    at evalmachine.<anonymous>:11:26
    at evalmachine.<anonymous>:18:3
[90m    at sigintHandlersWrap (node:vm:268:12)[39m
[90m    at Script.runInThisContext (node:vm:127:14)[39m
[90m    at Object.runInThisContext (node:vm:305:38)[39m
    at Object.execute (/Users/dehuang/Documents/teaching/csc600/f22/lectures/node_modules/[4mtslab[24m/dist/executor.js:162:38)
    at JupyterHandlerImpl.handleExecuteImpl (/Users/dehuang/Documents/teaching/csc600/f22/lectures/node_modules/[4mtslab[24m/dist/jupyter.js:219:38)
    at /Users/dehuang/Documents/teaching/csc600/f22/lectures/node_modules/[4mtslab[24m/dist/jupyter.js:177:57


### First-Class Functions to the Rescue (Again)!

- But ... what would it take to make this work?
- We need to **delay the evaluation** of the parameter.
- Put it into a first-class function!

In [12]:
// first-class function, accepts functions as arguments
function ternary(condition: boolean, trueThunk: () => any, falseThunk: () => any): any {
    if (condition) {
        return trueThunk(); // Apply function with no arguments
    } else {
        return falseThunk(); // Apply function with no arguments
    }
}

In [13]:
const a: number = 5;
const b: number = 0;

// Question: what is () => "invalid"?
console.log(ternary(b === 0, () => "invalid", () => divide(a, b)));

invalid


- The anonymous function (e.g., `trueThunk` and `falseThunk`) is called a **thunk**.
- A thunk embodies **computation to be done in the future**.
- Thunk's should remind you of callbacks (e.g., `onClick`).

## Memoization

- What if the thunks we passed in were **pure functions**?
- We could **memoize** the computation then by saving the results of earlier computations.

### Pure Functions, Again

- If a function is pure, it means it gives the same output for the same input.

```ts
console.log((() => "invalid")());
console.log((() => "invalid")());
console.log((() => "invalid")());
```

- Thus we can evaluate `() => "invalid"` once and store it in a table. This is known as **memoization**.
- This would save computation if `() => "invalid"` is expensive to calculate.

### Memoization example

In [14]:
// Exponential complexity fibonacci again
function fib(n: number): number {
    switch (n) {
        case 0: return 0;
        case 1: return 1;
        default: return fib(n - 1) + fib(n - 2);
    }
}

fib(40);

[33m102334155[39m


In [15]:
function thunkAndMemoize(fn, ... args) {
    let cachedResult = undefined;
    
    // Question: what is this? This is a *closure*
    return () => {
        // What does cachedResult refer to?
        if (cachedResult === undefined) {
            cachedResult = fn(... args);
        }
        
        return cachedResult;        
    };
}

In [16]:
const fib2 = thunkAndMemoize(fib, 40);
console.log(fib2()); // Takes a while to compute
console.log(fib2());
console.log(fib2());


[33m102334155[39m
[33m102334155[39m
[33m102334155[39m


### Memoization and Dynamic Programming

- Here's an example of a really fast fibonacci function that uses **memoization**.

In [17]:
// Linear complexity fibonacci again
function fibFast(n: number): number {
    // Difference 1: have a table that maps arguments to results
    const table: {[arg: number]: number} = {
        0: 0,
        1: 1,
    };
    
    // Difference 2: recursive function that populates the table
    function go(n: number): number {
        if (n in table) {
            return table[n];
        } else {
            const x1 = go(n-1);
            const x2 = go(n-2);
            table[n-1] = x1;  // memoizing here
            table[n-2] = x2;  // memoizing here
            return x1 + x2;  // originally: fib(n-1) + fib(n-2)
        }
    }
    return go(n);
}

In [18]:
// Each call now has linear complexity
console.log(fibFast(40));
console.log(fibFast(40));
console.log(fibFast(40));

[33m102334155[39m
[33m102334155[39m
[33m102334155[39m


In [19]:
// First call has linear complexity, every other call has constant complexity
const fibFast2 = thunkAndMemoize(fibFast, 40);
console.log(fibFast2());
console.log(fibFast2());
console.log(fibFast2());

[33m102334155[39m
[33m102334155[39m
[33m102334155[39m


### Reminder: Memoization only works with Pure Function

- **This only works with pure functions!**

In [20]:
let a = 5;
let b = 0;

function impureDivide() {
    // Closes over a and b
    return divide(a, b);
}

// Input (), get out "invalid"
console.log(ternary(b === 0, thunkAndMemoize(() => "invalid"), thunkAndMemoize(impureDivide)));

// Mutatation
a = 5; b = 2;

// Input (), get out 2.5
console.log(ternary(b === 0, thunkAndMemoize(() => "invalid"), thunkAndMemoize(impureDivide)));

invalid
[33m2.5[39m


## Lazy Evaluation and Streams

- We have seen how thunks can be used to delay evaluation until the value is needed, which can be used to implement short-circuit evaluation.
- When thunks are pure, we can use memoization to speed up computation.
- We will now see how thunks can be used to implement **lazy evaluation**: evaluate a value only when it is needed.
- This enables us to encode *infinte* data-structures in memory.

### Stream

A **stream** is like a lazy array:

* Can only be read front to back.
* Can't do anything else (read any element, get its length).
* Can represent an infinite sequence (like all natural numbers).

In [21]:
interface Stream {
    next(): number;
}

function makeFibStream(): Stream {
    let i = 0;
    
    return {
        next: () => {
            const value = fibFast(i);
            i += 1;
            return value;
        }
    };
}

const stream = makeFibStream();

In [22]:
// Get a few fibonacci number
console.log(stream.next());
console.log(stream.next());
console.log(stream.next());

[33m0[39m
[33m1[39m
[33m1[39m


In [23]:
// And then some more
console.log(stream.next());
console.log(stream.next());
console.log(stream.next());
console.log(stream.next());

[33m2[39m
[33m3[39m
[33m5[39m
[33m8[39m


In [24]:
// And we can go to infinity ... (but not beyond)
for (let i = 0; i < 20; i++) {
    stream.next();
}
console.log(stream.next());

[33m196418[39m


## Generators

- It turns out that you can encode every language feature with first-class functions.
   * Laziness (just now)
   * Concurrency (message passing)
   * Recursion (Y-combinator)
   * Numbers and Booleans (Church encodings)
- But sometimes it's helpful to have a language abstraction that makes it more natural to program with.
- JavaScript and TypeScript makes this easier by providing **generators**.

In [25]:
// Under the hood they use a `next()` method similar to what we did above.
// Note the asterisk after the keyword `function`:
function* makeFibStream(): Generator<number> {
    let i = 0;
    while (true) {
        // The yield keyword adds an element to the stream.
        yield fibFast(i);
        i ++;
    }
}

const stream = makeFibStream();

### `take()`

Now we can make functions that manipulate streams directly. Here's one that just passes along the first `n` items:

In [26]:
function* take(n: number, sequence: IterableIterator<number> | number[]): Generator<number> {
    for (const x of sequence) {
        if (n === 0) {
            break;
        }
        yield x;
        n -= 1;
    }
}

In [27]:
Array.from(take(3, [1,2,3,4,5,6,7]));

[ [33m1[39m, [33m2[39m, [33m3[39m ]


In [28]:
Array.from(take(10, makeFibStream()));

[
  [33m0[39m, [33m1[39m,  [33m1[39m,  [33m2[39m,  [33m3[39m,
  [33m5[39m, [33m8[39m, [33m13[39m, [33m21[39m, [33m34[39m
]


### `skip()`

This one skips the first `n` terms:

In [29]:
function *skip(n: number, sequence: IterableIterator<number> | number[]): Generator<number> {
    for (const x of sequence) {
        if (n > 0) {
            n -= 1;
            continue;
        }
        yield x;
    }
}

In [30]:
Array.from(skip(3, [1,2,3,4,5,6,7]));

[ [33m4[39m, [33m5[39m, [33m6[39m, [33m7[39m ]


In [31]:
Array.from(take(10, skip(30, makeFibStream())));

[
    [33m832040[39m,  [33m1346269[39m,
   [33m2178309[39m,  [33m3524578[39m,
   [33m5702887[39m,  [33m9227465[39m,
  [33m14930352[39m, [33m24157817[39m,
  [33m39088169[39m, [33m63245986[39m
]


In [32]:
Array.from(take(10, skip(15, skip(15, makeFibStream()))));

[
    [33m832040[39m,  [33m1346269[39m,
   [33m2178309[39m,  [33m3524578[39m,
   [33m5702887[39m,  [33m9227465[39m,
  [33m14930352[39m, [33m24157817[39m,
  [33m39088169[39m, [33m63245986[39m
]


In [33]:
Array.from(take(10, skip(10, skip(10, skip(10, makeFibStream())))));

[
    [33m832040[39m,  [33m1346269[39m,
   [33m2178309[39m,  [33m3524578[39m,
   [33m5702887[39m,  [33m9227465[39m,
  [33m14930352[39m, [33m24157817[39m,
  [33m39088169[39m, [33m63245986[39m
]


### `filter`

Our old friend the `filter()` function, but on a stream:

In [34]:
function* filter<T>(sequence: IterableIterator<T>, f: (x: T) => boolean): Generator<T> {
    for (const x of sequence) {
        if (f(x)) {
            yield x;
        }
    }
}

In [35]:
// First 10 even fibonacci numbers
Array.from(take(10, filter(makeFibStream(), x => x % 2 === 0)));

[
       [33m0[39m,     [33m2[39m,     [33m8[39m,
      [33m34[39m,   [33m144[39m,   [33m610[39m,
    [33m2584[39m, [33m10946[39m, [33m46368[39m,
  [33m196418[39m
]


### `map`

And another old friend the `map()` function, but on a stream:

In [36]:
function* map<S, T>(sequence: IterableIterator<S>, f: (x: S) => T): Generator<T> {
    for (const x of sequence) {
        yield f(x);
    }
}

### Composition with Streams

- The abstraction of streams makes it easy to write compositional functions on infinite data structures.

In [37]:
Array.from(take(10, filter(makeFibStream(), x => x % 2 === 0)));

[
       [33m0[39m,     [33m2[39m,     [33m8[39m,
      [33m34[39m,   [33m144[39m,   [33m610[39m,
    [33m2584[39m, [33m10946[39m, [33m46368[39m,
  [33m196418[39m
]


In [38]:
Array.from(take(10, filter(skip(10, makeFibStream()), x => x % 2 === 0)));

[
       [33m144[39m,      [33m610[39m,
      [33m2584[39m,    [33m10946[39m,
     [33m46368[39m,   [33m196418[39m,
    [33m832040[39m,  [33m3524578[39m,
  [33m14930352[39m, [33m63245986[39m
]


In [39]:
Array.from(take(10, map(filter(skip(10, makeFibStream()), x => x % 2 === 0), x => x * 2)));

[
       [33m288[39m,      [33m1220[39m,
      [33m5168[39m,     [33m21892[39m,
     [33m92736[39m,    [33m392836[39m,
   [33m1664080[39m,   [33m7049156[39m,
  [33m29860704[39m, [33m126491972[39m
]


### Optional: Primes (sieve)

- A recursive stream!
- Note the asterisk before the recursive _call_ to sieve.

In [40]:
function *sieve(sequence: IterableIterator<number>): Generator<number> {
    // We're called with the next lowest prime we know.
    const first = sequence.next().value;
    
    // Generate that prime.
    yield first;
    
    // And continue the stream, but filter out all multiples of our prime
    yield *sieve(filter(sequence, x => x % first !== 0));
}

In [41]:
function* naturals(): Generator<number> {
    let i = 1;
    while (true) {
        yield i;
        i ++;
    }
}

// Skip the first natural (1), it's not a prime.
Array.from(take(100, sieve(skip(1, naturals()))));

[
    [33m2[39m,   [33m3[39m,   [33m5[39m,   [33m7[39m,  [33m11[39m,  [33m13[39m,  [33m17[39m,  [33m19[39m,  [33m23[39m,  [33m29[39m,  [33m31[39m,  [33m37[39m,
   [33m41[39m,  [33m43[39m,  [33m47[39m,  [33m53[39m,  [33m59[39m,  [33m61[39m,  [33m67[39m,  [33m71[39m,  [33m73[39m,  [33m79[39m,  [33m83[39m,  [33m89[39m,
   [33m97[39m, [33m101[39m, [33m103[39m, [33m107[39m, [33m109[39m, [33m113[39m, [33m127[39m, [33m131[39m, [33m137[39m, [33m139[39m, [33m149[39m, [33m151[39m,
  [33m157[39m, [33m163[39m, [33m167[39m, [33m173[39m, [33m179[39m, [33m181[39m, [33m191[39m, [33m193[39m, [33m197[39m, [33m199[39m, [33m211[39m, [33m223[39m,
  [33m227[39m, [33m229[39m, [33m233[39m, [33m239[39m, [33m241[39m, [33m251[39m, [33m257[39m, [33m263[39m, [33m269[39m, [33m271[39m, [33m277[39m, [33m281[39m,
  [33m283[39m, [33m293[39m, [33m307[39m, [33m311[39m, [33m313[39m, [33m317[3

### Primes (factors)

Another approach. First, a function that generates all factors of a number (not including 1 or itself), as a stream:

In [42]:
function* factors(n: number): Generator<number> {
    for (let i = 2; i < n; i++) {
        if (n % i === 0) {
            // Uncomment this to see how many factors are actually being generated:
            // console.log("Generating factor", i);
            yield i;
        }
    }
}

In [43]:
Array.from(factors(60));

[
   [33m2[39m,  [33m3[39m,  [33m4[39m,  [33m5[39m,  [33m6[39m,
  [33m10[39m, [33m12[39m, [33m15[39m, [33m20[39m, [33m30[39m
]


In [44]:
// Helper function to see if a stream is empty.
function isEmpty(sequence: IterableIterator<number>): boolean {
    return sequence.next().done;
}

In [45]:
// A number is a prime if it has no factors.
function isPrime(n: number): boolean {
    return isEmpty(factors(n));
}

In [46]:
for (const n of [2, 10, 15, 19, 60, 61, 1000000]) {
    console.log(n, isPrime(n));
}

[33m2[39m [33mtrue[39m
[33m10[39m [33mfalse[39m
[33m15[39m [33mfalse[39m
[33m19[39m [33mtrue[39m
[33m60[39m [33mfalse[39m
[33m61[39m [33mtrue[39m
[33m1000000[39m [33mfalse[39m


Note that for that last large element, we didn't need to generate all factors, only one.

# Composition

Lazy evaluation of streams makes it easier to compose functions.

For example, say we have:

* We have a `map()` function on array.
* We have an `or()` function on boolean array that returns whether _any_ element is true.

We want:

* An `any()` function that takes a predicate and returns whether it returned true for any item.

Eager version:

```ts
function any<T>(f: (e: T) => boolean, arr: T[]): boolean {
    return or(map(f, arr));
}
```

But this is inefficient! It'll run `f()` on every item, even if the first returns `true`.

So we must write our own version of `any()` that copies the code of both `map()` and `or()`:

```ts
function any<T>(f: (e: T) => boolean, arr: T[]): boolean {
    for (const e of arr) {
        if (f(e)) {
            return true;
        }
    }
    
    return false;
}
```

Had the original array been a stream, the two would have been equally efficient. (**Why?**)


# Java streams

In Java, `map()` and `filter()` only work on streams:

```java
myArray.stream()
    .map(site -> site.domainName)
    .filter(domainName -> domainName.startsWith("www"))
    .toList();
```

# Conclusion

What we learned:

1. **Eager evaluation** can make some things impossible to implement as a function, such as a conditional.
2. **Lazy evaluation** can make these possible.
3. **Thunks** encapsulate "computation to be done", and **memoize** (cache) the result.
4. **Streams** are the lazy function of an array.
5. **Generators** in JavaScript and TypeScript (and Python) make generating streams easy.
6. Streams can represent **infinite lists** without infinite loops.
7. Streams can make **composition** easier.
