# The Semipredicate Problem

The [semipredicate problem](https://en.wikipedia.org/wiki/Semipredicate_problem) is the problem of how to overcome ambiguity in distinguishing a real result from an indication that there is no result and/or that an error occurred.

A language-agnostic example of the semipredicate problem, and of solving it using *out-of-band signalling*, is having a primary output stream `stdout` and a secondary output / error output stream `stderr`.

An specific example from C is [ERR02-C. Avoid in-band error indicators](https://wiki.sei.cmu.edu/confluence/display/c/ERR02-C.+Avoid+in-band+error+indicators).

In [1]:
'The horse said, "It\'s no good."'  # Quoting is an example of in-band signaling.

'The horse said, "It\'s no good."'

Suppose we have a dict `d` and a key `k`, and we want to do one thing if `k` is in `d` (using the associated value), and something else if it is not.

Provided we don't need to worry about the dict being mutated between checking and subscripting, we can use LBYL in a way that avoids the semipredicate problem arising:

```python
if k in d:
    v = d[k]
    ...  # Use the value v.
else:
    ...  # Deal with k not being mapped.
```

Often (but not always) a better approach is EAFP, which solves the semipredicate problem by using *returns* as the main source of information and *exceptions* as a secondary, out-of-band source of information:

```python
try:
    v = d[k]
except KeyError:
    ...  # Deal with k not being mapped.
else:
    ...  # Use the value v.
```

`dict` has a `get` method, which can be useful when neither of those patterns is convenient, but which brings back the semipredicate problem, since it uses in-band signaling.

In [2]:
help(dict.get)

Help on method_descriptor:

get(self, key, default=None, /)
    Return the value for key if key is in the dictionary, else default.



In [3]:
d = {7: "car", 3: "boat", 5: "airplane"}

In [4]:
d.get(5)

'airplane'

In [5]:
d.get(10)

In [6]:
d.get(10,"ITS NOT HERE!")

'ITS NOT HERE!'

Suppose you are writing code in a function, and the situation happens to be one where it is
much more convenient to use `dict.get` than to use `try`-`catch`. In *some* situations, you
will know for sure that a particular value will not be present. There are also situations
where there is no existing value that is totally safe as the default. It is nonetheless
possible to solve the semipredicate problem and use `dict.get`.

In [7]:
# We need to make sure that o is not accessible to anything that might
# mutate d. So, in practice, we usually need o to be a local variable,
# thus this technique is, in practice, mostly useful in a function (but
# that's okay since most of our Python code is in a function anyway).
o = object()
if d.get(10, o) is o:
    print('Not here')    

Not here


In [8]:
help(d.get)

Help on built-in function get:

get(key, default=None, /) method of builtins.dict instance
    Return the value for key if key is in the dictionary, else default.



In [9]:
v = d.get(10, object())

In [10]:
w = d.get(10, object())

In [11]:
v == w

False

In [12]:
type(object)

type

In [13]:
d[10] = o

In [14]:
def initial_practice(dictionary, key, value):
    try:
        value = dictionary[key]
    except KeyError:
        print("the key is not mapped")  # Deal with key not being mapped.
    else:
        print(f'The value for {key} is {value}')  # Use the value.

In [15]:
d

{7: 'car', 3: 'boat', 5: 'airplane', 10: <object at 0x1c6a1f7fea0>}

In [16]:
initial_practice(d, 12, 'X wing')

the key is not mapped


In [17]:
initial_practice(d, 7,'X wing')

The value for 7 is car


In [18]:
def has_entry_eafp(dictionary, key, value):
    """
    Tell if a dictionary maps the given key to the given value.

    This implementation uses EAFP, catching KeyError.
    """
    try: 
        return dictionary[key] == value
    except KeyError:
        return False

In [19]:
d

{7: 'car', 3: 'boat', 5: 'airplane', 10: <object at 0x1c6a1f7fea0>}

In [20]:
has_entry_eafp(d, 12, 'X wing')

False

In [21]:
has_entry_eafp(d, 7, 'car')

True

In [22]:
has_entry_eafp(d, 7, 'Tie Fighter')

False

In [23]:
def has_entry_lbyl(dictionary, key, value):
    """
    Tell if a dictionary maps the given key to the given value.

    This implementation uses LBYL, checking with the "in" operator.
    """
    return (key in dictionary) and (dictionary[key] == value)

In [24]:
has_entry_lbyl(d, 12, 'X wing')

False

In [25]:
has_entry_lbyl(d, 7, 'car')

True

In [26]:
has_entry_lbyl(d, 7, 'Tie')

False

In [27]:
def has_entry_get(dictionary, key, value):
    """
    Tell if a dictionary maps the given key to the given value.

    This implementation uses the get method (in a safe way).
    """
    o = object()
    return dictionary.get(key, o) == value

In [28]:
has_entry_get(d, 12, 'X wing')

False

In [29]:
has_entry_get(d, 7, 'car')

True

In [30]:
has_entry_get(d, 7, 'Tie')

False

In [31]:
has_entry_get(d, 10, object())

False

In [32]:
has_entry_get(d, 10, d[10])

True

In [33]:
d[12] = 'Tie Fighter' 

In [34]:
d

{7: 'car',
 3: 'boat',
 5: 'airplane',
 10: <object at 0x1c6a1f7fea0>,
 12: 'Tie Fighter'}

## 2-argument forms of `next` and `iter`

Calling `next` with a second argument causes that argument to be returned instead of propagating `StopIteration` to the caller:

In [35]:
help(next)

Help on built-in function next in module builtins:

next(...)
    next(iterator[, default])
    
    Return the next item from the iterator. If default is given and the iterator
    is exhausted, it is returned instead of raising StopIteration.



In [36]:
it = iter([10, 20])

In [37]:
next(it, 'Good bye!')

10

In [38]:
next(it, 'Good bye!')

20

In [39]:
next(it, 'Good bye!')

'Good bye!'

This is sometimes useful. **The usual considerations, regarding the semipredicate problem, that apply to `dict.get`, apply to calling `next` with two arguments.** Personally, I use two-argument `next` less often than I use `dict.get`, and I don&rsquo;t use either one regularly.

The two-argument `next` is a little bit unusual&mdash;and different from `dict.get`&mdash;because calling `next` with two arguments is very different from calling it with one argument, *no matter what second argument you pass*.

- Calling `dict.get` with no second argument is like calling it with `None`.

- Calling `next` with no second argument is not like calling it with any value as the second argument. When called with one argument, `next` always propagates `StopIteration` to the caller if the iterator has run out. But calling it with two arguments *never* does that&mdash;it returns the second argument instead.

What&rsquo;s going on here is that:

- The usual, one-argument form of `next` behaves in a manner analogous to *subscripting* a `dict`&mdash;when there is no value to be returned, calling `next` with one argument raises `StopIteration`, much as subscripting a `dict` raises `KeyError`.

- The less commonly used two-argument form of `next` behaves in a manner analogous to `dict.get` with an explicit second argument&mdash;when there is no value to be returned, calling `next` with two arguments returns the second argument, just as  calling `dict.get` returns its second argument.

`iter` can also be called with two arguments, which is totally different from calling it with one.

- Calling it with one argument returns an iterator to the iterable passed as the argument.

- Calling it with two arguments returns an iterator that repeatedly calls the first argument&mdash;which is expected to be a function or otherwise callable&mdash;and yields the values returned, until the value returned is equal to the second argument.

In [40]:
help(iter)

Help on built-in function iter in module builtins:

iter(...)
    iter(iterable) -> iterator
    iter(callable, sentinel) -> iterator
    
    Get an iterator from an object.  In the first form, the argument must
    supply its own iterator, or be a sequence.
    In the second form, the callable is called until it returns the sentinel.



The two-argument form of `iter` is rarely used, but it can be helpful in some situations.

In [41]:
a = ['ham', 'spam', 'eggs', 'foo', 'bar', 'baz', 'foobar', 'quux']
it = iter(a.pop, 'eggs')
it

<callable_iterator at 0x1c6a36af820>

Notice the type&mdash;`callable_iterator`:

- The type of iterator `iter` returns when called with one argument is determined by what type of thing is being iterated, and the logic for doing so is supplied by that type.

- But when called with two arguments, `iter` synthesizes an iterator of type `callable_iterator`.

In [42]:
next(it)

'quux'

In [43]:
a  # a.pop has been called once so far.

['ham', 'spam', 'eggs', 'foo', 'bar', 'baz', 'foobar']

In [44]:
list(it)

['foobar', 'baz', 'bar', 'foo']

In [45]:
a

['ham', 'spam']

There is a conceptual connection between two-argument `next` and two-argument `iter`&mdash;they both involve end sentinels. **But they need not be used together and there is no special reason to use them together.** Whether one calls `iter` with one argument or two typically has no bearing on whether or not one calls `next` on the returned iterator with a second argument.

The two-argument form of `iter` also suffers from the semipredicate problem. Do you see how?

## Semipredicate problem with `StopIteration`

*Why `StopIteration` turns into `RuntimeError` when it propagates out of a generator.*

As a simple example, consider the situation of a singleton generator that yields the first element of an iterable:

In [46]:
def first(iterable):
    """Return an iterator to exactly one element, the first one of iterable."""
    yield next(iter(iterable))

This works fine on all kinds of iterables. It works fine on sequences:

In [47]:
it = first([10, 20, 30])
it

<generator object first at 0x000001C6A46A4120>

In [48]:
list(it)

[10]

It also works on iterators, such as the generator object produced by evaluating a generator expression (since calling `iter` on an iterator returns an equivalent iterator, nearly always the very same iterator object):

In [49]:
it = first(word.upper() for word in ['foo', 'bar', 'baz'])
it

<generator object first at 0x000001C6A46A4430>

In [50]:
list(it)

['FOO']

But it raises `RuntimeError` when it attempts to process an empty iterable:

In [51]:
it = first([])
it

<generator object first at 0x000001C6A46A4510>

In [52]:
list(it)

RuntimeError: generator raised StopIteration

Although it is possible to catch `RuntimeError`, you shouldn't (unless, for example, you're building a REPL that implements custom logic for showing even fatal errors to the user).

What's happening here is that **`StopIteration` is prohibited from propagating outside of a generator**. When a `StopIteration` exception tries to do so, it is turned into a `RuntimeError`.

This is with good reason. `first` is supposed to always be able to generate exactly one element&mdash;if it can't, that should be an error. But even in the absence of any error, `StopIteration` is the way iterators (including generator objects) indicate that they are exhausted.

Consider the following code:

In [53]:
def generate(early_return):
    yield 'A'
    if early_return:
        return
    yield 'B'

In [54]:
it = generate(False)
it

<generator object generate at 0x000001C6A4F9F220>

In [55]:
list(it)

['A', 'B']

In [56]:
it = generate(True)
it

<generator object generate at 0x000001C6A4F9F990>

In [57]:
list(it)

['A']

When `it` is iterated over to produce a list, the `list` constructor knows that all elements have been consumed when it calls `next(it)` and, rather than a value being returned, `StopIteration` is raised:

In [58]:
it = generate(True)

In [59]:
next(it)

'A'

In [60]:
next(it)

StopIteration: 

Nonetheless, you may not raise `StopIteration` in a generator, unless you are going to catch it before it propagates out of the generator. Consider this broken attempt to implement `generate` by raising `StopIteration` directly rather than returning:

In [61]:
def generate_broken(early_return):
    yield 'A'
    if early_return:
        raise StopIteration()  # Bad.
    yield 'B'

In [62]:
list(generate_broken(False))  # Fine, the raise statement is never run.

['A', 'B']

In [63]:
list(generate_broken(True))  # Not fine.

RuntimeError: generator raised StopIteration

This is because, most of the times `StopIteration` would be raised but not handled in a generator, it would not be due to an explicit, deliberate raising of it (as above), but instead due to `next` being called (directly or indirectly) on an iterator that may have been expected to be able to yield another value. The intent of the programmer is ambiguous in this situation: sometimes, the programmer may intend that `StopIteration` be passed to the caller, while other times, the programmer may not intend this situation to occur at all, or may wish that it be treated as an error.

`StopIteration` turning into `RuntimeError` when propagating out of a generator is a rule of the Python language: you don't have to write code to make that happen, nor can you prevent it. But other kinds of iterators (that is, besides generators) do not automatically turn `StopIteration` into `RuntimeError`. Sometimes this makes them brittle. For example, suppose we have an iterable of iterators and we want to map it to the first items obtained from each of these iterators:

In [64]:
it1 = iter([10, 20, 30])
it2 = iter([])
it3 = iter([11, 22])
firsts = map(next, [it1, it2, it3])
firsts

<map at 0x1c6a4777fd0>

Initially, this `map` object seems to work okay. Calling `next` on `firsts` causes `firsts` to call `next` on `it1` and return the result:

In [65]:
next(firsts)

10

Calling `next` on `firsts` the second time causes `firsts` to call `next` on `it2`, but that `next` call cannot be completed. It raises `StopIteration`. Since `firsts` doesn't catch that exception, and since it is not automatically converted to `RuntimeError` because `firsts` is a `map` (not a generator), it looks like `firsts` has itself been exhausted:

In [66]:
next(firsts)

StopIteration: 

That might be okay... except, `firsts` *hasn't* been exhausted. Calling `next` on `firsts` a third time causes it to call `next` on `it3`, which returns a value:

In [67]:
next(firsts)

11

This is *very bad*, because when `next(firsts)` raises `StopIteration`, this was supposed to indicate unambiguously to us that `firsts` was exhausted. It is a requirement of iterator design in Python that, if calling `next` on an iterator object raises `StopIteration`, then immediately calling `next` on it again must again raise `StopIteration`. Yet the built-in `map` type violates this, due to the semipredicate problem of *which iterator `StopIteration` came from*. (When using `map`, you should take responsibility to ensure this will not happen.)

#### **Exercise 1**

`map` is important and useful. It maps lazily. But it is not the only way to map lazily, nor even the most commonly used.

**(a)** A generator expression will do it, and that is the most common way. Write the code above (including copying the assignments to `it1`, `it2`, and `it3`, so they are new iterators) but use an appropriate generator expression in place of the call to `map`. Observe how, in your second call to `next`, the `StopIteration` is converted to a `RuntimeError`.

**(b)** A generator function will do it. Write the code above (including copying the assignments to `it1`, `it2`, and `it3`, so they are new iterators) but, before assigning to `firsts`, write a generator function that does the mapping. Call that function, assigning the result to `firsts`. Observe how, just as with a generator expression, your second call to `next` gets a `RuntimeError`.

If you can, I recommend doing both part (a) and part (b) without referring to anything else in this project. However, if you need a reminder about how to do mapping with a generator expression or a generator function, your `my_map` and `my_map_alt` functions in `gencomp1.py` demonstrate this. If you really need help, or if you want to check your solutions, you can check the code on this topic that we did in `gencomp2.ipynb` (it is at the end of that notebook, as of this writing).

In [68]:
it1 = iter([10, 20, 30])
it2 = iter([])
it3 = iter([11, 22])

firsts = (next(it) for it in [it1, it2, it3])
firsts

<generator object <genexpr> at 0x000001C6A519D3F0>

In [69]:
next(firsts)

10

In [70]:
next(firsts)

RuntimeError: generator raised StopIteration

In [71]:
def gen_func(it1, it2, it3):
    for it in [it1, it2, it3]: 
        yield next(it)

In [72]:
it1 = iter([10, 20, 30])
it2 = iter([])
it3 = iter([11, 22])

firsts = gen_func(it1, it2, it3)
firsts

<generator object gen_func at 0x000001C6A519D930>

In [73]:
next(firsts)

10

In [74]:
next(firsts)

RuntimeError: generator raised StopIteration

#### **Exercise 2**

There's another place `StopIteration` is raised. For the purpose of this exercise, let's assume `map` iterates through its input iterable (above, this is `[it1, it2, it3]`) using a `for` loop. (Actually, `map` is implemented in C, so it is not using any Python language construct directly.) This involves calling `iter` on the iterator. `next` is then called on that iterator a total four times. The first three calls to `next` return values (`it1`, `it2`, and `it3`). The fourth raises `StopIteration`.

**(a)** Given the above assumption that `map` uses a `for` loop, this `StopIteration` is always caught by `map` and can never make it to the caller. Why is that?

**(b)** It is permitted to raise `StopIteration` from the body of a `for` loop. This is never confused with the `StopIteration` that the `for` loop catches after exhausting its iterator. Why is that? Why is there no semipredicate problem here?

Parts (a) and (b) can be answered either separately or together, but to answer part (b), you should show code representing the general functionality of a `for` loop, in terms of a `while` loop and `try`-`except`. Feel free to write such code anew here (which I suggest), or to copy it from where you have written it before. Annotate it with comments that explain why the `for` loop's own logic can never catch a `StopIteration` raised in its body, and also why it can never *fail* to catch a `StopIteration` that is raised by the for loop's own inbuilt call to `next`. (Make sure your representation of how a `for` loop works is compatible both with iterator and non-iterator iterables, and with arbitrary flow-control statements in its body&mdash;but don't worry about accommodating an `else` clause on the loop itself.)

In [75]:
# Code for context
it1 = iter([10, 20, 30])
it2 = iter([42])
it3 = iter([11, 22])
firsts = map(next, [it1, it2, it3])
firsts
next(firsts)
next(firsts)
next(firsts)
next(firsts) # THIS raises StopIteration

StopIteration: 

#### Answer to (a)

That `StopIteration` is always caught and can never make it to the caller because it is internal to the implementation of the Python `for` loop. 

#### Answer to (b)

Yes, it is permitted to raise `StopIteration` from the body of a `for` loop. As stated in the question, this is because it can't be confused with the `StopIteration` that the `for` loop catches after exhausting its iterator. The reason there is no semipredicate problem here is that the `StopIteration` that the `for` loop catches is never **propagated** out of the for loop after caught. In other words, it is handled by the `for` loop's internal logic. The code below will demonstrate this. 

In [76]:
def my_for_wrong(iterable, action): 
    it = iter(iterable)
    try: 
        while True:
            action(next(it))
    except StopIteration: 
        return

In [77]:
def act(x): 
    raise StopIteration
my_for_wrong([1, 2, 3], act)

In [78]:
for element in [1, 2, 3]: 
    raise StopIteration

StopIteration: 

In [85]:
def my_for_right(iterable, action): 
    it = iter(iterable)
    while True:
        try:
            # StopIteration will always be caught because next(it) is the only code being tried. 
            element = next(it)
        except StopIteration:
            break
        
        # will NOT catch StopIteration raised from this
        action(element)

In [86]:
def act(x): 
    raise StopIteration
my_for_right([1, 2, 3], act)

StopIteration: 

In [87]:
my_for_right([1, 2, 3], print)

1
2
3


#### **Exercise 3**

Assign `it1`, `it2`, and `it3` again, and this time assign a list comprehension to `firsts`. In all the previous examples, `firsts` was some kind of iterator. Now it is not. Observe that `StopIteration` is raised, but that there is no ambiguity here. To further clarify why this `StopIteration` can only be from attempting to evaluate your expression of the form `next(it)` in the list comprehension, consider what you just demonstrated in exercise 2 about `for` loops, and also consider what `for` loop a list comprehension is like.

Then write that `for` loop. That is, assign `it1`, `it2`, and `it3` once more, assign an empty list to a variable, and write a `for` loop that populates that list. Run the code and observe where `StopIteration` is raised and why. Add a comment to your code that connects the insights of exercise 2 to this code.

#### **Exercise 4**

In this project, we first encountered the `StopIteration` semipredicate problem in `gencomp1.py`, in the `my_zip` function.

**(a)** Review the code of that function. Copy it to a cell in this notebook and remove the doctests. (I recommend you remove the whole docstring, or at least condense it down to a single line.) Verify that `my_zip([10, 20, 30], [], [11, 22])` behaves properly.

**(b)** Make a copy of this (doctest-stripped) code in a new cell, and modify the `yield` statement to use a generator expression instead of a list comprehension. Show how iterating through `my_zip([10, 20, 30], [], [11, 22])` now raises `RuntimeError`.

**(c)** Make another copy and modify it again so the `yield` statement uses `map` instead of any kind of any kind of comprehension. Attempting to iterate all the way through `my_zip([10, 20, 30], [], [11, 22])` would now enter an infinite loop. Call `next` on the result of `my_zip([10, 20, 30], [], [11, 22])` a number of times to see what happens.

**(d)** Figure out exactly why the behavior in (c) is happening. Feel free to write code to experiment with variations or simplifications to figure it out. Write a brief description of what is going on here. Your description need not be framed explicitly in terms of the semipredicate problem&mdash;the most important characteristic of your description is that it be clear to *you*&mdash;but you should make sure you understand how this is a case of the semipredicate problem.

#### **Exercise 5**

Try to come up with a clearer and more informative way of writing the explanatory comment we currently have in `gencomp1.my_zip`. It's is okay if it is several lines long. If you succeed, replace the comment there with the new description. (If you want to write any scratchwork in this notebook, feel free, but that is in no way a requirement of this exercise.)