In [None]:
# Initialization cell
try:  # for CS1302 JupyterLite pyodide kernel
    import piplite

    with open("requirements.txt") as f:
        for package in f:
            package = package.strip()
            print("Installing", package)
            await piplite.install(package)
except ModuleNotFoundError:
    pass

# Sequence Types

**CS1302 Introduction to Computer Programming**
___

In [None]:
import random
%reload_ext divewidgets

## Motivation of composite data type

The following code calculates the average of five numbers:

In [None]:
def average_five_numbers(n1, n2, n3, n4, n5):
    return (n1 + n2 + n3 + n4 + n5) / 5


average_five_numbers(1, 2, 3, 4, 5)

What about using the above function to compute the average household income in Hong Kong.  
The labor size in Hong Kong is close to [4 million](https://www.gov.hk/en/about/abouthk/factsheets/docs/employment.pdf).
- Should we create a variable to store the income of each individual?
- Should we recursively apply the function to groups of five numbers?

What we need is
- a *composite data type* that can keep a variable number of items, so that  
- we can then define a function that takes an object of the *composite data type*,
- and returns the average of all items in the object.

**How to store a sequence of items in Python?**

We learned a composite data type that stores a sequence of characters. What is it?

`tuple` and `list` are two other built-in sequence types for ordered collections of objects. Unlike string, they can store items of possibly different types.

Indeed, we have already used tuples and lists before.

In [None]:
%%optlite -h 400
a_list = "1 2 3".split()
a_tuple = (lambda *args: args)(1, 2, 3)
a_list[0] = 0
a_tuple[0] = 0

**What is the difference between tuple and list?**

```{important}

- List is [*mutable*](https://docs.python.org/3/library/stdtypes.html#index-21) so programmers can change its items.
- Tuple is [*immutable*](https://docs.python.org/3/glossary.html#term-immutable) like `int`, `float`, and `str`, so
   - programmers can be certain the content stay unchanged, and
   - Python can preallocate a fixed amount of memory to store its content.
```

## Constructing sequences

**How to create tuple/list?**

Mathematicians often represent a set of items in two different ways:
1. [Roster notation](https://en.wikipedia.org/wiki/Set_(mathematics)#Roster_notation), which enumerates the elements in the sequence, e.g.,

$$ \{0, 1, 4, 9, 16, 25, 36, 49, 64, 81\} $$

2. [Set-builder notation](https://en.wikipedia.org/wiki/Set-builder_notation), which describes the content using a rule for constructing the elements, e.g.,

$$ \{x^2| x\in \mathbb{N}, x< 10 \}, $$

namely the set of perfect squares less than 100.

```{important}

Python also provides two corresponding ways to create a tuple/list:  
1. [Enclosure](https://docs.python.org/3/reference/expressions.html?highlight=literals#grammar-token-enclosure)
2. [Comprehension](https://docs.python.org/3/reference/expressions.html#index-12)
```

**How to create a tuple/list by enumerating its items?**

To create a tuple, we enclose a comma separated sequence by parentheses:

In [None]:
%%optlite -h 450
empty_tuple = ()
singleton_tuple = (0,)   # why not (0)?
heterogeneous_tuple = (singleton_tuple, (1, 2.0), print)
enclosed_starred_tuple = (*range(2), *"23")

```{note}

- If the enclosed sequence has one term, there must be a comma after the term.
- The elements of a tuple can have different types.
- The unpacking operator `*` can unpack an iterable into a sequence in an enclosure.
```

To create a list, we use square brackets to enclose a comma separated sequence of objects.

In [None]:
%%optlite -h 400
empty_list = []
singleton_list = [0]  # no need to write [0,]
heterogeneous_list = [singleton_list, (1, 2.0), print]
enclosed_starred_list = [*range(2), *"23"]

We can also create a tuple/list from other iterables using the constructors `tuple`/`list` as well as addition and multiplication similar to `str`.

In [None]:
%%optlite -l -h 900
str2list = list("Hello")
str2tuple = tuple("Hello")
range2list = list(range(5))
range2tuple = tuple(range(5))
tuple2list = list((1, 2, 3))
list2tuple = tuple([1, 2, 3])
concatenated_tuple = (1,) + (2, 3)
concatenated_list = [1, 2] + [3]
duplicated_tuple = (1,) * 2
duplicated_list = 2 * [1]

**Exercise** 

Explain the difference between following two expressions. Why a singleton tuple must have a comma after the item.

In [None]:
print((1 + 2) * 2, (1 + 2,) * 2, sep="\n")

YOUR ANSWER HERE

**How to use a rule to construct a tuple/list?**

We can specify the rule using a [comprehension](https://docs.python.org/3/reference/expressions.html#index-12),  
which we have used in a [generator expression](https://docs.python.org/3/reference/expressions.html#index-22).  
E.g., the following is a python one-liner that returns a generator for prime numbers.

In [None]:
all?
prime_sequence = lambda stop: (
    x for x in range(2, stop) if all(x % divisor for divisor in range(2, x))
)
print(*prime_sequence(100))

There are two comprehensions used:
- In `all(x % divisor for divisor in range(2, x))`, the comprehension creates a generator of remainders to the function `all`, which returns `True` if all the remainders are non-zero else `False`.
- In the return value `(x for x in range(2, stop) if ...)` of the anonymous function, the comprehension creates a generator of numbers from 2 to `stop-1` that satisfy the condition of the `if` clause. 

**Exercise** 

Use comprehension to define a function `composite_sequence` that takes a non-negative integer `stop` and returns a generator of composite numbers strictly smaller than `stop`. Use `any` instead of `all` to check if a number is composite.

In [None]:
any?
# YOUR CODE HERE
raise NotImplementedError()

print(*composite_sequence(100))

We can construct a list instead of a generator using [list comprehension](https://docs.python.org/3/glossary.html#term-list-comprehension):

In [None]:
[x ** 2 for x in range(10)]  # Enclose comprehension by brackets

**Is the list comprehension the same as applying `list` to a generator expression?**

In [None]:
list(x ** 2 for x in range(10))  # Enclose comprehension by brackets

List comprehension is more efficient as it does not need to create generator first:

In [None]:
%%timeit
[x ** 2 for x in range(10)]

In [None]:
%%timeit
list(x ** 2 for x in range(10))

**Exercise** 

The following are two different ways to use comprehension to construct a tuple. Which one is faster? Try predicting the results before running them.

In [None]:
%%timeit
tuple(x for x in range(100))

In [None]:
%%timeit
tuple([x for x in range(100)])

YOUR ANSWER HERE

With list comprehension, we can simulate a sequence of biased coin flips.

In [None]:
from random import random as rand

p = rand()  # unknown bias
coin_flips = ["H" if rand() <= p else "T" for i in range(1000)]
print("Chance of head:", p)
print("Coin flips:", *coin_flips)

We can then estimate the bias by the fraction of heads coming up.

In [None]:
def average(seq):
    return sum(seq) / len(seq)


head_indicators = [1 if outcome == "H" else 0 for outcome in coin_flips]
fraction_of_heads = average(head_indicators)
print("Fraction of heads:", fraction_of_heads)

```{note}

`sum` and `len` returns the sum and length of the sequence.
```

**Exercise** 

Define a function `variance` that takes in a sequence `seq` and returns the [variance](https://en.wikipedia.org/wiki/Variance) of the sequence.

In [None]:
def variance(seq):
    # YOUR CODE HERE
    raise NotImplementedError()


delta = (variance(head_indicators) / len(head_indicators)) ** 0.5
print("95% confidence interval: [{:.2f},{:.2f}]".format(p - 2 * delta, p + 2 * delta))

## Selecting items in a sequence

**How to traverse a tuple/list?**

Instead of calling the dunder method directly, we can use a for loop to iterate over all the items in order.

In [None]:
a = (*range(5),)
for item in a:
    print(item, end=" ")

To do it in reverse, we can use the `reversed` function.

In [None]:
reversed?
a = [*range(5)]
for item in reversed(a):
    print(item, end=" ")

We can also traverse multiple tuples/lists simultaneously by `zip`ping them.

In [None]:
zip?
a = (*range(5),)
b = reversed(a)
for item1, item2 in zip(a, b):
    print(item1, item2)

**How to select an item in a sequence?**

```{important}

Sequence objects such as `str`/`tuple`/`list` implements the [*getter method* `__getitem__`](https://docs.python.org/3/reference/datamodel.html#object.__getitem__) to return their items.
```

We can select an item of a sequence `a` by [subscription](https://docs.python.org/3/reference/expressions.html#subscriptions) 
```Python
a[i]
``` 
where `a` is a list and `i` is an integer index.

A non-negative index indicates the distance from the beginning.

$$\boldsymbol{a} = (a_0, ... , a_{n-1})$$

In [None]:
%%optlite -h 500
a = (*range(10),)
print(a)
print("Length:", len(a))
print("First element:", a[0])
print("Second element:", a[1])
print("Last element:", a[len(a) - 1])
print(a[len(a)])  # IndexError

```{caution}
`a[i]` with `i >= len(a)` results in an `IndexError`. 
```

A negative index represents a negative offset from an imaginary element one past the end of the sequence.

$$\begin{aligned} \boldsymbol{a} &= (a_0, ... , a_{n-1})\\
& = (a_{-n}, ..., a_{-1})
\end{aligned}$$

In [None]:
%%optlite -h 500
a = [*range(10)]
print(a)
print("Last element:", a[-1])
print("Second last element:", a[-2])
print("First element:", a[-len(a)])
print(a[-len(a) - 1])  # IndexError

```{caution}
`a[i]` with `i < -len(a)` results in an `IndexError`. 
```

**How to select multiple items?**

We can use [slicing](https://docs.python.org/3/reference/expressions.html#slicings) to select a range of items as follows:
```Python
a[start:stop]
a[start:stop:step]
```

The selected items corresponds to those indexed using `range`:

```Python
(a[i] for i in range(start, stop))
(a[i] for i in range(start, stop, step))
```

In [None]:
a = (*range(10),)
print(a[1:4])
print(a[1:4:2])

Unlike `range`, the parameters for slicing take their default values if missing or equal to None:

In [None]:
a = [*range(10)]
print(a[:4])  # start defaults to 0
print(a[1:])  # stop defaults to len(a)
print(a[1:4:])  # step defaults to 1

The parameters can also take negative values:

In [None]:
print(a[-1:])
print(a[:-1])
print(a[::-1])  # What are the default values used here?

A mixture of negative and postive values are also okay:

In [None]:
print(a[-1:1])      # equal [a[-1], a[0]]?
print(a[1:-1])      # equal []?
print(a[1:-1:-1])   # equal [a[1], a[0]]?
print(a[-100:100])  # result in IndexError like subscription?

**Exercise** (Challenge) 

Complete the following function to return a tuple `(start, stop, step)` such that `range(start, stop, step)` gives the non-negative indexes of the sequence of elements selected by `a[i:j:k]`.

```{hint}

See [note 3-5 in the python documentation](https://docs.python.org/3/library/stdtypes.html#common-sequence-operations).
```

In [None]:
def sss(a, i=None, j=None, k=None):
    # YOUR CODE HERE
    raise NotImplementedError()
    return start, stop, step


a = [*range(10)]
assert sss(a, -1, 1) == (9, 1, 1)
assert sss(a, 1, -1) == (1, 9, 1)
assert sss(a, 1, -1, -1) == (1, 9, -1)
assert sss(a, -100, 100) == (0, 10, 1)

**Exercise** 

With slicing, we can now implement a practical sorting algorithm called [quicksort](https://en.wikipedia.org/wiki/Quicksort) to sort a sequence. Explain how the code works:

In [None]:
def quicksort(seq):
    """Return a sorted list of items from seq."""
    if len(seq) <= 1:
        return list(seq)
    i = random.randint(0, len(seq) - 1)
    pivot, others = seq[i], [*seq[:i], *seq[i + 1 :]]
    left = quicksort([x for x in others if x < pivot])
    right = quicksort([x for x in others if x >= pivot])
    return [*left, pivot, *right]


seq = [random.randint(0, 99) for i in range(10)]
print(seq, quicksort(seq), sep="\n")

YOUR ANSWER HERE