# Compound Data Types

Our purpose here is to discuss the basic properties of compound data types in Python. In Python, it is helpful to consider what is "built-in" and what must be imported using an `import` statement. It is commonly accepted that Python's place in the contemporary scientific computing space is due in large part to `numpy` and its associated libraries -- the Python Numerical Stack -- all of which are not part of "pure" Python and must be imported. For the moment, however, let us set these aside and consider only that which is "built-in".

Note: throughout we will make use of the `assert` statement. `assert` statements are written using the form 

```
assert expression
```

or

```
assert expression, message
```

If `expression` evaluates to `True`, `assert` will do nothing. 

In [1]:
assert True

If `expression` evaluates to false, `assert` will raise an `AssertionError` with `message`. Where we expect that to happen, it is best to "catch" this `AssertionError` and its associated error message using a `try-except` clause.

In [2]:
try:
    assert False, "This is False!"
except AssertionError as error_message:
    print(error_message)

This is False!


## Constants

There are a handful of values which simply exist as constants. For now, we need only these three:

- `False`
- `True`
- `None`

There is a strong intuition for the meaning of each of these. For example, it should be clear that `False` and `True` are opposites of each other. 

In [3]:
assert False != True

It is also clear that `None` and `False` are not the same thing.

In [4]:
assert False != None

In [5]:
assert None is not False

Less intuitive is the way `None` is interpreted in a conditional check.

In [6]:
try:
    assert None, "This is False!"
except AssertionError as error_message:
    print(error_message)

This is False!


In [7]:
assert not None

So while `None` is not `False`, it is interpreted as `False` in some contexts. This may seem a bit arbitrary. Rest assured that `None` equating to `False` will always behave the same way and this behavior should be intuitive.

## Numeric Types

While these constants will only ever have a single, immutable or non-changing value, numeric types can be changed, that is they are mutable, and can take on a range of values.

Let us assign **value** to a few variables using the assignment operator `=`. 

In [8]:
a = 1
b = 2.4
c = 3 + 3j

Note that we have not specified the type of each variable prior to assignment. In Python, this is not done, as it is in many other programming languages.

In [9]:
a, b, c

(1, 2.4, (3+3j))

In spite of not having specified a type for each variable, Python was able to correctly infer the type. 

In [10]:
type(a), type(b), type(c)

(int, float, complex)

It is worth considering for a moment that these variables are displayed from most restrictive to least restrictive data type. In other words, we can turn `a` into a `float`, but can not turn `b` into an `int`, without losing information.

In [11]:
float(a), int(b)

(1.0, 2)

And Python will simply complain if we try to change `c`.

In [12]:
try:
    float(c)
except TypeError as error_message:
    print(error_message)

can't convert complex to float


## The Set

We might think of the `set` as the simplest **compound** datatype in Python. By compound we mean a single object that contains many values. In Python, a `set` is a single object containing many *unique* values.

The `set` type was not always a part of Python. The Python Enhancement Proposal or PEP proposing to add the `set` type can be read here: https://www.python.org/dev/peps/pep-0218/

In [13]:
type(set)

type

Mathematically speaking, a set is a collection of distinct (unique) object. This is essentially a statement of the two basic properties of a `set`: 

1. a `set` is composed of elements
1. these elements are unique

In [14]:
A = {a, b, 2.4, c}

Thus, the following boolean expression is `True`

In [15]:
assert 1 in A

Another way to say `1 in A` is to say that 1 is an **element** of `A`

$$1\in A$$

Of note is the $\LaTeX$ expression used to display this

```
$$1 \in A$$
```

as this connects the idea of membership across multiple languages (Python, $\LaTeX$) and paradigms (computational, mathematical). Regardless of which perspective we take, the critical idea is that 1 is **in** $A$. 

This is the main idea of the `set`: **membership**. Elements are members of a `set` and a `set` contains elements.

### Cardinality

The number of members of a `set` is called its **cardinality**. Note that in defining the `set A` we listed four values, `a`, `b`, `2.4`, and `c`. But the cardinality of `A` is 3.

In [16]:
len(A)

3

Recall that `b` has the value `2.4`.

In [17]:
b

2.4

Thus, in displaying the contents of the set `A`, we note that the repeated value `2.4` (which is also the value associated with `b`) is included but once. An element can only be a member of a set once.

In [18]:
A

{(3+3j), 1, 2.4}

### Set Equality

We can think of sets in terms of their equality

In [19]:
B = {1, 1, 1, 2.4, 3+3.0j}

We note that even though the definitions of `A` and `B` are different, these two sets are in fact equal.

In [20]:
assert A == B

In [21]:
C = {a, c}

`C` is not equal to `A`.

In [22]:
assert A != C

Certainly this is because `C` does not contain the element `2.4`.

In [23]:
assert b not in C

We might make note of the fact that every element in `c` is in `a`

In [24]:
for element in C:
    print(element, element in A)

1 True
(3+3j) True


Because every element in `C` is also an element of `A`, we can say that `C` is a subset of `A`.

In [25]:
assert C.issubset(A)

We can also write this as 

In [26]:
assert C <= A

It is useful to note that `a` is a subset of itself

In [27]:
assert A <= A

This actually leads to a definition of **set equality**. 

> Two sets are equal to each other if and only if each is a subset of the other.

In [28]:
assert A <= B
assert B <= A

In [29]:
def is_equal(set_1, set_2):
    if (set_1 <= set_2) and (set_2 <= set_1):
        return True
    return False

In [30]:
assert is_equal(A, B)

In [31]:
assert not is_equal(A, C)

Another way to say this is that two sets are equal, if and only if they have the exact same members. The only important consideration in their equality is membership.

## Homogenous Sets

The sets we have been using have been of heterogenous type. Both `A` and `B` contain elements of type `int`, `float`, and `complex`. We might be interested in a set that can only contain elements of the same type, a homogenous `set`. We can extend the `set` class in order to create a new compound type called `HomogenousSet`.

We start by simply extending the `set` class. We will extend the `set` class by defining `HomogenousSet` so that it [inherits](https://docs.python.org/3/tutorial/classes.html#inheritance) from `set`. In Python, this is done with the following syntax:

```
class DerivedClassName(BaseClassName):
    <statement-1>
    .
    .
    .
    <statement-N>
```

Thus, in the following definition, `HomogenousSet` inherits from `set`.


In [32]:
class HomogenousSet(set):
    def __init__(self, *args):
        set.__init__(self, *args)

The `__init__` (called "dunder-init") method is called during the creation of a new object. In other words, when we create a new `HomogenousSet`, the first thing that the new `HomogenousSet` will do is call its own `__init__` function. 

Because we have inherited from the `set` class, we have all of the `set` class' methods available to use, including its `__init__()` method. We use this in our class definition, so that `HomogenousSet` initializes itself by calling the initialization function of the `set` class. 

In [33]:
homogenous_set_1 = HomogenousSet((1,2,3))
homogenous_set_2 = HomogenousSet((1,2.4,3+3j))

In [34]:
homogenous_set_1

{1, 2, 3}

In [35]:
homogenous_set_2

{(3+3j), 1, 2.4}

At this point, `HomogenousSet` behaves exactly as `set` does. In order to add an additional restriction, we must add to the `__init__()` method. The restriction is that all elements must have the same type. This means that we must check the type of each element.

For `homogenous_set_1`, we print the type of each element and indeed they are all the same.

In [36]:
for element in homogenous_set_1:
    print(type(element))

<class 'int'>
<class 'int'>
<class 'int'>


For `homogenous_set_2`, we print the type of each element, and as expected they are not the same. 

In [37]:
for element in homogenous_set_2:
    print(type(element))

<class 'int'>
<class 'float'>
<class 'complex'>


Next, we add to the loop to verify that the type of each element is the same.

In [38]:
base_type = None

for element in homogenous_set_1:
    if base_type is None:
        base_type = type(element)
    elif base_type is not type(element):
        print("This set is heterogenous in type.")
print("This set is homogenous in type.")        

This set is homogenous in type.


In [39]:
base_type = None

for element in homogenous_set_2:
    if base_type is None:
        base_type = type(element)
    elif base_type is not type(element):
        print("This set is heterogenous in type.")
print("This set is homogenous in type.")        

This set is heterogenous in type.
This set is heterogenous in type.
This set is homogenous in type.


This is not exactly what we want. 

In [40]:
base_type = None
homogenous = True
for element in homogenous_set_2:
    if base_type is None:
        base_type = type(element)
    elif base_type is not type(element):
        homogenous = False
        print("This set is heterogenous in type.")
        break
if homogenous: print("This set is homogenous in type.")        

This set is heterogenous in type.


We can use this logic to create a function, `type_is_homogenous` that does this check. As we are wrapping this in a function, we can simply `return` when we find that the set is heterogenous.

In [42]:
def type_is_homogenous(defined_set):
    base_type = None
    
    for element in defined_set:
        if base_type is None:
            base_type = type(element)
        elif base_type is not type(element):
            return False
    return True

In [43]:
assert type_is_homogenous(homogenous_set_1)

In [44]:
assert not type_is_homogenous(homogenous_set_2)

### Add Condition to `HomogenousSet`

We can now add this function and use it as a condition check when the initialization function is called. As this is called during the creation of an `HomogenousSet` object, we can not simply `return` a `False`. We actually have to raise an error during the creation. If the new `HomogenousSet` object fails the `type_is_homogenous` check, we will raise a `TypeError` with the message:

```
"All elements of the set must have the same type."
```

In [45]:
class HomogenousSet(set):
    def __init__(self, *args):
        set.__init__(self, *args)
        self.type_is_homogenous()
                
    def type_is_homogenous(self):
        base_type = None

        for element in self:
            if base_type is None:
                base_type = type(element)
            elif base_type is not type(element):
                raise TypeError("All elements of the set must have the same type.")


In [46]:
homogenous_set_1 = HomogenousSet((1,2,3))

In [47]:
homogenous_set_2 = HomogenousSet((1., 2., 10.))

In [48]:
homogenous_set_3 = HomogenousSet((1,2,10.))

TypeError: All elements of the set must have the same type.