## 6.1 Defining data types

Before looking at how to implement the sequence ADT,
let's start with a simpler example that illustrates the difference between
a data structure and a data type, and how to define a data type in Python.

Let's suppose we want to define and implement an ADT for fractions $\frac{x}{y}$,
where $x$ and $y$ are integers, with $x$ being the numerator and
$y ≠ 0$ being the denominator.

<div class="alert alert-info">
<strong>Info:</strong> MU123 Unit&nbsp;3 Section&nbsp;2 introduces fractions and their operations.
</div>

Programming languages have special syntax for literals of built-in
data types, like integers and strings and other sequence types.
However, for our own types we need to define constructor operations
that create values of the new type from values of other types, e.g.
like Python's constructor to create strings from integers.
To highlight such operations,
we shall use 'Constructor' instead of 'Function' in the template.
Here's how two of the fraction ADT's operations could be defined.

**ADT**: fraction

**Constructor**: new fraction\
**Inputs**: _numerator_, an integer; _denominator_, an integer\
**Preconditions**: _denominator_ ≠ 0\
**Output**: _ratio_, a fraction\
**Postconditions**: _ratio_ = $\frac{numerator}{denominator}$

**Function**: multiplication\
**Inputs**: _left_, a fraction; _right_, a fraction\
**Preconditions**: true\
**Output**: _product_, a fraction\
**Postconditions**: if _left_ = $\frac{ln}{ld}$ and _right_ = $\frac{rn}{rd}$,
then _product_ = $\frac{ln \times rn}{ld \times rd}$

Let's see how to implement the fraction ADT in Python.

### 6.1.1 Data structure

The first step in implementing an ADT is to choose how to
structure the data. A fraction is represented by two integers,
so the obvious choice is to use a tuple or list with a pair of integers.
A tuple is a better choice, to prevent changes to the numerator or denominator.
Furthermore, we must state what each integer represents. It's probably
more intuitive for the first integer of the tuple to be the numerator and
the second to be the denominator.

Data structures vary widely and we can't fit their description into a template,
like we do for operations.
During your professional life you may need to explain what data structures you
use, verbally to a colleague, or in writing documentation, and so it's
important you communicate the structure of data in plain but clear English,
or whatever language you may use at work.
For this example, a good description would be:

> The data structure to represent a fraction is a tuple of two integers,
> the first being the numerator and the second the non-zero denominator.

### 6.1.2 Functions

Having decided the data structure, we can implement the operations.
Until now we implemented each operation with a Python function,
like the following. (I omit docstrings to keep this temporary solution short.)

In [1]:
def fraction(numerator: int, denominator: int) -> tuple:
    return (numerator, denominator)

def multiplication(left: tuple, right: tuple) -> tuple:
    return (left[0] * right[0], left[1] * right[1])

We use the constructor operation to create new values,
which are used by the other operations.

In [2]:
half = fraction(1, 2)
multiplication(half, half)  # one half times one half is a quarter

(1, 4)

This way of implementing ADTs is unsatisfactory because
it exposes the data structure to the user.
This allows the user to bypass the constructor and make (by mistake) calls like

In [3]:
multiplication((1, 2), (1, 3, 5))

(1, 6)

where the second argument doesn't represent a fraction.
Moreover, if we change the data structure then
the user will likely have to change their code. We need a better approach.

### 6.1.3 Classes

A data type is a collection of values and operations on those values.
Python and other languages have a construct to bundle together
data and functionality: **classes**. Each class implements a data type:
the values are the **instances** of the class and
the operations are the class's **methods**.
An **object** is an instance of some class, and that's why I used
the word 'object' in templates to denote any value.
For example, `5` and `[]` are objects: `5` is an instance of class `int` and
`[]` is an instance of class `list`, which has methods like `pop` and `append`.
Methods are usually directly called using the dot notation, but some are
indirectly called via operators, like `+` for addition or concatenation.

<div class="alert alert-info">
<strong>Info:</strong> Object-oriented programming is an approach that models a software system as a
collection of objects calling each other's methods.
M250 explains this approach at length.
</div>

Every instance has some variables, called **instance variables**,
to hold the data for that instance.
When defining a class we define the variables for all instances of that class.
Different instances typically have different values for their variables.
Let me show you how to define a class in Python. I explain the code afterwards.

In [4]:
class Fraction:
    """A ratio represented as a pair of integers:
    a numerator and a non-zero denominator.
    """

    def __init__(self, numerator: int, denominator: int) -> None:
        """Initialise the fraction.

        Preconditions: denominator != 0
        """
        self.value = (numerator, denominator)

    def multiplication(self, right: 'Fraction') -> 'Fraction':
        """Return the product of self and right.

        Postconditions: if self is the fraction sn/sd and
        right is the fraction rn/rd, then the output is fraction
        (sn*rn) / (sd*rd)
        """
        numerator = self.value[0] * right.value[0]
        denominator = self.value[1] * right.value[1]
        return Fraction(numerator, denominator)

The definition of a class `C` starts with `class C:` and
a docstring describing the data type being defined, followed by the methods,
which are defined like any Python function.
All methods are indented: they are 'within' the class.
The name of a class reflects what each instance represents, so
it's usually in the singular.
The names of built-in classes are usually in lowercase,
but the names of classes we define should use capitalised words
without underscores to separate them, e.g. `FractionalNumber`.

In M269, each class must have a method named `__init__`,
with two underscores at the start and at the end.
This method takes as first argument an instance of the class,
conventionally called `self`, and possibly additional arguments with data to
initialise that instance: here, the numerator and denominator of the fraction.
The body of the method creates the instance variables,
using dot notation to indicate that these variables 'belong' to the instance.
For every instance variable `x` there's an assignment to `self.x`.
In this example, I create an instance variable `value` that holds the
tuple with the two integers passed to the method.

Each built-in class, like `range` and `list`, has a constructor: a function
with the same name as the class to create instances of that class.
The Python interpreter automatically defines a constructor for every class
we define, with the same arguments as method `__init__`, except for `self`.
For this example, the interpreter creates a constructor `Function`
with two arguments, which we use to create new fractions.

In [5]:
one_half = Fraction(1, 2)

When we call the constructor, the interpreter creates an 'empty' instance,
without any data, and passes it as the first argument to `__init__`,
which creates the instance variables and assigns values to them.
The constructor (not the `__init__` method!) then returns the instance:
that's why there are no return statements in the `__init__` method.

We can use the `help` function to obtain information about any class.
Some of the information is of no relevance to M269.

In [6]:
help(Fraction)

Help on class Fraction in module __main__:

class Fraction(builtins.object)
 |  Fraction(numerator: int, denominator: int) -> None
 |  
 |  A ratio represented as a pair of integers:
 |  a numerator and a non-zero denominator.
 |  
 |  Methods defined here:
 |  
 |  __init__(self, numerator: int, denominator: int) -> None
 |      Initialise the fraction.
 |      
 |      Preconditions: denominator != 0
 |  
 |  multiplication(self, right: 'Fraction') -> 'Fraction'
 |      Return the product of self and right.
 |      
 |      Postconditions: if self is the fraction sn/sd and
 |      right is the fraction rn/rd, then the output is fraction
 |      (sn*rn) / (sd*rd)
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)



The `help` function copied the header of the `__init__` method
for the constructor. This gives the erroneous impression that
the constructor doesn't return anything. As an exception to the rule,
from now on we won't indicate the return type of the `__init__` method.

Methods are called with dot notation.
The interpreter looks up the class of the object to the
left of the dot and calls the method defined in that class.
The first argument of every method is therefore
an instance of the class being defined.
We conventionally call it `self` and don't indicate explicitly its type,
which is `Fraction` in this example.
Here's an example of calling a method.

In [7]:
one_half.multiplication(one_half)

<__main__.Fraction at 0x7ff4ddf4b070>

Unfortunately, the output message is not very useful.
The interpreter shows the class and unique id of the resulting object.
The id is an integer, shown here in hexadecimal notation.
Printing the fraction doesn't help either.

In [8]:
print(one_half.multiplication(one_half))

<__main__.Fraction object at 0x7ff4ddf4bca0>


If we want to see the value of an instance in a meaningful way,
we have to implement a method `__str__`
(again, two underscores before and after) that returns a string.
The `print` function calls the `str` constructor on each object it prints,
which in turn calls the `__str__` method on that object.

Here's the class again, with the additional method and without indicating
the return type for the initialisation method.

In [9]:
class Fraction:
    """A number represented as a pair of integers:
    a numerator and a non-zero denominator.
    """

    def __init__(self, numerator: int, denominator: int):
        """Initialise the fraction.

        Preconditions: denominator != 0
        """
        self.value = (numerator, denominator)

    def multiplication(self, right: 'Fraction') -> 'Fraction':
        """Return the product of self and right.

        Postconditions: if self is the fraction sn/sd and
        right is the fraction rn/rd, then the output is fraction
        (sn*rn) / (sd * rd)
        """
        numerator = self.value[0] * right.value[0]
        denominator = self.value[1] * right.value[1]
        return Fraction(numerator, denominator)

    def __str__(self) -> str:
        """Return a string representation of the fraction."""
        return str(self.value[0]) + ' / ' + str(self.value[1])

Now we can see the value of a fraction:

In [10]:
ONE_HALF = Fraction(1, 2)
ONE_QUARTER = ONE_HALF.multiplication(ONE_HALF)
ONE_QUARTER         # display class and unique id

<__main__.Fraction at 0x7ff4ddf6f4c0>

In [11]:
print(ONE_QUARTER)
str(ONE_QUARTER)

1 / 4


'1 / 4'

### 6.1.4 Mistakes

Whenever you change a class you must rerun code cells that create instances:
otherwise they remain instances of the old version of the class. Consider this:

In [12]:
print(half)         # created with fraction(1, 2); instance of tuple
print(one_half)     # created with first version of Fraction class
print(ONE_HALF)     # created with second version of Fraction class

(1, 2)
<__main__.Fraction object at 0x7ff4ddf4b8e0>
1 / 2


Unless you have rerun cells in a different order,
the middle output of the cell above isn't '1 / 2', because `one_half`
was created with the constructor for the class without the `__str__` method.

The name of a class becomes known only _after_ processing the class definition.
If we need to use the class name in the header of a method, then we must write it
as a string, as I've done for the `multiplication` method.
This irritating workaround will become unnecessary in Python&nbsp;3.10.
If you forget the string quotes you get a name error.

In [13]:
class Date:
    # docstrings and __init__ omitted to focus on the issue at hand

    def difference(self, other: Date) -> int:
        """Return number of days between two dates."""
        return 0    # dummy code

NameError: name 'Date' is not defined

As explained in [Section&nbsp;4.6.4](../04_Iteration/04_6_lists.ipynb#4.6.4-Modifying-lists),
method names are only known in the context of their class. Calling a method
as if it were a standalone function usually raises a name error.

In [14]:
__str__(ONE_QUARTER)

NameError: name '__str__' is not defined

In [15]:
ONE_QUARTER.__str__()

'1 / 4'

However, if a standalone function of the same name exists,
the interpreter will call that one.

In [16]:
multiplication(ONE_HALF, ONE_HALF)

TypeError: 'Fraction' object is not subscriptable

We get a type error because the indexing operation is not defined on fractions.
The standalone function expects two tuples, not two fractions.
`Fraction` and `tuple` are different types,
even though `Fraction` has an instance variable of type `tuple`.

Likewise, I cannot pass a tuple to a method expecting a fraction.

In [17]:
one_half.multiplication( (1, 2) )

AttributeError: 'tuple' object has no attribute 'value'

I get a special case of a name error: the interpreter is complaining that
tuples don't have an instance variable named `value`.
In Python, the instance variables and methods are a class's **attributes**.
If you have two attributes with the same name, e.g.
an instance variable and a method, you will get errors.
In the following incomplete example of a class for dates (day, month, year),
the name `day` refers both to an integer instance variable and a method.

In [18]:
class Date:

    def __init__(self):
        self.day = 1

    def day(self) -> int:
        return self.day

We can access the instance variable...

In [19]:
Date().day

1

...but not call the method.

In [20]:
Date().day()

TypeError: 'int' object is not callable

As the above example shows, Python doesn't prevent users of a class from
accessing its instance variables. Here's another example.

In [21]:
if 0 < ONE_HALF.value[0] < ONE_HALF.value[1]:
    print(ONE_HALF, 'is positive and smaller than 1')

1 / 2 is positive and smaller than 1


This code relies on a particular instance variable name and data structure
for representing fractions. If either changes, the code won't work.
For example, imagine we replace the tuple with two integer instance variables:

In [22]:
class Fraction:
    # docstrings omitted to focus on data structure changes

    def __init__(self, numerator: int, denominator: int):
        self.numerator = numerator
        self.denominator = denominator

    def multiplication(self, right: 'Fraction') -> 'Fraction':
        numerator = self.numerator * right.numerator
        denominator = self.denominator * right.denominator
        return Fraction(numerator, denominator)

    def __str__(self) -> str:
        return str(self.numerator) + ' / ' + str(self.denominator)

print(Fraction(1, 2).multiplication(Fraction(1, 3)))

1 / 6


Any code that uses the previous version of the class without
accessing the instance variables also works with this version,
because the **interface** of the class,
i.e. its methods and their headers, hasn't changed.

<div class="alert alert-warning">
<strong>Note:</strong> Only the class's methods should access the instance variables.
</div>

<div class="alert alert-info">
<strong>Info:</strong> Unlike Java, Python doesn't have access modifiers
to make instance variables private or protected.
</div>

Accessing instance variables from outside a class is poor programming practice.
If you need access to instance variables to solve a problem,
you may not have defined enough methods. For example, the `Fraction` class
should provide methods to return the numerator and denominator of a fraction.
I'm adding only one of them, to illustrate how methods can call each other.
Compare the following version to the previous one.

In [23]:
class Fraction:

    def __init__(self, numerator: int, denominator: int):
        self.top = numerator
        self.denominator = denominator

    def numerator(self) -> int:
        return self.top

    def multiplication(self, right: 'Fraction') -> 'Fraction':
        numerator = self.numerator() * right.numerator()
        denominator = self.denominator * right.denominator
        return Fraction(numerator, denominator)

    def __str__(self) -> str:
        return str(self.numerator()) + ' / ' + str(self.denominator)

print(Fraction(1, 2).multiplication(Fraction(1, 3)))

1 / 6


I renamed one instance variable, so that it doesn't have the same name as
the new method. Note that `x.y()` calls method `y` on instance `x`,
whereas `x.y` accesses variable `y` of instance `x`.

Assume that a second method to obtain the denominator is added. Can you think
of an advantage for the multiplication, string conversion and other methods to
be added to call the auxiliary methods instead of
directly accessing the instance variables?

____

If the data structure further changes, only the `__init_`, `numerator` and
`denominator` methods must change; all other methods on fractions
(multiplication, addition, string conversion, rounding, etc.) remain the same.

⟵ [Previous section](06-introduction.ipynb) | [Up](06-introduction.ipynb) | [Next section](06_2_static_array.ipynb) ⟶