# Session 6: OOP and data structures

*Data Structures and Algorithms*

*Achyuthuni Sri Harsha*

------------------------------------------------------------------------

Everything in Python is an "object". But what are objects? We will
introduce the fundamental concept of object-oriented programming, and
talk about how this links to the design of data structures. We will
introduce key linear data structures, and learn to define our own data
structures.

We will also introduce some essential Python libraries: numpy for
numerical computing and matplotlib for plotting. Finally, we will learn
to work with Jupyter Notebook, a browser-based Python interface that
combines code with reporting features.

------------------------------------------------------------------------

## Preparation:

**Readings:**

Guttag: Chapter 8.

OR

Chithul, Swaroop. A Byte of Python.

-   Object Oriented Programming

-   <https://python.swaroopch.com/oop.html>

AND (for numpy and matplotlib):

VanderPlas, Jake. Python Data Science Handbook.

-   <https://github.com/jakevdp/PythonDataScienceHandbook>

-   Chapter 2

***Optional readings:***

Rougier, Nicolas. Numpy tutorial.

-   <http://www.labri.fr/perso/nrougier/teaching/numpy/numpy.html>

**Questions:**

Please read the material above, and think about how you would explain to
your classmates:

-   What is object-oriented programming, and what are its main benefits?

-   What are classes, objects, and instances?

-   What do leading underscores signify in Python?

------------------------------------------------------------------------

## Recap

## What is an object?

## Defining objects: monsters

We've started building a data structure for pocket monsters in a
hypothetical game where you collect monsters and use them to fight other
monsters. A pocket monster has a number of attributes, like a name, a
type, hit points and combat points. We can represent a pocket monster by
creating variables like this:

In [1]:
monster_name = 'Pikachu'
combat_points = 30
hit_points = 13
monster_type = 'lightning'

However, if we want to have many monsters, we would need too many
variables. We could then try storing the monsters in a dictionary:

In [2]:
monsters = {'Pikachu':[20,53,'electric'], 'Squirtle':[82,90,'water'], 'Mew':[1940,599,'cat']}

But this may also become unwieldy. What if we want to add different
kinds of attacks for each of the monsters? These attacks would have
their own characteristics (name, damage, etc.). We would likely end up
with dictionaries of attack attributes nested within lists of monster
details within dictionaries of monsters. This may get tricky, but
further, we would also like to use these attacks to have the monsters
battle each other in the game. So, we need to create actions on the
monster data: for example, a monster may become hurt in a battle, so we
need functionality to reduce its health. We'd like to have a way to
organize monster data and *actions* on these data together in a
consistent and convenient manner.

So far, we've organized our programs around functions which operate on
data. This is called *procedural programming*, and works well for most
medium-sized projects. Now, with monsters, we have a problem where we'd
like to combine both monster *data* and *functionality* together. In
Python, we can do this inside an **object**. Instead of storing monster
data and functions, we'll define monsters as a **class**.

Classes and objects are key components of object-oriented programming. A
class defines a new *type* of object for Python, and objects are
*instances* of this type. We have already seen many types of built-in
Python objects, such as strings, floats, and dictionaries. The type
`str`, for example, is a class, and `'Hello'` is an object that is an
instance of the `str` type.

Objects combine a representation of data with functionality related to
that data. A string object, for example, includes data consisting of a
sequence of characters, and methods like `str.upper()` to manipulate the
data. More generally, data can be stored in objects in variables defined
inside the object that are said to belong to the object. These data are
often called attributes or fields. Similarly, functions defined inside
the object are said to belong to the object and called methods.

Now we're adding our own object type: a pocket monster. The syntax for
defining a new class `Monster` is as follows.

In [3]:
class Monster(object):
    pass # block of code, no content yet

pika = Monster()
print(pika)

<__main__.Monster object at 0x0000027E2C0A7370>


The class is defined using the `class` keyword, followed by its name.
Here we specify that the class is a type of `object`, which is a very
general Python construct. This is followed by a colon and an indented
block of code which contains class details, though nothing yet. We'll
define all classes in this way.

When we define the `Monster` class, we make this type of objects
available for our use, just like strings or lists.

We've then created an instance of the class using the class name
followed by parenthesis, and assigned it to the variable `pika` (we
don't need `object` here). The print function shows that the result is
indeed a `Monster` object, which is stored at a specific memory
location.

Let's add functionality to the class by creating a method.

In [4]:
class Monster(object):
    def print_monster(self):
        print('A monster.')

pika = Monster()
pika.print_monster()

A monster.


A method is a function defined just like before, but within the class.
There is one evident difference to our earlier functions: the `self`
parameter. We use `self` as the first parameter of all functions defined
inside a class. However, when calling the function, we omit this
parameter.

Why do we use `self`? The answer is that it refers to the object making
the function call. Above, we have defined the class `Monster` and
created the instance `pika`, and then call `pika.print_monster()`. When
we do this, Python sees that the `pika` object is of type `Monster`. It
finds that class, and automatically converts the call to
`Monster.print_monster(pika)`. In brief, whenever you see `self` in the
class definition, you can mentally replace it with the instance itself,
in this case `pika`.

This use of `self` becomes clearer when we start storing data within
objects. Let's start by giving our monster a name.

In [5]:
class Monster(object):
    def __init__(self, name):
        self.name = name
    def print_monster(self):
        print('A monster called {}.'.format(self.name))

Here we define another method: the `__init__` function (with two
underscores on each side) is a special function used in classes. It is
called when we create a new object of the `Monster` type. Here we give
the function a `name` parameter, which it attaches it to the new monster
we create as a data field `self.name`.

With data fields, we use the notation `self.name` to refer to the
specific object we're dealing with. That is, `name` is a local variable
defined as the function parameter, and `self.name` is something called
`name` inside the object called `self`.

Let's define our monster again, this time giving it a name.

In [6]:
pika = Monster('Pikachu')
print(pika.name)
pika.print_monster()

Pikachu
A monster called Pikachu.


The monster now has a name as an attribute. Note how we can mentally
replace `self` in the class definition with `pika` when referring to the
object's attributes.

Creating our monster works just like creating any new object. Compare
the above to creating a list.

In [7]:
a_list = list() # creates empty list
print(a_list)
a_list.append(1)
print(a_list)

[]
[1]


Let's add more attributes to our monster class.

In [8]:
class Monster(object):
    def __init__(self, name, monster_type, combat_points, hit_points):
        self.name = name
        self.monster_type = monster_type
        self.combat_points = combat_points # strength in combat
        self.hit_points = hit_points # max health
        self.health = hit_points # current health

    def print_monster(self):
        print('A monster called {}.'.format(self.name))

pika = Monster('Pikachu', 'electric', 100, 80)
print(pika.hit_points)

80


Now whenever we create a new Monster, it will have these data attributes
attached to it. Next, we will build up ways to manipulate these data. We
will do this by adding more methods to the class.

In [9]:
class Monster(object):
    def __init__(self, name, monster_type, combat_points, hit_points):
        self.name = name
        self.monster_type = monster_type
        self.combat_points = combat_points # strength in combat
        self.hit_points = hit_points # max health
        self.health = hit_points # current health

    def print_monster(self):
        print('A monster called {}.'.format(self.name))

    def get_health(self):
        return self.health

    def get_hit_points(self):
        return self.hit_points

    def print_health_status(self):
        if self.health <= 0:
            print('{} is knocked out!'.format(self.name))
        else: print('{0} has {1} health left.'.format(self.name, self.health))

    def hurt(self, damage):
        """
        Reduce monster health by damage to a minimum of zero
        """
        self.health = max(self.health - damage, 0)
        print('{} is hurt!'.format(self.name))
        self.print_health_status()

    def __str__(self):
        return self.name


pika = Monster('Pikachu', 'electric', 100, 80)
pika.hurt(5)

Pikachu is hurt!
Pikachu has 75 health left.


The class now includes a host of other functions. Many of these simply
return the different attributes of the monster. The `__str__` function
is a special function that gets called whenever we call a string
representation of a monster with `str(pika)` or `print(pika)`.

It is a safe programming practice to use functions like this instead of
directly referring to a monster's attributes:

In [10]:
pika = Monster('Pikachu', 'electric', 100, 80) 
pika.get_health() == 80 # safe practice
pika.health == 80 # not as safe practice - we might accidentally change value if we omit one equals sign

True

In more complex projects, using such functions would help make sure that
these values are not accidentally changed. Indeed, some programming
languages erect such of walls around object attributes by default.
Python instead gives you direct access to the attributes. We are,
however, not going to worry too much about these practices in this
module.

In more complex projects, using such functions would help make sure that
these values are not accidentally changed. Indeed, some programming
languages erect such of walls around object attributes by default.
Python instead gives you direct access to the attributes. We are,
however, not going to worry too much about these practices in this
module.

### Healing monsters

Let's add more methods to Monsters. We've already created a function
called `hurt`, which reduces a monster's health. Now let's implement the
method `heal`, which similarly increases its health. Note however that
*a monster's health cannot exceed its hit points*. Implement the method
`heal` in the `Monster` class in `ses06.py`. Your method should increase
the health as desired and also print out the monster's health status
similarly to the `hurt` function.

> **Important.** Whenever you update the code in the Monster class,
> you'll need to run it again (eg `F5` in Spyder) so that Python knows
> to load up your changes. You'll similarly need to recreate all your
> Monster objects to reflect the changes you've made.

## More monsters

### Attack\!

Next, let's add attacks for the monsters so they can battle. Following
the design of the Monster class, we'll design a class `Attack` which
defines attacks in `ses06.py`.

An attack has the following attributes:

-   `name`, a string
-   `attack_type`, a string
-   `damage`, a float

The stub code has the following functions to be completed:

-   `__init__` is used when a new `Attack` object instance is created.
    It stores the input attributes.
-   `get_attack_type` returns the attack type.
-   `get_damage` returns the attack damage.
-   `__str__` returns the name (this is already implemented and will be
    called if you use `print(attack)`)

Now you have designed a data structure to capture the attributes of
attacks. But the monsters don't know how to use them yet... next, update
your implementation of the `Monster` class so that the monsters can use
these attacks to fight each other. Implement the function
`use_attack(self, attack, target_monster)` within the `Monster` class
that uses an attack on another monster. This function should `hurt` the
other monster with the current monster's `damage`, modified by their
relative strength in combat. See the function docstring for details on
how to calculate the exact damage.

## Data structures

## Queues

Let's move from pocket monsters back to efficient algorithm design. In
the lecture, we looked briefly into the mechanics of implementing linear
data structures, specifically lists. There are various ways of doing
this, the simplest of which are an array and a linked list. We saw that
the efficiency of list operations depends on its implementation. The
difference is that an array keeps track of the absolute positions of
elements while a linked list only tracks their relative positions.
Adding an element to a linked list is therefore constant time, but in an
array the complexity depends on the position of the element. If the
element is appended to the end of the list, this is still constant time,
but adding to the beginning is \$O(n)\$ as all other elements need to be
moved forward by one position. However, accessing a linked list by index
is \$O(n)\$ while any position in an array can be accessed in \$O(1)\$.

In this exercise, you'll implement another linear data structure: a
queue. Queues work much as you would expect from your experience in the
college cafe: you add items to one end of the queue, and remove them
from the other end. Queues are not only everywhere in the world
(especially in London), but their design is crucial to operations in
many service industries. Furthermore, we will see in the following
sessions that a queue is a useful abstraction in algorithm design. Its
advantage is that when efficiently implemented, its operations (adding
and removing items) are both constant \$O(1)\$ running time.

### Queue using list (optional exercise)

The Python file `ses06_extra.py` asks you to complete two
implementations of a queue. The first, `ListQueue`, uses the Python
`list` to build a queue. It has four methods.

-   `__init__` is used when a new `ListQueue` object is created. It
    creates an empty list of items in the queue.
-   `is_empty` checks whether the queue is empty.
-   `enqueue` adds an item to the queue (in the end of the list)
-   `dequeue` removes an item from the queue (from the beginning of the
    list)

Note how the methods again refer to the `ListQueue` object itself using
the keyword `self`.

In [11]:
class ListQueue(object):
    """ 
    Queue using list

    Supports inserting and deleting nodes
    """
    def __init__(self):
        """Initialise queue as list"""
        self.items=[]

    def is_empty(self):
        """
        Checks if queue is empty
        """
        pass # replace with code

    def enqueue(self, item): 
        """ 
        Insert an element into the end of the queue (list)
        """
        pass # replace with code

    def dequeue(self):
        """
        Remove the first element of the queue, return it
        """
        return self.items.pop(0) 

Complete the class `ListQueue` to work correctly.

### Question 7: Queue of Nodes

The problem with the above implementation is that adding items to the
beginning of the list is computationally expensive. Your task is to
implement a more efficient queue class without using a Python list.
You'll do this by creating a queue of *nodes*. A `Node` class
implementation is given below. In brief, a node is a simple data
structure that stores unspecified data in the variable `stuff`, as well
as information on what other node (if any) it is connected to.

In [12]:
class Node(object):
    """ 
    Node: contains unspecified data in stuff and link to next Node
    """
    def __init__(self, stuff=None, next_node=None): 
        # this will run when you create a new Node
        self.stuff = stuff
        self.next_node  = next_node

    def __str__(self): # this will run when you use print() on a node
        return str(self.stuff)

Try creating a few nodes with string inputs.

The Python file contains a skeleton implementation of a `Queue` class.
Your task is to complete it so that it performs the same operations as
the `ListQueue` class but creating the queue from `Node` objects by
linking them together. In order to quickly add and remove elements, your
class should keep track of which node is first and last in the queue,
the length of the queue, as well as the next node for each node. Here's
how it works.

-   Suppose we start with an empty queue, and we add a node \$a\$. Now
    \$a\$ is both the first and the last item in the queue, but since
    there are no other items, \$a\$ does not have a next node. The
    length of our queue is one.
-   Now let's add another node \$b\$. The node \$a\$ now needs to point
    to \$b\$ as the next node. Node \$a\$ is still first in the queue,
    but the last item needs to be updated to be \$b\$. The last node
    does not point to another node. The length of the queue is increased
    to two.
-   Let's remove a node. This should return the `stuff` inside the first
    node, \$a\$. Now \$b\$ becomes the first node, and the length of the
    queue is decreased to one.

Note you can pass `stuff` directly as input when adding to the queue, as
the `Node` is created within the queue. This kind of abstraction is
often useful: the user does not have to worry about the internal
operations of your data structure.

The class also contains a `__str__` method for you to complete. This is
useful for printing out your queue status.

Complete the class `Queue` to work correctly.

After you have completed the class, try out the queue and compare the
speed of the enqueueing operations of the two classes. Python has a
built-in data structure `deque` which implements a slightly more
complicated data type called *double-ended queue*, giving an efficient
way to use queues. Look for more information on double-ended queues on
e.g. Wikipedia.

## Introduction to Numpy

Please complete the compulsory exercises in the notebook
`ses06_numpy.ipynb`.

## All done\!

## Review

Object-oriented programming provides a new way of organising our data -
for example pocket monsters - together with the procedures associated
with them.

Check your understanding of the concepts with these questions:

-   What do we mean by a class and an instance?
-   How do we add attributes and methods to an object?
-   What do the `__init__` and `__str__` methods do, and when are they
    called?
-   How is the keyword `self` used?

There is a lot more to object-oriented programming than what we have
covered here. Here are some more resources about OOP:

-   [Python-course.eu](https://www.python-course.eu/python3_object_oriented_programming.php)
    is quite comprehensive on OOP
-   [Real Python's
    tutorial](https://realpython.com/python3-object-oriented-programming/#how-to-define-a-class-in-python)
    on OOP
-   [Jeff
    Knupp](https://jeffknupp.com/blog/2014/06/18/improve-your-python-python-classes-and-object-oriented-programming/)
    on OOP

If you end up doing more OOP, one (advanced) thing to note is that the
way we have implemented the "get" methods is not the ideal or "pythonic"
way of providing an interface to object attributes. If we'd like a class
to have private attributes, we would typically make these attributes
"properties". To understand how properties work, we first need to know
what "decorators" are in Python.

-   [Real Python's tutorial on
    decorators](https://realpython.com/primer-on-python-decorators/)
-   [Python-course.eu](https://www.python-course.eu/python3_properties.php)
    on properties