Make sure you fill in any place that says `YOUR CODE HERE`. 

---

# Homework 4

*This* is a Python Notebook homework.  It consists of various types of cells: 

* Text: you can read them :-) 
* Code: you should run them, as they may set up the problems that you are asked to solve.
* **Solution:** These are cells where you should enter a solution.  You will see a marker in these cells that indicates where your work should be inserted.  

```
    # YOUR CODE HERE
```    

* Test: These cells contains some tests, and are worth some points.  You should run the cells as a way to debug your code, and to see if you understood the question, and whether the output of your code is produced in the correct format.  The notebook contains both the tests you see, and some secret ones that you cannot see.  This prevents you from using the simple trick of hard-coding the desired output. 

### Questions

There are two groups of questions: 
* Implementing addition and subtraction for sparse arrays
* Implementing equality for sparse arrays

### Working on Your Notebook

To work on your notebook, you can just work on `colab.research.google.com`.  Please don't download it and work directly on your laptop.  Working on Colab has two key features: 

* The notebook is shared with the TAs, tutors, and with the instructor.  So when you report that you have difficulties, they can open your notebook and help you. 
* The notebook preserves the revision history, which is useful for many reasons, among which that we can see how you reached the solution.

### Submitting Your Notebook

Submit your work as follows: 

* Download the notebook from Colab, clicking on "File > Download .ipynb".
* Upload the resulting file to [this Google form](https://docs.google.com/forms/d/e/1FAIpQLSewTzgk0xHAYLlDgh391kRthRvDnmBg1b-_cQibxSVt3_MuVg/viewform?usp=sf_link).
* **Deadline: Thursday October 8, 7pm.**

You can submit multiple times, and the last submittion before the deadline will be used to assign you a grade. 

In scientific computing and data science, it is often the case that arrays (and matrices) are sparsely populated: most elements have a default value, typically 0, except for a few. 
For instance, assume we wish to represent sentences (or portions of text) as vectors in an $N$-dimensional space, where $N$ is the number of words (including verb declensions, etc) in English.  In the vector $x$ representing a sentence, the component $x_i$ indicates how many times the word $i$ occurs in the sentence.
Since the number $N$ is large for English (we will be doing this with a word list containing over 60,000 words), and since every piece of text contains only a small subsest of all words, the vector $x$ representing the text would be mostly filled with 0s.  This would be clearly an inefficient representation! 

Here, we will develop a couple of implementations for sparse arrays.  If you are interested in using sparse arrays, the excellent Python package scipy contains an [excellent implementation of sparse matrices](https://docs.scipy.org/doc/scipy/reference/sparse.html). For now, we will have more fun re-implementing some of these for the case of arrays. 

## Non-Sparse Arrays
Before we implement sparse arrays, let's begin from the beginning, and implement non-sparse arrays.  This is of course a waste of time, because numpy has excellent array implementations.  However, it will be useful for us to understand what we need to implement in the sparse case.  Here is an implementation.

In [0]:
class UndefinedSizeArray(Exception):
    pass

class Array(object):
    
    def __init__(self, *args, initial_value=0., size=None):
        """If args are specified, they form the initial values for the array.  
        Otherwise, we need to specify a size."""
        if len(args) > 0:
            self.a = list(args)
        elif size is not None:
            # The * below is a trick to obtain a list of replicated elements.
            self.a = [initial_value] * size 
        else:
            # We have no idea how big the array should be.
            raise UndefinedSizeArray
    
    def __setitem__(self, i, x):
        """This implements the a[3] = method"""
        self.a[i] = x
    
    def __getitem__(self, i):
        """This implements the a[]"""
        return self.a[i]
    
    def __len__(self):
        """Implements the len() operator."""
        return len(self.a)
                    

In [2]:
a = Array(1, 2, 3, 5)
print(a[3])
a[2] = 23
print(a[2])

5
23


This is a beginning, but it's not very useful yet.  We would like to be able to do something more on an array, such as iterating over it, and adding it, subtracting it, and implementing other mathematical operators.  You can refer to the [Python data model](https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types) for the list of methods that needs to be implemented.  Here, for brevity, we will implement only addition and subtraction.  

In [0]:
class Array(object):
    
    def __init__(self, *args, initial_value=0., size=None):
        """If args are specified, they form the initial values for the array.  
        Otherwise, we need to specify a size."""
        if len(args) > 0:
            self.a = list(args)
        elif size is not None:
            # The * below is a trick to obtain a list of replicated elements.
            self.a = [initial_value] * size 
        else:
            # We have no idea how big the array should be.
            raise UndefinedSizeArray
            
    def __repr__(self):
        return repr(self.a)
    
    def __setitem__(self, i, x):
        """This implements the a[3] = method"""
        self.a[i] = x
    
    def __getitem__(self, i):
        """This implements the a[]"""
        return self.a[i]
    
    def __len__(self):
        """Implements the len() operator."""
        return len(self.a)
    
    def __iter__(self):
        for x in self.a:
            yield x
            
    def __add__(self, other):
        # This is just one way to implement add.  This implementation uses
        # the __len__, __getitem__, and __setitem__ methods above.
        r = Array(size=len(self))
        for i in range(len(self)):
            r[i] = self[i] + other[i]
        return r
            
    def __sub__(self, other):
        # Just for the sake of variety, we use a different implementation here.
        r = []
        for i, x in enumerate(self.a):
            r.append(x - other[i])
        return Array(*r)
            

In [4]:
a = Array(1, 2, 3, 4)
b = Array(0, 1, 0, 4)
print("a+b:", a + b)
print("a-b:", a - b)
for i, x in enumerate(a):
    print("Element", i, "is", x)

a+b: [1, 3, 3, 8]
a-b: [1, 1, 3, 0]
Element 0 is 1
Element 1 is 2
Element 2 is 3
Element 3 is 4


This gives us a taste of the task ahead: we need to implement methods repr, len, getitem, setitem, iter, add, sub at least. 

## Sparse Arrays Using Dictionaries



How can we implement sparse arrays? One possible idea is to just store the elements that are different from the default, along with their indices.  And since elements are usually accessed by their index, we can store the "exceptions" in a mapping from indices to values.  This is _not_ very efficient in terms of space as it could be (a dictionary implementation has overhead), but if the elements that are non-default are few, it will work nevertheless. 

In [0]:
class UndefinedSizeArray(Exception):
    pass

class SparseArrayDict(object):
    
    def __init__(self, *args, default=0., size=None):
        """If args are specified, they form the initial values for the array.  
        Otherwise, we need to specify a size."""
        self.d = {}
        self.default = default
        if len(args) > 0:
            # We build a representation of the arguments args. 
            self.length = len(args)
            for i, x in enumerate(args):
                if x != default:
                    self.d[i] = x
        if size is not None:
            self.length = size
        elif len(args) > 0:
            self.length = len(args)
        else:
            raise UndefinedSizeArray
            
    def __repr__(self):
        """We try to build a nice representation."""
        if len(self) <= 10:
            # The list() function uses the iterator, which is 
            # defined below.
            return repr(list(self))
        else:
            s = "The array is a {}-long array of {},".format(
                self.length, self.default
            )
            s += " with the following exceptions:\n"
            ks = list(self.d.keys())
            ks.sort()
            s += "\n".join(["{}: {}".format(k, self.d[k]) for k in ks])
            return s
    
    def __setitem__(self, i, x):
        """This implements the a[3] = method"""
        assert isinstance(i, int) and i >= 0
        if x == self.default:
            # We simply remove any exceptions.
            if i in self.d:
                del self.d[i]
        else:
            self.d[i] = x
        # Adjusts the length.
        self.length = max(self.length, i - 1)
    
    def __getitem__(self, i):
        """This implements the a[]"""
        if i >= self.length:
            raise IndexError()
        return self.d.get(i, self.default)
    
    def __len__(self):
        return self.length
    
    def __iter__(self):
        # You may think this is a crazy way to iterate. 
        # But in fact, it's quite efficient; there is no
        # markedly better way.
        for i in range(len(self)):
            yield self[i]
    
    def storage_len(self):
        """This returns a measure of the amount of space used for the array."""
        return len(self.d)
                    

In [6]:
a = SparseArrayDict(size=10)
a[2] = 4
a[7] = 1
print(a)


[0.0, 0.0, 4, 0.0, 0.0, 0.0, 0.0, 1, 0.0, 0.0]


In [7]:
a = SparseArrayDict(size=100)
a[77] = 3
a[23] = 1
print(a)
print("len(a):", len(a))
print("storage_len(a):", a.storage_len())

The array is a 100-long array of 0.0, with the following exceptions:
23: 1
77: 3
len(a): 100
storage_len(a): 2


### Exercise: Implementation of arithmetic operations

 Implement the `__add__` and `__sub__` methods for SparseArrayDict, avoiding looping over all the elements.  The efficient way to do this is: 
* First, create a new array for the result, with the _appropriate_ default value (which is not always 0!). 
* Then, figure out which elements are different from the default in one (or more) of the arrays being combined. 
* Lastly, for these non-default elements, set their value appropriately in the result array.

The refined way of solving this problem consists in factoring the common code of `__add__` and `__sub__` into a common method, since the two methods are so similar; otherwise, you can implement `__add__` first, and once you get it to work, cut and paste it, and do the few changes required to obtain `__sub__`. 

In [8]:
#@title Importing `nose`

# We write code by nose, mostly.
try:
    from nose.tools import assert_equal, assert_true, assert_false
    from nose.tools import assert_not_equal, assert_almost_equal
except:
    !pip install nose
    from nose.tools import assert_equal, assert_true, assert_false
    from nose.tools import assert_not_equal, assert_almost_equal

Collecting nose
[?25l  Downloading https://files.pythonhosted.org/packages/15/d8/dd071918c040f50fa1cf80da16423af51ff8ce4a0f2399b7bf8de45ac3d9/nose-1.3.7-py3-none-any.whl (154kB)
[K     |██▏                             | 10kB 15.3MB/s eta 0:00:01[K     |████▎                           | 20kB 2.1MB/s eta 0:00:01[K     |██████▍                         | 30kB 2.9MB/s eta 0:00:01[K     |████████▌                       | 40kB 2.0MB/s eta 0:00:01[K     |██████████▋                     | 51kB 2.3MB/s eta 0:00:01[K     |████████████▊                   | 61kB 2.7MB/s eta 0:00:01[K     |██████████████▉                 | 71kB 3.0MB/s eta 0:00:01[K     |█████████████████               | 81kB 3.2MB/s eta 0:00:01[K     |███████████████████             | 92kB 3.6MB/s eta 0:00:01[K     |█████████████████████▏          | 102kB 3.4MB/s eta 0:00:01[K     |███████████████████████▎        | 112kB 3.4MB/s eta 0:00:01[K     |█████████████████████████▍      | 122kB 3.4MB/s eta 0:00:01

In [0]:
### Exercise: Implement add and sub for `SparseArrayDict`

# YOUR CODE HERE
def sparse_array_dict_add(self, other):
  s = SparseArrayDict(default = self.default + other.default, size = max(len(self), len(other)))
  for i in self.d:
    if i < len(other):
      s[i] = self[i] + other[i]
    else:
      s[i] = self[i] + other.default
  for j in other.d:
    if j < len(self):
      s[j] = self[j] + other[j]
    else:
      s[j] = self.default + other[j]
  return s

def sparse_array_dict_sub(self, other):
  dif = SparseArrayDict(default = self.default - other.default, size = max(len(self), len(other)))
  for i in self.d:
    if i < len(other):
      dif[i] = self[i] - other[i]
    else:
      dif[i] = self[i] - other.default
  for j in other.d:
    if j < len(self):
      dif[j] = self[j] - other[j]
    else:
      dif[j] = self.default - other[j]
  return dif

SparseArrayDict.__add__ = sparse_array_dict_add
SparseArrayDict.__sub__ = sparse_array_dict_sub

In [0]:
### Tests for arrays of the same length
# Let us test this with arrays of the same length first. 
a = SparseArrayDict(1, 3, 4, 5)
b = SparseArrayDict(5, 4, 3, 2)
c = a + b
assert isinstance(c, SparseArrayDict)
assert_equal(c[0], 6)
assert_equal(c[1], 7)
assert_equal(c[3], 7)


In [0]:
### Tests for arrays of the same length, different default
a = SparseArrayDict(default=1, size=10)
b = SparseArrayDict(default=2, size=10)
a[1] = 3
a[4] = 5
b[4] = 6
b[5] = 8
c = a + b
assert_equal(c[0], 3) # This is due to the defaults.
assert_equal(c[1], 5)
assert_equal(c[2], 3)
assert_equal(c[4], 11)
assert_equal(c[5], 9)
assert isinstance(c, SparseArrayDict)

In [0]:
### Tests for arrays of different length and default

a = SparseArrayDict(default=1, size=10)
b = SparseArrayDict(default=2, size=20)
a[1] = 3
a[4] = 5
b[4] = 6
b[15] = 2
c = a + b
assert_equal(len(c), 20)
assert_equal(c[0], 3)
assert_equal(c[1], 5)
assert_equal(c[2], 3)
assert_equal(c[4], 11)
assert_equal(c[15], 3)
assert isinstance(c, SparseArrayDict)

In [0]:
### Some tests for subtraction.

a = SparseArrayDict(default=1, size=10)
b = SparseArrayDict(default=2, size=20)
a[1] = 3
a[4] = 7
b[4] = 6
b[15] = -2
c = a - b
assert_equal(len(c), 20)
assert_equal(c[0], -1)
assert_equal(c[1], 1)
assert_equal(c[2], -1)
assert_equal(c[4], 1)
assert_equal(c[15], 3)
assert isinstance(c, SparseArrayDict)

### Exercise: Implement equality for `SparseArrayDict`

If we have two `SparseArrayDict` with the same content, they are not considered equal:

In [14]:
a = SparseArrayDict(3, 4)
b = SparseArrayDict(3, 4)
a == b

False

This happens because, in Python, by default objects are considered equal iff they are the same object, not if the object's content is the same.  
If we want a content-based definition of equality, we must define it ourselves, by defining a method

    def __eq__(self, other):
        ...

that returns True for objects we wish to consider equal, and False for objects that we wish to consider different. 

We ask you to implement a notion of equality that yields True if and only if the two arrays behave the same, when considered as numerical arrays.  This is a slightly difficult question: you have to think very carefully at what _all_ the operations we have defined do, including the addition operation above, and including the `__setitem__` operation.  Think how `SparseArrayDict` behaves in operations involving arrays of different lengths... 

In [0]:
### Exercise: Define `SparseArrayDict.__eq__`

def sparse_array_dict_eq(self, other):
    """Definition of equality for SparseArrayDict.
    It can be done in 8 lines of code."""
    # YOUR CODE HERE
    if len(self) is not len(other): 
      return False
    elif len(self.d) == len(other.d): 
        return self.d == other.d and (self.default == other.default or len(self) == len(self.d))
    elif (len(self.d) > len(other.d)) and (self.default == other.default or len(self.d) == len(self)): 
      for i in self.d: 
        if self.d[i] is not other[i]:
          return False
      return True
    elif (len(other.d) > len(self.d)) and (self.default == other.default or len(other.d) == len(other)):
      for i in other.d:
        if other.d[i] is not self[i]:
          return False
      return True
    else: 
      return False

SparseArrayDict.__eq__ = sparse_array_dict_eq

In [16]:
### Here, you are encouraged to write your own tests, to help you debug. 

# YOUR CODE HERE
a = SparseArrayDict(1, 4, 2, 2, 2)
b = SparseArrayDict(1, 4, default = 2, size = 5)
a == b



True

In [0]:
### First, the obvious tests.

a = SparseArrayDict(3, 4, 5)
b = SparseArrayDict(3, 4, 5)
c = SparseArrayDict(3, 5, 5)
d = SparseArrayDict(3, 4, 5, 6)
assert_equal(a, b)
assert_not_equal(a, c)
assert_not_equal(a, d)

In [0]:
### Advanced tests

# Here, hidden from you, are some advanced tests.


In [0]:
### Advanced tests

# Here, hidden from you, are some advanced tests.


In [0]:
### Advanced tests

# Here, hidden from you, are some advanced tests.


In [0]:
### Advanced tests

# Here, hidden from you, are some advanced tests.
