<center><img src="img/dsa-logo.JPG" width="400"/>

***

<center>Lecture 4</center>

***

<center>Array Based Sequences</center>  

***

<center>26 September 2023<center>
<center>Rahman Peimankar<center>

# Agenda

1. Low-Level Arrays
2. Dynamic Arrays
3. Efficiency of Python’s Sequence Types
4. Array-Based Sequences
5. Exercices

# Recap of Last Week

## 1. Big-O Notation

<center>
<img src="img/Qimage-1-lecture3.JPG" width="1100"/>

**Ideally**

1. We would like data structure operations to run in times proportional to the *constant* or *logarithm* function.
2. we would like our algorithms to run in *linear* or *n-log-n* time.

### Worst vs Best Case Complexity

* Usually, when someone asks you about the complexity of the algorithm he is asking you about the worst case complexity.

In [2]:
def search_algo(num, items):
    for item in items:
        if item == num:
            return True
        else:
            return False
nums = [2, 4, 6, 8, 10]

print(search_algo(2, nums))

True


### Space Complexity

* In addition to the time complexity, where you count the number of steps required to complete the execution of an algorithm, you can also find space complexity.
* It refers to the number of spaces you need to allocate in the memory space during the execution of a program.

In [None]:
def return_squares(n):
    square_list = []
    for num in n:
        square_list.append(num * num)

    return square_list

nums = [2, 4, 6, 8, 10]
print(return_squares(nums))

## 2. Recursion

The Factorial Function:

<center>
<img src="img/Qimage-4-lecture3.JPG" width="600"/>

In [24]:
def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)
    
factorial(4)


24

<center>
<img src="img/Qimage-6-lecture3.JPG" width="900"/>

https://pythontutor.com/

<center>
    
# 1. Low-Level Arrays

* Here, we explore Python’s various **_sequence_** classes, namely the built-in **_list_**, **_tuple_**, and **_str_** classes.
    1. Each of these classes supports indexing to access an individual element of a sequence, using a syntax such as ``seq[k]``
    2. Each uses a low-level concept known as an **_array_** to represent the sequence.
    3. However, there are significant differences in the abstractions that these classes represent.

We learn both the public behavior and inner workings of these classes, because

* these classes are used so widely in Python programs.
* they will become building blocks upon which we will develop more complex data structures.

### Memory address

* The primary memory of a computer is composed of bits of information.
* These bits are typically grouped into larger units called **_bytes_**. (1 byte = 8 bits)

<center>
<img src="img/Qimage-1.JPG" width="900"/>

* In general, a programming language keeps track of the association between an identifier and the memory address in which the associated value is stored.
* A group of related variables can be stored one after another in a contiguous portion of the computer’s memory.
* We will denote such a representation as an **_array_**.

* In Python, each character is represented using the Unicode character set.
* on most computing systems, Python internally represents each Unicode character with 16 bits (i.e., 2 bytes).


<center>
<img src="img/Qimage-2.JPG" width="1000"/>

### Referential Arrays

A list of names: 

<!-- <center>
<img src="img/Qimage-3.JPG" width="900"/> -->

In [5]:
lst = ['Rene', 'Joseph', 'Janet', 'Jonas', 'Helen', 'Virginia']
lst

['Rene', 'Joseph', 'Janet', 'Jonas', 'Helen', 'Virginia']

* To represent such a list with an array, Python must adhere to the requirement that each cell of the array use the same number of bytes.
* Strings naturally have different lengths!

Python could attempt to reserve enough space for each cell to hold the maximum length string, **but that would be wasteful.**

**Can you guess how Python store a list of elements?**

Python represents a list instance using an internal storage mechanism of an array of object **_references_**.

<center>
<img src="img/Qimage-5.JPG" width="600"/>

* A single list instance may include multiple references to the same object as elements of the list.
* It is possible for a single object to be an element of two or more lists.

``temp = primes[3:6]``

<center>
<img src="img/Qimage-6.JPG" width="600"/>

``temp[2] = 15``

<center>
<img src="img/Qimage-7.JPG" width="600"/>

* This does not change the existing integer object.
* It changes the reference in cell 2 of the ``temp`` list to reference a different object.

``data = [0] * 8``

``data[2] += 1``

<center>
<img src="img/Qimage-8.JPG" width="600"/>

``primes.extend(extras)``

<center>
<img src="img/Qimage-9.JPG" width="700"/>

* The extended list does not receive copies of those elements, it receives references to those elements

<center>
    
# 2. Dynamic Arrays


<center>
<img src="img/Qimage-4.JPG" width="900"/>

* The capacity of an array cannot trivially be increased by expanding into subsequent cells. **Why?**

1. A Python tuple or str instance do not have this constraint.
2. Instances of those classes are immutable.

**Important**: Although a list has a particular length when constructed, the class allows us to add elements to the list, with no apparent limit on the overall capacity of the list.

* To provide this abstraction, Python relies on **_dynamic array_**

In [4]:
import sys                  # provides getsizeof function
data = []
for k in range(10):         # NOTE: must fix choice of n
    a = len(data)           # number of elements
    b = sys.getsizeof(data) # actual size in bytes
    print('Length: {0:3d}; Size in bytes: {1:4d}'.format(a, b))
    data.append(None)       # increase length by one
    

Length:   0; Size in bytes:   56
Length:   1; Size in bytes:   88
Length:   2; Size in bytes:   88
Length:   3; Size in bytes:   88
Length:   4; Size in bytes:   88
Length:   5; Size in bytes:  120
Length:   6; Size in bytes:  120
Length:   7; Size in bytes:  120
Length:   8; Size in bytes:  120
Length:   9; Size in bytes:  184


* We see that an empty list instance already requires a certain number of bytes of memory (56 on our system).
* Each object in Python maintains some state, for example, a reference to denote the class to which it belongs.

* As soon as the first element is inserted into the list, we detect a change in the underlying size of the structure (**jump from 56 to 88, i.e. 32 bytes**).

* This code was run on a 64-bit machine, meaning that each memory address is a 64-bit number (i.e., 8 bytes).
* This means that the increase of 32 bytes reflects the allocation of an underlying array capable of storing four object references.

* If an element is appended to a list at a time when the underlying array is full, we perform the following steps:

    1. Allocate a new array B with larger capacity.
    2. Set ``B[i] = A[i], for i = 0, . . . ,n−1``, where *n* denotes current number of items.
    3. Set ``A = B``, that is, we henceforth use *B* as the array supporting the list.
    4. Insert the new element in the new array.

<center>
<img src="img/Qimage-10.JPG" width="800"/>

(a) create new array B 

(b) store elements of A in B

(c) reassign reference A to the new array.

<center>
    
# 3. Efficiency of Python’s Sequence Types

### Python’s List and Tuple Classes

* Tuples are typically more memory efficient than lists because they are immutable.

**Because**, there is no need for an underlying dynamic array with surplus capacity.

Efficiency of the **_nonmutating_** behaviors of the ``list`` and ``tuple`` classes:

<img src="img/Qimage-11.JPG" width="700"/>

    
* *n*, *n1*, and *n2* are the respective lengths of *data*, *data1*, and *data2*.
* *k* represents the index of the leftmost occurrence (with *k = n* if there is no occurrence).

Efficiency of the **_nonmutating_** behaviors of the ``list`` and ``tuple`` classes:

<img src="img/Qimage-12.JPG" width="700"/>


### Adding Elements to a List

In [7]:
def insert(self, k, value):
    """Insert value at index k, shifting subsequent values rightward."""
    # (for simplicity, we assume 0 <= k <= n in this verion)
    if self._n == self._capacity:         # not enough room
        self._resize(2 * self._capacity)  # so double capacity
    for j in range(self._n, k, -1):       # shift rightmost first
        self._A[j] = self._A[j-1]
    self._A[k] = value                    # store newest element
    self._n += 1
    

* The addition of one element may require a resizing of the dynamic array.
* The other expense for insert is the shifting of elements to make room for the new item.

**Note:** The time for that process depends upon the index of the new element, and thus the number of other elements that must be shifted.

Overall this leads to an ``O(n−k+1)`` performance for inserting at index *k*.
<center>
<img src="img/Qimage-13.JPG" width="700"/>

Average running time of insert(k, val), measured in microseconds, as observed over a sequence of N calls, starting with an empty list. 

**NOTE:** You may get different results on your machine if you repeat the same experiment!

<center>
<img src="img/Qimage-14.JPG" width="700"/>

### Removing Elements from a List

Python’s list class offers several ways to remove an element from a list.
* call to ``pop()`` removes the last element from a list.
* The parameterized version, ``pop(k)``, removes the element that is at index *k < n* of a list, shifting all subsequent elements leftward to fill the gap that results from the removal.

<center>
<img src="img/Qimage-15.JPG" width="700"/>

* The ``list`` class offers another method, named **_remove_**, that allows the caller to specify the **_value_** that should be removed (*not the index* at which it resides).
* Formally, it removes only the first occurrence of such a value from a list, or raises a ``ValueError`` if no such value is found.

In [57]:
def remove(self, value):
    """Remove first occurrence of value (or raise ValueError)."""
    # note: we do not consider shrinking the dynamic array in this version
    for k in range(self._n):
        if self._A[k] == value:              # found a match!
            for j in range(k, self. n - 1):  # shift others to fill gap
                self._A[j] = self. A[j+1]
            self._A[self. n - 1] = None      # help garbage collection
            self._n -= 1                     # we have one less item
            return                           # exit immediately
        raise ValueError('value not found')  # only reached if no match

### Extending a List

Python provides a method named ``extend`` that is used to add all elements of one list to the end of a second list.

In [None]:
for element in other:
    data.append(element)
    
# create a list
prime_numbers = [2, 3, 5]
# create another list
numbers = [1, 4]
# add all elements of prime_numbers to numbers
numbers.extend(prime_numbers)

print('List after extend():', numbers)

The efficiency of extend is threefold:
1. There is always some advantage to using an appropriate Python method, because those methods are often implemented natively in a compiled language
2. There is less overhead to a single function call that accomplishes all the work, versus many individual function calls.
3. Increased efficiency of extend comes from the fact that the resulting size of the updated list can be calculated in advance.

### Constructing New Lists

* There are several syntaxes for constructing new lists. 
* In almost all cases, the efficiency of the behavior is linear in the length of the list that is created.
* However, there are significant differences in the practical efficiency.

In [5]:
import time
start = time.time()
n = 10000
squares_comp = [k*k for k in range(1, n+1)]
end = time.time()
print(end-start)
# squares_comp


0.0008680820465087891


In [1]:
import time
start = time.time()
squares_app = []
for k in range(1, 10000+1):
    squares_app.append(k*k)
    end = time.time()
print(end-start)
# squares_app


[4, 3, 2]


Experiments should show that the list comprehension syntax is significantly faster than building the list by repeatedly appending

Please study **_Python’s String Class_** from Chapter 5 of the text book!

**Quiz 1**

What will be the output of the following code snippet?

``a=[1,2,3,4,5]``

``print(a[3:0:-1])``

1) Syntax error

2) [4, 3, 2]

3) [4, 3]

4) [4, 3, 2, 1]

Please answer here: https://PollEv.com/multiple_choice_polls/svPCPIeByRHYQBncUeEX6/respond

<center>
    
# 4. Array-Based Sequences

### Storing High Scores for a Game

* The first application we study is storing a sequence of high score entries for a video game.
* This is representative of many applications in which a sequence of objects must be stored.

In [2]:
class GameEntry:
    """Represents one entry of a list of high scores."""

    def __init__(self, name, score):
        self._name = name
        self._score = score

    def get_name(self):
        return self._name

    def get_score(self):
        return self._score

    def __str__(self):
        return '({0}, {1})'.format(self._name, self._score) # e.g., (Bob, 98)
    
ent1=GameEntry('rahman', 10)
print(ent1)

(rahman, 10)


* To maintain a sequence of high scores, we develop a class named ``Scoreboard``.
* A scoreboard is limited to a certain number of high scores that can be saved.
* Once that limit is reached, a new score only qualifies for the scoreboard if it is strictly higher than the lowest *high score* on the board.

* The length of the desired scoreboard may depend on the game, perhaps 10, 50, or 500.
* We allow the length to be specified as a parameter to our ``Scoreboard`` constructor.

<center>
<img src="img/Qimage-16.JPG" width="900"/>

In [82]:
class Scoreboard:
    """Fixed-length sequence of high scores in nondecreasing order."""

    def __init__(self, capacity=10):
        """Initialize scoreboard with given maximum capacity.
           All entries are initially None.
        """
        self._board = [None] * capacity # reserve space for future scores
        self._n = 0                     # number of actual entries

    def __getitem__(self, k):
        """Return entry at index k."""
        return self._board[k]

    def __str__(self):
        """Return string representation of the high score list."""
        return '\n' .join(str(self._board[j]) for j in range(self._n))

    def add(self, entry):
        """Consider adding entry to high scores."""
        score = entry.get_score()

        # Does new entry qualify as a high score?
        # answer is yes if board not full or score is higher than last entry
        good = self._n < len(self._board) or score > self._board[-1].get_score()

        if good:
            if self._n < len(self._board): # no score drops from list
                self._n += 1               # so overall number increases
            
        # shift lower scores rightward to make room for new entry
        j = self._n - 1
        while j > 0 and self._board[j-1].get_score() < score:
            self._board[j] = self._board[j-1] # shift entry from j-1 to j
            j -= 1                            # and decrement j
        self._board[j] = entry                # when done, add new entry
        
board = Scoreboard()
board.add(ent1)
        

### Sorting a Sequence

We solve the sorting problem, that is, starting with an unordered sequence of elements and rearranging them into nondecreasing order.

### The Insertion-Sort Algorithm

<center>
<img src="img/Qimage-17.JPG" width="900"/>

In [21]:
def insertion_sort(A):
    """Sort list of comparable elements into nondecreasing order."""
    for k in range(1, len(A)):        # from 1 to n-1
        cur = A[k]                    # current element to be inserted
        j = k                         # find correct index j for current
        while j > 0 and A[j-1] > cur: # element A[j-1] must be after current
            A[j] = A[j-1]
            j -= 1
        A[j] = cur                    # cur is now in the right place
        

<center>
<img src="img/Qimage-18.JPG" width="900"/>

<center>
    
# 5. Exercices

**Ex1.**

Implement a pop method for the ``DynamicArray`` class, that removes the last element of the array, and that shrinks the capacity, *N*, of the array by half any time the number of elements in the
array goes below *N/4*.

In [None]:
import ctypes # provides low-level arrays

class DynamicArray:
    """A dynamic array class akin to a simplified Python list."""

    def __init__(self):
        """Create an empty array."""
        self._n = 0 # count actual elements
        self._capacity = 1 # default array capacity
        self._A = self._make_array(self._capacity) # low-level array

    def __len__(self):
        """Return number of elements stored in the array."""
        return self._n

    def __getitem__(self, k):
        """Return element at index k."""
        if not 0 <= k < self. n:
            raise IndexError('invalid index')
        return self._A[k] # retrieve from array

    def append(self, obj):
        """Add object to end of the array."""
        if self._n == self._capacity: # not enough room
            self._resize(2 * self._capacity) # so double capacity
        self._A[self._n] = obj
        self._n += 1

    def _resize(self, c): # nonpublic utitity
        """Resize internal array to capacity c."""
        B = self._make_array(c) # new (bigger) array
        for k in range(self._n): # for each existing value
            B[k] = self._A[k]
        self._A = B # use the bigger array
        self._capacity = c

    def _make_array(self, c): # nonpublic utitity
        """Return new array with capacity c."""
        return (c * ctypes.py_object)() # see ctypes documentation