# Array implementaion



`Array` is one of the most basic data structure. In python, `list` is similar to `Array`. `List` is array on steroids. 接下来我们construct一个Array具有以下的功能：

| Method in the Array Class      | Example |
| ----------- | ----------- |
| `__init__(capacity, fillValue = None)`      | `a = Array(10)`       |
| `__len__()`   | `len(a)`        |
| `__str__()` | `str(a)`|
| `__iter__()` | `for i in a:`|
| `__getitem__()` | `a[0]`|
| `__setitem__()` | `a[10]=100`|


All the functionality you process is implmented in the class behind the scene. You may notice the double underscore `__len__()` and it is commonly called `dunder method` (double underscore method) in Python. It is implicitly called by Python in this case.

> Note: In python, `None` is `Null` in other programming language.

In [13]:
class Array(object):
    def __init__(self,capacity,fillValue = None):
        """
        Args:
            capacity (int): capacity of the static array 
            fillValue (_type_, optional): 生成static array时放入的place holder. Defaults to None.
        """
        self.items = list()
        for count in range(capacity):
            self.items.append(fillValue)
        
    
    def __len__(self):
        """The capacity of the array. 这行代码负责len([2,3,4]) = 3这个操作"""
        return len(self.items)
    
    def __str__(self):
        """ string representation of the array. 将list转化为string后，让built-in print()自己call这个function"""
        return str(self.items)
    
    def __iter__(self):
        return iter(self.items)
    
    def __getitem__(self,index):
        """
        subscript operator for access at index.
        Args:
            index (int): index for the array
        Returns:
            _type_: value in the array stored at index
        """
        return self.items[index]
    
    def __setitem__(self,index,newItem):
        """
        给Array class change variable的能力
        Args:
            index (int): _description_
            newItem (_type_): _description_
        """
        self.items[index] = newItem
        
        


你可以测试一下by commenting out some methods in the `Array` class, some of functionality is gone.

In [14]:

# 创建一个array class of size 10
a = Array(capacity = 10)

# 测试__str__ 
print(a)

# 测试__getitem__
print(a[1])


[None, None, None, None, None, None, None, None, None, None]
None


## Index operation
Index operation like `a[2]` in Python, is `O(1)` time complexity. Index operation belongs to the family of **random access operation** and it is achieved by allocating a block of **contiguous memory**.
假设你现在有一个长度为5的array, 那么你会有,
|Machine address|Array index|
|-|-|
|10011101|0|
|10011110|1|
|10011111|2|
|10100000|3|
|10100001|4|

由于所有的machine address在contiguous assigned的情况下是连续的，perform index operation 只需要将base address和offset加一起就可以了，如`a[2]`
$$
\begin{equation}
    10011101_2 + 2_10 = 10011111_2
\end{equation}
$$

所以说我们break down index operation in array:
1. fetch the base address of the array's contiguous memory block. $O(1)$
2. return the result of adding the index to this address. $O(1)$

所以overall time complexity为$O(1)$.

和random access (or direct access)相对应的叫做sequential access, which takes up $O(n)$

<img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Random_vs_sequential_access.svg/440px-Random_vs_sequential_access.svg.png">

## Static and Dynamic Memory
之前我们讨论的属于`static array`，也就是说array size是初始设定后是不会变的,但这很不方便，因为当程序员开始写程序的时候，绝大多数情况，你不知道自己需要用多少的space, 因此你设置了过多的space, 导致inefficiency或者你设置了过少的space, 导致需要扩容。为了解决这个问题，提出了`dynamic array`。In python, `dynamic array`叫做`list`; 可以支持`list.append()` and `list.insert()`都是动态扩容的过程, 具体过程如下：
- create an array with a reasonable default size at program start-up (initial guess一个array)
- if array cannot hold more data:
  - create a new larger array
  - transfer data from old array to the new larger one
- if array is wasting memory
  - create a smaller array
  - transfer data from old array to the new smaller one 

当然这一切操作都是behind the scene

## Physical size and logical size

那么如何判断是否需要扩容或者减容呢? 通过比较现在占用了多少内存

![sd](./imgs/array_logical_size.png)

- `physical size`: array占据的物理内存大小
- `logical size`: array中有多少meaningful数据的大小(多少non null data比如)

可以通过比较这两个系数的ratio来判定是否需要扩容或减容:
$$
\begin{align}
ratio = \frac{\mathrm{logical\,size}}{\mathrm{physical\,size}}
\end{align}
$$
if ratio = 1, 则满了需要扩容了，具体什么时候扩孔减容，需要定义criteria.

## Operations on Arrays
我们之前讨论过了indexing cost, 接下来我们会讨论:
- incraese the size of an array
- decrease the size of an array
- insert an item into an array
- removing an item from an array


### array扩容
当array's logical size = physical size时，你就需要动态扩容了，在python list中，当`insert()` or `append()`被call时，就会implicitly进行扩容,扩容步骤如下:
- 创建一个new array
- copy data from the old array to the new array
- reset the old array varible to the new array object.

我们执行下列代码
```python
if logicalSize == len(a):
    # 创建一个新array，physical size + 1 O(1)
    temp = Array(len(a) + 1)
    # cioy date from the old array to the new array O(n)
    for i in range(logicalSize):
        temp[i] = a[i]
    # reset the old array variable to the new array object O(1)
    a = temp
```

然后old array's memory被garbage collector处理掉了。上述代码的扩容过程physical size increment为1，每进行一次扩容的时间复杂度为$O(n)$。当你将一个size为1的array，以1为incremental扩容到n时，时间复杂度为:
$$
\begin{align}
1+2+3+\dots +n = \frac{n(n+1)}{2} \approx O(n^2)
\end{align}
$$
空间复杂度为$O(n)$. 如何提升的话，就可以提升每次需要扩容的大小increment, 假设我们每次都将容量double, 那么
```python
if logicalSize == len(a):
    # 创建一个新array，physical size x 2 O(1)
    temp = Array(len(a) * 2)
    # cioy date from the old array to the new array O(n)
    for i in range(logicalSize):
        temp[i] = a[i]
    # reset the old array variable to the new array object O(1)
    a = temp
```

这种情况下，当你将一个size为1的array，以每次double in physical size直到扩容为n时，时间复杂度度为
$$
\begin{align}
1 + 2 + 4 + 8 + \dots + n &= 2^0 + 2^1 + 2^2 + 2^3 + \dots + n \\
&= \frac{a_1(1-q^n)}{1-q}\\
&= \frac{(1-2^n)}{1-2}\\
&= 2^n - 1 = x\\
\approx O(logn)
\end{align}
$$

来比较一下这两种扩容方式:
|-|时间复杂度|空间复杂度|
|-|-|-|
|每次扩1|$O(n^2)$|$O(n)$|
|每次翻倍|$O(logn)$|$O(n)$|




### array减容
当array's logical size < physical size by a certain value时，类似的,就需要动态减容了，在python list中，当`pop()`call时，就会implicitly进行减容,步骤如下:
- 创建一个new smaller array
- copy data from the old array to the new array
- reset the old array varible to the new array object.

代码如下：
```python
"""
满足俩条件:
1. logical size 只是physical size一半
2. physical size比default size大两倍
"""
if logicalSize <= len(a) // 4 AND len(a) >= DEFAULT_CAPACITY * 2:
    temp = Array(len(a)//2)
    for i in range(logicalSize):
        temp[i] = a[i]
    a = temp
```

Similrly, 时间复杂度为$O(logn)$, 空间复杂度为$O(n)$

### insert in an array
insert operation需要分类讨论:
- when logical size < physical size 时
  - shuffle process
- when logical size = physical size:
  - 扩容
  - shuffle process
  
现在我们只讨论第一种情况(logical size < physical size)，Shuffle process定义如下:
- check for availble space before insertion, if not enough space, 扩容
- shift the items from the logical end of the array to the target index position down by one.
- assign the new item to be the target index position
- increment the logical size by one

![](./imgs/insert_array.png)

代码如下
```python
# shift item down by one position
for i in range(logicalSize,targetIndex,-1):
    a[i] = a[i-1]

# add new item and increment logical size
a[targetIndex] = newItem
logicalsize += 1
```
|-|时间复杂度|空间复杂度|
|-|-|-|
|只是insert|$O(n)$|$O(n)$|


### Remove in an array
这个比insert简单，因为一般不涉及减容过程：
- shift the items from the one following the target index position to the logical end of the array up by one.
- decrement the logical size by one
- check for wasted space and decrease the physical size of the array if necessary

![](./imgs/remove_array.png)

```python
# shift items up by one position
for i in range(targetIndex,logicalSize-1):
    a[i] = a[i+1]
# decrement logical size
logicalSize -= 1
# decrease size of array, if necessary
```

### Complexity Trade-off in array

|data structure|Access|Search|Insertion|Deletion|
|-|-|-|-|-|
|array best|$O(1)$|$O(1)$|$O(1)$|$O(1)$|
|array worst|$O(1)$|$O(n)$|$O(n)$|$O(n)$|
|array average|$O(1)$|$O(n)$|$O(n)$|$O(n)$|

上述时间复杂度，都是在不需要扩容或者减容的情况下的，但刷题不考虑这些。

## Conceptual exercises
- Explain how random access works and why it is so fast?
  - When array created, a block of contignuous memory is assigned. 你只需要知道这一段contiguous的`base memory location`和`offset` (想要access的index离index 0有多远)，就可以直接access到你想要的数据了，所以时间复杂度为$O(1)$
- what are the differences between an array and a Python list? 
  - Array一般指的是static array, 也就是fixed capacity, 当underutilized时需要减容，array大小不够用时，需要扩容；python list is array on sterioid, dynamic array的一种implementation, handles increase and decrease capacity behind the scence.
- Explain the difference between the physical size and logical size of an array.
  - physical size is the size of the array in memory you assigned at the creation of the array. Logical size is the size of the array have meaningful data in.
- Explain why some items in an array might have to be shifted when a given item is
inserted or removed
  - item needs to be shifted to make the room for insertion, $O(n)$. Similarly for removal.
- When the programmer shifts array items during an insertion, which item is moved
first, the one at the insertion point or the last item? Why?
  - During insertion, you shift the logical end first and all the way backwards till the insertion potion. In that way, you are not overriding any data.
- State the run-time complexity for inserting an item when the insertion point is the
logical size of the array.
  - If insertion point is the logical size of the array and you have available space (physical size > logical size), O(1) which is the best case scenerio.
- An array currently contains 14 items, and its load factor is 0.70. What is its physical
capacity?
  - $\frac{14}{0.7} = 20$

# Reference
For more details, you could type `help(list())` to look at the list class to see it's implementation.