# Array

Array is a container which can hold a fix number of items and these items should be of the same type. 

``` 
arrayName = array(typecode, [Initializers])
``` 
 Typecode are the codes that are used to define the type of value the array will hold. Some common typecodes used are as follows
 
 | Typecode | Value |
 |---|---|
 | b	| Represents signed integer of size 1 byte |
 | B	| Represents unsigned integer of size 1 byte |
 | c	| Represents character of size 1 byte |
 | i	| Represents signed integer of size 2 bytes |
 | I	| Represents unsigned integer of size 2 bytes |
 | f	| Represents floating point of size 4 bytes |
 | d	| Represents floating point of size 8 bytes |

In [13]:
# creates an array named array1
from array import *
array1 = array('i', [10,20,30,40,50])
for x in array1:
    print(x)

10
20
30
40
50


In [14]:
## Accessing Array Element
print(f"The first element: {array1[0]}")
print(f"The first two element: {array1[:2]}")
print(f"The middle two element: {array1[1:4]}")
print(f"The last element: {array1[-1]}")
print(f"The last three element: {array1[-4:]}")

The first element: 10
The first two element: array('i', [10, 20])
The middle two element: array('i', [20, 30, 40])
The last element: 50
The last three element: array('i', [20, 30, 40, 50])


In [15]:
# Insertion Operation
# Insert operation is to insert one or more data elements into an array. Based on the requirement, 
# a new element can be added at the beginning, end, or any given index of array.

print(f"Original array: {array1}")
array1.insert(1,60)
print(f"revised array: {array1}")

Original array: array('i', [10, 20, 30, 40, 50])
revised array: array('i', [10, 60, 20, 30, 40, 50])


## Deletion Operation

Deletion refers to removing first one existing element from the array and re-organizing all elements of an array.

print(f"Original array: {array1}")
array1.remove(60)
print(f"revised array: {array1}")


In [17]:
# Search Operation
# You can perform a search for an array element based on its value or its index.
print (array1.index(50))

4


In [18]:
# Update Operation
# Update operation refers to updating an existing element from the array at a given index.

print(f"Original array: {array1}")
array1[2] = 80
print(f"revised array: {array1}")

Original array: array('i', [10, 20, 30, 40, 50])
revised array: array('i', [10, 20, 80, 40, 50])


# List

The list is a most versatile datatype available in Python which can be written as a list of comma-separated values (items) between square brackets. 

**Important thing about a list is that items in a list need not be of the same type, while for a array they should be same type**


In [20]:
# create lists

list1 = ['physics', 'chemistry', 1997, 2000]
list2 = [1, 2, 3, 4, 5 ]
list3 = ["a", "b", "c", "d"]

print(f"list1: {list1}")
print(f"list2: {list2}")
print(f"list3: {list3}")

list1: ['physics', 'chemistry', 1997, 2000]
list2: [1, 2, 3, 4, 5]
list3: ['a', 'b', 'c', 'd']


In [22]:
# Accessing Values
# To access values in lists, use the square brackets for slicing along with the index 
# or indices to obtain value available at that index.
print("list1[0]: ", list1[0])
print("list2[1:5]: ", list2[1:5])

list1[0]:  physics
list2[1:5]:  [2, 3, 4, 5]


In [23]:
# Updating Lists
# You can update single or multiple elements of lists by giving the slice on the left-hand side of the assignment operator
print("Value available at index 2 : ", list1[2])
list1[2] = 2001
print("New value available at index 2 : ", list1[2])

Value available at index 2 :  1997
New value available at index 2 :  2001


# Tuple

A tuple is a sequence of immutable Python objects. Tuples are sequences, just like lists. The differences between tuples and lists: 
- the tuples cannot be changed unlike lists 
- tuples use parentheses, (a, b, c), whereas lists use square brackets, [a, b, c].

In [26]:
# Updating Tuples

# Tuples are immutable which means you cannot update or change the values of tuple elements. 
# You are able to take portions of existing tuples to create new tuples as the following example demonstrates

tup1 = (12, 34.56)
tup2 = ('abc', 'xyz')

# So let's create a new tuple as follows
tup3 = tup1 + tup2
print(f"New tuple: {tup3}")

# Following action is not valid for tuples
tup1[0] = 100

New tuple: (12, 34.56, 'abc', 'xyz')


TypeError: 'tuple' object does not support item assignment

In [None]:
# Delete Tuple Elements

# Removing individual tuple elements is not possible. 
# There is, of course, nothing wrong with putting together 
# another tuple with the undesired elements discarded.

# To explicitly remove an entire tuple, just use the del statement.

# Dictionary

- In Dictionary each key is separated from its value by a colon (:)
- the items are separated by commas 
- the whole thing is enclosed in curly braces. 
- An empty dictionary without any items is written with just two curly braces, like this {}.

In [28]:
dict = {'Name': 'Zara', 'Age': 7, 'Class': 'First'}
print("dict['Name']: ", dict['Name'])
print("dict['Age']: ", dict['Age'])

dict['Name']:  Zara
dict['Age']:  7


# ChainMap

It is a type of data structure to manage multiple dictionaries together as one unit.
- Removing duplicate keys. If there are duplicate keys, then only the value from the first key is preserved.
- The best use of ChainMap is to search through multiple dictionaries at a time and get the proper key-value pair mapping
- We also see that these ChainMaps behave as stack data structure.

In [29]:
# Examples

import collections

dict1 = {'day1': 'Mon', 'day2': 'Tue'}
dict2 = {'day3': 'Wed', 'day1': 'Thu'}

res = collections.ChainMap(dict1, dict2)
print(res)

ChainMap({'day1': 'Mon', 'day2': 'Tue'}, {'day3': 'Wed', 'day1': 'Thu'})


In [30]:
# Creating a single dictionary
print(res.maps,'\n')

print('Keys = {}'.format(list(res.keys())))
print('Values = {}'.format(list(res.values())))
print()

[{'day1': 'Mon', 'day2': 'Tue'}, {'day3': 'Wed', 'day1': 'Thu'}] 

Keys = ['day3', 'day1', 'day2']
Values = ['Wed', 'Mon', 'Tue']



In [31]:
# Print all the elements from the result
print('elements:')
for key, val in res.items():
   print('{} = {}'.format(key, val))
print()

elements:
day3 = Wed
day1 = Mon
day2 = Tue



In [32]:
# Find a specific value in the result
print('day3 in res: {}'.format(('day1' in res)))
print('day4 in res: {}'.format(('day4' in res)))

day3 in res: True
day4 in res: False


In [34]:
# Updating Map
# When the element of the dictionary is updated, the result is instantly updated in 
# the result of the ChainMap. In the below example we see that the new updated value 
# reflects in the result without explicitly applying the ChainMap method again.

print(res.maps,'\n')

dict2['day4'] = 'Fri'

print(res.maps,'\n')

[{'day1': 'Mon', 'day2': 'Tue'}, {'day3': 'Wed', 'day1': 'Thu'}] 

[{'day1': 'Mon', 'day2': 'Tue'}, {'day3': 'Wed', 'day1': 'Thu', 'day4': 'Fri'}] 



# 链表

1) 含义：链表（Linked list）是一种常见的基础数据结构，是一种线性表，但是并不会按线性的顺序存储数据，而是在每一个节点里存到下一个节点的指针(Pointer)。由于不必须按顺序存储，链表在插入的时候可以达到O(1)的复杂度，比另一种线性表顺序表快得多，但是查找一个节点或者访问特定编号的节点则需要O(n)的时间，而顺序表相应的时间复杂度分别是O(logn)和O(1)

2) 特点：使用链表结构可以克服数组链表需要预先知道数据大小的缺点，链表结构可以充分利用计算机内存空间，实现灵活的内存动态管理。但是链表失去了数组随机读取的优点，同时链表由于增加了结点的指针域，空间开销比较大

3) 操作：
1. is_empty() 链表是否为空
3. length() 链表长度
3. travel() 遍历链表
4. add(item) 链表头部添加
5. append(item) 链表尾部添加
6. insert(pos, item) 指定位置添加
7. remove(item) 删除节点
8. search(item) 查找节点是否存在

In [2]:
class LinkNode(object):
    def __init(self, item, prev=None, next=None):
        self.item = item
        self.prev = prev
        self.next = next

In [None]:
class DLinkList(object):
    def __init__(self):
        self.head = None
        
        self.count = 0
        
    def add(self, item):
        node = LinkNode(item)
        
        if self.head == None:
            self.head = node
            node.prev = node
            node.next = node
        else:
            node.next = self.head
            self.head.prev = node
            self.head = node
        self.count += 1
    
    def append(self, item):
        pass
        
    def is_empty(self):
        return self._head == None
    
    def length(self):
        return self.count
    

# 堆栈

1. 含义：堆栈（英语：stack），也可直接称栈，在计算机科学中，是一种特殊的串列形式的数据结构，它的特殊之处在于只能允许在链接串列或阵列的一端（称为堆叠顶端指标，英语：top）进行加入资料（英语：push）和输出资料（英语：pop）的运算。另外堆叠也可以用一维阵列或连结串列的形式来完成。堆叠的另外一个相对的操作方式称为伫列；由于堆叠数据结构只允许在一端进行操作，因而按照后进先出（LIFO, Last In First Out）的原理运作

2. 特点：先入后出，后入先出；除头尾节点之外，每个元素有一个前驱，一个后继

In [4]:
class Stack(object):
    def __init__(self):
        self.data = []
    
    def length(self):
        return len(self.data)
    
    def is_empty(self):
        return self.length == 0
    
    def push(self, item):
        self.data.append(item)
    
    def pop(self, item):
        return self.data.pop()

# 队列

1) 含义：和堆栈类似，唯一的区别是队列只能在队头进行出队操作，所以队列是是先进先出（FIFO, First-In-First-Out）的线性表

2) 特点：先入先出,后入后出；除尾节点外,每个节点有一个后继；（可选）除头节点外,每个节点有一个前驱

In [35]:
class Queue(object):
    def __init__(self):
        self.data = []
        
    def dequeue(self):
        return self.data.pop(0) if self.data != [] else None
    def inqueue(self, item):
        self.data.append(item)

# Dequeue

A double-ended queue, or deque, supports adding and removing elements from either end. he more commonly used stacks and queues are degenerate forms of deques, where the inputs and outputs are restricted to a single end.

In [47]:
import collections

dq = collections.deque(["Monday","Tuesday", "Wesday", "Thursday"])

print(f"Double ended queue: {dq}")

dq.append('Friday'); print(f"Appended at right: {dq}")
dq.appendleft("Sunday"); print(f"Appended at left: {dq}")
dq.pop(); print(f"Deleting from right: {dq}")
dq.popleft(); print(f"Deleting from left: {dq}")

Double ended queue: deque(['Monday', 'Tuesday', 'Wesday', 'Thursday'])
Appended at right: deque(['Monday', 'Tuesday', 'Wesday', 'Thursday', 'Friday'])
Appended at left: deque(['Sunday', 'Monday', 'Tuesday', 'Wesday', 'Thursday', 'Friday'])
Deleting from right: deque(['Sunday', 'Monday', 'Tuesday', 'Wesday', 'Thursday'])
Deleting from left: deque(['Monday', 'Tuesday', 'Wesday', 'Thursday'])


# 二叉树

1）定义：二叉树是每个结点最多有两个子树的树结构。它有五种基本形态：二叉树可以是空集；根可以有空的左子树或右子树；或者左、右子树皆为空
2）特点：
   1. 性质1：二叉树第i层上的结点数目最多为$2^{i-1}$(i>=1)；
   2. 性质2：深度为k的二叉树至多有$2^{k-1}$个结点（k>=1）；
   3. 性质3：包含n个结点的二叉树的高度至少为$log_{2} n+1$；
   4. 性质4：在任意一棵二叉树中，若终端结点的个数为$n_0$，度为2的结点数为$n_2$，则$n_0=n_2+1$

In [6]:
class TreeNode(object):
    def __init__(self, item):
        self.item = item
        self.left_child = None
        self.right_child = None

In [27]:
class Tree(object):
    def __init__(self):
        self.root = None

    def add(self, item):
        node = TreeNode(item)
        if self.root is None:
            self.root = node
        else:
            q = [self.root]
            while True:
                pop_node = q.pop(0)
                if pop_node.left_child is None:
                    pop_node.left_child = node
                    break
                elif pop_node.right_child is None:
                    pop_node.right_child = node
                    break
                else:
                    q.append(pop_node.left_child)
                    q.append(pop_node.right_child)

    def traverse(self):
        if self.root is None:
            return None

        q = [self.root]

        res = [self.root.item]

        while q != []:
            pop_node = q.pop(0)

            if pop_node.left_child is not None:
                q.append(pop_node.left_child)
                res.append(pop_node.left_child.item)

            if pop_node.right_child is not None:
                q.append(pop_node.right_child)
                res.append(pop_node.right_child.item)

        return res

    def traverse_preorder(self, root, res=None):  # 先序遍历
        if res is None: res = []

        def preorder(root, res):
            if root is None: return None
            res.append(root.item)
            preorder(root.left_child, res)
            preorder(root.right_child, res)

        preorder(root, res)
        return res

    def traverse_inorder(self, root, res=None):  # 中序序遍历
        if res is None: res = []

        def inorder(root, res):
            if root is None: return None
            inorder(root.left_child, res)
            res.append(root.item)
            inorder(root.right_child, res)

        inorder(root, res)
        return res

    def traverse_postorder(self, root, res=None):  # 中序序遍历
        if res is None: res = []

        def postorder(root, res):
            if root is None: return None
            postorder(root.left_child, res)
            postorder(root.right_child, res)
            res.append(root.item)

        postorder(root, res)
        return res

In [28]:
t = Tree()
for i in range(10):
    t.add(i)

In [31]:
# 层序遍历
print(t.traverse())
print(t.traverse_preorder(t.root))
print(t.traverse_inorder(t.root))
print(t.traverse_postorder(t.root))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 3, 7, 8, 4, 9, 2, 5, 6]
[7, 3, 8, 1, 9, 4, 0, 5, 2, 6]
[7, 8, 3, 9, 4, 1, 5, 6, 2, 0]


# Container datatypes --- collections

## Counter

A Counter is a dict subclass for counting hashable objects. It is a collection where elements are stored as dictionary keys and their counts are stored as dictionary values. Counts are allowed to be any integer value including zero or negative counts. The Counter class is similar to bags or multisets in other languages.

In [49]:
from collections import Counter

# 1. initial to take a count in two ways:
cnt = Counter()
for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:
    cnt[word] += 1
print(f"after count: {cnt}")
print(f"Counting directly from array: {Counter(['red', 'blue', 'red', 'green', 'blue', 'blue'])}")

after count: Counter({'blue': 3, 'red': 2, 'green': 1})
Counting directly from array: Counter({'blue': 3, 'red': 2, 'green': 1})


In [50]:
# More examples from 
print(f"a new, empty counter {Counter()}")
print(f"a new counter from an iterable {Counter('gallahad') }")
print(f"a new counter from a mapping {Counter({'red': 4, 'blue': 2})}")
print(f"a new counter from keyword args {Counter(cats=4, dogs=8) }")

a new, empty counter Counter()
a new counter from an iterable Counter({'a': 3, 'l': 2, 'g': 1, 'h': 1, 'd': 1})
a new counter from a mapping Counter({'red': 4, 'blue': 2})
a new counter from keyword args Counter({'dogs': 8, 'cats': 4})


In [52]:
# Useful APIs
c = Counter(a=4, b=2, c=0, d=-2)
print(c)
print(f"elements: {list(c.elements())}")


Counter({'a': 4, 'b': 2, 'c': 0, 'd': -2})
elements: ['a', 'a', 'a', 'a', 'b', 'b']


In [55]:
# 
c = Counter('abracadabra')

top3 = c.most_common(3)
n = 2
least_n = c.most_common()[:-n-1:-1] # n least common elements
print(f"Original Count: {c}")
print(f"Top 3 most common: {top3}")
print(f"N least common elements: {least_n}")

Original Count: Counter({'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1})
Top 3 most common: [('a', 5), ('b', 2), ('r', 2)]
N least common elements: [('d', 1), ('c', 1)]


# Heap

A heap is a binary tree inside an array. A heap is sorted based on the "heap property" that determines the order of the nodes in the tree.

\begin{definition}[Heap]
Suppose that $arr = [k_0, k_1, k_2, \ldots, k_{n-1}]$, $\forall i \in [0, \frac{n - 2}{2}]$
\begin{itemize}
\item \textbf{Max-Heap}
 \begin{equation}
    \begin{cases}
      k_i \geq k_{2i + 1} \\
      k_i \geq k_{2i + 2}
    \end{cases}
  \end{equation}
\item \textbf{Min-Heap}
\begin{equation}
    \begin{cases}
      k_i \leq k_{2i + 1} \\
      k_i \leq k_{2i + 2}
    \end{cases}
  \end{equation}
\end{itemize}
\end{definition}

If $i$ is the index of a node, then the following formulas give the array indices of its parent and child nodes:
\begin{aligned}
parent(i) &= floor(\frac{i - 1}{2}) \\
left(i)   &= 2i + 1 \\
right(i)  &= left(i) + 1 = 2i + 2 \\
\end{aligned}
The left and right nodes are always stored right next to each other.

## Heap Property

In [66]:
# 1. Using Max- or Min-heap property to verify that if an array is a heap.
arr = [ 10, 7, 2, 5, 1 ]
import math

def parent(index):
    return math.floor((index - 1) / 2)

def left(index):
    return 2*index + 1

def right(index):
    return 2*index + 2

hold_props = [arr[parent(i)] >= arr[i] for i in range(1, len(arr))]
print(hold_props)

[True, True, True, True]


- In Max-heap, all parent nodes is bigger or equal than their children nodes, so the largest item at the root of the tree. 
- In Min-heap, all parent nodes is smaller or equal than their children nodes, so the smallest item at the root of the tree. 
- The root of the heap has the maximum or minimum element, but the sort order of other elements are not predictable. 

## Heap vs Regular Tree (Binary search tree)

- **Order of the nodes.** In a Binary search tree (BST), the left child must be smaller than its parent, and the right child must be greater. This is not true for a heap. In a max-heap both children must be smaller than the parent, while in a min-heap they both must be greater.
- **Memory.** Traditional trees take up more memory than just the data they store. You need to allocate additional storage for the node objects and pointers to the left/right child nodes. A heap only uses a plain array for storage and uses no pointers.
- **Balancing.** A binary search tree must be "balanced" so that most operations have O(log n) performance. You can either insert and delete your data in a random order or use something like an AVL tree or red-black tree, but with heaps we don't actually need the entire tree to be sorted. We just want the heap property to be fulfilled, so balancing isn't an issue. Because of the way the heap is structured, heaps can guarantee O(log n) performance.
- **Searching.** Whereas searching is fast in a binary tree, it is slow in a heap. Searching isn't a top priority in a heap since the purpose of a heap is to put the largest (or smallest) node at the front and to allow relatively fast inserts and deletes.

In [70]:
import math

class Heap:
    def __init__(self):

        self.data = []

    @property
    def size(self):
        return len(self.data)

    def insert(self, value):
        """
        Adds the new element to the end of the heap
        and then uses shiftUp() to fix the heap.
        """
        self.data.append(value)
        self.shiftUp(self.size - 1)

    def remove(self):
        """
        Removes and returns the maximum value (max-heap) or
        the minimum value (min-heap). To fill up the hole left
        by removing the element, the very last element is moved
        to the root position and then shiftDown() fixes up the heap.
        (This is sometimes called "extract min" or "extract max".)
        """
        pass

    def removeAtIndex(self, index):
        """
        Just like remove() with the exception that it allows you to
        remove any item from the heap, not just the root. This calls
        both shiftDown(), in case the new element is out-of-order with
        its children, and shiftUp(), in case the element is out-of-order
        with its parents.
        """
        pass

    def replace(self, index, value):
        """
        Assigns a smaller (min-heap) or larger (max-heap) value to a node.
        Because this invalidates the heap property, it uses shiftUp() to
        patch things up. (Also called "decrease key" and "increase key".)
        """
        pass

    def search(self, value):
        """
        Heaps are not built for efficient searches, but the replace() and
        removeAtIndex() operations require the array index of the node,
        so you need to find that index. Time: O(n).
        """
        pass

    def buildHeap(self, array):
        """
        Converts an (unsorted) array into a heap by repeatedly calling insert().
        If you are smart about this, it can be done in O(n) time.
        """
        pass

    def peek(self):
        """
        The heap also has a peek() function that returns the maximum (max-heap)
        or minimum (min-heap) element, without removing it from the heap.
        Time: O(1).
        """
        return self.data[0] if self.size() > 0 else None

    def shiftUp(self, index):
        """
        If the element is greater (max-heap) or smaller (min-heap)
        than its parent, it needs to be swapped with the parent.
        This makes it move up the tree.

        Shifting up or down is a recursive procedure that takes O(log n) time.
        """
        parent_index = self.parent(index)
        while index > 1 and self.data[parent_index] > self.data[index]:
            self.data[parent_index], self.data[index] = self.data[index], self.data[parent_index]
            index = parent_index
            parent_index = self.parent(index)


    def shiftDown(self, index):
        """
        If the element is smaller (max-heap) or greater (min-heap)
        than its children, it needs to move down the tree.
        This operation is also called "heapify".

        Shifting up or down is a recursive procedure that takes O(log n) time.
        """
        pass

    def heapify(self):
        pass

    @staticmethod
    def parent(index):
        return math.floor((index - 1) / 2)

    @staticmethod
    def left(index):
        return 2 * index + 1

    @staticmethod
    def right(index):
        return 2 * index + 2



if __name__ == "__main__":
    pass

## Heap queue algorithm in Python

This module provides an implementation of the heap queue algorithm, also known as the priority queue algorithm.

In [83]:
import heapq
# 1. Creat a heap 

h1 = []
data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]

heapq.heapify(data) # Transform list x into a heap, in-place, in linear time.
print(data)

[0, 1, 2, 6, 3, 5, 4, 7, 8, 9]


In [84]:
# Push the value item onto the heap, maintaining the heap invariant.
heapq.heappush(data, 3)
print(data)

[0, 1, 2, 6, 3, 5, 4, 7, 8, 9, 3]


In [85]:
# Pop and return the smallest item from the heap, maintaining the heap 
# invariant. If the heap is empty, IndexError is raised. To access the 
# smallest item without popping it, use heap[0].

result = heapq.heappop(data)
print(data)
print(result)

[1, 3, 2, 6, 3, 5, 4, 7, 8, 9]
0


In [89]:
def heapsort(iterable):
    h = []
    
    # build a min-heap
    for value in iterable:
        heapq.heappush(h, value)
    
    return [heapq.heappop(h) for _ in range(len(iterable))]

In [90]:
heapsort([1, 3, 2, 6, 3, 5, 4, 7, 8, 9])

[1, 2, 3, 3, 4, 5, 6, 7, 8, 9]

# Reference 

1. [Python 中常见的数据结构](https://zhuanlan.zhihu.com/p/69487899)
2. [Python对数据结构的实现](https://blog.csdn.net/mxz19901102/article/details/80071864?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.control)
3. [常见的数据结构](https://zhuanlan.zhihu.com/p/93928546)
4. [Heap](https://github.com/raywenderlich/swift-algorithm-club/tree/master/Heap)