# VIII. Linear Time Selection 
In this notebook we will look at two types of selection algorithms, the RSelect, or the randomized selection algorithm and a deterministic version called DSelect. But before we proceed the question is "What is selection?" A selection problem aims to find the $i^{th}$ smallest number for n given numbers. The simplest approach would be to sort the input array and then select the $i^{th}$ position number essentially doing the selection in $nlogn$ time. This however is not the optimum way as selection if an easier problem than sorting and thus we should achieve this in much less running time than sorting. To select a number from n given numbers, we will have to view all the numbers at the minimum and thus $O(n)$ is a reasonable lower bound and we know that sorting can be done in $O(nlogn)$ which is an upperbound for our problem.

Can we do something to possibly perform the selection in $O(n)$ time? Let us implement the selection problem using the partition routine we used in quick sort.

In [147]:
import random

def swap(element1, element2):
    temp = element1
    element1 = element2
    element2 = temp
    return element1, element2

def partition(array, l, r):
    #Partitions the input array in place and returns the index of the pivot element
    #in the partitioned array and the number of comparisons performed which is required for the analysis
    #of the number of comparisons performed.
    i = l + 1
    for j in range(1, r):
        if array[0] > array[j]:
            if j != i:
                array[i], array[j] = swap(array[i], array[j])   
            i += 1
    array[0], array[i-1] = swap(array[0], array[i-1]) 
    return array, i-1


def choose_pivot(array):
    randomidx = random.randint(0,len(array)-1)
    array[0], array[randomidx] = swap(array[0], array[randomidx])
    return array

def Randomized_Selection(array , order_statistic, init_ncomparisons = 0 ):
    global ncomparisons
    if init_ncomparisons == 0:
        ncomparisons = 0
    n = len(array)
    if n == 1:
        return array[0], ncomparisons
    if n > 1:
        array = choose_pivot(array)   # Randomized algorithm: choosing pivot
        array, pvtidx = partition(array, 0, n)
        ncomparisons += n-1
        if pvtidx == order_statistic - 1:
            return array[pvtidx], ncomparisons
        elif pvtidx > order_statistic - 1:
            return Randomized_Selection(array[:pvtidx], order_statistic, init_ncomparisons = 1)
        if pvtidx < order_statistic - 1:
            return Randomized_Selection(array[pvtidx+1:], order_statistic - pvtidx - 1, init_ncomparisons = 1)

In [54]:
print("Randomized_Selection & Number of comparisons made using randomized algorithm:")
print(Randomized_Selection([3, 2, 1, 5, 0, 7, 8], 3))
print('1st Order stat is {}'.format(Randomized_Selection([3, 8, 2, 5, 1, 4, 7, 6], 1)))
print('2nd Order stat is {}'.format(Randomized_Selection([3, 8, 2, 5, 1, 4, 7, 6], 2)))
print('3rd Order stat is {}'.format(Randomized_Selection([3, 8, 2, 5, 1, 4, 7, 6], 3)))
print('4th Order stat is {}'.format(Randomized_Selection([3, 8, 2, 5, 1, 4, 7, 6], 4)))
print('5th Order stat is {}'.format(Randomized_Selection([3, 8, 2, 5, 1, 4, 7, 6], 5)))
print('6th Order stat is {}'.format(Randomized_Selection([3, 8, 2, 5, 1, 4, 7, 6], 6)))
print('7th Order stat is {}'.format(Randomized_Selection([3, 8, 2, 5, 1, 4, 7, 6], 7)))
print('8th Order stat is {}'.format(Randomized_Selection([3, 8, 2, 5, 1, 4, 7, 6], 8)))

Randomized_Selection & Number of comparisons made using randomized algorithm:
(2, 9)
1st Order stat is (1, 12)
2nd Order stat is (2, 12)
3rd Order stat is (3, 11)
4th Order stat is (4, 7)
5th Order stat is (5, 14)
6th Order stat is (6, 16)
7th Order stat is (7, 7)
8th Order stat is (8, 7)


We will test the implementation with some of the test data given [here](http://algorithmsilluminated.org/)

In [144]:
import urllib3
http = urllib3.PoolManager()

def test_case(url):
    # Test case
    r1 = http.request('GET', url)
    IntegerArrayString = r1.data.split('\r\n')
    del IntegerArrayString[-1]
    IntegerArray = [int(i) for i in IntegerArrayString]
    return IntegerArray

print("Median and Number of comparisons  using Randomized Selection")
print("5th Order statistic of the content expected to be 5469, got : {}".format(Randomized_Selection(test_case("http://algorithmsilluminated.org/datasets/problem6.5test1.txt"), 5)))
print("50th Order statistic of the content expected to be 4715, got: {}".format(Randomized_Selection(test_case("http://algorithmsilluminated.org/datasets/problem6.5test2.txt"), 50)))

Median and Number of comparisons  using Randomized Selection
5th Order statistic of the content expected to be 5469, got : (5469, 29)
50th Order statistic of the content expected to be 4715, got: (4715, 153)


From above tests, the randomized implementation works in place with constant extra memory and works on a similar way quick sort does. The only difference is that after partition we either recurse on the left or the right side of the array unlike sorting where we recurse on both partitions around pivot. We will prove that the above algorithm runs in linear time.

***

**Intutition**

The partition sub routine does exactly what it does in case of quick sort. Once we choose the partition routine, we know in linear time, what position the pivot element will end up in after the recursve calls. Once we know that value, we can then recurse on the left or the right split after the pivot. In the best case suppose we end up picking the median of the input array as pivot, in which case the array will be split in exactly halves. Since we always do linear work outside the recursive calls, we can express our recurrance as follows $$T(n)\:=\:T(\frac{n}{2}) + O(n) $$

By master method, a = 1, b = 2 and d = 1, we have $a < b^d$ which is case 2 of the master method. The running time is in this case dominated by the work done outside recursive calls giving us the time as $O(n^d)$ = $O(n)$ in this case. This shows us that if we get an approximate median and split the array into two approximately equal pieces, we can expect the selection problem to run in linear time. Let us see how exactly we get linear time on an average for a given array using the following proof.

***

**Proof**

We will track the progress of our selection problem in phases. we will call the problem is in phase j if it is operating on an array of length $(\frac{3}{4})^{j + 1}$ to $(\frac{3}{4})^{j}$. For example, it will be in phase 0, if the size of the array is between 0.75n to n, phase 1 of the size array is 0.56n to 0.75n and so on. The maximum value of phase j when the value of the input size will be 1 should be $log_{4/3}n$.

We will now define a random variable $X_j$ which gives is the number of times the selection process made recursive calls in phase j. The minimum value of this variable is 0.

The maximum size of the array in phase j is $(\frac{3}{4})^j$ and thus the work done in this phase is no more than $c\cdot (\frac{3}{4})^j \cdot n$. Thus

$$Running\:time\:of\:RSelect\:is\:\leq \sum_{j \geq 0} c\cdot (\frac{3}{4})^j\cdot n X_j$$
By linearity of expectations

$$E[Running\:time\:of\:RSelect\:is]\:\leq cn \sum_{j \geq 0} (\frac{3}{4})^jE[X_j]$$
How do we find $E[X_j]$?

Whenever we choose an approximate median as pivot, a number that gives us 25-75% split of the input array, we are guaranteed to procees to next phase. If only when the split produces an array giving is greater than 75% of elements remaining after partition, we stay in the phase j for another recursive call. Thus we can conclude, picking an approximate median is guaranteed to take us to next phase.

Secondly, picking an approximate median has a probability of 50%. Suppose we have n numbers say, 1 to 100, then by picking anything from 26 to 75 is an approximate median which makes up 50% of the input numbers.

The random variable $X_j$ is similar to a coin flipping problem. Suppose N is a random variable which counts the number of times we need to flip the coin to get heads (or tails).

The random variable $X_j$ and $N$ are similar but the $E[X_j] \leq E[N]$

The coin flopping experiment has to flip the coin atleast once to get heads. The value of X_j can be 0 if we completely skip the phase.
We have maximum probability of 0.5 of not going out of phase j and staying in the same phase. This is similar to 0.5 probability we get tails and we we need to flip the coin another time.
The expected value of the coin flip experiment is

$E(N)\: = \: 1\: +\: \frac{1}{2}E(N)$

The only value of E(N) that satisfies the above equation is 2. The equation is explained as follows. The value 1 is for that minimum one flip needed and the value 0.5 is the probability that we get tails. We can thus say that $E(X_j) \leq 2$

Geometric sequence $1\: + \: r^2\: + \: r^3 \dots r^k = \frac{1 - r^{k + 1}}{1 - r}$

This $\sum_{j \geq 0} (\frac{3}{4})^j\: \leq\: \frac{1}{1 - \frac{3}{4}} = 4$

Therefore,  $E[Running\:time\:of\:RSelect\:is]\:\leq cn \sum_{j \geq 0} (\frac{3}{4})^jE[X_j] = 4\cdot 2cn = 8cn$

Hence, the running time of randomized select algorithm on an average runs in $O(n)$

***

Next we will look at an algorithm which does the selection in linear time and is deterministic. Its called DSelect in our notebook

We will analyze the running time of the algorithm. For Pseudo code of the algorithm refer page 169 of the book.

The first time consuming activity of DSelect is to find the median of medians. For this purpose we find the break the input array in batches of 5 and find medians of each of these splits of 5. This way we find the a total of $\frac{n}{5}$ medians. We recursively keep finding the medians of medians till we hit the base case where we only one element in the input array. To illustrate this, consider we have the following input array

11, 6, 10, 2, 15, 8, 1, 7, 14, 3, 9, 12, 4, 5, 13

We break find medians of the splits of 5 of the array giving us 10 (median of 11, 6, 10, 2, 15), 7 (median of 8, 1, 7, 14, 3, 9) and 9 median of (9, 12, 4, 5, 13). Median of these three medians is 9 and thus this is the pivot of our choice.

Finding median of 5 numbers is a constant time activity and let that constant be c. We operate on an array of $\frac{n}{5}$ to find medians. Thus the time taken to find the medians of medians is $\frac{n}{5}\cdot c$ = $O(n)$

Thus find median of medians is a Linear time operation.

The median of medians is found by calling DSelect recursively on an array of size $\frac{n}{5}$ and then the partition, which is done in linear time, determines the statistic of the pivot element (the median of medians). Based on the $i{th}$ order statistic we are interested in, we either recurse on the portion of the input to the left of the pivot or the one on the right. We therefore have two sub problems

The Dselect called on $\frac{n}{5}$ array to find median of medians
The second recursive call on DSelect on the reduced array.
Outside the recursive call, the DSelect does linear work to find the medians and to partition the numbers around pivot. Therefore the recurrence is

$T(n)\: \leq \: T(\frac{n}{5})\:+\:T(?)\:+ O(n)\:$

Right now we are interested in deterministically find $T(?)$

For this purpose we introduce the 30-70 Lemma (which we will prove later). This Lemma states that the median of medians is no less than at least 60% of elements in at least 50% of the groups of 5 and no greater than at least 60% of at least 50% of groups of 5. The value 60% of 50% is 0.6 * 0.5 = 30%. This sounds confusing, but it essentially says that there are at least 30% numbers in the input array those are smaller than the partition element (median of medians), and there are at least 30% of numbers in the input array those are larger than the partition element. The result of this is, that under no circumstance we will get the split around partition greater than 70% of original input. Therefore, $T(?) \leq T(\frac{7n}{10})$ hence the running time of DSelect is

$T(n)\: \leq \: T(\frac{n}{5})\:+\:T(\frac{7n}{10})\:+ O(n)\:$

We will prove the algorithm runs in linear time using induction. We assume that for k < n, $T(k) \leq lk$

We will assign l to some arbritrary constant independent of n, for our derivation we will assign l = 10c. This is a legitimate assumption as for the base case we know T(1) = 1. Since $c\geq 1$, $T(1) \leq 10c$

Therefore for $n\geq 2$

$T(n)\: \leq \: l\frac{n}{5}\:+\:l\frac{7n}{10}\:+ cn\: = \frac{9ln}{10}\:+ cn\: = 9cn + cn = 10cn = l\cdot n$

This proves the indictive step that the $T(n)\: \leq l\cdot n\:=\:O(n)$

We are still to prove the 30-70 Lemma. The simple explanation is as follows.

Consider k number arrays each of length 5. For simplicity lets assume k is 5. Thus there are two medians smaller and two medians larger than the median or medians (mom). We call them $s_1$, $s_2$ and $l_1$, $l_2$ respectively.
We can therefore conclusively say that no more than 3 of 5, which is 60% of numbers including the medians in $s_1$ and $s_2$ and the minimum 2 numbers in the same batch of 5 as mom are smaller than mom.
Similarly, we can therefore conclusively say that no more than 3 of 5, which is 60% of numbers including the medians in $l_1$ and $l_2$ and the maximum 2 numbers in the same batch of 5 as mom are larger than mom.
Thus for this example where k = 5 (n = 25). Minimum 8 numbers are smaller than mom and minimum 8 numbers are larger than mom. The next recursive call will not have the pivot and either of these 8 numbers which makes it 9 numbers minimum. This means the maximum size of the array we can have is 16 which is 64% of input array.
This generalization holds true for any large value of k and we are guaranteed to not have more than 70% of original input passed to the subsequent recursive call.



In [139]:
def swap(element1, element2):
    temp = element1
    element1 = element2
    element2 = temp
    return element1, element2

def partition(array, l, r):
    #Partitions the input array in place and returns the index of the pivot element
    #in the partitioned array and the number of comparisons performed which is required for the analysis
    #of the number of comparisons performed.
    i = l + 1
    for j in range(1, r):
        if array[0] > array[j]:
            if j != i:
                array[i], array[j] = swap(array[i], array[j])   
            i += 1
    array[0], array[i-1] = swap(array[0], array[i-1]) 
    return array, i-1

def quicksort_naivepivot(array , init_ncomparisons = 0):
    global ncomparisons
    if init_ncomparisons == 0:
        ncomparisons = 0
    n = len(array)
    if n == 1:
        #print(array)
        return array, ncomparisons
    if n > 1:
        #print(array)
        pivot = array[0]   # 1st element is pivot
        array, pvtidx = partition(array, 0, n)
        ncomparisons += n-1
        #print("partitioned : {}".format(array))
        if pvtidx != 0:
            array[:pvtidx], _ = quicksort_naivepivot(array[:pvtidx], init_ncomparisons = 1)
        if pvtidx != n-1:
            array[pvtidx+1:], _ = quicksort_naivepivot(array[pvtidx+1:], init_ncomparisons = 1)
    return array, ncomparisons

def choose_pivot_median_of_medians(array):
    #print("m&m: {}".format(array))
    group5 = len(array) // 5 if len(array) % 5 == 0 else (len(array) // 5)+1
    idx = 0
    idxC = 0
    C = [None]*group5
    arraysorted = [None]*len(array)
    for g in range(group5-1):
        #print(array[idx:idx+5])
        arraysorted[idx:idx+5] = quicksort_naivepivot(array[idx:idx+5])[0]
        C[idxC] = arraysorted[idx:idx+5][2]
        #print(C[idxC])
        idx += 5
        idxC += 1
    if len(array[idx:]) != 1:
        lastmid = len(array[idx:]) // 2 
        arraysorted[idx:] =  quicksort_naivepivot(array[idx:])[0]
        C[idxC] = arraysorted[idx:][lastmid]
    else:
        arraysorted[idx] = array[idx]
        C[idxC] = array[idx]
    #print(C)
    if len(arraysorted) == 1:
        #print("m&m basecase: {}".format(arraysorted))
        return arraysorted
    else:
        medianidx = Deterministic_Selection(C, (group5 // 2)+1)[1]
        #print("median idx: {}".format(medianidx))
        arraysorted[0], arraysorted[medianidx] = swap(arraysorted[0], arraysorted[medianidx])
        #print("median first: {}".format(arraysorted))
        return arraysorted

def Deterministic_Selection(array , order_statistic, globalidx = 0, init_ncomparisons = 0 ):
    global ncomparisons
    if init_ncomparisons == 0:
        ncomparisons = 0
    n = len(array)
    if n == 1:
        #print("DS basecase: {}".format(array))
        return array[0], globalidx, ncomparisons
    if n > 1:
        #print("DS array: {}".format(array))
        array = choose_pivot_median_of_medians(array)   # Randomized algorithm: choosing pivot
        #print("DS array after: {}".format(array))
        array, pvtidx = partition(array, 0, n)
        ncomparisons += n-1
        if pvtidx == order_statistic - 1:
            return array[pvtidx], pvtidx+globalidx, ncomparisons
        elif pvtidx > order_statistic - 1:
            return Deterministic_Selection(array[:pvtidx], order_statistic, globalidx = pvtidx+1, init_ncomparisons = 1)
        if pvtidx < order_statistic - 1:
            return Deterministic_Selection(array[pvtidx+1:], order_statistic - pvtidx - 1, init_ncomparisons = 1)

In [143]:
print('2st Order stat is', Deterministic_Selection([3, 6], 2)[0])
print('1st Order stat is', Deterministic_Selection([3, 8, 2, 5, 1, 4, 7, 6], 1)[0])
print('2nd Order stat is', Deterministic_Selection([3, 8, 2, 5, 1, 4, 7, 6], 2)[0])
print('3rd Order stat is', Deterministic_Selection([3, 8, 2, 5, 1, 4, 7, 6], 3)[0])
print('4th Order stat is', Deterministic_Selection([3, 8, 2, 5, 1, 4, 7, 6], 4)[0])
print('5th Order stat is', Deterministic_Selection([3, 8, 2, 5, 1, 4, 7, 6], 5)[0])
print('6th Order stat is', Deterministic_Selection([3, 8, 2, 5, 1, 4, 7, 6], 6)[0])
print('7th Order stat is', Deterministic_Selection([3, 8, 2, 5, 1, 4, 7, 6], 7)[0])
print('8th Order stat is', Deterministic_Selection([3, 8, 2, 5, 1, 4, 7, 6], 8)[0])
print('11th Order stat is', Deterministic_Selection([56, 765, 34,242,3, 8, 542, 789, 2, 5,321, 1, 4, 7, 87, 90, 6, 9, 77, 55, 18,10,66, 79,15,131,12, 134], 11)[0])



('2st Order stat is', 6)
('1st Order stat is', 1)
('2nd Order stat is', 2)
('3rd Order stat is', 3)
('4th Order stat is', 4)
('5th Order stat is', 5)
('6th Order stat is', 6)
('7th Order stat is', 7)
('8th Order stat is', 8)
('11th Order stat is', 12)


In [146]:
print("Median using Deterministic Selection: ")
print("5th Order statistic of the content expected to be 5469, got : {}".format(Deterministic_Selection(test_case("http://algorithmsilluminated.org/datasets/problem6.5test1.txt"), 5)[0]))
print("50th Order statistic of the content expected to be 4715, got: {}".format(Deterministic_Selection(test_case("http://algorithmsilluminated.org/datasets/problem6.5test2.txt"), 50)[0]))


Median using Deterministic Selection: 
5th Order statistic of the content expected to be 5469, got : 5469
50th Order statistic of the content expected to be 4715, got: 4715


#  IX. GRAPHS AND THE CONTRACTION ALGORITHM
Graphs are relationships between pair of objects. The objects are called Vertices (singulat: vertex) or nodes and the relationship between them is represented by the edge. The graph (G) is represented using the vertices (V) and the edges (E).

***

Two flavors of graph are directed and undirected.

- In an undirected graph, the edges are formed by the undirected edges {v, w}. This mearly means that the vertices have a connection between them and there is no direction necessary for the edge. Thus the edge (v, w) is same as edge (w, v). Friendship in social networking site like Facebook is an example of undirected graph. If Tom is friend of Jerry then Jerry is also a friend of Tom.
- Directed graphs on other hand have edges directed from node v to w. In this case since the edge originates from v, it is the tail and since it goes to w, w is the head of the edge. The edges (v, w) is not same as (w, v) abd infact one of them may not even exist. Twitter followers can be represented by directed edges. If Tom follows Jerry on twitter, Jerry may or may not follow Tom. Twitter following is not mutual unlike friends on Facebook.

Some examples of graphs are
- Road networks: Though we may represent this using undirected graphs for simplicity its more apt to represent road networks using directed graphs. Roads generally are two way in which case we will have both edges (v, w) and (w, v) but that may not always be true in cases we have one way streets.
- World wide web: This is an example of directed graph where the page containing hyperlink to another page is the tail of the directed graph and the page it references being the head.
- Precedence constraints: We use this to form dependency trees where to preform a task A, we can find what are the prerequisites.

**Notation of a graph**<br>
A graph G is represented using its Vertices and Edges as

G = (V, E)

where

n = $\vert V \vert$, number of vertices and

m = $\vert E \vert$, number of edges

**Quiz 7.1**<br>

Lets consider an undirected graph G with n vertices with no parallel edges (no two vertices have two edges connecting them). What are the minimum and maximum number of edges.

To start with minimum number of edges, a graph will stay as one graph if we can traverse from one vertex to another. Its not too difficult to see that we will need n - 1. That is to connect from a vertex $v_1$, we will have n - 1 edges to remaining n - 1 vertices $n_2 .. n_n$. Kind of like a hub and spoke where one verted is the hub and other vertices form the end of the spoke.

Now for maximum edges, we will draw from first vertex, n-1 edges to remaining edges. Then from second vertex have to draw n - 2 edges to remaining vertices. Why n - 2? Its because there already is an edge from first vertex to the second and we dont have parallel edges. Continuing this way, we will have (n - 1) + (n - 1) + (n - 3) + ... + 1 + 0 edges which is same as $\frac{n(n-1)}{2}$

***

**Sparse and Dense graphs**<br>
We see above that the number of vertices m = $\Omega{(n)}$ and m = $\theta{(n^2)}$. This the number of edges range from linear to quadratic. Though there is no strict cutoff for considering a graph sparse or dense, we can call a graph sparse if the number of edges is close to linear or dense if it is close to quadratic.

Thus something like $\theta{(nlogn)}$ is considered sparse whare as $\theta{(\frac{n^2}{log n})}$ is dense. Subquadratic like $n^{1.5}$ is either sparse or dense depending on the application

**Graph Representation**<br>
We will look at a couple of ways to represent graphs

***

**Adjacency List**<br>
This is form of representation where we store the vertices in a one list and one list will store the list of edges. Each edge will have a two pointers to the vertex connecting the two vertices.

For directed graphs, we will store two lists for each vertex, one list for incoming and one list for outgoing edges.

For all quiz and test your understanding questions, refer to the book as i dont want to type out the question text here.

**Quiz 7.2**<br>


We can see above, we have two lists one has a size same as the number of vertices(n) and one has size same as number of edges(m). Thus the adjacency list will have a space requirement of $O(m + n)$

***

**Adjacency Matrix**<br>
Consider a graph G = (V, E). We have n vertices and m edges. Each of the m edges will have both their endpoints as two vertices from n possible vertices.

$
A_{ij} = 
\begin{cases}
1 \text{ if edge (i,j) belongs to G}\\
0 \text{ otherwise}
\end{cases}
$

The adjacency matrix for the graph with following edges

$
1 \leftrightarrow 2 \\
2 \leftrightarrow 3 \\
2 \leftrightarrow 4 \\
3 \leftrightarrow 4 \\
$

becomes

$
\begin{bmatrix}
0 & 1 & 0 & 0\\
1 & 0 & 1 & 1\\
0 & 1 & 0 & 1\\
0 & 1 & 1 & 0\\
\end{bmatrix}
$

The rows represents one end of the edge and other end is represented as the column. Notice how the matrix is symmetric across the diagonal in case of undirected graph.

The adjacency matrix can also easily be used for directed graphs.

***

**Quiz 7.3**<br>

The adjacency matrix will require a matrix of size $n^2$ to store the edges and this the space requirement for the matrix is $O(n^2)$

***

Generally, if the graph is dense or number of vertices is not huge (for example the pages on the web where one page is one vertex), we may choose Adjacency Matrix. However, for normal graph traversal, if we were to find from a verted all adjacent vertex, Adjacency list is better as for matrix the operation will be of order $O(n)$ which will involve going through the entire row of 0s and 1s to find all vertices adjacent/connect to the current vertex. In adjacency list however, getting the adjacent vertices of a vertex is a constant time operation. Also for sparse graphs Adjacency List is the way to go.

We will also build an Adjacency List implementation in Python which we will use in other notebooks.

***

**Test Your Understanding**<br>

***

**Problem 7.1**<br>

- At least one Vertex of G has at most degree 10.
    - For a Graph with 10000+ vertices, even if all vertices have a degree of 10 (which is a stronger statement than saying at least one vertex has degree 10 ). The number of edges is a small constant (< 10) times the number of vertices and thus graph is still sparse.
- Every vertex of G has at most 10 degree.
    - Similar to above, saying all vertices have a degree 10 is much stronger assumption and the graph is still sparse in this case.
- At least one vertex of G has degree n - 1
    - This can either be sparse or dense. At the very minimum, only one vertex connects to all other n - 1 vertices and other n - 1 vertices have degree 1 (connecting to that vertex with degree n - 1). This makes the graph sparse (with minimum number of edges to form the graph). However, the important fact here is it says At least which means all vertices can have degree n - 1 in the extreme case which makes it as dense as it can, connecting each vertex to all other vertices.
- Every verted has degree n- 1
    - This one is easy, this is a dense graph. Explanation same as the explanation for the worst case in above scenario
    
***
    
**Problem 7.2**<br>

For a graph stored as Adjacency Matrix, the complexity to find adjacent edges is of order of n (number of vertices). To see why, consider the matrix which has size $n \times n$. When we are given a vertex v, we will slice a row corresponding to vertex v from this Matrix. This row (vector) of 0s and 1s will be of size n and the operation to find adjacent vertices will require us to iterate though it to find all entries with value 1. Thus the complexity of the operation to find adjacent vertices is $O(n)$

***

**Problem 7.3**<br>

If a directed Graph stores only outgoing edges (the edge where the vertex is the tail of the edge) with each vertex, then to find all incoming vertices, we need to find the vertex v in the outgoing edges of other n - 1 vertices. The choice of data structure here is very important for storing the outgoing edges. If the set, like a Hash Set is used, we can expect the contains operation for the vertex in each set to have a constant complexity ($O(1)$). Combining both, the iteration though all n - 1 vertices to search for the vertex in their outgoing edges and the constant time operation to check the existence of the vertex in each of the outgoing edges of a vertex, we can expect the overall operation to have a complexity $O(n)$

***

We will now look at a simple Adjacency List implementation in Python which we will use for storing the graph even om subsequent chapters.

In [346]:
from copy import deepcopy

class Graph_UndirectedAdjList:
    def __init__(self):
        self._vertices = {}
        self._edges = []
        
    def addEdge(self, v1, v2):
        #Adds edge between two edges v1 an v2
        if not [v1, v2] in self._edges and  not [v2, v1] in self._edges:
            self._edges.append([v1, v2])
            if v1 in self._vertices:
                self._vertices[v1].append(v2)
            else:
                self._vertices[v1] = [v2]  #adjacent vertex
            if v2 in self._vertices:
                self._vertices[v2].append(v1)
            else:
                self._vertices[v2] = [v1]  #adjacent vertex
            
    def adjacentEdges(self, vertex):
        return self._vertices[vertex] if vertex in self._vertices else None
    
    def vertices(self):
        return self._vertices.keys()
    
    def edges(self):
        return self._edges
    
    def addMultipleEdges(self, vertex, adjacentvertices):
        for a_vertices in adjacentvertices:
            self.addEdge(vertex, a_vertices)
        
    def contractEdge(self, edge):
        # remove edge
        if edge in self._edges:
            self._edges.remove(edge)
        # change all vertices of edges to contracted vertex
        for i in range(len(self._edges)):
            for j in range(len(self._edges[i])):
                for k in range(len(edge)):
                    #print(self._edges[i][j])
                    if self._edges[i][j] == edge[k]:
                        if isinstance(edge[0], int) and isinstance(edge[1], int):
                            self._edges[i][j] = [edge[0], edge[1]]
                        elif isinstance(edge[0], list) and not isinstance(edge[1], list) :
                            newedge = deepcopy(edge[0])
                            newedge.append(edge[1])
                            self._edges[i][j] = newedge
                        elif isinstance(edge[1], list) and not isinstance(edge[0], list):
                            newedge = deepcopy(edge[1])
                            newedge.append(edge[0])
                            self._edges[i][j] = newedge
                        else:
                            self._edges[i][j] = list(edge[0]) + list(edge[1])
        # contract vertices
        for i in self._vertices.keys():
            #print("i: {}".format(i))
            newedge0 = edge[0]
            newedge1 = edge[1]
            e0 = edge[0]
            e1 = edge[1]
            newval0 = []
            newval1 = []
            if isinstance(i, tuple):
                if isinstance(edge[0], int):
                    e0 = (edge[0], ) 
                if isinstance(edge[1], int):
                    e1 = (edge[1], )
                newedge0 = tuple(e0)
                newedge1 = tuple(e1)
            #print("newedge0 : {}  newedge1: {}".format(newedge0, newedge1))
            if i == newedge0:
                newval0 = self._vertices[i]
                #print("newval0: {}".format(newval0))
                self._vertices.pop(i)
            if i == newedge1:
                newval1 = self._vertices[i]
                #print("newval1: {}".format(newval1))
                self._vertices.pop(i)
        if isinstance(edge[0], int):
            e0 = (edge[0], ) 
        if isinstance(edge[1], int):
            e1 = (edge[1], )
        self._vertices[tuple(e0) + tuple(e1)] = newval0 + newval1
        # remove self-loop if any
        for i in self._edges:
            if i[0] == i[1]:
                self._edges.remove(i)
            

In [347]:
graph = Graph_UndirectedAdjList()
#graph.addEdge(1, 2)
#graph.addEdge(3, 4)
#graph.addEdge(5, 2)
graph.addMultipleEdges(6, [7, 8, 9, 2, 5, 3])
graph.addMultipleEdges(7, [5, 6, 9, 4, 3])
print("Edges: {}".format(graph.edges()))
print("Vertices: {}".format(graph.vertices()))
print("Adjacent Edges: {}".format(graph.adjacentEdges(2)))
print("\nAfter Contraction 1:")
graph.contractEdge([6, 9])
print("Edges: {}".format(graph.edges()))
print("Vertices: {}".format(graph.vertices()))
print("\nAfter Contraction 2:")
graph.contractEdge([[6, 9], 7])
print("Edges: {}".format(graph.edges()))
print("Vertices: {}".format(graph.vertices()))

Edges: [[6, 7], [6, 8], [6, 9], [6, 2], [6, 5], [6, 3], [7, 5], [7, 9], [7, 4], [7, 3]]
Vertices: [2, 3, 4, 5, 6, 7, 8, 9]
Adjacent Edges: [6]

After Contraction 1:
Edges: [[[6, 9], 7], [[6, 9], 8], [[6, 9], 2], [[6, 9], 5], [[6, 9], 3], [7, 5], [7, [6, 9]], [7, 4], [7, 3]]
Vertices: [2, 3, 4, 5, 7, 8, (6, 9)]

After Contraction 2:
Edges: [[[6, 9, 7], 8], [[6, 9, 7], 2], [[6, 9, 7], 5], [[6, 9, 7], 3], [[6, 9, 7], 5], [[6, 9, 7], 4], [[6, 9, 7], 3]]
Vertices: [2, 3, 4, 5, 8, (6, 9, 7)]


In [348]:
graph = Graph_UndirectedAdjList()
#graph.addEdge(1, 2)
#graph.addEdge(3, 4)
#graph.addEdge(5, 2)
graph.addMultipleEdges(6, [7, 8, 9, 2, 5, 3])
graph.addMultipleEdges(7, [5, 6, 9, 4, 3])
print("Edges: {}".format(graph.edges()))
print("Vertices: {}".format(graph.vertices()))
print("Adjacent Edges: {}".format(graph.adjacentEdges(2)))
print("\nAfter Contraction 1:")
graph.contractEdge([6, 9])
print("Edges: {}".format(graph.edges()))
print("Vertices: {}".format(graph.vertices()))
print("\nAfter Contraction 2:")
graph.contractEdge([7, 5])
print("Edges: {}".format(graph.edges()))
print("Vertices: {}".format(graph.vertices()))
print("\nAfter Contraction 3:")
graph.contractEdge([[6, 9], [7, 5]])
print("Edges: {}".format(graph.edges()))
print("Vertices: {}".format(graph.vertices()))

Edges: [[6, 7], [6, 8], [6, 9], [6, 2], [6, 5], [6, 3], [7, 5], [7, 9], [7, 4], [7, 3]]
Vertices: [2, 3, 4, 5, 6, 7, 8, 9]
Adjacent Edges: [6]

After Contraction 1:
Edges: [[[6, 9], 7], [[6, 9], 8], [[6, 9], 2], [[6, 9], 5], [[6, 9], 3], [7, 5], [7, [6, 9]], [7, 4], [7, 3]]
Vertices: [2, 3, 4, 5, 7, 8, (6, 9)]

After Contraction 2:
Edges: [[[6, 9], [7, 5]], [[6, 9], 8], [[6, 9], 2], [[6, 9], [7, 5]], [[6, 9], 3], [[7, 5], [6, 9]], [[7, 5], 4], [[7, 5], 3]]
Vertices: [2, 3, 4, 8, (6, 9), (7, 5)]

After Contraction 3:
Edges: [[[6, 9, 7, 5], 8], [[6, 9, 7, 5], 2], [[6, 9, 7, 5], 3], [[6, 9, 7, 5], 4], [[6, 9, 7, 5], 3]]
Vertices: [2, 3, 4, 8, (6, 9, 7, 5)]


What we see above is a simple implementation of an adjacency list to represent the graph. With operations to

- Add edges
- Get list of vertices
- Get list of edges
- Get adjacent vertices given a vertex in the graph
- Contract Edge 

In [179]:
def RandomContractionAlgorithm(objectGraph):
    objGraph = deepcopy(objectGraph)
    #while 
    

graph = Graph_UndirectedAdjList()
graph.addMultipleEdges(6, [7, 8, 9, 2, 5, 3])
graph.addMultipleEdges(7, [5, 6, 9, 4, 3])
RandomContractionAlgorithm(graph)



[(7, 3), (6, 9), (6, 3), (6, 7), (6, 8), (7, 4), (7, 5), (6, 2), (6, 5), (7, 9)]
[(7, 4), (6, 9), (6, 7), (6, 8), (7, 3), (7, 5), (6, 3), (6, 2), (6, 5), (7, 9)]


In [338]:
#titbit
print("##list")
x = [1, 2]
y = [3, 4]
z = 5
w = [6, 7]
x= x+y
print(x)
if isinstance(z, int):
    x.append(z)
if isinstance(w, int):
    x.append(w)
print(x)
u = [2, 3]
x.append(u)
print(x)
x.remove(u)
print(x)

x.append(u)
print(type(u))
x.append([3,4])
x.append([4,5])
print(x)
for i in range(len(x)):
    if isinstance(x[i], int) and x[i] == 2:
            x[i] = [7, 9]
    if isinstance(x[i], list):
        for j in range(len(x[i])):
            if x[i][j] == 2:
                x[i][j] = [7, 9] 
print(x)
x = []
x.append([[3,4], 7])
x.append([[3,4], 7])
print(x)
x.remove([[3,4], 7])
print(x)
print(list([1,2,3]))
print([[1,2,3]])
print([[1]])



print("##dict")
thisdict ={
  "brand": "Ford",
  "model": "Mustang",
  "year": 1964
}
print(thisdict)
thisdict["year"] = 2018
print(thisdict)
x = {2:[3,4,5], 4:[5,6,7]}
print(x)
print(x[2])
x[2].append(3)
print(x)
for i in x.keys():
    x.pop(i)
    if(i == 2):
        print("2 found")
    if(i == 3):
        print("3 not found")
print(x)
x = {2:[3,4,5], 4:[5,6,7]}
print(x)
a = x[2]
b = x[4]
c = a+b
print(c)
x.pop(2)
x.pop(4)
x[(2,4)] = c
print(x)

print("##set")
x = set([1, 2])
u = (2, 3)
for i in y:
    x.add(i) 
print(x)
if isinstance(z, int):
    x.add(z)
if isinstance(w, list):
    for i in w:
        x.add(i)
print(x)
x.add(u)
print(x)
x.remove(u)
print(x)
print(" 'set' object does not support indexing")

print("##tuple")
x = 2
y = [3, 4]
print((x, ))
print(tuple(y))
print(tuple(y) + (x, ))

print(tuple((x, )))
print(tuple(tuple(y)))
y = tuple(y)
print(y)
y = tuple(y)
print(y)



##list
[1, 2, 3, 4]
[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5, [2, 3]]
[1, 2, 3, 4, 5]
<type 'list'>
[1, 2, 3, 4, 5, [2, 3], [3, 4], [4, 5]]
[1, [7, 9], 3, 4, 5, [[7, 9], 3], [3, 4], [4, 5]]
[[[3, 4], 7], [[3, 4], 7]]
[[[3, 4], 7]]
[1, 2, 3]
[[1, 2, 3]]
[[1]]
##dict
{'brand': 'Ford', 'model': 'Mustang', 'year': 1964}
{'brand': 'Ford', 'model': 'Mustang', 'year': 2018}
{2: [3, 4, 5], 4: [5, 6, 7]}
[3, 4, 5]
{2: [3, 4, 5, 3], 4: [5, 6, 7]}
2 found
{}
{2: [3, 4, 5], 4: [5, 6, 7]}
[3, 4, 5, 5, 6, 7]
{(2, 4): [3, 4, 5, 5, 6, 7]}
##set
set([1, 2, 3, 4])
set([1, 2, 3, 4, 5, 6, 7])
set([1, 2, 3, 4, 5, 6, 7, (2, 3)])
set([1, 2, 3, 4, 5, 6, 7])
 'set' object does not support indexing
##tuple
(2,)
(3, 4)
(3, 4, 2)
(2,)
(3, 4)
(3, 4)
(3, 4)


In [339]:
#titbit
print([1, 2])
x = [3, 4]
x.append(1)
print(x+[5,6])
x.append([5, 6])
print(x)

[1, 2]
[3, 4, 1, 5, 6]
[3, 4, 1, [5, 6]]
