# CH13 Sorting

In [2]:
# Some important notes:
# A number of sorting algorithms run in O(nlogn) time-heapsort, merge sort, and quicksort are examples. Each has its advantages and disadvantages:
# for example, heapsort is in-place but not stable; merge sort is stable but not in-place; quicksort runs O(n2) time in worst-case. 
# (An in-place sort is one which uses 0(1) space; a stable sort is one where entries which are equal appear in their original order.)
# A well implemented quick sort is usually the best choice for sorting.
# How to decide which sorting algorithm to use given a particular situation:
# - For short arrays, e.g., 10 or fewer elements, insertion sort is easier to code and faster than asymptotically superior sorting algorithms. 
# - If every element is known to be at most k places from its final location, a min-heap can be used to get an O(nlogk) algorithm (Solution 10.3 on Page 137).
# - If there are a small number of distinct keys, e.g., integers in the range [0..255], counting sort, which records for each element, the number of elements less than it, works well. 
# -- This count can be kept in an array (if the largest number is comparable in value to the size of the set being sorted) or a BST where the keys are the numbers and the values are their frequencies. 
# -- If there are many duplicate keys we can add the keys to a BST, with linked lists for elements which have the same key; the sorted result can be derived from an in-order traversal of the BST.
# Sorting problems occur in the following two types:
# - use sorting to make subsequent steps simpler => use a library sort function, possibly with a custom comparator
# - design a custom sorting routine => use a data structure like a BST, heap or array indexed by values.
# The following are the properties of Python sorting libraries:
# - to sort list in-place use the sort() method
# -- it returns None, the calling list itself is updated
# -- Take two optional input paramets: 
# --- key which is a func which takes list elements and maps them to objects which are comparable. eg: a[1, 2, 4, 5, 0] a.sort(key=lambda x:str(x)) 
# --- reverse = True => descending order else ascending order(default)
# - to sort an iterable, use the sorted().
# -- leaves input unchanged and optional parameters same as sort()

In [9]:
# Sort students by their names
class Student(object):
    def __init__(self, name, grade_point_average):
        self.name = name
        self.grade_point_average = grade_point_average
    
    def __lt__(self, other):
        return self.name < other.name

def print_students_list(students):
    for student in students:
        print(f'({student.name}, {student.grade_point_average})', end=" ")
        
students = [Student('A', 4.0), Student('C', 3.0), Student('B', 2.0), Student('D', 3.2)]

# sort students by their name - using sorted() func, the input array remains unchanged
students_sorted_by_name = sorted(students)
print(f'Students sorted by name:')
print_students_list(students_sorted_by_name)

# Sort studnets by GPA in-place => the input array will be modified
students.sort(key = lambda student: student.grade_point_average)
print(f'Students sorted by GPA:')
print_students_list(students)

# Most library sorts are based on quick sort which has O(1) space complexity and O(nlogn) time complexity.

Students sorted by name:
(A, 4.0) (B, 2.0) (C, 3.0) (D, 3.2) Students sorted by GPA:
(B, 2.0) (C, 3.0) (D, 3.2) (A, 4.0) 

## 13.1 Compute the intersection of two sorted arrays

In [13]:
# Task: Write a program which takes as input two sorted arrays, and returns a new array containing elements that are present in both of the input arrays.
# The input arrays may have duplicate entries, but the returned array should be free of duplicates. 
# For example, the input is <2,3,3,5,5,6,7,7,8,12> and (5,5,6,8 ,8,9,70,10), your output should be (5, 6, 8).

# Brute Force: Iterate over one array by comparing each element of other element Time Complexity:O(MN)
# Optimized: The arrays are sorted so compare one element if it is smaller mover to next, if both are equal add to output set 

# Time Complexity: O(M+N)
def intersect_two_sorted_arrays(A, B):
    i, j, intersection_A_B = 0, 0, []
    while i < len(A) and j < len(B):
        #print(i,j)
        if A[i] < B[j]:
            i += 1
        elif A[i] == B[j]:
            if i == 0 or A[i] != A[i-1]:
                intersection_A_B.append(A[i])
            i, j = i + 1, j + 1
        else:
            j += 1
    return intersection_A_B

A = [2, 3, 3, 5, 5, 6, 7, 7, 8, 12]
B = [5, 5, 6, 8, 8, 9, 10, 10]
print(f'Intersion of sorted list: {intersect_two_sorted_arrays(A, B)}')

Intersion of sorted list: [5, 6, 8]


## 13.2 Merge two sorted arrays

In [3]:
# Write a Program which takes as input two sorted arrays of integers, and updates the first to the
# combined entries of the two arrays in sorted order.Assume the first array has enough empty entries
# at its end to hold the result.

# Brute Force:If we start filling from the beginning, then we might have to shift the first array multiple times. Time Complexity:O(mn)
# Approach: Instead, we can start filling from the end. The last element will be at position m+n-1. Traverse both arrays from behind and fill them.
# Time Complexity: O(m+n) Space Complexity:O(1) => space used by pointers to track indices
def merge_two_sorted_arrays(A, m, B, n):
    a, b, write_idx = m - 1, n - 1, m + n - 1
    while a>=0 and b>=0:
        if B[b] > A[a]:
            A[write_idx] = B[b]
            b -= 1
        else:
            A[write_idx] = A[a]
            a -= 1
        write_idx -= 1
    while b >= 0: # B array still has some entries unprocessed
        A[write_idx] = B[b]
        write_idx -= 1
        b -= 1

A = [5, 13, 17, 0, 0, 0, 0, 0]
B = [3,7,11,19]
merge_two_sorted_arrays(A, 3, B, 4)
print(f'The merged sorted list is {A}')

The merged sorted list is [3, 5, 7, 11, 13, 17, 19, 0]


## 13.3 Remove first-name duplicates

In [6]:
# Design an efficient algorithm for removing all first-name duplicates from an array. For example,
# if the input is ((Ian,Botham),(David,Gower),(Ian,Bell),(Ian,Chappell)), one result could be
# ((Ian, Bell), (David, Gower)); ((David, Gower), (Ian, Botham)) would also be acceptable.

# Brute Force: Use a hash table and if first-name already present in hash table remove that entry from the array. Time Complexity: O(n) Space compelxity:O(n)
# Another approch: Sort the array based on first names, if first_name[i] == first_name[i-1] remove i. 
# Time Complexity: O(nlogn) for sorting + O(n) to remove duplicates Space Complexity:O(1)
class Name:
    def __init__(self, first_name, last_name):
        self.first_name, self.last_name = first_name, last_name
    
    def __eq__(self, other):
        return self.first_name == other.first_name
    
    def __lt__(self, other):
        return (self.first_name < other.first_name if self.first_name != other.first_name else self.last_name < other.last_name)

def print_array(A):
    for cand in A:
        print(f'({cand.first_name},{cand.last_name})', end = " ")
    print("\n")
    
def eliminate_duplicates(A):
    A.sort()
    print('Array after sorting:')
    print_array(A)
    write_idx = 1
    for cand in A[1:]:
        if cand != A[write_idx - 1]: # the eq function we wrote in Name class checks the first names of both elements and returns true or false accordingly
            A[write_idx] = cand
            write_idx += 1
    del A[write_idx:] # we have rewritten all the dup entries with distinct entries - so all the entries from write_idx till end are not requried

A = []
A.append(Name('Ian', 'Botham'))
A.append(Name('David', 'Gower'))
A.append(Name('Ian', 'Bell'))
A.append(Name('Ian', 'Chappell'))
print('Input Array:')
print_array(A)
eliminate_duplicates(A)
print('Array after removing duplicates:')
print_array(A)

Input Array:
(Ian,Botham) (David,Gower) (Ian,Bell) (Ian,Chappell) 

Array after sorting:
(David,Gower) (Ian,Bell) (Ian,Botham) (Ian,Chappell) 

Array after removing duplicates:
(David,Gower) (Ian,Bell) 



## 13.4 Smallest nonconstructible value

In [12]:
# Write a program which takes an array of positive integers and returns the smallest number which is not to the sum of a subset of elements of the array.
# For example: if your coins are <1,1,1,1,1,5,10,25> then the smallest value of change which cannot be made is 21.

# Brute Force: Get all the subsets -> compute sums -> sort them and iterate through from 1 till max of sums to find the smallest values that cannot be computed.
# Time Complexity: Exponential O(2^n)

# Approach: Suppose a collection of elements can product all values from 1 to V but not V+1, then adding a new element u to the collection can have two possibilities:
# - if u <= (V+1), then we can produce values upto V+u but cannot produce V+u+1
# - if u > (V+1), then even after including u to the collection we cannot product V+1 so break and return V+1
# Time Complexity: O(nlong) to sort the input array + O(n) to iterate 
def smallest_nonconstructible_value(A):
    max_constructible_value = 0
    for a in sorted(A):
        if a > max_constructible_value + 1:
            break
        max_constructible_value += a
    return max_constructible_value + 1

A = [2,3,4]
print(f'Smallest nonconstructible value for input array {A} is {smallest_nonconstructible_value(A)}')
A = [1,2]
print(f'Smallest nonconstructible value for input array {A} is {smallest_nonconstructible_value(A)}')
A = [1,3]
print(f'Smallest nonconstructible value for input array {A} is {smallest_nonconstructible_value(A)}')
A = [1,2,4]
print(f'Smallest nonconstructible value for input array {A} is {smallest_nonconstructible_value(A)}')
A = [1,2,5]
print(f'Smallest nonconstructible value for input array {A} is {smallest_nonconstructible_value(A)}')

Smallest nonconstructible value for input array [2, 3, 4] is 1
Smallest nonconstructible value for input array [1, 2] is 4
Smallest nonconstructible value for input array [1, 3] is 2
Smallest nonconstructible value for input array [1, 2, 4] is 8
Smallest nonconstructible value for input array [1, 2, 5] is 4


## 13.5 Render a calendar

In [16]:
# Write a program that takes a set of events, and determines the maximum number of events that take place concurrently.
# X axis denotes time and each each event has a start and end time. Ref fig:13.1

# Brute Force: There are two points for an interval:start and end=>For n events, we have 2n points. 
# For each point, find out the number of intervals it is contained in=>O(n). Max of this gives us the maximum number of concurrent events.
# Time Complexity:O(2n*n) = O(n^2)

# Optimized Approach: Sort the set of all points in ascending order(If two points have same time and if one is start and other is end, then start comes first).
# Then keep a counter, increment counter for every start point and decrement it for every end point => the max value attained by counter is the max number of concurrent events.
# Time Complexity:O(nlogn) for sorting + O(n) to find max value of counter Space Complexity:O(n) for point array
import collections
Event = collections.namedtuple('Event',('start','finish'))
Endpoint = collections.namedtuple('Endpoint',('time','is_start'))

def find_max_simultaneous_events(A):
    E = ([Endpoint(event.start, True) for event in A] + [Endpoint(event.finish, False) for event in A]) # creating endpoints array from input events array
    # sort the endpoint array by breaking ties by putting start time before end time
    E.sort(key = lambda e:(e.time, not e.is_start))
    
    max_num_simultaneous_events, num_simultaneous_events = 0, 0
    for e in E:
        if e.is_start:
            num_simultaneous_events += 1
            max_num_simultaneous_events = max(max_num_simultaneous_events, num_simultaneous_events)
        else:
            num_simultaneous_events -= 1 # this is reqd because otherwise we will keep on adding events even though they got completed.
    return max_num_simultaneous_events

A = []
A.append(Event(1,5))
A.append(Event(6,10))
A.append(Event(11,13))
A.append(Event(14,15))
A.append(Event(2,7))
A.append(Event(8,9))
A.append(Event(12,15))
A.append(Event(4,5))
A.append(Event(9,17))
print(f'Max num of simultaneous events = {find_max_simultaneous_events(A)}')

Max num of simultaneous events = 3


In [17]:
# variant: Users 1,2,. . .,n share an Intemet connection. User I uses b_{i} bandwidth from time si to fi, inclusive. what is the peak bandwidth usage?
# use the same method with counter incrementing itself by b_{i} for each start point. Max of the counter gives the peak bandwidth usage. Decrement counter by b_{i} on each finish point

## 13.6 Meging intervals

In [25]:
# Write a program which takes as input an array of disjoint closed intervals with integer endpoints,
# sorted by increasing order of left endpoint, and an interval to be added, and returns the union of
# the intervals in the array and the added interval. Your result should be expressed as a union of
# disjoint intervals sorted by left endpoint.
# For example: if the initial set of intervals is [-4,-1],[0,2],[3,6],[7,9],[11,12],[14,17], and the added interval is [1,8], the result is
# [-4,-1], [0,9], [11,12], [14,17].

# Brute Force: Find smallest left endpoint and largest right endpoint. Then test every integer between these tow values for membership in an interval.
# Time Complexity: O(Dn) where D is the difference between the two extreme values and n i the number of intervals. 

# Optimized Approach: When a new interval comes in, then three cases are possible:
# - all the intervals before this interval can be directly added to the result
# - find the union for the intervals with which the new interval intersects
# - all the intervals after this interval can be directly added to the result
# Time Compelxity: O(N)
Interval = collections.namedtuple('Interval',('left', 'right'))
def add_interval(disjoint_intervals, new_interval):
    i, result = 0, []
    # case 1: add all intervals before new interval to the result
    while((i < len(disjoint_intervals)) and (new_interval.left > disjoint_intervals[i].right)):
        #result.append(disjoint_intervals[i])
        i += 1
    interval_idx_before_new = i
    
    # case 2: Finding union of new interval with intersecting intervals
    while((i < len(disjoint_intervals)) and (new_interval.right >= disjoint_intervals[i].left)):
        # union of (a,b) and (c,d) = [min(a,c), max(b,d)]
        new_interval = Interval(min(new_interval.left, disjoint_intervals[i].left), max(new_interval.right, disjoint_intervals[i].right))
        i += 1
    
    return disjoint_intervals[:interval_idx_before_new] + [new_interval] + disjoint_intervals[i:]

def print_intervals(A):
    for cand in A:
        print(f'[{cand.left},{cand.right}]', end=" ")
    print("\n")

A = []
A.append(Interval(-4, -1))
A.append(Interval(0, 2))
A.append(Interval(3, 6))
A.append(Interval(7, 9))
A.append(Interval(11, 12))
A.append(Interval(14, 17))
new_interval = Interval(1,8)

result = add_interval(A, new_interval)
print('Disjoint Intervals Result:')
print_intervals(result)

Disjoint Intervals Result:
[-4,-1] [0,9] [11,12] [14,17] 



## 13.7 Compute the union of intervals

In [35]:
# Design an algorithm that takes as input a set of intervals, and outputs their union expressed as a set of disjoint intervals.
# The intervals can be open or closed at either end points. Ref Fig 13.2

# Brute Force: Consider every number from min left to max right. Time Complecity:O(DN)
# Another Approach: Take a interval randomly, find all the intervals it intersects with => computer their uniton and place it in the result
# If the  selected interval, does not intersect with any other interval then add it to the result directly. Time Complexity:O(N^2)

# Approach using Sort: Sort the intervals on their left endpoints => break the ties by first placing left closed intervals if two intervals have same left endpoint.
# As we iterate through the sorted array of intervals, we get the following cases:
# - If the curr interval intersects with the most recently added interval in the result, find union and update the result
# - If the curr interval does not intersect with the most recently added interval in the result, add curr interval to the result
# Time Complexity: O(nlogn)
Endpoint = collections.namedtuple('Endpoint', ('is_closed', 'val'))
class Interval:
    def __init__(self, left, right):
        self.left = left
        self.right = right
    
    def __lt__(self, other):
        if self.left.val != other.left.val:
            return self.left.val < other.left.val
        return self.left.is_closed and not other.left.is_closed # self < other if self.left.is_closed == True and other.left.is_closed == False

def union_of_intervals(intervals):
    if not intervals:
        return []
    
    intervals.sort() # sort intervals according to left endpoints of intervals
    result = [intervals[0]]
    for i in intervals:
        if intervals and ((i.left.val < result[-1].right.val) or 
                          ((i.left.val == result[-1].right.val) and (i.left.is_closed or result[-1].right.is_closed))):
            if(i.right.val > result[-1].right.val or (i.right.val == result[-1].right.val and i.right.is_closed)):
                #print_intervals(result)
                result[-1].right = i.right
        else:
            result.append(i)
            #print(f'else:{print_intervals(result)}')
    return result

def print_intervals(intervals):
    for i in intervals:
        interval_str = '[' if i.left.is_closed else '('
        interval_str += str(i.left.val) + ',' + str(i.right.val)
        interval_str += ']' if i.right.is_closed else ')'
        print(interval_str, end = " ")
    print("\n")

A = []
A.append(Interval(Endpoint(False,0), Endpoint(True, 4)))
A.append(Interval(Endpoint(True,5), Endpoint(True, 11)))
A.append(Interval(Endpoint(True,12), Endpoint(False, 17)))
A.append(Interval(Endpoint(True,1), Endpoint(True, 1)))
A.append(Interval(Endpoint(True,3), Endpoint(False, 4)))
A.append(Interval(Endpoint(True,7), Endpoint(False, 8)))
A.append(Interval(Endpoint(False,12), Endpoint(True, 16)))
A.append(Interval(Endpoint(True,2), Endpoint(True, 4)))
A.append(Interval(Endpoint(True,8), Endpoint(False, 11)))
A.append(Interval(Endpoint(False,13), Endpoint(False, 15)))
A.append(Interval(Endpoint(False,16), Endpoint(False, 17)))
result = union_of_intervals(A)
print('Output:')
print_intervals(result)

Output:
(0,4] [5,11] [12,17) 



## 13.8 Partitioning and sorting an array with many repeated entries

In [None]:
# You are given an array of student objects. Each student has an integer-valued age field that is to be
# treated as a key. Rearrange the elements of the array so that students of equal age appear together.
# The order in which diffurent ages appear is not important. How would your solution change if ages
# have to appear in sorted order?

# Brute Force: Sort the array based on age. Time complexity:O(nlogn) if array length is n.
# Another approach: Create a hash table with age as key and value as the list of students with this age.
# Incomplete
Person = collections.namedtuple('Person', ('age', 'name'))
def group_by_age(people):
    age_to_count = collections.Counter([person.age for person in people])
    age_to_offset, offset = {}, 0
    for age, count in age_to_count.items{}:
        age_to_offset[age] = offset
        offset += count
    
    while age_to_offset:
        from_age = next(iter(age_to_offset))
        form_idx = age_to_offset[from_age]
        to_age = people[from_idx].age
        to_idx = age_to_offset[people[from_idx].age]
        people[from_idx], people[to_idx] = people[to_idx], people[from_idx]
        age_to_count[to_age] -= 1
        if age_to_count[to_age]:
            age_to_offset[to_age] = to_idx + 1
        else:
            del age_to_offset[to_age]

## 13.9 Team Photo Day

In [None]:
# Design an algorithm that takes as input two teams and the heights of the players in the teams and
# checks if it is possible to place players to take the photo subject to the placement constraint.

## 13.10 Implement a fast sorting algorithm for Linked List

In [None]:
# Implement a routine which sorts lists efficiently. It should be a stable sort, i.e., the relative positions of equal elements must remain unchanged.

# Brute Force: repeatedly delete smallest element of the list and append it at the end of the list. 
# Time Complexity: O(n^2) Space Complexity: O(n)
# we can reorder the nodes in list instead of creating new ones # Incomplete
class ListNode:
    def __init__(self, data = 0, next_node = None):
        self.data = data
        self.next_node = next_node
        
def insertion_sort(L):
    dummy_head = ListNode(0, L)
    while L and L.next_node:
        if L.data > L.next_node.data:
            target, pre = L.next_node, dummy_head
            while pre.next_node.data < target.data:
                pre = pre.next
            temp, pre.next_node, L.next_node = pre.next_node, target, target.next_node
            target.next = temp
        else:
            L = L.next_node
    return dummy_head.next_node

## 13.11 Compute a salary threshold

In [41]:
# Design an algorithm for computing the salary cap, given existing salaries and the target payroll.
# Every employee who earned more than the cap last year will be paid the cap this year; employees who earned no more than the cap will see no change in their salary.
# For example: salaries last year were $90, $30, $100, $40, and $20, and the target payroll this year is $210, then 60 is a suitable salary cap, since 60+30 +60+40+20 = 210.

# Brute Force: Infinite possibilities for cap. The cap lies between 0 and maximum current salary.
# Analytic Approach: sort the salary array. Take each element of the array and compute target
# =>The payrolls for caps equal to the salaries in A<20,30,40,90,100> are <100,140,170,270,280>. The target is 210 so payroll lies between 40 and 90
# For a c value greater than 40, payroll will 20+30+40 + 2c = 210 => c = 60
# Time Complexity: O(nlogn) which is dominated by sort()
def find_salary_cap(target_payroll, current_salaries):
    current_salaries.sort()
    unadjusted_salary_sum = 0
    for i, current_salary in enumerate(current_salaries):
        cap = current_salary
        adjusted_people = len(current_salaries) - i # number of people who salary is going to be set to curr salary which is cap for this iterateion
        adjusted_people_salary = adjusted_people * cap
        if unadjusted_salary_sum + adjusted_people_salary >= target_payroll:
            return (target_payroll - unadjusted_salary_sum)/(adjusted_people)
        unadjusted_salary_sum += cap
    return -1 # No solution if target payroll > existing payroll

salary = [90, 30, 100, 40, 20]
target_payroll = 210
print(f'Cap is {find_salary_cap(target_payroll , salary)}')

Cap is 60.0
