## Tier 1. Module 3: Basic Algorithms and Data Structures

## Topic 8 - Piles

## Homework

### Task 1

There are several network cables of different lengths, they must be combined two at a time into one cable using connectors, in the order that will result in the lowest costs. The cost of connecting two cables is equal to the sum of their lengths, and the total cost is equal to the sum of connecting all cables.

The task is to find a joining order that minimizes the total costs.

In [2]:
import heapq


def connect_cables(cables: list) -> tuple[int, int]:
    total_cost = 0

    heapq.heapify(cables)  # build a minimum heap from the cables

    while len(cables) > 1:  # pull 2 shortest cables from the heap
        shortest_cable_1 = heapq.heappop(cables)
        shortest_cable_2 = heapq.heappop(cables)

        connected_length = shortest_cable_1 + shortest_cable_2  # and join them
        total_cost += connected_length

        heapq.heappush(cables, connected_length)  # put joined cable back to the heap

    final_cable = heapq.heappop(cables)  # the last remained cable in the heap

    return final_cable, total_cost


# testing
cable_lengths = [4, 2, 6, 8, 3]
final_cable, total_cost = connect_cables(cable_lengths)
print("Final joined cable length:", final_cable)
print("Total cost:", total_cost)

Final joined cable length: 23
Total cost: 51


### Task 2

Given `k` sorted lists of integers. Your task is to combine them into one sorted list. Now, when running a task, you must use a minimal heap to efficiently merge multiple sorted lists into a single sorted list. Implement a `merge_k_lists` function that takes as input a list of sorted lists and returns a sorted list.

#### 2.1 - Step-by-step implementation

In [35]:
import heapq


def merge_k_lists(lists: list[list, list]) -> list:
    heap = []

    # create a heap of first elements from each list along with their list indeces
    for i, lst in enumerate(lists):
        if lst:
            heapq.heappush(heap, (lst[0], i, 0))  # (value, list index, index in list)

    merged_list = []

    while heap:
        # pop the smallest element from the heap
        value, list_index, index_in_list = heapq.heappop(heap)

        merged_list.append(value)  # add the popped value to the merged list

        # move to the next element in the corresponding list
        if index_in_list + 1 < len(lists[list_index]):
            next_value = lists[list_index][index_in_list + 1]
            heapq.heappush(heap, (next_value, list_index, index_in_list + 1))

    return merged_list


# testing
lists = [[1, 4, 5], [1, 3, 4], [2, 6]]
merged_list = merge_k_lists(lists)
print("Sorted list:", merged_list)

Sorted list: [1, 1, 2, 3, 4, 4, 5, 6]


#### 2.2 - Straightforward implementation

In [36]:
import heapq


def merge_k_lists_alt(lists: list[list, list]) -> list:
    heap = lists[0]
    
    if len(lists) > 1:
        for lst in lists[1:]:
            for element in lst:
                heapq.heappush(heap, element)

    merged_list = []
    
    while heap:
        merged_list.append(heapq.heappop(heap))

    return merged_list


# testing
lists = [[1, 4, 5], [1, 3, 4], [2, 6]]
merged_list = merge_k_lists_alt(lists)
print("Sorted list:", merged_list)

Sorted list: [1, 1, 2, 3, 4, 4, 5, 6]


#### 2.3 - Merge sort algorithm

In [37]:
def merge(left: list, right: list) -> list:
    """
    A function to merge two separate lists into one sorted list

    :params left & right: two lists to be merged
    :return: merged and sorted list
    """
    merged = []
    left_idx, right_idx = 0, 0

    while left_idx < len(left) and right_idx < len(right):
        if left[left_idx] < right[right_idx]:
            merged.append(left[left_idx])
            left_idx += 1
        else:
            merged.append(right[right_idx])
            right_idx += 1

    merged.extend(left[left_idx:])
    merged.extend(right[right_idx:])

    return merged


def merge_k_lists_msa(lists: list[list, list]) -> list:
    """
    Function to merge sorted arrays (lists or tuples)

    :param lists: a list that includes other lists (or tuples) to
    be merged using an outer function
    :return: merged and sorted list
    """
    if not lists:
        return []

    while len(lists) > 1:
        merged = []
        for i in range(0, len(lists), 2):
            if i + 1 < len(lists):
                merged.append(merge(lists[i], lists[i + 1]))
            else:
                merged.append(lists[i])
        lists = merged

    return lists[0]


# testing
lists = [[1, 4, 5], [1, 3, 4], [2, 6]]
merged_list = merge_k_lists_msa(lists)
print("Sorted list:", merged_list)

Sorted list: [1, 1, 2, 3, 4, 4, 5, 6]


#### 2.4 - Testing conditions

In [29]:
import timeit
import random


def generate_10_sorted_lists(size: int) -> list:
    """
    Function to generate 10 random sotred lists

    :param size: size of the lests to be generated
    :return: list of sorted lists
    """
    result = []
    for _ in range(10):
        random_list = [random.randint(0, 1000000) for _ in range(size)]
        result.append(sorted(random_list))
    return result


def test_algorithms(algorithms: dict, data_sizes: list) -> dict:
    """
    Function for testing sorting algorithms on randomly generated lists of
    numbers

    :param algorithms: a dictionary, where the keys are algorithm names, and
    the values are functions or methods with an implemented sorting algorithm
    :param data_sizes: a list of integer values representing the length of
    random arrays to be generated for testing algorithms
    :return: a dictionary with algorithm names as keys and times spent sorting
    random arrays as values
    """
    results = {}
    for algo_name, algo_func in algorithms.items():
        results[algo_name] = {}
        for size in data_sizes:
            data = generate_10_sorted_lists(size)
            time_taken = timeit.timeit(lambda: algo_func(data.copy()), number=10)
            results[algo_name][size] = time_taken
    return results

#### 2.5 - Testing

In [34]:
algorithms = {
    "Step-by-Step Heap": merge_k_lists,
    "Straightforward Heap": merge_k_lists_alt,
    "Merge Sort": merge_k_lists_msa,
    "Timsort (Python built-in)": sorted,
}
data_sizes = [1000, 10000, 100000]

results = test_algorithms(algorithms, data_sizes)
for algo, timings in results.items():
    print(f"Algorithm: {algo}")
    for size, time_taken in timings.items():
        print(f"Data size: {size:<7} Time taken: {time_taken:.6f} seconds")
    print()

Algorithm: Step-by-Step Heap
Data size: 1000    Time taken: 0.051670 seconds
Data size: 10000   Time taken: 0.483353 seconds
Data size: 100000  Time taken: 5.695681 seconds

Algorithm: Straightforward Heap
Data size: 1000    Time taken: 0.024670 seconds
Data size: 10000   Time taken: 0.392984 seconds
Data size: 100000  Time taken: 8.390274 seconds

Algorithm: Merge Sort
Data size: 1000    Time taken: 0.058170 seconds
Data size: 10000   Time taken: 0.657850 seconds
Data size: 100000  Time taken: 10.604611 seconds

Algorithm: Timsort (Python built-in)
Data size: 1000    Time taken: 0.000015 seconds
Data size: 10000   Time taken: 0.000011 seconds
Data size: 100000  Time taken: 0.000016 seconds



#### Conclusion:

Python's built-in Timsort is the fastest way to concatenate sorted lists (primarily because it's implemented in Python in C). Characteristically, thanks to its optimizations, Timsort works equally fast regardless of the size of the sorted arrays.

Dumping all sorted lists into one heap and then removing the smallest elements one at a time is the second fastest way to merge for small sorted arrays. But as arrays grow, another (stepwise) implementation of the heap, where elements are added and removed from the heap one by one, catches up with the direct implementation.

It is obvious that the step-by-step implementation has linear time complexity, and the straight-line implementation has double linear time complexity. At the same time, step-by-step implementation loses to straight-line implementation on small arrays due to the need to also remember the position of each element when combining lists.