## Sorting algorithms

- Generate a list of 1000 random integers.
- Implement the sorting algorithms: bubble sort, insertion sort, selection sort, mergesort, heapsort, and quicksort.
- Implement a function to measure the elapsed time for each sorting function.
- Execute the sorting functions on the generated list and display the resulting times.

ChatGPT Prompt: Write a python script for presenting different sorting algorithms. First, generate a list of 1000 random integers. Then, write functions for bubble sort,  insertion sort, selection sort, mergesort, heapsort, and quicksort.  Include a function to measure elapsed time for each sorting function and display the resulting times.

* In addition: shell sort.

In [1]:
import random

# Generate a list of 5000 random integers between 1 and 10000
random_integers = [random.randint(1, 10000) for _ in range(5000)]
random_integers[:10]  # Display the first 10 integers for reference

[4966, 9445, 1145, 8610, 2749, 4084, 1396, 6408, 3999, 7597]

In [2]:
def bubble_sort(arr):
    n = len(arr)
    for i in range(n):
        swapped = False
        for j in range(0, n-i-1):
            if arr[j] > arr[j+1]:
                arr[j], arr[j+1] = arr[j+1], arr[j]
                swapped = True
        if not swapped:
            break
    return arr

In [3]:
def insertion_sort(arr):
    for i in range(1, len(arr)):
        key = arr[i]
        j = i-1
        while j >=0 and key < arr[j]:
            arr[j+1] = arr[j]
            j -= 1
        arr[j+1] = key
    return arr

In [4]:
def selection_sort(arr):
    for i in range(len(arr)):
        min_idx = i
        for j in range(i+1, len(arr)):
            if arr[j] < arr[min_idx]:
                min_idx = j
        arr[i], arr[min_idx] = arr[min_idx], arr[i]
    return arr

In [5]:
def merge_sort(arr):
    if len(arr) <= 1:
        return arr

    mid = len(arr) // 2
    left = arr[:mid]
    right = arr[mid:]

    left = merge_sort(left)
    right = merge_sort(right)

    return merge(left, right)

def merge(left, right):
    result = []
    i = j = 0
    while i < len(left) and j < len(right):
        if left[i] < right[j]:
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    result.extend(left[i:])
    result.extend(right[j:])
    return result

In [6]:
def heapify(arr, n, i):
    largest = i
    l = 2 * i + 1
    r = 2 * i + 2

    if l < n and arr[l] > arr[largest]:
        largest = l

    if r < n and arr[r] > arr[largest]:
        largest = r

    if largest != i:
        arr[i], arr[largest] = arr[largest], arr[i]
        heapify(arr, n, largest)

def heap_sort(arr):
    n = len(arr)

    # Build a maxheap
    for i in range(n // 2 - 1, -1, -1):
        heapify(arr, n, i)

    # Extract elements one by one
    for i in range(n-1, 0, -1):
        arr[i], arr[0] = arr[0], arr[i]
        heapify(arr, i, 0)
    
    return arr

In [7]:
def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) // 2]
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    return quick_sort(left) + middle + quick_sort(right)

## Shell Sort

Named after its inventor Donald Shell, is essentially an extension of insertion sort. It improves on insertion sort by comparing elements separated by a gap of several positions. This allows elements to move to their correct position in the array more rapidly. As the algorithm progresses, the gap is reduced until it becomes 1, and the algorithm effectively becomes an insertion sort. However, by the time the gap is reduced to 1, most of the elements are already in the right (or nearly right) positions, making the insertion sort phase efficient.

Basic Functionality:

Gap Sequence: Start with a gap, typically half the size of the list. There are various sequences used for the gap: half of the list size, then half of that, and so on, until the gap is 1 (like in the provided implementation). Other sequences can be used as well, such as the Knuth's sequence.

Gapped Insertion Sort: For each gap, perform a gapped insertion sort. Just like insertion sort, but instead of moving by 1 position, move by a gap.

Reduce Gap: After sorting the entire array for a particular gap, reduce the gap and repeat the process.

Final Phase: The final phase (gap=1) is a regular insertion sort, but by this time, the array is "almost" sorted, making the procedure efficient.

Key Points:

The performance of shell sort depends on the gap sequence chosen. The original sequence proposed by Shell (which halves the gap size at each step) may not be the most efficient for many cases.

Shell sort is an in-place sorting algorithm.

It's not stable by default because of the long jumps (or gaps) that can reorder equal elements.

Its time complexity can vary based on the gap sequence, but it's generally faster than O(n^2) and slower than O(n log n) sorting algorithms.

Shell sort is an intermediate sorting algorithm that offers a good trade-off between coding complexity and performance, especially for smaller datasets or when memory usage is a concern.

In [8]:
def shell_sort(arr):
    n = len(arr)
    gap = n // 2
    
    # Start with a big gap, then reduce the gap
    while gap > 0:
        for i in range(gap, n):
            temp = arr[i]
            j = i
            # Shift elements by gap
            while j >= gap and arr[j - gap] > temp:
                arr[j] = arr[j - gap]
                j -= gap
            arr[j] = temp
        gap //= 2

    return arr

In [9]:
import time

def measure_time(sort_function, data):
    start_time = time.time()
    sort_function(data.copy())  # Use a copy to ensure the original list isn't sorted in-place
    end_time = time.time()
    return end_time - start_time

## Execute the sorting functions on the generated list and display the resulting times:

In [10]:
# Create a dictionary of sorting algorithms to execute and measure
sorting_algorithms = {
    "Bubble Sort": bubble_sort,
    "Insertion Sort": insertion_sort,
    "Selection Sort": selection_sort,
    "Merge Sort": merge_sort,
    "Heap Sort": heap_sort,
    "Quick Sort": quick_sort,
    "Shell Sort": shell_sort
}

# Measure the time taken for each sorting algorithm
time_results = {}
for name, function in sorting_algorithms.items():
    elapsed_time = measure_time(function, random_integers)
    time_results[name] = elapsed_time

time_results

{'Bubble Sort': 1.0618140697479248,
 'Insertion Sort': 0.467984676361084,
 'Selection Sort': 0.38185548782348633,
 'Merge Sort': 0.008005857467651367,
 'Heap Sort': 0.009836196899414062,
 'Quick Sort': 0.009948015213012695,
 'Shell Sort': 0.010050296783447266}

From the results, we can observe that the divide-and-conquer algorithms (Merge Sort, Heap Sort, and Quick Sort) are significantly faster than the simpler algorithms (Bubble, Insertion, and Selection Sort) for this dataset size.

Keep in mind that the actual times can vary based on the specific data, machine specifications, and other factors. However, the relative performance of the algorithms in terms of time complexity is consistent with common knowledge:

Bubble, Insertion, and Selection Sort have a time complexity of O(n^2) in the worst case.

Merge Sort and Heap Sort have a time complexity of O(n log n) in the worst case.

Quick Sort has an average time complexity of O(n log n), but a worst-case time complexity of O(n^2) (though with proper optimizations and on average data, it tends to perform well).

Sorting algorithms that preserve the relative order of equal elements are called stable sorts. Stability in sorting is important when you have a secondary criterion for sorting.

For example, imagine you have a list of books sorted by author, and you want to sort it by title. Using a stable sort ensures that books with the same title will still be sorted by author within that title. An unstable sort may mix up the order of authors for books with the same title.

From the sorting algorithms we discussed:

## Stable sorts:

Insertion Sort: It's inherently stable. When it encounters equal elements, it doesn't swap them, so their relative order is preserved.

Bubble Sort: It's stable because when two elements are equal, they are not swapped.

Merge Sort: It's stable, provided that when merging two lists, if two elements are equal, the element from the left list is always picked before the one from the right list.

## Unstable sorts:

Selection Sort: By default, it's unstable. This is because it scans the entire unsorted segment and swaps the minimum (or maximum) element with the first unsorted element, potentially changing the relative order of equal elements.

Heap Sort: It's inherently unstable. The process of building and reconstructing the heap can change the relative order of equal elements.

Quick Sort: In its basic form, it's unstable. The relative order can change based on the pivot selection and the partitioning process.

However, note that some of these unstable sorts can be made stable with modifications, but doing so might add complexity or reduce efficiency. For instance, a stable version of selection sort can be implemented by pushing elements down rather than swapping them, but this usually isn't done because selection sort isn't typically chosen for scenarios where stability is crucial.

https://www.youtube.com/watch?v=ZZuD6iUe3Pc

https://www.youtube.com/watch?v=kPRA0W1kECg

https://www.youtube.com/watch?v=AAwYzYkjNTg

## Homework

Include:

* radix sort
* cocktail shaker sort