# ShellSort

With ShellSort, it sorts the elements that are a certain gap size apart. The sort used is an InsertionSort on these smaller subarrays. In the next iteration, the gap is diminished, until eventually the gap size is 1. Then, it becomes a regular InsertionSort. In the best case scenario, it is $O(nlogn)$. In the worst case scenario, it is $O(n^{3/2})$. It is not a stable algorithm since ordered elements can lose their order as part of different gap groups, and the memory complexity is $O(1)$.

## Generating a random sequence

In [27]:
import numpy as np

In [28]:
seed = 17
np.random.seed(seed)
n = 10
high = 100
low = 0
randseq = np.random.randint(low, high, n).tolist()
print(randseq)

[15, 6, 22, 57, 45, 22, 31, 68, 39, 84]


# The algorithm step by step

## Step 0: Modifying InsertionSort

Before designing the algorithm for shell sort, the algorithm for InsertionSort should be implemented. We take the algorithm from the InsertionSort file and modify it to account for differing gapsizes

In [29]:
def InsertionSortShell(seq, gapsize):
    array_length = len(seq)
    # The InsertionSort has to run the gapsize number of times. 
    for i in range(gapsize):
        # Traverse the full list, with increments of the gapsize. Skip the first element since it is already sorted
        for j in range(i+gapsize, array_length, gapsize):
            # Starting with storing the index of the element to be inserted (The first element is considered to already be sorted)
            inserterindex = j
            # We use a while loop to automate until an element smaller or equal to is found in the subarray
            while seq[inserterindex] < seq[inserterindex-gapsize]:
                # swap the element to be inserted with the previous element
                temp = seq[inserterindex-gapsize]
                seq[inserterindex-gapsize] = seq[inserterindex]
                seq[inserterindex] = temp
                # Update the index of the element to be inserted
                inserterindex -= gapsize
                # Break out of the loop of inserterindex is i, ie, we have the first element
                if inserterindex==i:
                    break
    return seq

Thus, using a test gapsize of 3, we should expect an output of [15,6,22,31,45,22,57,68,39,84]

In [30]:
InsertionSortShell(randseq, 3)

[15, 6, 22, 31, 45, 22, 57, 68, 39, 84]

## Step 1: Perform the sort with the first appropriate gapsize in a Ciura Sequence

In [31]:
# Generating the random sequence again
np.random.seed(seed)
n = 10
high = 100
low = 0
randseq = np.random.randint(low, high, n).tolist()
print(randseq)

[15, 6, 22, 57, 45, 22, 31, 68, 39, 84]


Ciura (2001) put forth a sequence that seemed to perform empiraclly well. The seqeucence is as follows:
$$1,4,10,23,57,132,301,701$$
We shall make use of this sequence where is makes sense to do so

In [32]:
InsertionSortShell(randseq, 4)

[15, 6, 22, 57, 39, 22, 31, 68, 45, 84]

## Step 2: Perform the sort with the gapsize of 1

In [33]:
InsertionSortShell(randseq, 1)

[6, 15, 22, 22, 31, 39, 45, 57, 68, 84]

## Combining everything

In [34]:
# Test on mini-algorithm to determine list of gapsizes
ciura = [1,4,10,23,57,132,301,701]
i=0
while 1000 > ciura[i]:
    i += 1
    if i==len(ciura):
        break
gap_sizes = ciura[:i]
print(gap_sizes)

[1, 4, 10, 23, 57, 132, 301, 701]


In [35]:
def ShellSort(seq):
    array_length = len(seq)
    # Intialising the list of gapsizes
    ciura = [1,4,10,23,57,132,301,701]
    # Deciding which gapsize to use
    i = 0
    # Once the length of the array is smaller or equal to the next possible gapsize, take then the current largest gapsize
    while array_length > ciura[i]:
        i += 1
        if i==len(ciura)-1:
            break
    gap_sizes = ciura[:i]

    # Perform the sort for each possible gapsize, starting from the largest
    for gap in reversed(gap_sizes):
        seq = InsertionSortShell(seq,gap)

    return seq

In [36]:
# Generating the random sequence again
np.random.seed(seed)
n = 10
high = 100
low = 0
randseq = np.random.randint(low, high, n).tolist()
print(randseq)

[15, 6, 22, 57, 45, 22, 31, 68, 39, 84]


In [37]:
ShellSort(randseq)

[6, 15, 22, 22, 31, 39, 45, 57, 68, 84]

In [38]:
# Generating the random sequence with more numbers
np.random.seed(seed)
n = 100
high = 100
low = 0
randseq = np.random.randint(low, high, n).tolist()
print(randseq)

[15, 6, 22, 57, 45, 22, 31, 68, 39, 84, 44, 7, 1, 17, 41, 56, 10, 98, 3, 63, 6, 91, 38, 91, 41, 57, 30, 17, 49, 32, 61, 5, 38, 2, 44, 54, 13, 56, 27, 83, 50, 49, 60, 8, 51, 27, 87, 63, 26, 91, 56, 57, 79, 74, 2, 49, 88, 49, 72, 64, 76, 20, 15, 90, 36, 93, 27, 35, 51, 83, 13, 7, 21, 12, 17, 25, 84, 89, 3, 74, 61, 99, 59, 33, 47, 13, 6, 45, 65, 46, 43, 35, 62, 14, 24, 67, 10, 18, 56, 50]


In [39]:
sortedseq = sorted(randseq)
print(ShellSort(randseq)==sortedseq)

True


# Timing the algorithm

In [40]:
def ShellSortTester(n, high, low=0):
    randseq = np.random.randint(low, high+1, n).tolist()
    return ShellSort(randseq)

In [41]:
ShellSortTester(10,100)

[1, 4, 40, 47, 52, 66, 66, 67, 96, 100]

## Standard conditions

In [42]:
%timeit ShellSortTester(10,10)

4.72 μs ± 72.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [43]:
%timeit ShellSortTester(100,100)

54.2 μs ± 64.5 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [44]:
%timeit ShellSortTester(10_000,10_000)

13.4 ms ± 66.2 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


## Case when all elements are sorted

In [53]:
# Sequence of sorted elements
def sorted_case(n):
    sorted = [i for i in range(n)]
    return ShellSort(sorted)

In [54]:
%timeit sorted_case(10_000)

2.01 ms ± 5.7 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


## Case when all elements are sorted in reverse

In [55]:
def rev_case(n):
    # Sequence of elements sorted in reverse
    rev_sorted = [i for i in range(n-1, -1, -1)]
    return ShellSort(rev_sorted)

In [56]:
%timeit rev_case(10_000)

13.4 ms ± 24.7 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


## Case with multiple duplicates

In [49]:
def duplicates(n):
    # Sequence of elements with many duplicates
    dup = [n]+[int(n/2) for i in range(n-2)] + [0]
    return ShellSort(dup)

In [50]:
%timeit duplicates(10_000)

2.37 ms ± 17.9 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [51]:
def duplicates2(n):
    # Sequence of elements with many duplicates
    dup = [0]+[int(n/2) for i in range(n-2)] + [n]
    return ShellSort(dup)

In [52]:
%timeit duplicates2(10_000)

2.4 ms ± 37.7 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In terms of runtime, ShellSort expectedly performs better than InsertionSort in the general and reverse list cases since by the time of gap 1, elements are already partly sorted. It performs worse in the best/sorted case since it runs through each gap size when that is not necessary. The same goes for duplicates