# RadixSort

RadixSort is useful for integers and for fixed-length strings, but obviously gets worse the more digits there are. In the general case, this algorithm runs in O(d*n)=O(n), where d is the number of digits in the largest elements.

# Linkedlist implementation

For such a sort an access table with references to a Dual-access Queue is helpful. We use deque from the Collections package

In [5]:
from collections import deque

In [2]:
q = deque([1,2,3])

In [3]:
q.append(23)

In [4]:
q.popleft()

1

In [5]:
q

deque([2, 3, 23])

In [6]:
len(q)

3

## Additionally, the math package is useful for the floor function

In [8]:
from math import floor

# Generating a Random Sequence

A random sequence needs to be generated

In [8]:
import numpy as np

In [9]:
np.random.seed(23)
n=10
high = 300
low = 0

randseq = np.random.randint(low, high+1, n).tolist()
print(randseq)

[83, 230, 40, 31, 237, 39, 90, 153, 6, 12]


# Step 1: Find the number of digits in the largest element

This then becomes the number of iterations the sort has to perform

In [10]:
# Getting the maximum from a sequence, then converting it to a string, then getting its length
largest_length = len(str(max(randseq)))
print(largest_length)

3


# Step 2: Bucketing the elements with the same first digits

In [11]:
# First, a bucketing table needs to be created, with numbers 0 to 9. Each entry of the dict leads to a queue
buckets = {i:deque([]) for i in range(10)}

# For each element in the sequence
for i in randseq:
    # The hash key will be i%10 so that the element will be properly assigned based on its last digit value
    hash = i%10
    # Append to the appropriate queue
    buckets[hash].append(i)

print(buckets)

{0: deque([230, 40, 90]), 1: deque([31]), 2: deque([12]), 3: deque([83, 153]), 4: deque([]), 5: deque([]), 6: deque([6]), 7: deque([237]), 8: deque([]), 9: deque([39])}


# Step 3: Put the elements from the buckets into a new array, in the order of the hash value and the order which they came in

In [12]:
newarray = []
# For each hash value
for hash in buckets:
    # While the bucket still has elements
    while len(buckets[hash])!=0:
        # pop the first element from the queue, and append that to the new array
        newarray.append(buckets[hash].popleft())
newarray

[230, 40, 90, 31, 12, 83, 153, 6, 237, 39]

Note that buckets is now filled with only empty queues

In [13]:
buckets

{0: deque([]),
 1: deque([]),
 2: deque([]),
 3: deque([]),
 4: deque([]),
 5: deque([]),
 6: deque([]),
 7: deque([]),
 8: deque([]),
 9: deque([])}

# Step 4: Repeat the above for the number of digits in the largest elements

The above steps should now be put in a loop that runs as many times as the number of digits in the largest element, with each loop focusing on a different digit. For easier reference we generate the random number sequence again here.

In [14]:
np.random.seed(23)
n=10
high = 300
low = 0

randseq = np.random.randint(low, high+1, n).tolist()
print(randseq)

[83, 230, 40, 31, 237, 39, 90, 153, 6, 12]


In [15]:
# Getting the maximum from a sequence, then converting it to a string, then getting its length
largest_length = len(str(max(randseq)))
print(largest_length)

3


In [16]:
# For each digit in the largest element
for i in range(largest_length):
    # Creating an empty new array
    newarray = []
    # Go through the elements of the list
    for j in randseq:
        # Hashing by the appropriate value. Move the decimal point appropriately depending on the loop, then round the number down.
        # The modulo 10 of the reuslt then gives the digit in the appropriate place. Ie. 237 -> 23.7 -> 23 -> 23%10 = 3
        hash = floor(j/(10**i))%10
        # Append the element to the appropriate bucket
        buckets[hash].append(j)
        
    # Go through the buckets in buckets
    for key in buckets:
        # while the bucket is not empty
        while len(buckets[key])!=0:
            # Append the elements in order to the new array by popping the first element in the bucket's queue
            newarray.append(buckets[key].popleft())

    # Assigning this array to a variable that can be used in the next loop
    randseq = newarray

print(newarray)

[6, 12, 31, 39, 40, 83, 90, 153, 230, 237]


# Combining everything into a single function

In [3]:
def RadixSort(seq):
    # Getting the maximum from a sequence, then converting it to a string, then getting its length
    largest_length = len(str(max(seq)))

    # Creating buckets
    buckets = {index:deque([]) for index in range(10)}

    # For each digit in the largest element
    for i in range(largest_length):
        # Creating an empty new array
        newarray = []
        # Go through the elements of the list
        for j in seq:
            # Hashing by the appropriate value. Move the decimal point appropriately depending on the loop, then round down.
            # The modulo 10 of the reuslt then gives the digit in the appropriate place. Ie. 237 -> 23.7 -> 23 -> 23%10 = 3
            hash = floor(j/(10**i))%10
            # Append the element to the appropriate bucket
            buckets[hash].append(j)
        
        # Go through the buckets in buckets
        for key in buckets:
            # while the bucket is not empty
            while len(buckets[key])!=0:
                # Append the elements in order to the new array by popping the first element in the bucket's queue
                newarray.append(buckets[key].popleft())

        # Assigning this array to a variable that can be used in the next loop
        seq = newarray
    return newarray

In [18]:
np.random.seed(23)
n=10
high = 300
low = 0

randseq = np.random.randint(low, high+1, n).tolist()
print(randseq)

[83, 230, 40, 31, 237, 39, 90, 153, 6, 12]


In [19]:
RadixSort(randseq)

[6, 12, 31, 39, 40, 83, 90, 153, 230, 237]

In [20]:
n=100
high = 999
low = 0

randseq = np.random.randint(low, high+1, n).tolist()
print(randseq)

[49, 706, 459, 981, 709, 192, 12, 533, 560, 297, 591, 858, 958, 57, 403, 417, 187, 551, 911, 548, 577, 897, 884, 683, 15, 214, 103, 723, 698, 630, 537, 194, 970, 637, 35, 480, 56, 269, 316, 534, 678, 543, 862, 462, 0, 917, 518, 32, 354, 544, 142, 543, 590, 467, 336, 582, 834, 978, 981, 439, 283, 830, 249, 244, 299, 714, 71, 247, 426, 467, 277, 997, 518, 68, 150, 299, 136, 469, 446, 672, 828, 356, 523, 513, 553, 244, 794, 549, 425, 765, 824, 295, 465, 762, 672, 693, 291, 279, 106, 797]


In [21]:
print(RadixSort(randseq))

[0, 12, 15, 32, 35, 49, 56, 57, 68, 71, 103, 106, 136, 142, 150, 187, 192, 194, 214, 244, 244, 247, 249, 269, 277, 279, 283, 291, 295, 297, 299, 299, 316, 336, 354, 356, 403, 417, 425, 426, 439, 446, 459, 462, 465, 467, 467, 469, 480, 513, 518, 518, 523, 533, 534, 537, 543, 543, 544, 548, 549, 551, 553, 560, 577, 582, 590, 591, 630, 637, 672, 672, 678, 683, 693, 698, 706, 709, 714, 723, 762, 765, 794, 797, 824, 828, 830, 834, 858, 862, 884, 897, 911, 917, 958, 970, 978, 981, 981, 997]


In [22]:
np.random.seed(2390)
n=10
high = 100_000
low = 0

randseq = np.random.randint(low, high+1, n).tolist()
print(randseq)

[69360, 41927, 19356, 5427, 93930, 75216, 68650, 60709, 65643, 37042]


In [23]:
print(RadixSort(randseq))

[5427, 19356, 37042, 41927, 60709, 65643, 68650, 69360, 75216, 93930]


# Testing the timings of the sorting algorithm under different conditions

In [24]:
def RadixSortTester(n, high, low=0):
    randseq = np.random.randint(low, high+1, n).tolist()
    return randseq, RadixSort(randseq)

In [25]:
RadixSortTester(10, 100)

([78, 67, 81, 90, 61, 1, 83, 94, 88, 40],
 [1, 40, 61, 67, 78, 81, 83, 88, 90, 94])

In [26]:
%timeit RadixSortTester(10,10)

6.69 μs ± 155 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [27]:
%timeit RadixSortTester(100,100)

35.2 μs ± 3.53 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [28]:
%timeit RadixSortTester(10_000, 10_000)

4.74 ms ± 31.2 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


So far, it behaves similarly to the the CountingSort, albeit a little slower

## If the distribution does not match the number of elements

In [29]:
%timeit RadixSortTester(100, 100_000_000, 99_999_990)

109 μs ± 2.77 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [9]:
outlier = [50_000_000+i for i in range(98)] + [100_000_000, 0]
print(outlier)

[50000000, 50000001, 50000002, 50000003, 50000004, 50000005, 50000006, 50000007, 50000008, 50000009, 50000010, 50000011, 50000012, 50000013, 50000014, 50000015, 50000016, 50000017, 50000018, 50000019, 50000020, 50000021, 50000022, 50000023, 50000024, 50000025, 50000026, 50000027, 50000028, 50000029, 50000030, 50000031, 50000032, 50000033, 50000034, 50000035, 50000036, 50000037, 50000038, 50000039, 50000040, 50000041, 50000042, 50000043, 50000044, 50000045, 50000046, 50000047, 50000048, 50000049, 50000050, 50000051, 50000052, 50000053, 50000054, 50000055, 50000056, 50000057, 50000058, 50000059, 50000060, 50000061, 50000062, 50000063, 50000064, 50000065, 50000066, 50000067, 50000068, 50000069, 50000070, 50000071, 50000072, 50000073, 50000074, 50000075, 50000076, 50000077, 50000078, 50000079, 50000080, 50000081, 50000082, 50000083, 50000084, 50000085, 50000086, 50000087, 50000088, 50000089, 50000090, 50000091, 50000092, 50000093, 50000094, 50000095, 50000096, 50000097, 100000000, 0]


In [10]:
%timeit RadixSort(outlier)

96.4 μs ± 792 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


We see that RadixSort performs much better than CountingSort

In [32]:
%timeit RadixSortTester(100, int(1e15))

196 μs ± 1.31 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


Even for very large numbers it is very fast

# RadixSort can also work for Strings

We start with the following string:

In [68]:
strseq = ['It', 'is', 'dangerous', 'to', 'go', 'alone', 'take', 'this']

## Modifying the algorithm

In [34]:
len(max(strseq, key=len))

9

In [70]:
def RadixSortStr(seq):
    # Converting all strings to lowercase
    seq = [word.lower() for word in seq]
    # Getting the longest string, then getting its length
    largest_length = len(max(seq, key=len))

    # Creating buckets for each letter. ord converts a string to its ascii int, then chr turns it back into a string
    buckets = {chr(letter):deque([]) for letter in range(ord('a'), ord('a')+26)}

    # For each digit in the largest element
    for i in range(largest_length):
        # Creating an empty new array
        newarray = []
        # Go through the elements of the list
        for j in seq:
            # If the length of the element is smaller than the remaining loops, immediately append it to the list without 
            # doing any sorting
            if len(j)<(largest_length-i):
                newarray.append(j)
                continue
                
            # Hashing by looking at the (largest_length-i-1)th letter
            hash = j[largest_length-i-1]
            # Append the element to the appropriate bucket
            buckets[hash].append(j)
        
        # Go through the buckets in buckets
        for key in buckets:
            # while the bucket is not empty
            while len(buckets[key])!=0:
                # Append the elements in order to the new array by popping the first element in the bucket's queue
                newarray.append(buckets[key].popleft())

        # Assigning this array to a variable that can be used in the next loop
        seq = newarray
    return newarray

In [71]:
RadixSortStr(strseq)

['alone', 'dangerous', 'go', 'is', 'it', 'take', 'this', 'to']

In [76]:
strseq = ['My', 'name', 'is', 'Maximus', 'Decimus', 'Meridius', 'commander', 'of', 'the', 'Armies', 'of', 'the', 'North', 'General', 'of', 'the', 'Felix', 'Legions', 'and', 'loyal', 'servant', 'to', 'the', 'true', 'emperor', 'Marcus', 'Aurelius', 'Father', 'to', 'a', 'murdered', 'son', 'Husband', 'to', 'a', 'murdered', 'wife', 'And', 'I', 'will', 'have', 'my', 'vengeance', 'in', 'this', 'life', 'or', 'the', 'next']

In [77]:
print(RadixSortStr(strseq))

['a', 'a', 'and', 'and', 'armies', 'aurelius', 'commander', 'decimus', 'emperor', 'father', 'felix', 'general', 'have', 'husband', 'i', 'in', 'is', 'legions', 'life', 'loyal', 'marcus', 'maximus', 'meridius', 'murdered', 'murdered', 'my', 'my', 'name', 'next', 'north', 'of', 'of', 'of', 'or', 'servant', 'son', 'the', 'the', 'the', 'the', 'the', 'this', 'to', 'to', 'to', 'true', 'vengeance', 'wife', 'will']


In [78]:
%timeit RadixSortStr(strseq)

38.6 μs ± 172 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
