It was in 1998 that Donald Knuth expressed, in his Art of Computer Programming, Volume 3: Sorting and Searching, that “virtually every important aspect of programming arises somewhere in the context of sorting or searching.” Indeed, within the simplistic context and reality of data, comes the demanding force of organization. In light of this context, we take a brief expansion of analyzing the power and necessity of several sorting algorithms. By doing so, the reader can benefit and understand within contextual evidence that sorting algorithms possess both positive and negative consequences. We begin our analysis by asking a fundamental question: “what is an efficient sorting algorithm?” To answer this question, we must first define the basis in which measures the elegance and effectiveness of an algorithm. The rules of the game can be defined as follows: (1) Running time, the performance of an algorithm can be defined by the basic operations of comparison and exchanges. Moreover, the hypothesis of any valid algorithm must be checked within the context of facilitating time measured to complete a sort; (2) Extra memory, the amount of memory and the space in which to store extra memory can be a factor in calculating total running time. Smaller memory demands greater performance; (3) Types of data, our sorting algorithms can only be as effective as the items it handles when enabling the Comparable interface. Deducting from these three principles, we can come to a basic generalization when measuring the efficiency of an algorithm. That is, the essential measurement of a sorting algorithm lies within both its running time and the amount of exchanges that take place. Both memory and types of data have a substantial impact on time and compares. For that reason, the following examination will consider both time efficiency and data compare effectiveness. By programming in Java, we have the natural ability to implement simplistic object-oriented algorithms in which display the underlying logic of sorting. Furthermore, by implementing recursive definitions within two algorithms, we allow ourselves to recursively divide and conquer the data into general cases. Overall, testing allows us to truly see the importance of sorting as well as searching.
In order to create a valid testing range, our tester class implements the use of three text files that possess unique characteristics. Our first file contains pseudo random data that is ascending n-tuples. The second file contains pseudo random data that is descending n-tuples. Finally, the last data file we will implement is of complete random n-tuples. Each file contains a range of 1 – 100,000 random numbers. We will discuss individual execution of each algorithm and its performance from which we will then draw conclusions from our Tester.java class.
The straightforward approach to implementing merging is to design a method that merges two disjoint ordered arrays. Our algorithm implements a recursive basis that divides and conquers down to just one element. Once the recursive method reaches its base case it merges an order list within the next hierarchy of data to be merged onto. Once this is done, the list is essentially ordered. The mathematical approach lies within the Fast Fourier Transform. A discrete module that transforms a sorting performance from 𝑂𝑂(𝑛𝑛2) to 𝑂𝑂(𝑛𝑛log𝑛𝑛), which is the difference between uselessness and panacea. The proof within this comparison lies in assuming 𝐶𝐶(𝑁𝑁) be the number of compares needed to sort an array of length 𝑁𝑁. As a result, we can conclude, through a series of recursive proofs that 𝐶𝐶(𝑁𝑁)=𝐶𝐶(2𝑛𝑛)=𝑛𝑛 2𝑛𝑛=𝑁𝑁lg 𝑁𝑁. In light of pseudo random data that is ascending, we find that Merge Sort takes on average 5.89ms to complete the sort. The total number of average swaps was 2,126,230. In light of pseudo random data that is descending, we find that merge sort take on average 5.97ms to complete the sort. The total number of average swaps was 7,679,978 to complete the sort. Finally, in light of random data, we find that Merge Sort takes on average 11.38ms to complete the sort and on average 4,825,370,727 exchanges. It can be concluded that merge sort performs best when handling pseudo random data that is both ascending and descending. However, data that is completely random with no particular set of n-tuple organization, fairs negatively in comparison to Quick Sort. Because Merge Sort uses a recursive method, it is not optimal with respect to space usage. The worst case scenario demonstrated may not be likely in practice. Thus, it appears that merge sort, although elegant in its recursion has negative consequences and should be used as an opportunity and not a standard.
Quick sort, similar to Merge Sort, is also a divide and conquer method for sorting. It works by partitioning two arrays into two sub arrays but does so by rearranging the array such that, when the two sub arrays are sorted, the whole array is sorted. Quick Sort is very popular because it is not difficult to implement and works well for a variety of different kinds of input data. The mathematical approach behind Quick sort implements a similar approach to our previous transform module. That is, on average, we can expect a performance of 𝑂𝑂(𝑛𝑛log𝑛𝑛). As we will see, Quick Sort’s decision logic considers the recursive definition by rearranging small bits of an array at once. This causes a faster implementation both in theory as well as practice. In light of pseudo random data that is ascending, we find that Quick Sort takes on average 2.09ms to complete a sort and 1,037,401 swaps. In light of pseudo random data that is ascending, Quick Sort takes on average 2.25ms to complete a sort and 3,454,251 swaps. Finally, within the context of pure random data, we can understand that Quick Sort takes 3.81ms and only 1,610,334,560 swaps. In comparison to Merge Sort, Quick Sort has the advantage. This is due to the logical conditions implemented in harmony with the two inner arrays that sort the original data. It can be rendered, then, in comparison to Merge Sort, that a carefully tuned version of Quick Sort can run significantly faster. Merge Sort is indeed the prime method of choice within a divide-and-conquer condition.
The graph is among the most common data structures in computer science, and it’s unsurprising that a staggeringly large amount of time has been dedicated to developing algorithms on graphs. The idea of depth-and-Breadth-First first search was first implemented by Pierre Treanaux and later by Edward Moore. It is from these mathematical necessities that we draw into the deployment of priority queues within heap organization. In other words, Heap Sort relies on the use of priority queue development. Heap Sort can be broken into two phases: construction and sort down. Both in which allow the sink based construction to use fewer than 2𝑁𝑁 compares and fewer than 𝑁𝑁 exchanges to construct a heap from 𝑁𝑁 items. As a result, average cases can run within worst case scenario 𝑂𝑂(𝑛𝑛) yet within exceptional best case scenario, it can run as fast as 𝑂𝑂(𝑛𝑛log𝑛𝑛). Taking a closer look at our results from tester, a data set of pseudo random ascending data can take on average 0.71ms to complete the sort with a total average of only 545,798 swaps. A data set of pseudo random descending data can take on average 0.76ms to complete the sort and a total average of 1,651,412 swaps. If we apply a Heap Sort to a random data set, we find that on average it takes 1.29ms to complete and an average of 537,578,178 swaps. From this data, we find a huge improvement both in swaps and time in comparison to our divide and conquer algorithms. This may be due to the immediate necessity of a priority queue created before our actual Heap Sort. By doing so, we save time in the actual sorting of our data because the array we prioritize is queued in a manner that allows quick organization. Overall, by providing a reliable heap sort method, we quicken both performance and necessity within sorting performance.
Although elementary in its properties, Shell Sort offers a considerably fast execution time with regard to large amounts of data. Although relying on an insertion sort methodology, Heap Sort considers a condition in which the property saves every ℎth entry. Each entry creates a gap in which compares data and then swaps if the comparison proves less than the source. This yields a sorted subsequence that performs extraordinarily faster in comparison to Insertion Sort. These subsequences cause an increase in performance time and less damage to overhead performance. The actual module for developing the algorithm is defined by the gap in which is created in sorting the data. In mathematical terms, Shell Sort has a worst case definition of 𝑂𝑂(𝑛𝑛) but in average and best cases, its definition will depend on the gap indicated by the author of a Shell Sort algorithm. Donald Knuth developed an algorithm that performs 𝑂𝑂𝑁𝑁3. Performance is incredibly fast in comparison to the algorithms we’ve developed thus far. In light of our results, Shell Sort takes on average 0.26ms to complete a sort and a total of 995,332 swaps when dealing with pseudo random ascending data. When handling pseudo random descending data, we find similar results. That is, 0.27ms to complete a sort and 3,333,164 total swaps. Finally, in view of random data, shell sort takes an incredible 0.48ms to complete the sort and only 188,818,389 swaps. In both time and swaps, Shell Sort demonstrates the best average overall. Shell Sort gains efficiency by making a tradeoff between size and partial order. By doing so, Shell Sort is sometimes the choice among experienced programmers because it handles moderately large arrays and requires a small amount of code.
Although simplistic in nature, Cocktail Sort offers a bidirectional option to the well-known bubble sort method. By implementing a bidirectional procedure, Cocktail Sort can sort in one direction that work its way back by sorting in the other direction. In practice, this allows the algorithm to bring the most significant digit to the back and the least significant digit to the front. As a result, we obtain a surprising increase in performance within a small artifact of code. Mathematically speaking, Cocktail Sort measures on average 𝑂𝑂(𝑛𝑛2). Because of this, Cocktail Sort is perhaps among the more simplistic algorithms to understand. In light of this, let’s take a closer look at our testing data. When handling pseudo random ascending data, Cocktail Sort on average finishes within 0.11ms and takes an average of 5,227,905 total swaps. Whereas, pseudo random descending data takes on average 5.97ms to complete and a total of 7,679,978 swaps. Finally, when handling complete random data, Cock Tail sort averages 16.08ms and a total of 19,878,453,231 swaps. Deducting from these results, we can attest that an improvement has been made over single directional sorting. Overall, it can be concluded that Cocktail Sort, although possessing a cool name, still falls below average in comparison to far more elegant solutions.
Expanding the performance of insertion sort, we bring our attentions to an algorithm proposed in 2000 that finds the first place where two adjacent elements are in the wrong order, and swaps them. By doing so, it takes advantage of positional swapping that saves its index and condition of swapping –this entails a solution in handling amounts of data that contain a large size potency in the middle. Mathematically, it appears that performance of Gnome Sort fairs between 𝑂𝑂(𝑛𝑛2) and 𝑂𝑂(𝑛𝑛). In light of our expected results, we assume that in practice our algorithm would perform quite negatively. However, because of its relative positioning and conditioning system, we are able to process a random ascending data set within 0.04ms –our best time thus far. This is particularly striking when the number of swaps is considered as an average total of 6,638,763. Within the area of pseudo random descending data, we have the result of 15.93ms and 13,336,669,647 average swaps. In light of pure random data, a total of 10.25ms to complete a sort and an average of 26,441,664,845 swaps. Although simplistic in its form, a slowdown in performance is evident. It can be said, however, that when it comes to data ascending, Gnome Sort executes extraordinarily well. However, with regards to data that is descending and random, Gnome Sort falls short in comparison to other algorithms.
We finish our testing with a simple but well known sorting algorithm. Although less efficient in handling large n-tuples of data, it can provide an advantage in simplicity as well as efficiency in handling small amounts of data. Insertion sort lends itself in accessing the back of a data list and swapping its value previous from it. Furthermore, by implementing a checker allows us to easily access the array and then sort it under the conditions that there is more data to search and that it has yet to finish. Insertion Sort fairs between 𝑂𝑂(𝑛𝑛2) and 𝑂𝑂(𝑛𝑛). This is reflected when we analyze our average results. In handling pseudo random ascending data, Insertion Sort displays a fast result of 0.02ms and 7,109,049 swaps. Nevertheless, in dealing with pseudo random descending data, Insertion Sort displays a slow result of 16.38ms and 14,447,781,021 swaps. Finally, in handling random data sets, Insertion sort takes on average 8.74ms to complete the sort and a total of 28,629,402,049 swaps. It’s quite evident that insertion sort’s strength lies within ascending data.
The question we raised from the beginning of this analysis can be answered with more context and fluidity. Indeed, every algorithm contains the potentiality of being engineered into practice. However, similar to any method, comes the condition of choice. If we take a closer look at our illustrations, we can draw a strong emphasis on when to use a particular algorithm for sorting.
We begin our scope within the area of sorting times in respect to pseudo random ascending data. As we can see, performance time favors the simplistic choice of algorithms. That is, algorithms whose construction relies on no extra memory nor complicated code. It’s clear that both Insertion and Gnome Sort (a variation of Insertion Sort) dominates the field. It may come to the readers surprise that complicated divide-and-conquer algorithms fair the worse. However, this is to be expected both in theory and practice. Indeed the logic behind Merge Sort lies in the recursive organization of least descending data. If data is ascending, it’s clear that both overhead and extra memory become victims. It can be concluded, to a certain extent, that simplistic algorithms may be a choice among experienced programmers when dealing with ascending data. Let us now compare the average time with the average swap.
It is quite humorous to see that the tables have literally turned. Although Insertion Sort executed the fastest, it swapped values the most. In end result, Heap, Shell, as well as quick and Merge sort, have the upper hand in the least amount of average swap values. However, this entails a question of whether these results effect the final decision of use. It depends. As we will draw from our other comparisons, many times and algorithm may perform extraordinarily fast but require extensive amount of overhead comparisons. Because of this, we would need to consider the condition of memory and data type. As stated before, these contingences effect the decision of an experienced programmer.
Let us now analyze our sorting algorithms in light of descending sorting times.
As we can see, algorithms that are far more complex and elegant, fair far better in execution compared to simplistic algorithms. It should be highlighted that Shell Sort is considerably a simple and deals with calculation gap sequencing. By implementing this method, it sorts a descending n-tuple set far faster than Heap Sort. This is not to say that Shell Sort is better than Heap Sort or the two divide-and-conquer algorithms. But does demonstrate a powerful implication, in which not all sorting algorithms, need to be complex, to sort large sets of data. It’s obvious, though, that complex algorithms handle descending data far better than ascending. If we take a look at our next illustration, our average swap times match the radix position of its average sorting times. In other words, it’s evident that both average swaps and time possess a matching discourse. It’s interesting to deduct that Heap Sort contains the least amount of swaps. Yet this is of course due to making a low demanding priority queue prior to sorting the heap itself. This causes not only a faster time but huge improvements in the actual overhead needed for swaps to occur. Overall, depth-and-breadth-first search, divide-and-conquer, and shell sort, prove to be the prime choice in organizing descending data.
Let us now analyze our last set of data, random. Random is to be noted as a particularly true data test that really measures the efficiency of an algorithm. Keep in mind that the data presented, is completely at random. That is, it has no origin of descent or ascent. As a result, it’s easy to see which algorithm handles a truly random set of data. In our tester, it was evident that both Shell and Heap Sort were the clear winners. Indeed, it is to no surprise, that Heap Sort would be considered one of the fastest due to its priority queue advantage. This same ideal also applies to Shell Sort but, of course, involves gap sequencing. Furthermore, it’s a surprise that the only divide-and-conquer algorithm to show favorable results, is Quick Sort. This may be due to the fact that Merge Sort’s recursive base case is reached when a sub array is brought down to its finite element and then adjoined further up the stack. In theory, this overhead and extra memory space causes the CPU to process the stack and thus add extensive time to the sorting process. For our simplistic algorithms, Insertion Sort finishes at an exceptional time. Ultimately, Shell Sort as well as Heap Sort are the clear winners.
So where does this leave us? In many ways, the analysis has left us with the undertone that every algorithm has its significant advantages as well as disadvantages. No sorting algorithm is the prime choice in every situation. Indeed, by sorting three different n-tuple sets of data, it was evident that the result do not match. And for that reason, we can draw a valid direction for ourselves when deciding which sorting algorithm to implement in our projects. When handling ascending data, it was evident that a simplistic approach was quicker than a more complicated solution. But when handling descending data, complexity was the prime solution. This ideal also applied to sorting a random set of data as well. In conclusion, we can draw the summation that any algorithm is only as good as it is written and deployed. By keeping our data in mind, the reader can truly come to understand which circumstances call for the best choice of a sorting algorithm. Nevertheless, no matter which algorithm an experienced programmer chooses, it is evident that all sorting algorithms play a major role in the art of computer programming.