# Which requirements favour which implementation?

Abstract—This paper will dive in four different implementation of the insertion sort algorithm. The four implementations are FSMD (Finite State Machine with Data path), ASIP (Application-Specific Instruction-set Processor), C (software-based implementation) and Custom IP (hardware-based implementation). The implementation will be described in the first part. The second part will look further and focus on the question which requirements favour which implementation. Here different requirements will be presented and evaluated. In the end we present in which situation developing and deploying of a certain implementation makes most sense.

## I. INTRODUCTION

#### II. INSERTIONSORT

We decided to implement the InsertionSort Algorithm for the project asignment. It is a well-known and simple sorting algorithm. The idea of InsertionSort is similar to "sorting playing cards in your hands. You split the cards into two groups: the sorted cards and the unsorted cards. Then, you pick a card from the unsorted group and put it in the right place in the sorted group." [1]

It is maily used for small or nearly sorted lists. For a nearly sorted list, the time complexity of insertionsort is a O(n) which is the best case. However the average and worst case have both a time complexity of  $O(n^2)$ . Other algorithms like Mergesort or Quicksort are more prevelant. This is due to their lower average time complexity.

InsertionSort is a in-place sorting algorithm which means that its space complexity is at O(n). It requires as much space as the input requires and only fixed additional inputs for some values as

#### III. IMPLEMENTATIONS

This section will focus on our four different implementations of the In. We performed the implementation in the same order as they are listed in the next sections. The decision was made due to feeling most comfortable with designing and implemented an FSMD. This decision turned out to be very beneficial. In the ASIP implementation we based our work on the ASMD diagram created in the FSMD implementation. In the creation of our custom IP block, we started of by using the same code which we used in our C-program.

All implementations were done by the whole group and strictly after one another. We did not work on the next implementation before having a working version of the previous implementation.

## A. FSMD

We started with a finite state machine with data path (FSMD) diagram as well as an algorithm state machine with

datapath (ASMD) diagram before we started the implementation. We started with designing the FSMD. There, we choose the parts that are needed such as a RAM, receiving and transmission unit as well as comparators and registers. Based on that we defined our data path. In order to keep the implementation simple and strict to the concept, we avoided sending any data to the control path. We exchange only control signals. In the end we have nine input signals for our control path and twelve output signals that handle the dataflow in the data path.

After coming up with a FSMD we defined an ASMD. Here we devided our data flow into different states. Each state is one clock cycle long and therefore in- and output singals can only have represent value in each state. In total there are ten states. State S0 resets the FSMD to start the sorting anew. S1 receives and saves the integers from putty via a serial port to the RAM. The Enter character will trigger the transition to start states S2 to S7. Here the received numbers are sorted iteratively. Unsorted numbers will be compared to the already sorted number sequence and then inserted at the right place. This will be repeated until there are no unsorted numbers left and every number is at their right place. Again the transition to the next states is triggered by finding an Enter character. State 8 and 9 are responsible to send out the sorted numbers one by one via the UART\_tx component.

After designing our diagrams we started the implementation in Vivado. We used the VHDL files of the selectionSort as a base project and adapted the fsm.vhd and insort\_top.vhd file. The fsm.vhd file is defining the states and also controling the flow between the states. The insort\_top.vhd file defines all individual components (like registers, RAM, uart\_rx, uart\_tx), their input and output signals as well as the routing of these signals to other components. In each state the signals are set acording to the state box in the ASMD diagram. To implement the decision boxes we included new functionalities for the five comparators within the insort\_top.vhd file. Further the address mux and the data mux are also implemented in the insort top.vhd file.

After the initial implementation, we tested our functionality by using the waveform. Errors, that were found while debugging, were corrected in the code as well as in the diagram.

After we had our first working version, we refactored both the code and the diagrams to ensure that both are not overcomplicated.

#### B. ASIP

Second implementation was the application-specific instruction set processor, which we implemented using the states described in the ASMD chart. This allowed us to stay consistent



Fig. 1. FSMD diagram for the insertion sort

with our design choices as well as keep a structred overview of the ASIP implementation. We used the one-cycle cpu design as a template for the ASIP design and modified it by changing some core features. This included adding additional components and control signals.

In this implementation we start by initilizing registers. We then enter a loop until the condition rx\_done is met. Afterwards data is stored into memory as we look through the data we recieved. We use intermediary conditional branch statements for the main logic. This is achieved by using store, load and clear statements to sort the data as we gradually make our way through the input data. Once the end of the input data is detected tx\_done is set which signals that the end of data transmission.

Since the design of the ASIP was built upon the logic of the ASMD we needed to modify the opcodes to include the instructions we required for our implementation. The required opcode instructions were more than originally designed, and therefore the instruction address register had to have its size increased.

The new opcodes we required were used for branching, writing to the UART, and looping. For the conditional branching, all we required was Branch if Equal, Branch if Not Equal, and Branch if Greater than. In our implementation, we use these conditional operations to create loops. We also required the use of jumps, as in our ASMD diagram we needed to access earlier parts of our states frequently. We use our LOP\_UART\_DONE to loop until we meet the condition for our signal indicator rx\_done.

The UART itself is a component in our structure that is used for recieving and transmission of data. The main functionality of our UART can be summerized in the states idle, start, transmit and stop. The signal indicator rx was key when working with the logic running the UART.

## C. C coding in Vitis

Next up, we definied the C program. We setup the project - as we have done it in the lecture week - based on the roadmap of Eric Peronnin. This was done to start of a working version which can always go back to when encoutering errors. It



Fig. 2. ASMD diagram for the insertion sort

proved to be valuable since the debugger output is sometimes not easily understandble. We starte by removing the logic for the LEDs both in Vitis and afterwards in Vivado by removing the Blocks for the LEDs in the design. We then copied and basic insertion sort algorithm implementation in c from geeksforgeeks[1]. Next we changed the input and output behavior. After some debugging with "print" statements, we managed to get the code working quite fast. Most of the debugging time was spend on issues with pointers. Since no one in the group has comprehensive experience in c, some errors like that were to be expected. We verified that our final version behaved on the Zybo board exactly the same as the FSMD and ASIP implementation did on the Basys board. Once we made sure it behaved as expected we moved on to the next implementation.

## D. Create custom IP block

Lastly, we implemented the IP block. We expected this to be very similar to the C porgram. As we have do it in the project before, we wanted to start based on some working version. Therefore, we implemented the FIR Filter Project again.

We encountered the most issues in this implementation. The first issue we encountered was in the udpating of the platform and application components in Vitis. They seem to have a cache which is not cleared automatically when another component is deleted. So when we made changes in our hls component, created a new xsa file and updated the platform, we encountered the same behaviour as before. Only deleting the platform and application component in Vitis, fully seemed to clear any cached information.

Further, issues were all based on the connection between our application component and our hls/IP component. As described we started of with the fir filter implementation. Since the fir filter implementation exchanges only single integer values, we wanted to modify it into sending arrays. An array cannot be set as an input for a function, so we used pointers. The values of the pointers were exchanged between these two components. Any further data was not copied. Next, we tried a different angle by exchanging a struct between these two components. The issues with this idea were in the application. We did not manage to find out how to use custom data types with the TODO API library from AMD. We also saw the option that it might be possible to share ram between these two parts. We opted to switch back to the fir

implementation and instead of using an array to exchange the data, we send the data one by one via the existing working exchange of integer values. We managed to build a working version around that idea. After confirming that we have a working version, we refactored the code and finished that part of the implementation.

Most of the time for this implementation was spend on the issues that we had. The implementation of the sorting function and the implementation in the end were quite similar to the c-code implementation. We estimate that we spend 75% of our work in this implementation with dealing with the issues. This can be atributed to the both the lack of knowledge of the tools and libraries as well as a worse debugging experience which explain later in (TODO link to subsection). Due to these issues, we tried a lot of different approaches as well as using ChatGP(TODO footnote). The solutions from ChatGPT did not prove to be valuable for us. Either it referred us to solution which worked in c but not with the interface between the two components or it created solutions which were overly complicated.

#### IV. COMPARISON

- A. Design and Implementation Complexity
- B. Debugging and Verification Complexity

The debugging strategies and the complexity of debugging varied greatly in the different implementations. In both the FSMD and the ASIP, we used the simulation in Vivado. In both cases we had to customize the signals that we want to simulate. The difficulty with this debugging lies in finding the first unexpected or abnormal change. In both cases we used both our diagrams to go through the flow and compare our expected values to the simulated values until we found the outlier. This results in two or even three people doing the debugging. Since we worked as a team on this task, this did not present any resource issue. Nevertheless this form of debugging felt slower and more complicated compared to bugging python code in a modern IDE in which you can mark the line of code, to which you want to jump. Although being a slow process, it still proved to be an effective tool to find any mistakes that we made. We managed to find all our mistakes with this form of debugging. One benefit of this debugging method is that you do not have to know where you want to debug. Since you get many singals at once, the unnormal behaving signal is very likely included. This reduces the amount of restarting the debugging procedure itselfs.

For the c-program we used print-statements for debugging. We viewed the output directly via the serial connection in putty. This form of debugging was quite familiar to all team members. It was intiutive and faster compared to the method above, since we did not have to search for the data. However the print statements have to be included in various places and also have to be deleted afterwards. The code is therefore changed for debugging. Despide this downside we managed to debug quite a bit faster than the first method.

Even though the IP block is also code which is written in c, the

debugging experience is quite different and more complex than in the method before. Since there are seperated components, the complexity increases and there are more things which can work or not work and need therefore debugging. The building of the hls component and afterwards the c-simulation could be debugged as normal c-code could be debugged. The difficulties start at the RTL simulation. The RTL simulation could fail without giving good indications of the reasons for it. This could be due to issues in transfering data or because of not sending data or for running into a loop which was not cought in the c-simulation. As far as we could see, there is not even a debugging tool for it. Our solution was to uncomment sections of the code until we find the section which causes the issue. This is complex and prone to errors since the code logic is heavily effected by the debugging. When you RTL simulation worked, it produced a waveform file. However this waveform file was more complex and difficult to understand since. There are many signals in the waveform file to represent the logic of the c code. However they are auto generated and therefore have unintuitive names. We tried debugging with this but could not make any use of it. The next point of debugging is in the application component. However at this point there are many possible areas in which the bug itself could be located. This means that we can print the values which we send and receive of the hls component but depending on the results, we can not located the bug clearly. Due to the lack of knowledge of this interface between the components, we made many mistakes with the interface itself which were neither located in one or the other component and were therefore difficult to debug. The different implementations result in various different debugging experiences and required skills for debugging. We spend the most time for debugging on the last implementation which was unexpected. The complexity of the interaction between the various different parts resulted in a complex debugging process. Further the lack of tools for debugging this complex situation, resulted in a slow process. So the result is that the third implementation was by far the fasted to bebug. Both the first and the second implementation took some time to debug but it was still manageable. The last implementation was very difficult to debug and it was unpredictable how fast an error could be found. This should be taken into the cost

C. Scalability, Flexibility and Reusability

consideration for the implementation.

- D. Other comparing methods
- E. Cost (Development and Production)

.gniggubed fo mrof siht htiihw sekdedcivorp siht htiw sekatsim ruo lla dnif ot deg geanamma eW .edam ew tahtw sekatsim yna dnif ot loot evitceffe na eb ob ot devorp llits ti ,ssecor

from IP block: Additionally, the testbench simulation was not provide the same results as the real tests. Some of our implementations worked within the testbench but not with the real implementation. Lastly, debugging of the file was very difficult. Vitis did perform well as a debugging software and as such many bugs and errors proved to be difficult to identify leading to different iterations of the implementation. The waveform file was difficult to interpret as it did not keep our variable names and print statements could not always be displayed on the terminal.

#### ACKNOWLEDGMENT

The preferred spelling of the word "acknowledgment" in America is without an "e" after the "g". Avoid the stilted expression "one of us (R. B. G.) thanks ...". Instead, try "R. B. G. thanks...". Put sponsor acknowledgments in the unnumbered footnote on the first page.

## REFERENCES

## REFERENCES

[1] "Insertion sort algorithm," *GeeksforGeeks*, 7.03.2013. [Online]. Available: https://www.geeksforgeeks.org/insertion-sort-algorithm/.