Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
report - add most of embedded and sopc
- Loading branch information
Showing
9 changed files
with
3,512 additions
and
29 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,25 +1,371 @@ | ||
\chapter{SoPC} | ||
------------------------ | ||
|
||
|
||
\section{Avalon Memory Mapped Interface} | ||
The Avalon Memory Mapped (Avalon MM) interface is used throughout the project to interconnect | ||
blocks with memory elements and the SoPC interconnect. An example Avalon MM waveform | ||
is shown in Figure \ref{figure:avalonmm}. | ||
\begin{figure}[h!] | ||
\begin{center} | ||
\includegraphics[width=\textwidth]{avalonmm} | ||
\caption{Example waveforms of an Avalon MM bus} | ||
\label{figure:avalonmm} | ||
\end{center} | ||
\end{figure} | ||
|
||
|
||
The master can issue both reads and writes, which a slave then replies to. A read | ||
is initiated by setting the desired address and byte enables and asserting the \texttt{read} | ||
signal. The master can continue issuing reads to different addresses as long as the \texttt{waitrequest} | ||
signal stays low. If the Avalon MM slave asserts the \texttt{waitrequest} signal, the master needs | ||
to hold the current signals for as long as the \texttt{waitrequest} signal is asserted. | ||
The Avalon MM slave will reply to the read by asserting the \texttt{readdataready} signal | ||
and providing the requested read data on the \texttt{readdata} bus. | ||
\\ | ||
|
||
Similarly writes are initiated by the Avalon MM master by setting the desired address, byte enables, | ||
data to write and asserting the \texttt{write} signal. The master can assume that the write completed | ||
without waiting any further unless the \texttt{waitrequest} signal is asserted, in which case, | ||
similarly to the reads, the master needs to hold the signals steady until \texttt{waitrequest} is deasserted. | ||
\\ | ||
|
||
% XXX: some simulation waveforms? | ||
|
||
|
||
\newpage | ||
\section{Overview} | ||
Alex \\ | ||
- Altera-provided Peripherals \\ | ||
- Nios II core \\ | ||
- Avalon-MM bus (at least the signals we use) | ||
A decision was made to use a hybrid software and hardware approach to tackle the problem | ||
so as to simplify the required hardware. To achieve this, a system on programmable chip | ||
(SoPC) generated by the Altera QSys software was used. | ||
\\ | ||
|
||
Figure \ref{figure:sopc_overview} shows an overview of the SoPC module. The system consists of a Nios II/f core | ||
and a number of peripherals interconnected via the QSys (Merlin) Network-on-Chip | ||
interconnect. | ||
|
||
\begin{figure}[h!] | ||
\begin{center} | ||
\includegraphics[width=\textwidth]{sopc-fig} | ||
\caption{Overview of the SoPC} | ||
\label{figure:sopc_overview} | ||
\end{center} | ||
\end{figure} | ||
|
||
The system is clocked by a 100 MHz clock generated by a PLL. Additionally a 10 MHz clock | ||
is also generated, which is used to clock the GPIO controller and the LCD controller. The | ||
system consists of both third-party and own IP cores. | ||
\\ | ||
|
||
The GPIO controller provides a General Purpose I/O interface to the operating system. It | ||
is connected to the configuration-related pins of the slave FPGA - \texttt{nCE, nSTATUS, | ||
CONF\_DONE, nCONFIG}. | ||
\\ | ||
|
||
Two SPI interfaces master controllers are also included in the system. Both run the SPI | ||
interface at a safe 10 MHz clock frequency. SPI0 is connected to the SD Card, while SPI1 | ||
connects to the SPI Flash on the slave FPGA board. | ||
\\ | ||
|
||
Serial connectivity is provided by the JTAG UART and UART modules. The JTAG UART | ||
provides a serial UART interface over the USB JTAG for use with the custom Nios II software | ||
tools on a host PC. The UART module provides a regular serial interface to the on-board | ||
RS-232 connector. The controller implements the RX and TX signals as well as transmission control | ||
signals CTS and RTS. | ||
\\ | ||
|
||
The two main memory interfaces are an SDRAM controller interfacing to the on-board 128 MB | ||
SDRAM, and a flash controller interfacing to the on-board 8 MB CFI Flash. | ||
\\ | ||
|
||
An ethernet interface is provided via the Altera Triple-Speed Ethernet (TSE) MAC module. The | ||
CPU interfaces to the TSE MAC via two Scatter-Gather DMA controller to maximize throughput. The | ||
MAC uses an RGMII interface to connect to the on-board Ethernet PHY. The RX clock to the MAC | ||
initially used a 0 degree phase shift. This resulted in a large number of dropped packages and | ||
hence TCP rentransmissions. Increasing the phase shift to 90 degrees significantly improved | ||
the receive performance. Another large improvement in transmit and receive reliability and | ||
throughput was achieved by disabling the statistics counters in the MAC. | ||
\\ | ||
|
||
The system also includes a timer module for use with the operating system, and a sysid module | ||
which provides a programmable ID and the synthesis date of the system to the operating system. | ||
\\ | ||
|
||
A custom module (SRAM Bridge) is used to interconnect the CPU with the internal SRAM synchronizing | ||
arbiter (\texttt{sram\_arb\_sync}). It is only a bridge exposing a memory range as an external | ||
Avalon MM master interface. | ||
\\ | ||
|
||
Further custom modules interface with other on-chip peripherals. The Test Runner module | ||
interfaces the tester and test controller with the system. The ADC interface provides a control | ||
interface for the on-chip ADC controller. | ||
\\ | ||
|
||
A custom frequency counter module takes an external signal bus, and is able to measure the frequency | ||
on any of the incoming signals. | ||
\\ | ||
|
||
Table \ref{table:memorymap_cpu} shows the memory map as seen from the CPU. | ||
|
||
\begin{table}[h!] | ||
\centering | ||
\begin{tabular}{ | l | l | } | ||
\hline | ||
Address Range & Peripheral\\ | ||
\hline | ||
\texttt{0x00000000 - 0x07ffffff} & SDRAM \\ | ||
\hline | ||
\texttt{0x08001000 - 0x08001fff} & TSE MAC SGDMA Descriptor Memory \\ | ||
\hline | ||
\texttt{0x08002800 - 0x08002fff} & JTAG Controller \\ | ||
\hline | ||
\texttt{0x08003000 - 0x080033ff} & Triple-Speed Ethernet MAC \\ | ||
\hline | ||
\texttt{0x08003400 - 0x0800343f} & TSE MAC SGDMA (TX) \\ | ||
\hline | ||
\texttt{0x08003440 - 0x0800347f} & TSE MAC SGDMA (RX) \\ | ||
\hline | ||
\texttt{0x08003480 - 0x0800349f} & Timer \\ | ||
\hline | ||
\texttt{0x080034a0 - 0x080034af} & PLL \\ | ||
\hline | ||
\texttt{0x080034b0 - 0x080034b7} & JTAG UART \\ | ||
\hline | ||
\texttt{0x0a000000 - 0x0a7fffff} & CFI Flash \\ | ||
\hline | ||
\texttt{0x0b000010 - 0x0b00001f} & LCD Controller \\ | ||
\hline | ||
\texttt{0x0b000200 - 0x0b00020f} & GPIO (nSTATUS) \\ | ||
\hline | ||
\texttt{0x0b000210 - 0x0b00021f} & GPIO (CONF\_DONE) \\ | ||
\hline | ||
\texttt{0x0b000220 - 0x0b00022f} & GPIO (nCONFIG) \\ | ||
\hline | ||
\texttt{0x0b000230 - 0x0b00023f} & GPIO (nCE) \\ | ||
\hline | ||
\texttt{0x0ba00000 - 0x0ba00007} & SYS ID \\ | ||
\hline | ||
\texttt{0x0ba10000 - 0x0ba1001f} & SPI 0 \\ | ||
\hline | ||
\texttt{0x0ba20000 - 0x0ba2001f} & SPI 0 \\ | ||
\hline | ||
\texttt{0x0c000000 - 0x0c1fffff} & SRAM (SRAM Bridge) \\ | ||
\hline | ||
\texttt{0x0d000000 - 0x0d0000ff} & Test Runner \\ | ||
\hline | ||
\texttt{0x0d100000 - 0x0d10001f} & UART \\ | ||
\hline | ||
\texttt{0x0d200000 - 0x0d2003ff} & Frequency Counter \\ | ||
\hline | ||
\texttt{0x0d300000 - 0x0d3000ff} & ADC Interface \\ | ||
\hline | ||
\end{tabular} | ||
\caption{Memory map of the SoPC as seen from the CPU data master} | ||
\label{table:memorymap_cpu} | ||
\end{table} | ||
|
||
|
||
\newpage | ||
\section{Nios II} | ||
A Nios II/f is the application processor in the SoPC. The Nios II/f is a 32-bit MIPS-based | ||
RISC processor with a branch predictor, barrel shifter, hardware multiplication and division, | ||
instruction and data caches and an MMU. | ||
\\ | ||
|
||
The choice of the Nios II/f was made in particular with the MMU in mind. The lower-end Nios II/e and | ||
Nios II/s cores do not support an MMU. By using an MMU it is possible, by using virtual memory, | ||
to allocate for example large buffers, of which only the used pages will be backed by real memory. For example | ||
a 1 MB buffer will not be fully allocated with backing memory immediately, but only when all its pages are | ||
actually used. Another important advantage of an MMU is the memory protection. Faults in userland such as | ||
segmentation faults can be caught and recovered from. | ||
\\ | ||
|
||
To improve the performance of the system with an MMU, a Translation Lookaside Buffer (TLB) of 128 entries was | ||
also included. | ||
\\ | ||
|
||
During the initial stages of development, the CPU was used with a Level 3 debug module, which includes a number | ||
of hardware breakpoints and watchpoints. This was later reduced to a Level 1 debug module which only includes | ||
a small JTAG controller. | ||
\\ | ||
|
||
The reset vector of the CPU points to the first location of the CFI Flash, so that the CPU boots up from | ||
the non-volatile Flash memory. The exception vectors are located in the SDRAM, where the OS will be located | ||
as soon as the bootloader has loaded it. | ||
\\ | ||
|
||
Initially a small cache size of just 4kB and 8kB (with a line size of 32 bytes) was chosen for | ||
the D and I caches respectively. Given the large amount of unused resource on the board a decision | ||
was made to increase those sizes to 32kB and 64kB respectively. Table \ref{table:benchmark} shows some benchmark results | ||
with both cache sizes. The improvement is fairly significant. In the case of Dhrystone, the complete | ||
benchmark fits into the caches with the new larger cache sizes. | ||
|
||
|
||
\begin{table}[h!] | ||
\centering | ||
\begin{tabular}{ | l | r | r | } | ||
\hline | ||
Benchmark & Score (small caches) & Score (large caches)\\ | ||
\hline | ||
Dhrystone & 92165.9 & 119617.2 \\ | ||
\hline | ||
BYTEmark Numeric Sort & 26.574 & 33.36 \\ | ||
\hline | ||
BYTEmark String Sort & 1.3667 & 1.8123 \\ | ||
\hline | ||
BYTEmark Bitfield & 5.6997e+06 & 5.8613e+06 \\ | ||
\hline | ||
\end{tabular} | ||
\caption{Benchmark results for small and large caches respectively} | ||
\label{table:benchmark} | ||
\end{table} | ||
|
||
|
||
|
||
\newpage | ||
\section{Test Runner} | ||
The test runner module provides a memory-mapped interface to control the external | ||
tester module. Figure \ref{figure:tr_blackbox} shows an overview of the signals of this core. | ||
\begin{figure}[h!] | ||
\begin{center} | ||
\includegraphics[width=0.4\textwidth]{tr} | ||
\caption{Black box overview of Test Runner} | ||
\label{figure:tr_blackbox} | ||
\end{center} | ||
\end{figure} | ||
|
||
|
||
The Avalon MM Slave interface connects to the SoPC and provides a convenient | ||
memory-mapped interface to its internal registers. The interrupt sender interface | ||
connects directly to the interrupt controller of the CPU. | ||
\\ | ||
|
||
The peripheral bus consists of three signals: a \texttt{busy} signal which feeds as | ||
one of the select signals into the SRAM Arbiter (\texttt{sram\_arb\_sync}), a \texttt{enable} | ||
signal and a \texttt{done} signal which connect to the \texttt{tester} module. | ||
\\ | ||
|
||
Table \ref{table:trunner_memorymap} shows the memory map of the core. The system can read an ID from the | ||
ID register (which always contains the hexadecimal value 0x0a) to verify that | ||
the device is responding. A write to the enable register will assert the peripheral | ||
\texttt{enable} line for one cycle. A read of the done register returns the value | ||
of the \texttt{done} peripheral signal. | ||
\\ | ||
|
||
When a rising edge occurs on the \texttt{done} signal, the IRQ register is written | ||
and the IRQ line to the CPU is asserted. As soon as the IRQ register is read, it is | ||
automatically cleared and the IRQ line is deasserted. | ||
\\ | ||
|
||
The \texttt{busy} signal is asserted between the rising edge of the \texttt{enable} signal | ||
and the rising edge of the \texttt{done} signal. | ||
|
||
|
||
\begin{table}[h!] | ||
\centering | ||
\begin{tabular}{ | l | l | } | ||
\hline | ||
Address & Description \\ | ||
\hline | ||
\texttt{0x0a} & IRQ Register \\ | ||
\hline | ||
\texttt{0x7f} & ID Register \\ | ||
\hline | ||
\texttt{0x80} & Done Register \\ | ||
\hline | ||
\texttt{0x81} & Enable Register \\ | ||
\hline | ||
\end{tabular} | ||
\caption{Relative memory map of the test runner module} | ||
\label{table:trunner_memorymap} | ||
\end{table} | ||
|
||
|
||
\newpage | ||
\section{ADC Interface} | ||
The ADC interface has the exact same interfaces as the Test Runner module as shown | ||
on Figure \ref{figure:adc_if_blackbox}. The only difference is its memory map, which is slightly different | ||
and shown in Table \ref{table:adc_memorymap}. | ||
|
||
\begin{figure}[h!] | ||
\begin{center} | ||
\includegraphics[width=0.4\textwidth]{adc_if} | ||
\caption{Black box overview of Test Runner} | ||
\label{figure:adc_if_blackbox} | ||
\end{center} | ||
\end{figure} | ||
|
||
\section{SRAM\_bridge} | ||
Alex | ||
\begin{table}[h!] | ||
\centering | ||
\begin{tabular}{ | l | l | } | ||
\hline | ||
Address & Description \\ | ||
\hline | ||
\texttt{0x0a} & IRQ Register \\ | ||
\hline | ||
\texttt{0x0f} & ID Register \\ | ||
\hline | ||
\texttt{0xa0} & Done Register \\ | ||
\hline | ||
\texttt{0xb0} & Enable Register \\ | ||
\hline | ||
\end{tabular} | ||
\caption{Relative memory map of the adc interface module} | ||
\label{table:adc_memorymap} | ||
\end{table} | ||
|
||
\section{test\_runner} | ||
Alex \\ | ||
|
||
\newpage | ||
\section{Frequency counter} | ||
%XXX: Pavlos part goes here | ||
|
||
\section{Frequency Counter} | ||
% Pavlos \\ | ||
% mention: synchronizer | ||
|
||
The frequency counter is good. | ||
\newpage | ||
\section{Resource usage} | ||
Table \ref{table:linuxsys_resusage} shows a summary of the FPGA resource usage of the SoPC. The table includes only | ||
the largest modules, and a total of all modules. In total the SoPC uses 17,453 logic cells of the | ||
115,200 available on the FPGA. | ||
\\ | ||
|
||
Particularly interesting is the large number of cells used by the interconnect. About half of these | ||
come from the clock crossing adapters to cross from the 100 MHz to the 10 MHz clock domain. | ||
|
||
\section{ADC} | ||
Pavlos | ||
\begin{table}[h!] | ||
\centering | ||
\begin{tabular}{ | p{2cm} | r | r | r | r | } | ||
\hline | ||
Module & Logic cells (comb) & Logic cells (reg) & DSP Elements & Memory bits \\ | ||
\hline | ||
CPU & 3852 & 2822 & 4 & 877,056 \\ | ||
\hline | ||
Interconnect (estimate) & 2566 & 2144 & 0 & 0\\ | ||
\hline | ||
TSE MAC & 2198 & 2701 & 0 & 298,416 \\ | ||
\hline | ||
SG DMAs & 1202 & 1542 & 0 & 2,745 \\ | ||
\hline | ||
SDRAM Controller & 331 & 338 & 0 & 0 \\ | ||
\hline | ||
SPI0 & 111 & 117 & 0 & 0 \\ | ||
\hline | ||
SPI1 & 110 & 115 & 0 & 0 \\ | ||
\hline | ||
Timer & 130 & 120 & 0 & 0 \\ | ||
\hline | ||
UART & 129 & 101 & 0 & 0 \\ | ||
\hline | ||
JTAG UART & 142 & 112 & 0 & 1024 \\ | ||
\hline | ||
Test Runner & 863 & 1036 & 0 & 0 \\ | ||
\hline | ||
ADC Interface & 142 & 140 & 0 & 0 \\ | ||
\hline | ||
Frequency counter & 334 & 298 & 0 & 0 \\ | ||
\hline | ||
\hline | ||
Total & 13754 & 11692 & 4 & 1,218,729 \\ | ||
\hline | ||
\end{tabular} | ||
\caption{Resource consumption by SoPC module (only the largest modules are shown), and total (including all modules). Note that the total logic cell usage is not a sum of the combinational and register logic cells, since some logic cells contain both combinational logic and registers. The total number of logic cells used by the SoPC is 17,453} | ||
\label{table:linuxsys_resusage} | ||
\end{table} |
Binary file not shown.
Oops, something went wrong.