# Multi-Port Memory with Bi-directional Ports for FPGAs Using **XOR and LVT Methods**

XXX XXX@XXX

#### Abstract

We propose a simple extention to XOR memories presented in previous work. In this paper we generalize the XOR memory allowing for any number of full, read-only and write-only ports. This paper also presents a novel and efficient architecture for creating live value table memory using an XOR-based scheme, emphasizing its bidirectional capabilities.

This is achieved by exploiting the properties of XOR, allowing any data entry to be reconstructed by XORing the corresponding entries from all memory banks. The result is a high-throughput, coherent multi-ported memory that is particularly well-suited for implementation on FPGAs.

We use an XOR memory with full ports to implement a live value table (LVT) design. We evaluate the architecture's performance and resource utilization, showing that it uses significantly less logic and can achieve higher frequencies for deep memory configurations compared to LVT-based designs. This makes the XOR-based bidirectional live value table a compelling alternative for applications requiring high-performance, flexible memory access.

#### **ACM Reference Format:**

XXX. 2025. Multi-Port Memory with Bi-directional Ports for FPGAs Using XOR and LVT Methods. In Proceedings of International Symposium on Field Programable Gate Arrays (FPGA '26). ACM, New York, NY, USA, 4 pages. https://doi.org/XXXXXXXXXXXXXX

## 1 Motivation

As computation needs keep increasing, one way to keep up has been specialized architectures. FPGAs provide a way to implement architectures without taping out an ASIC. However, the limitations of FPGA resources means some creativity is needed to map designs to FPGAs. This paper explores the limitation of FPGAs in the fact that FPGA memories have a limited number of ports. Specifically we look at creating memories with more than 2 ports.

The major FPGA vendors (AMD[11], Intel[5], Lattice[8], Microchip[9], Achronix[2]) implement distributed memory (small memories) and block memory (large memories) differently, however they share some characteristics. All vendors support distributed memory configurations with 1 full or write port and between 1 to 3 read ports. All vendors support block memory with 2 full ports. None of the vendors support memories with more than 2 full ports. Although this limitation is problematic for designs requiring multiple

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

FPGA '26, Seaside, CA © 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-XXXX-X/2018/06

https://doi.org/XXXXXXXXXXXXXXX

ports particularly write or full ports, we show that these resources make it possible to achieve high throughput quad and octal full port memories.

#### 2 Source Code

We provide all of the source code used in implementation and testing our design at https://anonymous.4open.science/r/mpm-7666/. We tested our design with Verilator[10] and implemented the design with Vivado[3].

#### **Related Work**

Several solutions to the port limit on FPGAs other than what is presented here. For example multi-pumping and banking. multipumping is the process of reducing the clock speed to increase the number of ports. For example a 300Mhz single port memory can handle 2 150Mhz ports. Banking requires stalls and routing logic due to the segmented memory. Our design is most similar to replication. Replication involves tying the write ports of multiple memories together to create additional read ports.

## 4 XOR memory

We propose a simple generalization to XOR memories presented in previous work[6]. XOR memories work by using the  $a \oplus b \oplus b = a$ property. We add bidirectional ports and analyze the perforamance of distributed memory and block memory versions of this design. We also present applications for these memories.

The number of RAMs needed is:

$$(W+F)(W+F+R)-W\tag{1}$$

Which expands to:

$$W^2 + 2WF + F^2 + WR + FR - W (2)$$

Where W is the number of write ports, R is the number of read ports and *F* is the number of full ports.

Reading data from an XOR memory simply involves XORing all of the data from the RAMs in one column from the same address. For example say address *x* has values *A*, *B*, *C* and *D*, the data read would be  $A \oplus B \oplus C \oplus D$ .

Writing to the memory involves reading from all memories except the current row (say row/port 2 in the example) and XORing the incoming data *E* (in the example this results in  $A \oplus B \oplus D \oplus E$ ) and storing that value in all the RAMs in that row.

The next time that data is read the result will be  $A \oplus B \oplus (A \oplus$  $B \oplus D \oplus E) \oplus D$ , which equals E.

You may notice that writing to a port involves XORing all but one stored value and reading involves XORing all values. This enables full ports to be created just by adding one RAM to what would otherwise be just a write port.

Note that this memory requires that all of the RAMs in a row have the same data. Initializing the rams to the same data (e.g. all





(a) Clock cycle 0.





Figure 1: XOR Multiport memory.

Table 1: Synthesis results of XOR memory for different port counts

| Ports    | LUTS    | LUTS<br>configured<br>as memory | FF | BRAM | Max Frequency |
|----------|---------|---------------------------------|----|------|---------------|
| 2        | 1,492   | 1,216                           | 0  | 0    | 370Mhz        |
| 4        | 6,312   | 5,248                           | 0  | 0    | 317Mhz        |
| 8        | 26,576  | 21,760                          | 0  | 0    | 241Mhz        |
| $16^{1}$ | 107,424 | 88,576                          | 0  | 0    | XMhz          |
| $32^{2}$ | 435,008 | 357,376                         | 0  | 0    | 0Mhz          |

Table 2: Synthesis results of XOR memory for different widths

| Width | LUTS   | LUTS<br>configured<br>as memory | FF | BRAM | Max Frequency |
|-------|--------|---------------------------------|----|------|---------------|
| 1     | 1,492  | 1,216                           | 0  | 0    | 370Mhz        |
| 2     | 6,312  | 5,248                           | 0  | 0    | 317Mhz        |
| 4     | 26,576 | 21,760                          | 0  | 0    | XMhz          |
| 8     | 16,544 | 0                               | 0  | 0    | XMhz          |
| 16    | 85,728 | 0                               | 0  | 0    | XMhz          |
| 32    | 85,728 | 0                               | 0  | 0    | XMhz          |

0s) is required for the memory to operate properly. This is not an issue in FPGAs since the memory can be initialized to 0. since rows are written to at the same time as long as the memories are intially the same they will remain the same.

## 5 Analysis of XOR memory

We analyze several configurations of XOR memories. Particularly we vary the number of port and width of the memory.

TODO: Analysis of varying the ports.

TODO: Analysis of varying the width.

## **6** Live Value Table Memory

XOR memories can be used by themselves, however a live value table (LVT) may be more efficient.

We present a LVT memory that utilises distrubuted memory xor live value table.

In [7] the live value table was implemented with registers. Previous work used distributed memory [1]. However they did not use bidirectional xor ports in their implementation.

We create a LVT memory using the technique described in [4]. This live value memory is composed of 2-(full)port memories. Each port shares a RAM with another port. This results in N(N-1)/2 RAMs being needed. See figure (todo cite figure).

The memory gets its name because of a multiport memory that tracks the most recent stored value (aka a live value table). The point of a multi-port memory that requires a multi-port memory is that wide (e.g. 32 bit data) can be stored more effeciently this way.

Instead of using a register based live-value table as in [4] we use a xor memory similar to [1].



Figure 2: A multi-port memory with 2 full ports, 2 write-only ports and 2 read-only ports.



Figure 3: Multiport memory created with dual-port memories.



Figure 4: LVT Multiport memory created with dual-port memories.

We show we utilize x% less resources than LVT and I-LVT.

# 7 Analysis of LVT Memory

We explore LVT designs with 2 to 32 ports. Although 16 and 32 port designs fit on large FPGAs, we believe smaller 4 and 8 port designs are more practical. We say more practical because of the high resource usage of XOR and LVT memories at high port counts.



Figure 5: Frequency of design.

 $N^{**}2$  for XOR and  $N^{*}(N-1)/2$  for LVT. However we were able to synthesize a 32 port memory.

Without write delay an 8 port memory runs at Xmhz (x% of max). With write delay and pipelining the design runs at Xmhz (x% of max).

#### 8 Conclusion

Several solutions to the port limit on FPGAs other than what is presented here. For example multi-pumping and banking. multi-pumping is the process of reducing the clock speed to increase the number of ports. For example a 300Mhz single port memory can handle 2 150Mhz ports. Banking requires stalls and routing logic

 $<sup>^{0}</sup>$ To reduce the number of IO ports and fit the design on the FPGA we used a wrapper for the multi-port memory for designs with 16 and 32 ports.

Table 3: Synthesis results for different port counts

| Ports    | LUTS   | FF | BRAM | Max Frequency |
|----------|--------|----|------|---------------|
| 2        | 127    | 0  | 1    | 455Mhz        |
| 4        | 640    | 0  | 6    | 364Mhz        |
| 8        | 3,012  | 0  | 28   | 278Mhz        |
| $16^{1}$ | 16,544 | 0  | 120  | 146Mhz        |
| $32^{1}$ | 85,728 | 0  | 496  | 67Mhz         |

Table 4: Synthesis results for different port counts for pipelined design

| Ports    | LUTS   | FF    | BRAM | Max Frequency |
|----------|--------|-------|------|---------------|
| 2        | 111    | 26    | 1    | 476Mhz        |
| 4        | 708    | 64    | 6    | 500Mhz        |
| 8        | 3,092  | 152   | 28   | 451Mhz        |
| $16^{1}$ | 16,800 | 448   | 120  | 284Mhz        |
| $32^{1}$ | 86,608 | 1,502 | 496  | XMhz          |

due to the segmented memory. Our design uses replication and some creativity (XOR and LVT techniques) to create multiple ports.

## References

Ameer M. S. Abdelhadi and Guy G. F. Lemieux. 2016. Modular Switched Multiported SRAM-Based Memories. ACM Trans. Reconfigurable Technol. Syst. 9, 3, Article 22 (July 2016), 26 pages. doi:10.1145/2851506

- [2] Achronix Semiconductor Corporation 2022. Speedster7t FPGA Datasheet (DS015). Achronix Semiconductor Corporation. https://www.achronix.com/sites/default/files/docs/Speedster7t\_FPGA\_Datasheet\_DS015\_8.pdf This document contains preliminary information and is subject to change without notice..
- [3] AMD. 2025. Vivado Design Suite User Guide: Using the Vivado IDE (2025.1 english ed.). AMD, San Jose, CA, USA. https://docs.amd.com/r/en-US/ug893-vivado-ide
- [4] Jongsok Choi, Kevin Nam, Andrew Canis, Jason Anderson, Stephen Brown, and Tomasz Czajkowski. 2012. Impact of Cache Architecture and Interface on Performance and Area of FPGA-Based Processor/Parallel-Accelerator Systems. In 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines. 17–24. doi:10.1109/FCCM.2012.13
- [5] Intel Corporation 2024. Intel Agilex 7 FPGA and SoC FPGA Datasheet. Intel Corporation. https://www.intel.com/content/www/us/en/programmable/support/literature/lit-agilex.html Version 2024.09.03.
- [6] Charles Eric Laforest, Zimo Li, Tristan O'rourke, Ming G. Liu, and J. Gregory Steffan. 2014. Composing Multi-Ported Memories on FPGAs. ACM Trans. Reconfigurable Technol. Syst. 7, 3, Article 16 (Sept. 2014), 23 pages. doi:10.1145/2629629
- [7] Charles Eric Laforest, Ming G. Liu, Emma Rae Rapati, and J. Gregory Steffan. 2012. Multi-ported memories for FPGAs via XOR. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (Monterey, California, USA) (FPGA '12). Association for Computing Machinery, New York, NY, USA, 209–218. doi:10.1145/2145694.2145730
- [8] Lattice Semiconductor 2021. FPGA-DS-02000 Lattice iCE40 Family Data Sheet. Lattice Semiconductor. https://www.latticesemi.com/view\_document? document id=52424 Accessed on 2025-09-08.
- [9] Microchip Technology Inc. 2025. PolarFire and PolarFire SoC FPGA Fabric User Guide. Microchip Technology Inc. https://www.microchip.com/content/dam/ mchp/documents/FPGA/ProductDocuments/UserGuides/PolarFire\_PolarFire\_ SoC\_FPGA\_Fabric\_User\_Guide\_VB.pdf Document Number: UG0680.
- [10] Veripool, Inc. 2024. Verilator Reference Manual. Veripool, Inc. https://verilator. org/guide/latest/ Version 5.041 (or latest as of your date of use). Available at https://verilator.org/guide/latest/.
- [11] Xilinx. 2020. 7 Series FPGAs Data Sheet: Overview. Xilinx. https://docs.amd.com/api/khub/documents/2LByHkO-nSZXcei2D55fTg/content v2.6.1.

Received 1 Octomber 2025; revised XX XXXX XXXX; accepted XX XXX XXXX