diff --git a/README.md b/README.md index 6823d44a77..afa28c500c 100644 --- a/README.md +++ b/README.md @@ -69,10 +69,11 @@ BlackParrot heavily leverages the [BaseJump STL](https://github.com/bespoke-sili BlackParrot is an aggressively modular design: communication between the components is performed over a set of narrow, latency-insensitive interfaces. The interfaces are designed to allow implementations of the various system components to change independently of one another, without worrying about cascading functional or timing effects. Read more about BlackParrot's standardized interfaces here: [Interface Specification](docs/interface_specification.md) -## BedRock Coherence System Guide -The BedRock coherence system maintains cache coherence between the BlackParrot processor cores and attached -coherent accelerators in a BlackParrot multicore system. Please see the [BedRock Coherence Protocol](docs/bedrock_coherence_protocol.md) -page for more details on the coherence protocol and system. +## BedRock Cache Coherence System Guide +The BedRock coherence system maintains cache coherence between the BlackParrot processor cores and +attached coherent accelerators in a BlackParrot multicore system. Please see the +[BedRock Cache Coherence System Guide](docs/bedrock_guide.md) page for more details on the cache +coherence protocol and its implementation in BlackParrot. ## Microarchitecture Guide [Microarchitecture Guide](docs/microarchitecture_guide.md) diff --git a/docs/bedrock.png b/docs/bedrock.png deleted file mode 100644 index d3fc9c8ce5..0000000000 Binary files a/docs/bedrock.png and /dev/null differ diff --git a/docs/bedrock_coherence_deep_dive.pdf b/docs/bedrock_coherence_deep_dive.pdf deleted file mode 100644 index 61d65f5237..0000000000 Binary files a/docs/bedrock_coherence_deep_dive.pdf and /dev/null differ diff --git a/docs/bedrock_coherence_protocol.md b/docs/bedrock_coherence_protocol.md deleted file mode 100644 index b93b7e4ff4..0000000000 --- a/docs/bedrock_coherence_protocol.md +++ /dev/null @@ -1,212 +0,0 @@ -# BedRock Cache Coherence Protocol - -BlackParrot's BedRock-based cache coherence system comprises three major components: a Cache -Coherence Engine (CCE) that contains and manages the coherence directory, a Local Cache Engine -(LCE) that manages entities such as data and instruction caches participating in coherence, and the -three LCE-CCE networks that carry the coherence protocol messages. BedRock currently has two CCE -implementations, which are described below. - -The current implementation of BlackParrot uses point-to-point ordered networks for the -coherence networks, however the coherence protocol is designed and verified correct -for unordered or ordered networks. - -![BedRock System diagram](bedrock.png) - -## Coherence Networks - -BedRock sends and receives messages on three networks, called Request, Command, and Response. -The message formats of these three BedRock channels are fully defined in the -[BedRock Interface](../bp_common/src/include/bp_common_bedrock_if.svh) file. -The specific message type, coherence states, and message size enums are defined in the -[BedRock Package Defines](../bp_common/src/include/bp_common_bedrock_pkgdef.svh). -The [BedRock Network Specification](bedrock_guide.md) provides an overview of the generic -BedRock message format and supported communication protocols. The three coherence networks -are implemented using the standard BedRock message format and differ only in the payload -fields within the messages. - -The Request network carries coherence requests from the cache controllers (LCE) to the directory -(CCE). A request may be a Read or Write request. Requests may be cached or uncached, and -atomic operation (AMO) requests are supported. - -The Command network carries coherence commands to the cache controllers (LCE). Most commands are -issued by the directory (CCE), except for cache to cache transfers that occur when a CCE commands -an LCE to send a cache block to another LCE. - -The Response network carries coherence responses from the cache controllers (LCE) to the coherence -directory (CCE). Common responses include cache block data writebacks, invalidation acknowledgements, -and coherence transaction acknowledgements. - -### Network Priorities - -The three coherence networks are related by a priority ordering scheme. The Response network is the -highest priority, followed by the Command network, and lastly the Request network with the lowest -priority. Processing a message on a lower priority network may cause a message to be sent on a -higher priority network, but not the other way around. For example, a Request message can cause -a Command message to be sent, or a Command message can cause a Response message to be sent, but -a Response message can not cause any message to be sent since it is the highest priority network. -It is also possible for a Command network message to cause a single extra message to be sent on the -Command network, when performing a cache to cache data transfer. Preserving the priority ordering -of the networks helps prevent deadlock-free protocol operation and prevent the presence of -message cycles across the three networks. - -## Coherence Protocol - -The BedRock cache coherence protocol supports the MOESIF family of protocols using a directory-based -coherence system. The coherence directory, managed by the CCE, is effectively a duplicate tags or -shadow tags directory design. The key difference between BedRock and a canonical shadow tags -directory is that in BedRock it is the directory (CCE), not the local caches (LCE), that maintains -and manages the golden copy of the tags. The local caches (LCEs) hold shadow tags and are only -allowed to modify their coherence state when instructed to do so by the directory (CCE). The -directory full controls all coherence state changes and cache block replacements in all of the -LCEs. This design decision eliminates a number of races from the coherence protocol, greatly -simplifying the implementation of the LCEs and CCE. Cache requests for the same block from -different LCEs may race to the directory, but are serialized by the network and are processed -in the order they arrive by the CCE. No other races exist in the protocol due to the CCE fully -controlling all coherence state changes and cache block replacements. - -At the LCE, every cache block has a small amount of associated metadata comprising the coherence -state of the block and a dirty bit. The LCE or cache also tracks, per cache set, any replacement -information required to implement the desired replacement algorithm (e.g., LRU way tracking). The -collection of cache tag, coherence state, and dirty bit for each block in a cache set plus the -LRU/replacement information per cache set is called a Tag Set. - -The coherence directory collects one or more tag sets into a Way Group. A Way Group adds a pending -bit to the collection of Tag Set information. The pending bit is used to effectively lock the Tag -Sets of the Way Group, allowing only a single coherence transaction per Way Group at a time. The -mapping from address to Tag Set and Way Group is such that all addresses (cache blocks) that map -to the same Tag Set (i.e., cache set in normal cache indexing and lookup) map to the same -Way Group. - -### Requests - -Request Message Types: -* Read miss - cache miss on a load / read operation -* Write miss - cache miss on a store / write operation -* Uncached Read - uncached load from memory -* Uncached Write - uncached store to memory -* Atomic - AMO operations - -A Request message has the following payload fields: -* Destination ID - destination CCE ID -* Source ID - requesting LCE ID -* Subop - store or atomic subop type -* Non-Exclusive Request Bit - 1 if LCE does not want Exclusive rights to cache block -* LRU Way ID - cache way within cache set that LCE wants miss filled to - -Uncached store and atomic requests contain data while all other request are header-only messages. - -### Commands - -Command Message Types: -* Sync - synchronization command during system initialization -* Set Clear - clear entire cache set (invalidate all blocks in set) -* Invalidate - invalidate specified cache block -* Set State - set coherence state for specified cache block -* Data and Tag - fill data, tag, and coherence state for specified cache block -* Set Tag and Wakeup - set tag and coherence state for specified block and wake up LCE (miss resolved) -* Writeback - command LCE to writeback a (potentially) dirty cache block -* Transfer - command LCE to send cache block and tag to another LCE -* Set State & Writeback - set coherence state then writeback cache block -* Set State & Transfer - set coherence state then transfer cache block to specified target LCE -* Set State & Transfer & Writeback - set coherence state, transfer cache block to target LCE, and writeback cache block -* Uncached Data - send uncached load data to an LCE -* Uncached Store Done - inform LCE that an uncached store was completed by memory - -A Command message has the following payload fields: -* Destination ID - destination LCE ID -* Source ID - sending CCE ID -* Way ID - cache way within LCE's cache set (given by address) to operate on -* State - coherence state -* Target LCE - LCE ID that receiving LCE will send cache block data and tag to for Transfer Command -* Target Way ID - cache way within target LCE's cache set (determined by address) to fill data in -* Target State - coherence state for target (to be implemented in future) - -Cache fill commands (Data and Tag) and Uncached Data commands contain data. All other commands -are header-only messages. - -### Responses - -Response Message Types: -* Sync Ack - synchronization acknowledgement during system initialization -* Inv Ack - invalidation ack to acknowledge invalidation command has been processed -* Coh Ack - coherence ack to acknowledge end of coherence transaction -* Resp WB - cache block writeback response, with cache block data -* Resp Null WB - cache block writeback response, without cache block data - -A Response message has the following payload fields: -* Destination ID - destination CCE ID -* Source ID - sending LCE ID - -## Coherence Protocol - -The BedRock coherence system supports variants of the standard MOESIF coherence protocol family. -BlackParrot's current implementation of the instruction and data caches and LCE support the full -set of MOESIF states. The specific protocol implemented in a system is determined by the CCE. -The FSM-based CCE implements a MESI protocol while the microcoded CCE can be programmed for -EI, MSI, MESI, or MOESIF protocols. - -This section provides an overview of the coherence protocol operation. Please view -the following documents for detailed descriptions of the coherence protocol operation -and to view the protocol tables: -* [BedRock Protocol](bedrock_coherence_deep_dive.pdf) -* [LCE Protocol Table](bedrock_coherence_protocol_lce_table.pdf) -* [CCE Protocol Table](bedrock_coherence_protocol_cce_table.pdf) - -### Request Processing - -Each request is processed by a single CCE, and requests are processed in the order they arrive at -the CCE. When a new request arrives, the CCE performs a sequence of operations including checking -the associated pending bit, reading the coherence directory, invalidating or downgrading other LCEs -if required, and sourcing the cache block from memory/LLC or another LCE. A cache request may also -trigger a replacement in the requesting LCE if the target cache set has no free entries to fill -the request to. - -In order to preserve correctness of the coherence protocol, the CCE must perform certain operations -before others. Primarily, this includes checking the pending bit first, performing replacement -in the requesting LCE if required to make room for the request fill, and then invalidating any -copy of the block from other LCEs prior to granting permissions to the requesting LCE. - -At a high level, a coherence request is processed as described by the following list of steps. Each step -may include multiple substeps, and it may be possible to overlap actions of certain steps. The -amount of concurrency between independent requests is dependent on the complexity of the CCE -implementation. In the simplest form, all requests to a single CCE are processed in the order -received. In a complex implementation, it is only necessary to serialize requests to each way group, -while requests to independent way groups may be processed concurrently. - -1. Check Pending Bit for way group associated with request address. If the bit is cleared, - the request can be processed, otherwise stall this request. - -2. Read coherence directory to determine which LCEs have block cached and in which states and to - determine the new coherence state for the block in the requesting LCE. - -3. Invalidate block from other LCEs, if required. - -4. Perform a writeback of the LRU block from requesting LCE, if required. - -5. Determine how request will be satisfied, which may be an Upgrade, LCE to LCE Transfer, - or read from next level of memory (e.g., L2 cache). - -6. If an LCE to LCE Transfer is used, optionally write back the cache block if it was dirty in the - sending LCE's cache. This writeback may be deferred until the block is evicted from the last LCE. - -7. Receive coherence acknowledgement to close transaction. - -## BedRock Fixed-Function CCE - -The BedRock Fixed-Function CCE (FSM CCE) is a hardware implementation of the cache coherence engine -that relies on fixed-function FSM logic to implement the MESI coherence protocol. It is designed -to be performant and efficient, but lacks programmability or flexibility. BlackParrot users -interested in a cache coherent system, but without the need to modify the coherence protocol or -exploit programmability in the CCE should use the FSM CCE. - -## BedRock Programmable CCE - -The BedRock Programmable CCE (ucode CCE) is a hardware implementationm of the cache coherence engine -employing a microcode programmed coherence engine for coherence protocol processing. The ucode CCE -executes a custom microcode ISA and is a two-stage fetch-execute machine. Programmability allows the -ucode CCE to easily switch between variants of the MOESIF protocol (MSI, MESI, MOSI, etc.) and -allows system designers to incorporate custom logic into the protocol processing routines. The ucode -CCE is under constant development and is actively used as a research platform. We encourage those -interested in the ucode CCE to read the [BedRock CCE Microarchitecture](bedrock_uarch_guide.md) -document to learn more about its design and programming. - diff --git a/docs/bedrock_coherence_protocol_cce_table.pdf b/docs/bedrock_coherence_protocol_cce_table.pdf deleted file mode 100644 index 186acf2531..0000000000 Binary files a/docs/bedrock_coherence_protocol_cce_table.pdf and /dev/null differ diff --git a/docs/bedrock_coherence_protocol_lce_table.pdf b/docs/bedrock_coherence_protocol_lce_table.pdf deleted file mode 100644 index 6b746ac806..0000000000 Binary files a/docs/bedrock_coherence_protocol_lce_table.pdf and /dev/null differ diff --git a/docs/bedrock_guide.md b/docs/bedrock_guide.md index 583bbfbcd5..10f529f52f 100644 --- a/docs/bedrock_guide.md +++ b/docs/bedrock_guide.md @@ -1,127 +1,178 @@ -# BedRock - Network Specification - -The BedRock Network Specification defines the on-chip networks used by the cache coherence -and memory system in the BlackParrot system. The cache coherence system keeps the data and -instruction caches of core coherent with eachother for shared-memory multicore designs. The protocols -also support integration of cache coherent accelerators. - -The BedRock Interface is defined in [bp\_common\_bedrock\_if.svh](../bp_common/src/include/bp_common_bedrock_if.svh) -and [bp\_common\_bedrock\_pkgdef.svh](../bp_common/src/include/bp_common_bedrock_pkgdef.svh). -These files are the authoritative definitions for the interface in the event that this -document and the code are out-of-sync. +# BedRock Cache Coherence and Memory System Guide + +BedRock encompasses both the cache coherence and memory systems used in BlackParrot. The principle +component of BedRock is the specification of a cache coherence protocol and its required networks, +which are collectively named BedRock. The BlackParrot implementation of BedRock, called BP-BedRock +specifies the network message formats and implements the required coherence system components. +BP-BedRock further defines a memory interface and system that is compatible with and complementary +to the coherence system and network interfaces. + +## BedRock Cache Coherence Protocol + +BedRock defines a family of directory-based invalidate cache coherence protocols based on the standard +MOESIF coherence states. Protocol variants are defined for the MI, MSI, MESI, MOSI, MOESI, MESIF, +and MOESIF subsets of states. The protocol relies on a duplicate tag, fully inclusive, standalone +coherence directory to precisely track the coherence state of every block cached within the +coherence system. A full description of the BedRock cache coherence protocol and system is available +[here](bedrock_protocol_specification.pdf). This description is system-agnostic, however its design +has been influenced by its implementation within BlackParrot. + +## BlackParrot BedRock Cache Coherence and Memory Systems + +BlackParrot implements BedRock to provide cache coherence between the processor cores and +coherent accelerators in a multicore BlackParrot system. This system is called BlackParrot Bedrock +(BP-BedRock). BP-BedRock also defines a BedRock compatible memory interface. The text below +provides a brief overview of BP-Bedrock. + +### BP-BedRock Network Interface Specifications + +The BlackParrot BedRock Interfaces are defined in the following files: +- [bp\_common\_bedrock\_if.svh](../bp_common/src/include/bp_common_bedrock_if.svh) +- [bp\_common\_bedrock\_pkgdef.svh](../bp_common/src/include/bp_common_bedrock_pkgdef.svh) +- [bp\_common\_bedrock\_wormhole_defines.svh](../bp_common/src/include/bp_common_bedrock_wormhole_defines.svh) + +BP-BedRock defines a common message format with a unified header and parameterizable payload. +The header includes message type, operation sub-type, address, and size fields, as well as +the parameterizable payload. The payload is network-specific and carries metadata required to +process messages on the selected network. The current implementation defines message formats +for the four BedRock coherence protocol networks and a memory command/response network +(discussed in the [interface\_specification](interface_specification.md)). + +The files above are the authoritative definitions for the BP-BedRock interface implementation. +In the event that the code differs from any documentation on or referenced by this page, the code +shall be considered as the current and authoritative specification. + +### BP-BedRock Coherence Interface + +The BP-BedRock coherence interface (also called the LCE-CCE interface) carries messages between the +BlackParrot LCEs (cache controllers) and CCEs (coherence directories). This interface implements +the four BedRock coherence networks: Request, Command, Fill, and Response. Each network utilizes +the BedRock message formats. For brevity, we outline the fields that differ for each network below. +Fields not listed (e.g., message size, address) have common meanings across all message types. + +The Request network has the following message types and payload fields: +- Message type + - Read miss + - Write miss + - Uncached read/load (1, 2, 4, or 8 bytes) + - Uncached write/store (1, 2, 4, or 8 bytes) + - Uncached Atomic +- Payload + - Destination CCE + - Requesting LCE + - Requesting Way ID + - Non-exclusive request hint (request block in read-only state without write permissions) -BedRock defines a common message format that is specialized to support both the on-chip -cache coherence system and the memory interface networks from a BlackParrot processor to memory. -The protocol can be easily transduced to standard protocols such as AXI, AXI-Lite, or WishBone. -BedRock messages are designed for use as a latency-insensitive interface. Although a particular -handshake is not required, ready&valid handshaking should be used whenever possible. +The Command network has the following message types and payload fields: +- Message type + - Sync + - Invalidate + - Set State + - Data (cache block data, tag, and state) + - Set State and Wakeup (cache block permission upgrade, no data) + - Writeback + - Set State and Writeback + - Transfer + - Set State and Transfer + - Set State, Transfer, and Writeback + - Uncached Data (uncached load request data from memory) + - Uncached Store Done (uncached store request has been completed to memory) +- Payload + - Destination LCE + - CCE sending command + - Cache Way ID + - Coherence State + - Target cache, state, and way ID for cache to cache transfer -## BedRock Message Format +The Fill network has the following message types and payload fields: +- Message type + - Data (cache to cache block transfer) +- Payload + - Destination LCE + - CCE managing block + - Cache Way ID + - Coherence State -A BedRock message has the following fields: +The Response network has the following message types and payload fields: - Message type -- Write Subop type (store, amoswap, amolr, amosc, amoadd, amoxor, amoand, amoor, amomin, amomax, amominu, amomaxu) -- Physical address -- Message Size + - Sync Ack + - Invalidation Ack + - Coherence Transaction Ack + - Writeback + - Null Writeback - Payload -- Data - -The message type is a network specific message type. The write subop type specifies the type of\ -write or atomic operation, which is required for those operations. The physical address is the -address of the requested data, aligned according to the message size field. -The message size field specifies the size of request or accompanying data as log2(size) bytes. -The payload is a network specific field used to communicate additional information between sender -and receiver, or used by the sender to attach information to the message that should be returned -unmodified by the receiver in the response. The data field contains (1 << message size) bytes for -messages that contain valid data. - -## BedRock Protocols - -BedRock defines three closely related protocols: Stream, Lite, and Burst. Each protocol carries -the same message information. They differ only in the specific header and data signals used -for protocol communication. - -All three protocols support critical word first behavior, where a request for a specific word -in a cache or memory block is returned in the least-significant bits of the response message. -The critical data is provided first with the remaining words provided in sequential ordering, -wrapping around as required. THe following example requests illustrate this behavior: - -Request: 0x0 [d c b a]
-Request: 0x2 [b a d c] - -## BedRock Stream - -The BedRock Stream protocol comprises the following signals: - -* Header -* Data (64\*-bits) -* Valid -* Ready\_and -* Last - -Each message is sent as one or more header plus data beats using a shared ready&valid handshake. -The last signal is raised along with valid when the sender is transmitting the last header plus data beat. -Last must not be raised if there is no valid data available. -The data field is typically 64-bits, but may be any 512/N-bits wide that is at least 64-bits. - -When sending multiple beat messages, the sender must increment the address in the header by -data-width bits for each beat. Critical-word first behavior is easily supported by issuing the -first beat for the critical word, followed by successive data words in sequential order with wrap -around (e.g., [1, 0, 3, 2], left to right MSB to LSB, LSB arrives first). If the requested data size -is smaller than the data channel size, -the requested data is repeated to fill the channel. For example the data response for a 16-bit -request using a 64-bit channel for some data value A has a 64-bit data response of [A, A, A, A]. - -## BedRock Lite (Deprecated) - -BedRock Lite is a wide variant of BedRock Stream. BedRock Lite does not use the Last signal as -every message is a single header plus data beat. The data channel width is equal to the cache or -memory block width used by the sender and receiver. Critical word first is supported by the sender -issuing the request with the desired address and the receiver responding with memory block rotated -so the critical word is placed in the least significant bits. - -Requests for data that smaller than the data channel width result in responses where the returned -data is replicated to fill the data channel width. - -BedRock Lite is deprecated and should be replaced by Stream interfaces wherever found. - -## BedRock Burst - -BedRock Burst is similar to BedRock Stream, but sends only a single header message followed by -zero or more data beats. The BedRock Burst protocol has the following signals: - -* Header -* Header\_valid -* Header\_ready\_and -* Has\_data -* Data (64\*-bits) -* Data\_valid -* Data\_ready\_and -* Last - -In this protocol, the header and data channels have independent ready&valid handshakes. The header -is accompanied by a has\_data signal that is raised if the message has at least one data beats. -The data channel is accompanied by a last signal that is raised with data\_valid on the last data -beat. Last must not be raised if there is no valid data available. -As with BedRock Stream, the data channel may is typically 64-bits wide, but may be any -512/N-bits wide that is at least 64-bits. + - Destination CCE + - Responding LCE + +#### Address and Data Alignment + +All four LCE-CCE networks have the same address and data alignment properties. Uncached accesses are +naturally aligned to the size of the request, and behavior of a misaligned request is undefined. + +Cacheable accesses are block-based and support critical word first behavior. Data is returned +to the cache beginning with the byte at the LCE Request address, then wrapping around at the natural +cache block boundary. In other words, data is returned as found in the cache block, from LSB to MSB, +but left rotated to place the requested byte at the LSB of the message data field. This +behavior naturally supports networks that serialize the cache block data and send the block in +multiple data beats, as well as conversion between different serialization widths without requiring +re-alignment of message data. + +The BlackParrot LCEs and CCEs expect that cacheable requests are issued aligned to the BedRock +network data channel width (which is currently the same as the cache fill width) at the LCE. + +### BedRock Burst Network Protocol + +BP-BedRock defines the BedRock Burst network protocol to exchange BedRock messages between +modules. BedRock Burst has independent header and data channels with ready-and-valid handshaking +on each channel. The BedRock Burst protocol comprises the following signals: + +- header +- header\_valid +- has\_data +- header\_ready\_and +- data +- data\_valid +- last +- data\_ready\_and + +The has\_data signal is raised with header\_valid when the message being sent includes at least +one data beat. The last signal is raised with data\_valid when the last data beat of the message +is being sent. The width of the data channel must be a power-of-two number of bits, in the inclusive +range of 64- to 1024-bits. The data channel should not be wider than the size of a cache block. The sender contract is: -* Data may be sent before, with, or after header. -* Minimal implementations may wait to send data until after header sends. -* Header and data channels must conform to ready&valid handshaking. +* Header and data channels must conform to ready&valid handshaking +* Data may be sent before, with, or after header +* header\_valid must not depend on data\_ready\_and +* All data beast for the current message must send before any data beats of future messages are sent The receiver contract is: -* May consume data before, with, or after header. -* Minimal implementations may wait to receive data (i.e., wait to raise data\_ready\_and) until -after the header arrives and is processed. -* Header and data channels must conform to ready&valid handshaking. +* Header and data channels must conform to ready&valid handshaking +* May consume data before, with, or after header +* header\_ready\_and must not depend on data\_valid +* has\_data must not be used in the header channel handshake +* last must not be used in the data channel handshake Sophisticated implementations of BedRock Burst channels may support overlapping transactions where the sender may send a second header prior to sending all data associated with the first header. The receiver must also support this behavior. If either send or receiver does not support overlapping transactions, then transactions will necessarily be non-overlapping. -As with BedRock Stream, requests for data smaller than the data channel width result in a response -with data replicated to fill the data channel width. +#### Minimal BedRock Burst Implementations + +Minimal implementations of BedRock Burst producers and consumers may further restrict +the producer and consumer contracts above. For example, implementations commonly require the header +handshake to occur prior to any data channel handshake for the current message, or disallow +an additional header handshake from occurring until the all data beats from the current message +have been transmitted. + +### BP-BedRock Local Cache Engine (LCE) Microarchitecture + +Coming Soon! + +### BP-BedRock Cache Coherence Engine (CCE) Microarchitecture + +Refer to the [BedRock Microarchitecture Guide](bedrock_uarch_guide.md) for an overview of the cache +coherence directory designs employed in BlackParrot. + diff --git a/docs/bedrock_protocol_specification.pdf b/docs/bedrock_protocol_specification.pdf new file mode 100644 index 0000000000..63fe6c74df Binary files /dev/null and b/docs/bedrock_protocol_specification.pdf differ diff --git a/docs/bedrock_uarch_guide.md b/docs/bedrock_uarch_guide.md index 8d9621b441..bdcbd3fc40 100644 --- a/docs/bedrock_uarch_guide.md +++ b/docs/bedrock_uarch_guide.md @@ -2,7 +2,7 @@ This document details the microarchitecture of the BedRock Programmable CCE (ucode CCE). For a general overview of the BedRock coherence protocol and system, please see the general -[BedRock guide](bedrock_guide.md). +[BedRock Cache Coherence and Memory System Guide](bedrock_guide.md). ## Ucode CCE @@ -213,9 +213,10 @@ a single cycle to execute. Microcode instructions are 32-bits wide and tagged wi bits to enable fast branch detection and prediction in the fetch state, which are the branch and predict taken bit. -The microcode instructions and formats are defined in -[bp\_me\_cce\_inst\_defines.svh](../bp_me/src/include/bp_cce_inst_defines.svh) and -[bp\_me\_cce\_inst\_pkgdef.svh](../bp_me/src/include/bp_cce_inst_pkgdef.svh). +The microcode instructions and formats are defined in: +- [bp\_me\_cce\_inst\_defines.svh](../bp_me/src/include/bp_me_cce_inst_defines.svh) +- [bp\_me\_cce\_inst\_pkgdef.svh](../bp_me/src/include/bp_me_cce_inst_pkgdef.svh) +- [bp\_me\_cce\_pkgdef.svh](../bp_me/src/include/bp_me_cce_pkgdef.svh) ### Microcode Instruction Classes There are six different classes of instructions in BedRock's ISA. diff --git a/docs/interface_specification.md b/docs/interface_specification.md index d4f479bdf3..f0b3570da0 100644 --- a/docs/interface_specification.md +++ b/docs/interface_specification.md @@ -240,14 +240,35 @@ There are additional signals for available credits in the engine, used for fenci signify all downstream transactions have completed, whereas full credits signify no more transactions may be sent to the network. -## Memory Interface +## BedRock Interface + +The BlackParrot memory and cache coherence networks rely on a common message format that can +is easily specialized for the specific network interface. A BedRock message includes a +header and zero or more bytes of data. + +The BlackParrot BedRock Interfaces are defined in the following files: +- [bp\_common\_bedrock\_if.svh](../bp_common/src/include/bp_common_bedrock_if.svh) +- [bp\_common\_bedrock\_pkgdef.svh](../bp_common/src/include/bp_common_bedrock_pkgdef.svh) +- [bp\_common\_bedrock\_wormhole_defines.svh](../bp_common/src/include/bp_common_bedrock_wormhole_defines.svh) + +A BedRock message header is composed of: +- Message type (available types depend on the specific network) +- Subop type (store, amolr, amosc, amoswap, amoadd, amoxor, amoand, amoor, amomin, amomax, amominu, amomaxu) +- Physical address +- Message Size (1 to 128 bytes, in powers of two; specifies request size or size of attached data) +- Payload (a black-box to the command receiver, this is returned as-is along with the memory response) + +A BedRock message may also include data. The amount of data is specified by the message size field +in the message header. Alignment of the message address and data is specific to the network +implementation. + +### Memory and I/O Interface The BlackParrot Memory Interface is a simple command / response interface used for communicating with memory or I/O devices. The goal is to provide a simple and understandable way to access any type of memory system, be it a shared bus or a more sophisticated network-on-chip scheme. -The Memory Interface can easily be transduced to standard protocols such as AXI, AXI-lite or WishBone. -Components connecting to the Memory Interface should implement one of the -[BedRock Interfaces](bedrock_guide.md). +The Memory Interface can easily be transduced to standard protocols such as AXI, AXI-lite or WishBone, +and is implemented using the BedRock network interfaces. A memory command or response packet is composed of: - Message type @@ -259,85 +280,27 @@ A memory command or response packet is composed of: - Atomic operation - Subop type (amoswap, amolr, amosc, amoadd, amoxor, amoand, amoor, amomin, amomax, amominu, amomaxu) - Physical address -- Request Size +- Message/Request Size - Payload (A black-box to the command receiver, this is returned as-is along with the memory response) - An example payload for the CCE is: - Requesting LCE - Requesting way id - Coherence state - Whether this is a speculative request -- Data - -Misaligned addresses return data wrapped around the request size using the following scheme: - -Request: 0x0 [d c b a] -Request: 0x2 [b a d c] - -## LCE-CCE Interface - -The LCE-CCE Interface comprises the connections between the BlackParrot caches and the -memory system in a cache-coherent BlackParrot multicore processor. The interface is implemented with -three networks: Request, Command, and Response. These networks carry memory access requests and -cache coherence management traffic between the Local Cache Engines (LCE) and Cache Coherence -Engines (CCE). All components participating in cache coherence and communicating on the LCE-CCE -Interface must implement the [BedRock Interface](bedrock_guide.md) cache coherence channels. -The BedRock Cache Coherence Protocol is described in detail on [this page](bedrock_coherence_protocol.md). -The remainder of this section provides a high-level overview of the LCE-CCE Interface components. - -A Local Cache Engine (LCE) is a coherence controller attached to each entity in the system -participating in coherence. The most common entities are the instruction and data caches -in the Front End and Back End, respectively, of a BlackParrot processor. The LCE is responsible -for initiating coherence requests and responding to coherence commands. - -A Cache Coherence Engine (CCE) is a coherence directory that manages the coherence state of blocks -cached in any of the LCEs. The CCEs have full control over the coherence state of all cache blocks. -Each CCE manages the coherence state of a subset of the physical address space, and there may be -many LCEs and CCEs in a multicore BlackParrot processor. - -The LCE-CCE Interface comprises three networks: Request, Command, and Response. An -LCE initiates a coherence request using the Request network. The CCEs issue commands, such as -invalidations, transfers, or fills to satisfy coherence requests, on the Command network while -processing a request. The LCEs respond to commands issued by the CCEs by sending messages -on the Response network. The current implementation of BlackParrot uses point-to-point -ordered networks for all of the LCE-CCE Interface networks, however, the coherence protocol -is designed to be correct on unordered networks. A CCE implementation must obey the following -network priorities from high to low: Response, Command, Request. A lower priority message may -cause a higher priority message to be sent, but a higher priority message may not cause a lower -priority message to send. This priority ordering helps guarantee the correctness of the cache -coherence protocol. - -The LCE-CCE Interface is defined in [bp\_common\_bedrock\_if.vh](../bp_common/src/include/bp_common_bedrock_if.svh) -and [bp\_common\_bedrock\_pkgdef.svh](../bp_common/src/include/bp_common_bedrock_pkgdef.svh). All -LCE-CCE messages use the BedRock message format and differ only in the contents of the payload field in the -header. These files are the authoritative definitions for the interface in the event that this -document and the code are out-of-sync. - -### Request Network - -The Request network carries coherence requests from the LCEs to the CCEs. Requests are initiated -when an LCE encounters a cache or coherence miss. Cache misses occur when the LCE does not contain -the desired cache block. A coherence miss occurs when the LCE contains a valid copy of the desired -cache block, but has insufficient permissions to perform the desired operation (e.g., trying to write -a cache block with read-only permissions). Requests may also be issued for uncached loads and stores -or for atomic operations. Issuing a request initiates a new coherence -transaction, which is handled by one of the CCEs in the system. Uncached requests may result in -coherence transactions when targeting cacheable memory to guarantee memory correctness across all -cores. - -### Command Network - -The Command network carries commands and data to the LCEs. Most messages on this network originate -at the CCEs. LCEs may be commanded to send a Command message by a CCE to perform a LCE to LCE -cache block transfer when required by the coherence protocol, but otherwise may not initiate -Command messages. Common Commands include cache block invalidation and writeback commands, cache -block fills, and LCE to LCE transfer commands. - -### Response Network - -The Response network carries acknowledgement and data writeback messages from the LCEs to the -CCEs. The CCE must be able to sink any potential response that could be generated in the system -in order to prevent deadlock in the system. Sinking a message can be accomplished by processing -the message when it arrives or placing it into a buffer to consume it from the network. The CCE -must be able to sink or buffer all possible response messages generated by a single coherence -transaction in-order to avoid blocking the coherence networks. +- Data (for memory write or uncached write operations) + +Uncached accesses must be naturally aligned with the request size. Cached accesses are block-based +and return the cache block containing the requested address. Cached accesses return the critical +data word first (at LSB of data) and wrap around the requested block as follows: + +- Request: 0x00, size=32B [D C B A] +- Request: 0x10, size=32B [B A D C] + +### LCE-CCE Interface + +The LCE-CCE Interface comprises the connections between the BlackParrot caches and coherence +directories in a cache-coherent BlackParrot multicore processor. These networks support the +BlackParrot BedRock coherence protocol implementation and utilize the common BedRock message +format described above. A full description of the LCE-CCE Interface and its implementation +can be found in the [BedRock Cache Coherence and Memory System Guide](bedrock_guide.md). diff --git a/docs/microarchitecture_guide.md b/docs/microarchitecture_guide.md index 03f6a7038d..2289f89e54 100644 --- a/docs/microarchitecture_guide.md +++ b/docs/microarchitecture_guide.md @@ -60,6 +60,6 @@ RISC-V instructions: The data cache is a VIPT (Virtually-Indexed Physical-Tagged) cache with three pipeline stages: Tag Lookup (TL) and Tag Verify (TV). There are 3 hardened memories in the D$: the data mem, tag mem and stat mem. They are implemented as 1RW synchronous RAMs to be amenable to most commercial SRAM generators. In TL, the data memory and tag memory are accessed. In TV, the data from these caches is selected based on the result of the tag comparison. Additionally, data is written to a 2-entry writebuffer, which is used to prevent data memory structural hazards. Data mux (DM) stage sign extends, recodes floating point loads and selects subword loads. ## Memory End -Refer to the [BedRock Microarchitecture Guide](bedrock_uarch_guide.md) for an overview of the cache -coherence directory designs employed in BlackParrot. +Refer to the [BedRock Microarchitecture Guide](bedrock_uarch_guide.md) for an overview of the +coherence directory (CCE) microarchitectures as implemented in BlackParrot.