New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PATCH CATERPILLAR v5] Caterpillar mdev auxiliary #359
[PATCH CATERPILLAR v5] Caterpillar mdev auxiliary #359
Conversation
5ae0d10
to
2faaf33
Compare
2faaf33
to
dce14ca
Compare
This is false positive in checkpatch (the same is done in the kernel), it's impossible to enclose this in parentheses: _ERROR: Macros with complex values should be enclosed in parentheses ERROR: Macros with complex values should be enclosed in parentheses ERROR: Macros with complex values should be enclosed in parentheses |
dce14ca
to
0d30853
Compare
include/odp/drv/spec/mmio.h
Outdated
* @param value cpu native 8-bit value to write to MMIO | ||
* @param addr MMIO address to write at | ||
*/ | ||
void odpdrv_mmio_u8be_write(uint8_t value, volatile void *addr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both odpdrv_mmio_u8le_write()
and odpdrv_mmio_u8be_write()
are documented the same. I assume the conversions are from host-endian to le or be explicitly? That should be documented as such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether to have u8le + u8be or a single u8 is just a matter of taste: anyone understands that they do the same thing, so the only question is whether you want this identifier to look similar to other in the same group. I'll change to u8, OK with me.
I'll update doxygen comments to explicitly say 'little endian' and 'big endian'.
include/odp/drv/spec/mmio.h
Outdated
* @param value cpu native 64-bit value to write to MMIO | ||
* @param addr MMIO address to write at | ||
*/ | ||
void odpdrv_mmio_u64be_write(uint64_t value, volatile void *addr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar documentation comments for the u16, u32, and u64 variants of these routines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK (see first comment)
include/odp/drv/spec/mmio.h
Outdated
* @param addr MMIO address to read at | ||
* @return cpu native 64-bit value | ||
*/ | ||
uint64_t odpdrv_mmio_u64be_read(volatile void *addr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I'd make it explicit that these take a le or be source and convert it into native host-endian as part of the read operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK (see first comment)
odpdrv_mmio_u8be_write(uint8_t value, volatile void __odpdrv_mmio *addr) | ||
{ | ||
odpdrv_mmio_u8le_write(value, addr); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have two routines if they are just synonyms for each other? I thought the idea was to combine the write with an endian conversion? If there's no actual conversion going on then wouldn't odpdrv_mmio_u8_write()
, etc., be sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK (see first comment)
odpdrv_mmio_u16le_write(uint16_t value, volatile void __odpdrv_mmio *addr) | ||
{ | ||
odpdrv_io_wmb(); | ||
*(__odp_force volatile odp_u16le_t *)addr = odp_cpu_to_le_16(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these routines are likely to be performance critical, you're probably going to want to invoke the byteorder inline forms directly rather than incur the cost of a function call. These simply expand into the GCC builtins. Whether this is done via the inlines or directly via southbound equivalent converters is something we'll need to consider. For now this is fine since it's just basic scaffolding, but something that will need to be revisited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't odp_xxx_to_yyy_NN() expand to inline function which uses cc builtin ? I guess cc is smart enough to optimize out all wrapper code and use builtin (or unmodified value) directly. Is there really a reason to duplicate byteorder_inlines.h here ?
Side note on performance impact:
I wouldn't consider MMIO accesses performance critical (in a sense that you can optimize it and get performance boost). What is basically happening when you access MMIO is that CPU starts read/write transaction which ends up at the chipset (PCIe root complex in fact), the chipset sends MMIO transaction (64 bytes + overhead) to the peripheral and then awaits response (completion) from the peripheral. All in all it takes hundreds of CPU cycles to complete, and CPU can't do anything during this time. In short: it's way more expensive than even cache miss on host RAM, and register size (8-64 bits) has no impact on the total transaction cost.
So, any code which is accessing MMIO too often is poorly designed, basically. In mediated device drivers we try to access it as little as possible, usually 1-2 times per TX/RX burst.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ODP APIs can only expand inline when --enable-abi-compat=no
is used, which means the application is targeting an embedded environment. Take a look at odp_packet.c
which supports inline forms of APIs. The idea is that the external API may be inlined or not when called by applications, but the implementation itself always uses the inline form since it's self-contained for a specific target.
For these conversions, ODP itself is going to be compiled for a specific endianness so you can use the GCC __builtin_bswapNN()
routines directly as needed. This avoids any function calls. For example:
/* Set up the conversion functions we need for this host's endianness */
#if ODP_BYTE_ORDER == ODP_LITTLE_ENDIAN
#define _odp_u16le(x) (x)
#define _odp_u16be(x) __builtin_bswap16(x)
#define _odp_u32le(x) (x)
#define _odp_u32be(x) __builtin_bswap32(x)
#define _odp_u64le(x) (x)
#define _odp_u64be(x) __builtin_bswap64(x)
#else
#define _odp_u16le(x) __builtin_bswap16(x)
#define _odp_u16be(x) (x)
#define _odp_u32le(x) __builtin_bswap32(x)
#define _odp_u32be(x) (x)
#define _odp_u64le(x) __builtin_bswap64(x)
#define _odp_u64be(x) (x)
#endif
static inline void
odpdrv_mmio_u16le_write(uint16_t valie, volatile void __odpdrv_mmio *addr)
{
odpdrv_io_wmb();
*(__odp_force volatile odp_u16le_t *)addr = _odp_u16le(value);
}
...etc.
Since these routines are on performance-critical paths, eliminating the calls to the external APIs is likely to be measurably beneficial.
odpdrv_mmio_u8be_read(volatile void __odpdrv_mmio *addr) | ||
{ | ||
return odpdrv_mmio_u8le_read(addr); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comments w.r.t. the read variants. Since there is no endian conversion for u8 variants is odpdrv_mmio_u8_read()
sufficient here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK (see first comment)
odp_be_to_cpu_64(*(__odp_force volatile odp_u64be_t *)addr); | ||
|
||
odpdrv_io_rmb(); | ||
return value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thought, since host bytes are maintained in host byte order, do we need explicit be/le variants of all of these routines? Is it the case that different I/O devices expect bytes to be presented in different endian modes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the case, different devices expect MMIO registers and dma coherent data in host RAM in either le or be, no one cares about host endianness. Some funny devices even mix le and be in the same peripheral-host communication, like Chelsio being be uC but expecting TX/RX doorbells in MMIO as le :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that's good to know. Thanks.
({ \ | ||
__typeof__(a) tmp_a = (a); \ | ||
__typeof__(b) tmp_b = (b); \ | ||
tmp_a < tmp_b ? tmp_a : tmp_b; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how this is different from the previous definition. Won't this still produce a warning if a and b have different signedness? Shouldn't the second line be:
__typeof__(a) tmp_b = (__typeof__(a))(b);
although maybe an error/warning is preferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the idea is to generate compilation error if signedness doesn't match.
odpdrv_mmio_u16le_write(uint16_t value, volatile void __odpdrv_mmio *addr) | ||
{ | ||
odpdrv_io_wmb(); | ||
*(__odp_force volatile odp_u16le_t *)addr = odp_cpu_to_le_16(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ODP APIs can only expand inline when --enable-abi-compat=no
is used, which means the application is targeting an embedded environment. Take a look at odp_packet.c
which supports inline forms of APIs. The idea is that the external API may be inlined or not when called by applications, but the implementation itself always uses the inline form since it's self-contained for a specific target.
For these conversions, ODP itself is going to be compiled for a specific endianness so you can use the GCC __builtin_bswapNN()
routines directly as needed. This avoids any function calls. For example:
/* Set up the conversion functions we need for this host's endianness */
#if ODP_BYTE_ORDER == ODP_LITTLE_ENDIAN
#define _odp_u16le(x) (x)
#define _odp_u16be(x) __builtin_bswap16(x)
#define _odp_u32le(x) (x)
#define _odp_u32be(x) __builtin_bswap32(x)
#define _odp_u64le(x) (x)
#define _odp_u64be(x) __builtin_bswap64(x)
#else
#define _odp_u16le(x) __builtin_bswap16(x)
#define _odp_u16be(x) (x)
#define _odp_u32le(x) __builtin_bswap32(x)
#define _odp_u32be(x) (x)
#define _odp_u64le(x) __builtin_bswap64(x)
#define _odp_u64be(x) (x)
#endif
static inline void
odpdrv_mmio_u16le_write(uint16_t valie, volatile void __odpdrv_mmio *addr)
{
odpdrv_io_wmb();
*(__odp_force volatile odp_u16le_t *)addr = _odp_u16le(value);
}
...etc.
Since these routines are on performance-critical paths, eliminating the calls to the external APIs is likely to be measurably beneficial.
odp_be_to_cpu_64(*(__odp_force volatile odp_u64be_t *)addr); | ||
|
||
odpdrv_io_rmb(); | ||
return value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that's good to know. Thanks.
(couldn't reply to "abi-compat" request for changes from Bill directly for some reason, replying here instead.) I would try to avoid adding 3rd endianness API here if at all possible. Are we going to support "abi-compat" for odpdrv APIs ? If not it could make sense to make odpdrv_ endianness conversion functions always inline. All mdev core and drivers code will use it then as it's really more performance critical than MMIO access. |
Signed-off-by: Mykyta Iziumtsev <mykyta.iziumtsev@linaro.org>
Add helper function to query RX/TX queue lengths on network interface to facilitate upcoming native and mediated device drivers. Signed-off-by: Mykyta Iziumtsev <mykyta.iziumtsev@linaro.org>
Added char* and uint64_t sysfs attribute read functions to support upcoming native and mediated device drivers. Updated sysfs_netif_stats() to take interface name as argument instead of pktio_entry, because pktio_entry->s.name is not necessarily recognisable Linux network interface name. Signed-off-by: Mykyta Iziumtsev <mykyta.iziumtsev@linaro.org>
Gather macros needed for upcoming mediated devices in one location. Fix signed vs. unsigned comparisons caused by incorrect MIN / MAX usage. Signed-off-by: Mykyta Iziumtsev <mykyta.iziumtsev@linaro.org>
0d30853
to
756dc7c
Compare
Checkpatch issue confirmed to be a false positive. Ok to ignore. |
Merged and close. |
Add miscellaneous patches required for mediated device drivers support.
"Work in progress" branch with mdev common code and i40e driver can be found here:
https://github.com/MykytaI/odp/tree/caterpillar_mdev_i40e_demo
Last two patches aren't yet ready for upstreaming (i40e works, but code needs more documentation & cleanup) and cxgb4+e1000e drivers aren't merged on the top of the branch either.