New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PATCH v2] Misc fast path optimizations #281
Conversation
@@ -21,6 +21,9 @@ | |||
/** @internal Inline function offsets */ | |||
extern const _odp_packet_inline_offset_t _odp_packet_inline; | |||
|
|||
/** @internal Pool inline function offsets */ | |||
extern const _odp_pool_inline_offset_t _odp_pool_inline; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go to pool_inlines.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pool_inlines.h would only include these two lines and looking at pool API there aren't any additional functions which should be inlined. Do you have a particular use case in mind for this header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to have pool functions in pool header, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there are currently no functions in sight to put into pool_inlines.h I would suggest not to add a new header file. If/when some pool inline function are added later on the header can be created them. Is this OK for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MatiasElo ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API being inlined is odp_packet_pool()
so this is an artifact of the implementation of that API. So I'm OK with it being here since this is supposed to be fastpath code.
uint16_t uarea_size; | ||
|
||
} _odp_pool_inline_offset_t; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this to pool_inline_types.h.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed the same pattern as the packet inlines. _odp_packet_inline_offset_t is defined in packet_types.h.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MatiasElo I agree this is correctly placed here.
@MatiasElo first patches look good, Reviewed-by: Dmitry Eremin-Solenikov Last one might clash with #250 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any performance measurement results with these changes?
@@ -104,17 +104,10 @@ struct odp_buffer_hdr_t { | |||
/* User area pointer */ | |||
void *uarea_addr; | |||
|
|||
/* User area size */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is done already in 2.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nagarahalli Any reason why we shouldn't have it in master?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is about 2.0 merge to master, it will happen sometime in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not planned untill TigerMoth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2.0 is based on api-next, not master. This will be one of several conflicts that will need to be resolved at merge time.
@@ -1079,7 +1079,7 @@ static int dpdk_input_queues_config(pktio_entry_t *pktio_entry, | |||
const odp_pktin_queue_param_t *p) | |||
{ | |||
odp_pktin_mode_t mode = pktio_entry->s.param.in_mode; | |||
odp_bool_t lockless; | |||
uint8_t lockless; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is that needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While packing the pkt_dpdk_t struct I changed pkt_dpdk_t.lockless_rx and pkt_dpdk_t.lockless_tx to uint8_t. I'm using the same type here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MatiasElo yes, I see. While reading patches in that web interface sometime I miss some pieces of code. Just above you changed that.
@nagarahalli In l2fwd this patch set provides ~2-4% improvement (mainly in scheduled mode). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed-by: Bill Fischofer bill.fischofer@linaro.org
@@ -104,17 +104,10 @@ struct odp_buffer_hdr_t { | |||
/* User area pointer */ | |||
void *uarea_addr; | |||
|
|||
/* User area size */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2.0 is based on api-next, not master. This will be one of several conflicts that will need to be resolved at merge time.
uint16_t uarea_size; | ||
|
||
} _odp_pool_inline_offset_t; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MatiasElo I agree this is correctly placed here.
@@ -21,6 +21,9 @@ | |||
/** @internal Inline function offsets */ | |||
extern const _odp_packet_inline_offset_t _odp_packet_inline; | |||
|
|||
/** @internal Pool inline function offsets */ | |||
extern const _odp_pool_inline_offset_t _odp_pool_inline; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API being inlined is odp_packet_pool()
so this is an artifact of the implementation of that API. So I'm OK with it being here since this is supposed to be fastpath code.
@@ -21,6 +21,9 @@ | |||
/** @internal Inline function offsets */ | |||
extern const _odp_packet_inline_offset_t _odp_packet_inline; | |||
|
|||
/** @internal Pool inline function offsets */ | |||
extern const _odp_pool_inline_offset_t _odp_pool_inline; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extern is not needed in .h file, also extern 2 lines above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case it is needed. _odp_pool_inline is defined in odp_pool.c and _odp_packet_inline in odp_packet.c. ODP build with '--disable-abi-compat' will fail without extern declarations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@muvarov extern is not needed for function prototypes.
dsthdr->buf_hdr.uarea_size : | ||
srchdr->buf_hdr.uarea_size); | ||
srchdr->buf_hdr.uarea_addr != NULL) { | ||
memcpy(dsthdr->buf_hdr.uarea_addr, srchdr->buf_hdr.uarea_addr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
odp_memcpy() might be good fit here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
odp_memcpy() is just a wrapper for memcpy() and I would avoid using api functions in the implementation unless there is a clear benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I just thought maybe some other implementations already speedup this call and we can get this benefits for free. But I don't might if it will be simple memcpy().
uint8_t lockless_rx; /**< no locking for rx */ | ||
uint8_t lockless_tx; /**< no locking for tx */ | ||
uint8_t port_id; /**< DPDK port identifier */ | ||
/* --- 34 bytes --- */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is propose of sayng 34 bytes here? Cache line is 64 so maybe comments for cache lines borders are more interesting here? vdev_sysc_promisc, lockless_rx, lockless_tx can be packet to bit field. That will also reduce structure size. Is there any reason to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The next member in the struct named 'capa' is 40 bytes long, so it will cross a cache line boundary. I was thinking this comment was more useful than an ambiguous cache line comment.
There is still space in the first cache line for additional fast path variables, so I wouldn't use a bitmask before it is actually needed.
uint8_t lockless_tx; /**< no locking for tx */ | ||
uint8_t port_id; /**< DPDK port identifier */ | ||
/* --- 34 bytes --- */ | ||
odp_pktio_capability_t capa; /**< interface capabilities */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pktio capability is big struct and it's same for all packets. Is it really needs to be here? as well as uint16_t mtu? why not in pktio_entry_t ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A valid point, at least I don't see any reason why capability struct is stored inside each pktio implementation. MTU is used in the fast path so I wouldn't move that. Should I move the capability in this PR or create a new one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if it will be in this request it will make easy to review all changes at once.
@MatiasElo few questions in that PR. |
Disable debug checks by default for optimal performance. Signed-off-by: Matias Elo <matias.elo@nokia.com>
Reorder pkt_dpdk_t members so that all struct members used in fast path are located on the same cache line. Signed-off-by: Matias Elo <matias.elo@nokia.com>
Data size is constant for all buffers from the same pool, so there is no need to store the value in the buffer header. Signed-off-by: Matias Elo <matias.elo@nokia.com>
Pool entry pointer is already stored in the odp_buffer_hdr_t struct. Signed-off-by: Matias Elo <matias.elo@nokia.com>
Add inlined accessors for pool_t members pool_hdl and uarea_size. This enables removing the matching members from odp_buffer_hdr_t, and thus reduces per buffer overhead. Signed-off-by: Matias Elo <matias.elo@nokia.com>
Since odp_pktio_capability_t is common for all pktio implementations move it inside pktio_entry_t. Signed-off-by: Matias Elo <matias.elo@nokia.com>
Merged. |
Improve fast path performance by reorganizing and cleaning buffer headers.
V2: