Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PATCH v2] Misc fast path optimizations #281

Closed
wants to merge 6 commits into from
Closed

[PATCH v2] Misc fast path optimizations #281

wants to merge 6 commits into from

Conversation

MatiasElo
Copy link
Collaborator

@MatiasElo MatiasElo commented Nov 7, 2017

Improve fast path performance by reorganizing and cleaning buffer headers.

V2:

  • Add pktio capability to pktio_entry_t (Maxim)

@muvarov muvarov changed the title Misc fast path optimizations [PATCH v1] Misc fast path optimizations Nov 7, 2017
@@ -21,6 +21,9 @@
/** @internal Inline function offsets */
extern const _odp_packet_inline_offset_t _odp_packet_inline;

/** @internal Pool inline function offsets */
extern const _odp_pool_inline_offset_t _odp_pool_inline;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go to pool_inlines.h

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pool_inlines.h would only include these two lines and looking at pool API there aren't any additional functions which should be inlined. Do you have a particular use case in mind for this header?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to have pool functions in pool header, etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there are currently no functions in sight to put into pool_inlines.h I would suggest not to add a new header file. If/when some pool inline function are added later on the header can be created them. Is this OK for you?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API being inlined is odp_packet_pool() so this is an artifact of the implementation of that API. So I'm OK with it being here since this is supposed to be fastpath code.

uint16_t uarea_size;

} _odp_pool_inline_offset_t;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this to pool_inline_types.h.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the same pattern as the packet inlines. _odp_packet_inline_offset_t is defined in packet_types.h.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MatiasElo I agree this is correctly placed here.

@lumag
Copy link

lumag commented Nov 7, 2017

@MatiasElo first patches look good, Reviewed-by: Dmitry Eremin-Solenikov

Last one might clash with #250

Copy link
Contributor

@nagarahalli nagarahalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any performance measurement results with these changes?

@@ -104,17 +104,10 @@ struct odp_buffer_hdr_t {
/* User area pointer */
void *uarea_addr;

/* User area size */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is done already in 2.0.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nagarahalli Any reason why we shouldn't have it in master?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is about 2.0 merge to master, it will happen sometime in the future.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not planned untill TigerMoth.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.0 is based on api-next, not master. This will be one of several conflicts that will need to be resolved at merge time.

@@ -1079,7 +1079,7 @@ static int dpdk_input_queues_config(pktio_entry_t *pktio_entry,
const odp_pktin_queue_param_t *p)
{
odp_pktin_mode_t mode = pktio_entry->s.param.in_mode;
odp_bool_t lockless;
uint8_t lockless;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is that needed?

Copy link
Collaborator Author

@MatiasElo MatiasElo Nov 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While packing the pkt_dpdk_t struct I changed pkt_dpdk_t.lockless_rx and pkt_dpdk_t.lockless_tx to uint8_t. I'm using the same type here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MatiasElo yes, I see. While reading patches in that web interface sometime I miss some pieces of code. Just above you changed that.

@MatiasElo
Copy link
Collaborator Author

@nagarahalli In l2fwd this patch set provides ~2-4% improvement (mainly in scheduled mode).

Copy link
Contributor

@Bill-Fischofer-Linaro Bill-Fischofer-Linaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed-by: Bill Fischofer bill.fischofer@linaro.org

@@ -104,17 +104,10 @@ struct odp_buffer_hdr_t {
/* User area pointer */
void *uarea_addr;

/* User area size */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.0 is based on api-next, not master. This will be one of several conflicts that will need to be resolved at merge time.

uint16_t uarea_size;

} _odp_pool_inline_offset_t;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MatiasElo I agree this is correctly placed here.

@@ -21,6 +21,9 @@
/** @internal Inline function offsets */
extern const _odp_packet_inline_offset_t _odp_packet_inline;

/** @internal Pool inline function offsets */
extern const _odp_pool_inline_offset_t _odp_pool_inline;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API being inlined is odp_packet_pool() so this is an artifact of the implementation of that API. So I'm OK with it being here since this is supposed to be fastpath code.

@@ -21,6 +21,9 @@
/** @internal Inline function offsets */
extern const _odp_packet_inline_offset_t _odp_packet_inline;

/** @internal Pool inline function offsets */
extern const _odp_pool_inline_offset_t _odp_pool_inline;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extern is not needed in .h file, also extern 2 lines above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it is needed. _odp_pool_inline is defined in odp_pool.c and _odp_packet_inline in odp_packet.c. ODP build with '--disable-abi-compat' will fail without extern declarations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@muvarov extern is not needed for function prototypes.

dsthdr->buf_hdr.uarea_size :
srchdr->buf_hdr.uarea_size);
srchdr->buf_hdr.uarea_addr != NULL) {
memcpy(dsthdr->buf_hdr.uarea_addr, srchdr->buf_hdr.uarea_addr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

odp_memcpy() might be good fit here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

odp_memcpy() is just a wrapper for memcpy() and I would avoid using api functions in the implementation unless there is a clear benefit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I just thought maybe some other implementations already speedup this call and we can get this benefits for free. But I don't might if it will be simple memcpy().

uint8_t lockless_rx; /**< no locking for rx */
uint8_t lockless_tx; /**< no locking for tx */
uint8_t port_id; /**< DPDK port identifier */
/* --- 34 bytes --- */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is propose of sayng 34 bytes here? Cache line is 64 so maybe comments for cache lines borders are more interesting here? vdev_sysc_promisc, lockless_rx, lockless_tx can be packet to bit field. That will also reduce structure size. Is there any reason to do that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next member in the struct named 'capa' is 40 bytes long, so it will cross a cache line boundary. I was thinking this comment was more useful than an ambiguous cache line comment.

There is still space in the first cache line for additional fast path variables, so I wouldn't use a bitmask before it is actually needed.

uint8_t lockless_tx; /**< no locking for tx */
uint8_t port_id; /**< DPDK port identifier */
/* --- 34 bytes --- */
odp_pktio_capability_t capa; /**< interface capabilities */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pktio capability is big struct and it's same for all packets. Is it really needs to be here? as well as uint16_t mtu? why not in pktio_entry_t ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A valid point, at least I don't see any reason why capability struct is stored inside each pktio implementation. MTU is used in the fast path so I wouldn't move that. Should I move the capability in this PR or create a new one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if it will be in this request it will make easy to review all changes at once.

@muvarov
Copy link
Contributor

muvarov commented Nov 14, 2017

@MatiasElo few questions in that PR.

Disable debug checks by default for optimal performance.

Signed-off-by: Matias Elo <matias.elo@nokia.com>
Reorder pkt_dpdk_t members so that all struct members used in fast path
are located on the same cache line.

Signed-off-by: Matias Elo <matias.elo@nokia.com>
Data size is constant for all buffers from the same pool, so there is no
need to store the value in the buffer header.

Signed-off-by: Matias Elo <matias.elo@nokia.com>
Pool entry pointer is already stored in the odp_buffer_hdr_t struct.

Signed-off-by: Matias Elo <matias.elo@nokia.com>
Add inlined accessors for pool_t members pool_hdl and uarea_size. This
enables removing the matching members from odp_buffer_hdr_t, and thus
reduces per buffer overhead.

Signed-off-by: Matias Elo <matias.elo@nokia.com>
Since odp_pktio_capability_t is common for all pktio implementations move
it inside pktio_entry_t.

Signed-off-by: Matias Elo <matias.elo@nokia.com>
@muvarov muvarov changed the title [PATCH v1] Misc fast path optimizations [PATCH v2] Misc fast path optimizations Nov 15, 2017
@muvarov
Copy link
Contributor

muvarov commented Nov 16, 2017

Merged.

@muvarov muvarov closed this Nov 16, 2017
@MatiasElo MatiasElo deleted the dev/misc_perf_opt branch November 17, 2017 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants