Skip to content

Commit

Permalink
Per-command priority: Priority logging and libaio/io_uring cmdprio_pe…
Browse files Browse the repository at this point in the history
…rcentage

Add cmdprio_percentage option to libaio and io_uring engines to set
ioprio on a per-command basis. Add tracking of high priority
commands to be displayed separately in human readable and JSON
outputs.
  • Loading branch information
trinof committed Jan 22, 2020
1 parent 7a4e480 commit b2a432b
Show file tree
Hide file tree
Showing 19 changed files with 392 additions and 72 deletions.
38 changes: 25 additions & 13 deletions HOWTO
Expand Up @@ -2034,21 +2034,29 @@ In addition, there are some parameters which are only valid when a specific
with the caveat that when used on the command line, they must come after the
:option:`ioengine` that defines them is selected.

.. option:: hipri : [io_uring]
.. option:: cmdprio_percentage=int : [io_uring] [libaio]

If this option is set, fio will attempt to use polled IO completions.
Normal IO completions generate interrupts to signal the completion of
IO, polled completions do not. Hence they are require active reaping
by the application. The benefits are more efficient IO for high IOPS
scenarios, and lower latencies for low queue depth IO.
Set the percentage of I/O that will be issued with higher priority by setting
the priority bit. Non-read I/O is likely unaffected by ``cmdprio_percentage``.
This option cannot be used with the `prio` or `prioclass` options. For this
option to set the priority bit properly, NCQ priority must be supported and
enabled and :option:`direct`\=1 option must be used.

.. option:: fixedbufs : [io_uring]

If fio is asked to do direct IO, then Linux will map pages for each
IO call, and release them when IO is done. If this option is set, the
pages are pre-mapped before IO is started. This eliminates the need to
map and release for each IO. This is more efficient, and reduces the
IO latency as well.
If fio is asked to do direct IO, then Linux will map pages for each
IO call, and release them when IO is done. If this option is set, the
pages are pre-mapped before IO is started. This eliminates the need to
map and release for each IO. This is more efficient, and reduces the
IO latency as well.

.. option:: hipri : [io_uring]

If this option is set, fio will attempt to use polled IO completions.
Normal IO completions generate interrupts to signal the completion of
IO, polled completions do not. Hence they are require active reaping
by the application. The benefits are more efficient IO for high IOPS
scenarios, and lower latencies for low queue depth IO.

.. option:: registerfiles : [io_uring]

Expand Down Expand Up @@ -2692,11 +2700,15 @@ Threads, processes and job synchronization
Set the I/O priority value of this job. Linux limits us to a positive value
between 0 and 7, with 0 being the highest. See man
:manpage:`ionice(1)`. Refer to an appropriate manpage for other operating
systems since meaning of priority may differ.
systems since meaning of priority may differ. For per-command priority
setting, see I/O engine specific `cmdprio_percentage` and `hipri_percentage`
options.

.. option:: prioclass=int

Set the I/O priority class. See man :manpage:`ionice(1)`.
Set the I/O priority class. See man :manpage:`ionice(1)`. For per-command
priority setting, see I/O engine specific `cmdprio_percentage` and
`hipri_percentage` options.

.. option:: cpus_allowed=str

Expand Down
10 changes: 9 additions & 1 deletion client.c
Expand Up @@ -1032,6 +1032,14 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
dst->nr_block_infos = le64_to_cpu(src->nr_block_infos);
for (i = 0; i < dst->nr_block_infos; i++)
dst->block_infos[i] = le32_to_cpu(src->block_infos[i]);
for (i = 0; i < DDIR_RWDIR_CNT; i++) {
for (j = 0; j < FIO_IO_U_PLAT_NR; j++) {
dst->io_u_plat_high_prio[i][j] = le64_to_cpu(src->io_u_plat_high_prio[i][j]);
dst->io_u_plat_prio[i][j] = le64_to_cpu(src->io_u_plat_prio[i][j]);
}
convert_io_stat(&dst->clat_high_prio_stat[i], &src->clat_high_prio_stat[i]);
convert_io_stat(&dst->clat_prio_stat[i], &src->clat_prio_stat[i]);
}

dst->ss_dur = le64_to_cpu(src->ss_dur);
dst->ss_state = le32_to_cpu(src->ss_state);
Expand Down Expand Up @@ -1693,7 +1701,7 @@ static struct cmd_iolog_pdu *convert_iolog(struct fio_net_cmd *cmd,

s->time = le64_to_cpu(s->time);
s->data.val = le64_to_cpu(s->data.val);
s->__ddir = le32_to_cpu(s->__ddir);
s->__ddir = __le32_to_cpu(s->__ddir);
s->bs = le64_to_cpu(s->bs);

if (ret->log_offset) {
Expand Down
2 changes: 1 addition & 1 deletion engines/filecreate.c
Expand Up @@ -49,7 +49,7 @@ static int open_file(struct thread_data *td, struct fio_file *f)
uint64_t nsec;

nsec = ntime_since_now(&start);
add_clat_sample(td, data->stat_ddir, nsec, 0, 0);
add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0);
}

return 0;
Expand Down
2 changes: 1 addition & 1 deletion engines/filestat.c
Expand Up @@ -53,7 +53,7 @@ static int stat_file(struct thread_data *td, struct fio_file *f)
uint64_t nsec;

nsec = ntime_since_now(&start);
add_clat_sample(td, data->stat_ddir, nsec, 0, 0);
add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0);
}

return 0;
Expand Down
47 changes: 47 additions & 0 deletions engines/io_uring.c
Expand Up @@ -70,6 +70,7 @@ struct ioring_data {
struct ioring_options {
void *pad;
unsigned int hipri;
unsigned int cmdprio_percentage;
unsigned int fixedbufs;
unsigned int registerfiles;
unsigned int sqpoll_thread;
Expand Down Expand Up @@ -108,6 +109,26 @@ static struct fio_option options[] = {
.category = FIO_OPT_C_ENGINE,
.group = FIO_OPT_G_IOURING,
},
#ifdef FIO_HAVE_IOPRIO_CLASS
{
.name = "cmdprio_percentage",
.lname = "high priority percentage",
.type = FIO_OPT_INT,
.off1 = offsetof(struct ioring_options, cmdprio_percentage),
.minval = 1,
.maxval = 100,
.help = "Send high priority I/O this percentage of the time",
.category = FIO_OPT_C_ENGINE,
.group = FIO_OPT_G_IOURING,
},
#else
{
.name = "cmdprio_percentage",
.lname = "high priority percentage",
.type = FIO_OPT_UNSUPPORTED,
.help = "Your platform does not support I/O priority classes",
},
#endif
{
.name = "fixedbufs",
.lname = "Fixed (pre-mapped) IO buffers",
Expand Down Expand Up @@ -313,11 +334,23 @@ static int fio_ioring_getevents(struct thread_data *td, unsigned int min,
return r < 0 ? r : events;
}

static void fio_ioring_prio_prep(struct thread_data *td, struct io_u *io_u)
{
struct ioring_options *o = td->eo;
struct ioring_data *ld = td->io_ops_data;
if (rand_between(&td->prio_state, 0, 99) < o->cmdprio_percentage) {
ld->sqes[io_u->index].ioprio = IOPRIO_CLASS_RT << IOPRIO_CLASS_SHIFT;
io_u->flags |= IO_U_F_PRIORITY;
}
return;
}

static enum fio_q_status fio_ioring_queue(struct thread_data *td,
struct io_u *io_u)
{
struct ioring_data *ld = td->io_ops_data;
struct io_sq_ring *ring = &ld->sq_ring;
struct ioring_options *o = td->eo;
unsigned tail, next_tail;

fio_ro_check(td, io_u);
Expand All @@ -343,6 +376,8 @@ static enum fio_q_status fio_ioring_queue(struct thread_data *td,

/* ensure sqe stores are ordered with tail update */
write_barrier();
if (o->cmdprio_percentage)
fio_ioring_prio_prep(td, io_u);
ring->array[tail & ld->sq_ring_mask] = io_u->index;
*ring->tail = next_tail;
write_barrier();
Expand Down Expand Up @@ -618,6 +653,7 @@ static int fio_ioring_init(struct thread_data *td)
{
struct ioring_options *o = td->eo;
struct ioring_data *ld;
struct thread_options *to = &td->o;

/* sqthread submission requires registered files */
if (o->sqpoll_thread)
Expand All @@ -640,6 +676,17 @@ static int fio_ioring_init(struct thread_data *td)
ld->iovecs = calloc(td->o.iodepth, sizeof(struct iovec));

td->io_ops_data = ld;

/*
* Check for option conflicts
*/
if ((fio_option_is_set(to, ioprio) || fio_option_is_set(to, ioprio_class)) &&
o->cmdprio_percentage != 0) {
log_err("%s: cmdprio_percentage option and mutually exclusive "
"prio or prioclass option is set, exiting\n", to->name);
td_verror(td, EINVAL, "fio_io_uring_init");
return 1;
}
return 0;
}

Expand Down
54 changes: 54 additions & 0 deletions engines/libaio.c
Expand Up @@ -16,7 +16,13 @@
#include "../optgroup.h"
#include "../lib/memalign.h"

/* Should be defined in newest aio_abi.h */
#ifndef IOCB_FLAG_IOPRIO
#define IOCB_FLAG_IOPRIO (1 << 1)
#endif

static int fio_libaio_commit(struct thread_data *td);
static int fio_libaio_init(struct thread_data *td);

struct libaio_data {
io_context_t aio_ctx;
Expand Down Expand Up @@ -44,6 +50,7 @@ struct libaio_data {
struct libaio_options {
void *pad;
unsigned int userspace_reap;
unsigned int cmdprio_percentage;
};

static struct fio_option options[] = {
Expand All @@ -56,6 +63,26 @@ static struct fio_option options[] = {
.category = FIO_OPT_C_ENGINE,
.group = FIO_OPT_G_LIBAIO,
},
#ifdef FIO_HAVE_IOPRIO_CLASS
{
.name = "cmdprio_percentage",
.lname = "high priority percentage",
.type = FIO_OPT_INT,
.off1 = offsetof(struct libaio_options, cmdprio_percentage),
.minval = 1,
.maxval = 100,
.help = "Send high priority I/O this percentage of the time",
.category = FIO_OPT_C_ENGINE,
.group = FIO_OPT_G_LIBAIO,
},
#else
{
.name = "cmdprio_percentage",
.lname = "high priority percentage",
.type = FIO_OPT_UNSUPPORTED,
.help = "Your platform does not support I/O priority classes",
},
#endif
{
.name = NULL,
},
Expand Down Expand Up @@ -85,6 +112,17 @@ static int fio_libaio_prep(struct thread_data fio_unused *td, struct io_u *io_u)
return 0;
}

static void fio_libaio_prio_prep(struct thread_data *td, struct io_u *io_u)
{
struct libaio_options *o = td->eo;
if (rand_between(&td->prio_state, 0, 99) < o->cmdprio_percentage) {
io_u->iocb.aio_reqprio = IOPRIO_CLASS_RT << IOPRIO_CLASS_SHIFT;
io_u->iocb.u.c.flags |= IOCB_FLAG_IOPRIO;
io_u->flags |= IO_U_F_PRIORITY;
}
return;
}

static struct io_u *fio_libaio_event(struct thread_data *td, int event)
{
struct libaio_data *ld = td->io_ops_data;
Expand Down Expand Up @@ -188,6 +226,7 @@ static enum fio_q_status fio_libaio_queue(struct thread_data *td,
struct io_u *io_u)
{
struct libaio_data *ld = td->io_ops_data;
struct libaio_options *o = td->eo;

fio_ro_check(td, io_u);

Expand Down Expand Up @@ -218,6 +257,9 @@ static enum fio_q_status fio_libaio_queue(struct thread_data *td,
return FIO_Q_COMPLETED;
}

if (o->cmdprio_percentage)
fio_libaio_prio_prep(td, io_u);

ld->iocbs[ld->head] = &io_u->iocb;
ld->io_us[ld->head] = io_u;
ring_inc(ld, &ld->head, 1);
Expand Down Expand Up @@ -358,6 +400,8 @@ static int fio_libaio_post_init(struct thread_data *td)
static int fio_libaio_init(struct thread_data *td)
{
struct libaio_data *ld;
struct thread_options *to = &td->o;
struct libaio_options *o = td->eo;

ld = calloc(1, sizeof(*ld));

Expand All @@ -368,6 +412,16 @@ static int fio_libaio_init(struct thread_data *td)
ld->io_us = calloc(ld->entries, sizeof(struct io_u *));

td->io_ops_data = ld;
/*
* Check for option conflicts
*/
if ((fio_option_is_set(to, ioprio) || fio_option_is_set(to, ioprio_class)) &&
o->cmdprio_percentage != 0) {
log_err("%s: cmdprio_percentage option and mutually exclusive "
"prio or prioclass option is set, exiting\n", to->name);
td_verror(td, EINVAL, "fio_libaio_init");
return 1;
}
return 0;
}

Expand Down
6 changes: 3 additions & 3 deletions eta.c
Expand Up @@ -509,9 +509,9 @@ bool calc_thread_status(struct jobs_eta *je, int force)
calc_rate(unified_rw_rep, rate_time, io_bytes, rate_io_bytes,
je->rate);
memcpy(&rate_prev_time, &now, sizeof(now));
add_agg_sample(sample_val(je->rate[DDIR_READ]), DDIR_READ, 0);
add_agg_sample(sample_val(je->rate[DDIR_WRITE]), DDIR_WRITE, 0);
add_agg_sample(sample_val(je->rate[DDIR_TRIM]), DDIR_TRIM, 0);
add_agg_sample(sample_val(je->rate[DDIR_READ]), DDIR_READ, 0, 0);
add_agg_sample(sample_val(je->rate[DDIR_WRITE]), DDIR_WRITE, 0, 0);
add_agg_sample(sample_val(je->rate[DDIR_TRIM]), DDIR_TRIM, 0, 0);
}

disp_time = mtime_since(&disp_prev_time, &now);
Expand Down
27 changes: 19 additions & 8 deletions fio.1
Expand Up @@ -1795,19 +1795,26 @@ In addition, there are some parameters which are only valid when a specific
with the caveat that when used on the command line, they must come after the
\fBioengine\fR that defines them is selected.
.TP
.BI (io_uring)hipri
If this option is set, fio will attempt to use polled IO completions. Normal IO
completions generate interrupts to signal the completion of IO, polled
completions do not. Hence they are require active reaping by the application.
The benefits are more efficient IO for high IOPS scenarios, and lower latencies
for low queue depth IO.
.BI (io_uring, libaio)cmdprio_percentage \fR=\fPint
Set the percentage of I/O that will be issued with higher priority by setting
the priority bit. Non-read I/O is likely unaffected by ``cmdprio_percentage``.
This option cannot be used with the `prio` or `prioclass` options. For this
option to set the priority bit properly, NCQ priority must be supported and
enabled and `direct=1' option must be used.
.TP
.BI (io_uring)fixedbufs
If fio is asked to do direct IO, then Linux will map pages for each IO call, and
release them when IO is done. If this option is set, the pages are pre-mapped
before IO is started. This eliminates the need to map and release for each IO.
This is more efficient, and reduces the IO latency as well.
.TP
.BI (io_uring)hipri
If this option is set, fio will attempt to use polled IO completions. Normal IO
completions generate interrupts to signal the completion of IO, polled
completions do not. Hence they are require active reaping by the application.
The benefits are more efficient IO for high IOPS scenarios, and lower latencies
for low queue depth IO.
.TP
.BI (io_uring)registerfiles
With this option, fio registers the set of files being used with the kernel.
This avoids the overhead of managing file counts in the kernel, making the
Expand Down Expand Up @@ -2386,10 +2393,14 @@ priority class.
Set the I/O priority value of this job. Linux limits us to a positive value
between 0 and 7, with 0 being the highest. See man
\fBionice\fR\|(1). Refer to an appropriate manpage for other operating
systems since meaning of priority may differ.
systems since meaning of priority may differ. For per-command priority
setting, see I/O engine specific `cmdprio_percentage` and `hipri_percentage`
options.
.TP
.BI prioclass \fR=\fPint
Set the I/O priority class. See man \fBionice\fR\|(1).
Set the I/O priority class. See man \fBionice\fR\|(1). For per-command
priority setting, see I/O engine specific `cmdprio_percentage` and `hipri_percent`
options.
.TP
.BI cpus_allowed \fR=\fPstr
Controls the same options as \fBcpumask\fR, but accepts a textual
Expand Down
2 changes: 2 additions & 0 deletions fio.h
Expand Up @@ -139,6 +139,7 @@ enum {
FIO_RAND_ZONE_OFF,
FIO_RAND_POISSON2_OFF,
FIO_RAND_POISSON3_OFF,
FIO_RAND_PRIO_CMDS,
FIO_RAND_NR_OFFS,
};

Expand Down Expand Up @@ -258,6 +259,7 @@ struct thread_data {
struct frand_state buf_state_prev;
struct frand_state dedupe_state;
struct frand_state zone_state;
struct frand_state prio_state;

struct zone_split_index **zone_state_index;

Expand Down

0 comments on commit b2a432b

Please sign in to comment.