Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replay of blktrace containing Trim operations causes assert #771

Closed
shimrot opened this issue Apr 18, 2019 · 6 comments
Closed

Replay of blktrace containing Trim operations causes assert #771

shimrot opened this issue Apr 18, 2019 · 6 comments
Labels
needreporterinfo Waiting on information from the issue reporter

Comments

@shimrot
Copy link
Contributor

shimrot commented Apr 18, 2019

Replay of blktrace containing trim operations gets assert failure:

> fio: fio.h:552: fio_ro_check: Assertion `!(io_u->ddir == DDIR_WRITE && !((td)->o.td_ddir & TD_DDIR_WRITE)) && !(io_u->ddir == DDIR_TRIM && !((td)->o.td_ddir & TD_DDIR_TRIM))' f 

To reproduce:

  1. Start blktrace:

    > blktrace -d /dev/nvme0n1
    
  2. Run fio with some trim operations:

    > fio --output="fio.out" --direct=1 --filename="/dev/nvme0n1" --ioengine=libaio --norandommap=1 --random_generator=tausworthe64 --randrepeat=0 --runtime=1m --time_based=1 --name="Measure_RW_256KB_QD1" --bs=256KB1 --rate=64000KB --rw=randwrite --name="Measure_Trim_256MB_QD1" --bs=256MB --rate=64000KB --rw=randtrim
    
  3. Create blkparse bin file:

    > blkparse nvme0n1 -o /dev/null -d nvme0n1.blkparse.bin
    
  4. Replay gives fio_ro_check assertion:

    > fio  --direct=1 --filename="/dev/nvme0n1" --replay_redirect="/dev/nvme0n1" --ioengine=libaio --randrepeat=0 --name="Replay" --ramp_time=0s --read_iolog=nvme0n1.blkparse.bin Replay: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
    fio-3.13-27-gef32d
    Starting 1 process
    fio: fio.h:552: fio_ro_check: Assertion `!(io_u->ddir == DDIR_WRITE && !((td)->o.td_ddir & TD_DDIR_WRITE)) && !(io_u->ddir == DDIR_TRIM && !((td)->o.td_ddir & TD_DDIR_TRIM))' failed.
    fio: pid=44785, got signal=6eta 18h:50m:20s]
    
    Replay: (groupid=0, jobs=1): err= 0: pid=44785: Thu Apr 18 13:19:07 2019
      lat (usec)   : 100=2.44%, 250=7.32%, 500=68.29%, 750=4.88%, 1000=2.44%
      lat (msec)   : 2=2.44%, 4=4.88%
      cpu          : usr=0.00%, sys=0.00%, ctx=0, majf=0, minf=0
      IO depths    : 1=2.4%, 2=4.9%, 4=92.7%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         issued rwts: total=41,0,0,0 short=0,0,0,0 dropped=0,0,0,0
         latency   : target=0, window=0, percentile=100.00%, depth=4
    
    Run status group 0 (all jobs):
    
    Disk stats (read/write):
      nvme0n1: ios=58/0, merge=0/0, ticks=3/0, in_queue=4, util=0.04%
    fio: file hash not empty on exit
    

Debugging, I found that the assertion is not finding an active TD_DDIR_TRIM flag in td_ddir.
The following TD_DDIR_TRIM addition does seem to fix, but I'm unclear if there are other implications. Perhaps it's better to be another else if leg, or perhaps a change to the assertion check.

diff --git a/blktrace.c b/blktrace.c
index efe9ce2..ad0c786 100644
--- a/blktrace.c
+++ b/blktrace.c
@@ -551,7 +551,7 @@
 		td->o.td_ddir = TD_DDIR_WRITE;
 		td->o.max_bs[DDIR_WRITE] = rw_bs[DDIR_WRITE];
 	} else {
-		td->o.td_ddir = TD_DDIR_RW;
+		td->o.td_ddir = TD_DDIR_RW | TD_DDIR_TRIM;
 		td->o.max_bs[DDIR_READ] = rw_bs[DDIR_READ];
 		td->o.max_bs[DDIR_WRITE] = rw_bs[DDIR_WRITE];
 		td->o.max_bs[DDIR_TRIM] = rw_bs[DDIR_TRIM]; 

Thanks

@shimrot
Copy link
Contributor Author

shimrot commented Apr 20, 2019

I'm sorry - that prior patch was a hack - and doesn't work in all cases.
May I suggest the following instead:

diff --git a/blktrace.c b/blktrace.c
index efe9ce2..daf0689 100644
--- a/blktrace.c
+++ b/blktrace.c
@@ -544,16 +544,19 @@
 	    !ios[DDIR_SYNC]) {
 		log_err("fio: found no ios in blktrace data\n");
 		return false;
-	} else if (ios[DDIR_READ] && !ios[DDIR_WRITE]) {
-		td->o.td_ddir = TD_DDIR_READ;
+	}
+
+	td->o.td_ddir = 0;
+	if (ios[DDIR_READ] && !ios[DDIR_WRITE]) {
+		td->o.td_ddir |= TD_DDIR_READ;
 		td->o.max_bs[DDIR_READ] = rw_bs[DDIR_READ];
-	} else if (!ios[DDIR_READ] && ios[DDIR_WRITE]) {
-		td->o.td_ddir = TD_DDIR_WRITE;
+	}
+	if (ios[DDIR_WRITE]) {
+		td->o.td_ddir |= TD_DDIR_WRITE;
 		td->o.max_bs[DDIR_WRITE] = rw_bs[DDIR_WRITE];
-	} else {
-		td->o.td_ddir = TD_DDIR_RW;
-		td->o.max_bs[DDIR_READ] = rw_bs[DDIR_READ];
-		td->o.max_bs[DDIR_WRITE] = rw_bs[DDIR_WRITE];
+	}
+	if (ios[DDIR_TRIM]) {
+		td->o.td_ddir |= TD_DDIR_TRIM;
 		td->o.max_bs[DDIR_TRIM] = rw_bs[DDIR_TRIM];
 	}

That is, replace this:

*** 541,562 ****
  						td->o.name, skipped_writes);
  
  	if (!ios[DDIR_READ] && !ios[DDIR_WRITE] && !ios[DDIR_TRIM] &&
  	    !ios[DDIR_SYNC]) {
  		log_err("fio: found no ios in blktrace data\n");
  		return false;
! 	} else if (ios[DDIR_READ] && !ios[DDIR_WRITE]) {
! 		td->o.td_ddir = TD_DDIR_READ;
  		td->o.max_bs[DDIR_READ] = rw_bs[DDIR_READ];
! 	} else if (!ios[DDIR_READ] && ios[DDIR_WRITE]) {
! 		td->o.td_ddir = TD_DDIR_WRITE;
  		td->o.max_bs[DDIR_WRITE] = rw_bs[DDIR_WRITE];
! 	} else {
! 		td->o.td_ddir = TD_DDIR_RW;
! 		td->o.max_bs[DDIR_READ] = rw_bs[DDIR_READ];
! 		td->o.max_bs[DDIR_WRITE] = rw_bs[DDIR_WRITE];
  		td->o.max_bs[DDIR_TRIM] = rw_bs[DDIR_TRIM];
  	}

with this:

--- 541,565 ----
  						td->o.name, skipped_writes);
  
  	if (!ios[DDIR_READ] && !ios[DDIR_WRITE] && !ios[DDIR_TRIM] &&
  	    !ios[DDIR_SYNC]) {
  		log_err("fio: found no ios in blktrace data\n");
  		return false;
! 	}
! 
! 	td->o.td_ddir = 0;
! 	if (ios[DDIR_READ] && !ios[DDIR_WRITE]) {
! 		td->o.td_ddir |= TD_DDIR_READ;
  		td->o.max_bs[DDIR_READ] = rw_bs[DDIR_READ];
! 	}
! 	if (ios[DDIR_WRITE]) {
! 		td->o.td_ddir |= TD_DDIR_WRITE;
  		td->o.max_bs[DDIR_WRITE] = rw_bs[DDIR_WRITE];
! 	}
! 	if (ios[DDIR_TRIM]) {
! 		td->o.td_ddir |= TD_DDIR_TRIM;
  		td->o.max_bs[DDIR_TRIM] = rw_bs[DDIR_TRIM];
  	}

Thanks

@sitsofe
Copy link
Collaborator

sitsofe commented May 31, 2019

Might be a dup of #769 but this issue (771) has more information in it.

@shimrot
Copy link
Contributor Author

shimrot commented May 31, 2019

#769 might be a similar problem, but that occurs using the 'legacy' replay. This issue, #771, is the same as the pull request: #777, which I opened created after there was not attention on this for a month. Maybe one or the other should be closed - how would you prefer to get issues?
(I see few push requests, instead most complex patches and issue reports seem to go through the mailing list. What do you prefer? I personally just like github because descriptions can support some formatting)

@sitsofe
Copy link
Collaborator

sitsofe commented May 31, 2019

@shimrot I follow both but I'm more likely to do a drive by review on GitHub (as you've found '-). So long as you clearly link both together I think we'll be fine continuing with the approaches you've taken so far...

@sitsofe
Copy link
Collaborator

sitsofe commented Oct 10, 2019

@shimrot has this been fixed by #777 being merged?

@sitsofe sitsofe added needreporterinfo Waiting on information from the issue reporter and removed needsattention labels Oct 16, 2019
@sitsofe
Copy link
Collaborator

sitsofe commented Oct 28, 2019

Closing due to lack of reply from reporter. If this issue is still happening with the latest fio (see https://github.com/axboe/fio/releases to find out which version that is) please reopen. Thanks!

@sitsofe sitsofe closed this as completed Oct 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needreporterinfo Waiting on information from the issue reporter
Projects
None yet
Development

No branches or pull requests

2 participants