Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

row size limit? getting "Error: disk I/O error" from dqlite-demo #432

Closed
nickchen opened this issue Nov 16, 2022 · 15 comments · Fixed by #445
Closed

row size limit? getting "Error: disk I/O error" from dqlite-demo #432

nickchen opened this issue Nov 16, 2022 · 15 comments · Fixed by #445
Assignees

Comments

@nickchen
Copy link

is there a row byte size limit?

libuv=1.44.2, libraft=0.15.0, sqlite3=3.39.4, dqlite=1.11.1

running the dqlite-demo with following

server

CGO_LDFLAGS_ALLOW="-Wl,-z,now" CGO_ENABLED=1 go run  dqlite-demo.go --api 127.0.0.1:8001 --db 127.0.0.1:9001 -v

client

# base64 /dev/urandom | head -c 20000000 > file.txt
#  curl -X PUT -T "file.txt" http://127.0.0.1:8001/my-key-1
Error: disk I/O error
@MathieuBordere
Copy link
Contributor

MathieuBordere commented Nov 16, 2022

Thanks for your submission, we'll have a look.

Can reproduce it reliably too, so this definitely looks like a bug. The first PUT request to a key seems to succeed, successive writes all fail.

@cole-miller cole-miller self-assigned this Nov 16, 2022
@cole-miller
Copy link
Contributor

I can also reproduce this, now trying to figure out which sqlite3 API call is failing.

@cole-miller
Copy link
Contributor

It's this one, I think:

req->status = sqlite3_step(req->stmt);

Trying to step into that call but GDB is being difficult...

@MathieuBordere
Copy link
Contributor

It's this one, I think:

req->status = sqlite3_step(req->stmt);

Trying to step into that call but GDB is being difficult...

The err is SQLITE_IOERR_SHORT_READ, try to look for that in dqlite's codebase, might give a clue.

@cole-miller
Copy link
Contributor

Indeed, seems like we are hitting the wal->n_frames == 0 condition in vfsWalRead.

@nickchen
Copy link
Author

just checking in to see if there are any progress with this?

@cole-miller
Copy link
Contributor

@nickchen, I haven't identified the root cause of the issue yet; I'll try to work on this some more today.

@cole-miller
Copy link
Contributor

Just an update @nickchen: I'm focusing on #435 and the possibly-related #419 for now, so I likely won't be able to address this issue until next week at the earliest.

@cole-miller
Copy link
Contributor

cole-miller commented Nov 30, 2022

I have time to work on this after all. Here is a backtrace:

0x7fde77e51c32 vfsWalRead
	src/vfs.c:844
0x7fde77e523a2 vfsFileRead
	src/vfs.c:932
0x7fde77fc4ee7 readDbPage
	./src/pager.c:2978
0x7fde77fc7fce getPageNormal
	./src/pager.c:5551
0x7fde77f49df4 btreeGetPage
	./src/btree.c:2143
0x7fde77f4ba22 btreeGetUnusedPage
	./src/btree.c:2286
0x7fde77f52015 allocateBtreePage
	./src/btree.c:6282
0x7fde77f55baf fillInCell
	./src/btree.c:6750
0x7fde77f55baf sqlite3BtreeInsert
	./src/btree.c:8984
0x7fde78000491 sqlite3VdbeExec
	./src/vdbe.c:5306
0x7fde7800c80e sqlite3Step
	./src/vdbeapi.c:716
0x7fde7800c80e sqlite3_step
	./src/vdbeapi.c:773
0x7fde7800c80e sqlite3_step
	./src/vdbeapi.c:761
0x7fde77e45193 leaderExecV2
	src/leader.c:355
0x7fde77e454ef execBarrierCb
	src/leader.c:393
0x7fde77e4590a raftBarrierCb
	src/leader.c:441
0x7fde77ecc970 applyBarrier
	src/replication.c:1368
0x7fde77ecc970 replicationApply
	src/replication.c:1605
0x7fde77ecded4 appendLeaderCb
	src/replication.c:523
0x7fde77ed463e uvAppendFinishRequestsInQueue
	src/uv_append.c:124
0x7fde77ed4efb uvAppendFinishWritingRequests
	src/uv_append.c:132
0x7fde77ed4efb uvAliveSegmentWriteCb
	src/uv_append.c:256
0x7fde77ee32ce uvWriterPollCb
	src/uv_writer.c:215
0x7fde77ae211d uv__io_poll
	./src/unix/epoll.c:374
0x7fde77acbc87 uv__io_poll
	./src/unix/udp.c:122
0x7fde77acbc87 uv_run
	./src/unix/core.c:389
0x7fde77e4bb2c taskRun
	src/server.c:632
0x7fde77e4bc63 taskStart
	src/server.c:658

What seems to happen is that a large enough value triggers the "overflow" condition in SQLite's fillInCell -- an extra page is needed to hold the row data. On the second attempt to insert, SQLite goes to allocate this overflow page, and wants to recycle a page that it's already allocated (for the previous insert, I assume). That leads to trying to read the existing page's content (not sure why), and SQLite thinks that the latest version of the page in question lives in the WAL -- which is incorrect, because the previous insert caused a checkpoint, so the WAL is empty. So it seems like a case of the WAL index file getting out of sync with the WAL following a checkpoint.

@cole-miller
Copy link
Contributor

Apparently I was wrong -- the WAL index does get updated following the checkpoint, and if you inspect it just before the sqlite3_step that causes the error, you can see that mxFrame = 0, reflecting that there are no frames in the WAL. I'm not sure why SQLite is trying to read from the WAL in that case -- need to read over the code again.

@cole-miller
Copy link
Contributor

cole-miller commented Dec 6, 2022

Actually, it seems a bunch of frames are written to the WAL between the end of the checkpoint and the point where the error occurs, as part of the second EXEC, so it shouldn't be empty! But somehow when that vfsWalRead call happens, it doesn't find any frames.

Edit: Ahhh, the frames that are written to the WAL after the checkpoint are all part of a transaction, so they go into the tx field of struct vfsWal. But the "short read" check in vfsWalRead only looks at n_frames, not n_tx, even though the logic later on in that function does allow reading tx frames (which makes sense, because a transaction should be able to read the frames that it's just written). So I think that check should be changed to look at n_tx as well -- @freeekanayaka does that make sense to you?

dqlite/src/vfs.c

Lines 840 to 842 in ee7a3cd

if (w->n_frames == 0) {
return SQLITE_IOERR_SHORT_READ;
}

@nickchen
Copy link
Author

nickchen commented Dec 7, 2022

nice! when's the planned release for this?

@cole-miller
Copy link
Contributor

@MathieuBordere are we ready to tag v1.13.0, or would you like to give disk mode some more time to cook?

@MathieuBordere
Copy link
Contributor

I would have liked it to simmer a bit longer, we could do a release early next week?

@MathieuBordere
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants