New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System does not cleanly shut down when system drive is an external USB HDD #2245

Open
andrewdavidwong opened this Issue Aug 13, 2016 · 1 comment

Comments

Projects
None yet
1 participant
@andrewdavidwong
Member

andrewdavidwong commented Aug 13, 2016

On 2016-08-12 15:31, johnyjukya@[...].org wrote:

I realize USB drives (or USB anything) is a stupid, stupid idea when it
comes to being security conscious, but while trying out Qubes, I do have
my root drive on an external USB HD.

(And there's something to be said for taking your drive with you.)

It works great in general, is fast enough, and seems very reliable.

Until shutdown time.

Things seem to shut down okay, but on the following boot, I see complaints
about the disk errors on the journal; it does a FSCK but fails and goes
into Read-Only mode, which prevents proper system startup, so most of the
system just sits there doing nothing, presumably waiting for the root
drive to go rw, which it never does.

Doing an fsck from a console terminal on the ro partition notes the
journal repair required, does its work, and says everything is ducky.

Then I reboot, and get the same problem.

If I reboot into Tails, do the cryptsetup, lvmchange, fsck to tidy up the
drive, and then reboot, the system will start up okay.

So every time I reboot the system, I need to first boot into Tails to
repair the drive, to get back into Qubes. More than a minor
inconvenience, and booting other OS's always adds to the risk of
compromise.

I'll try moving the drive onto the SATA bus and see if the problem goes
away, just to verify that it's a USB-only thing.

It's also a bit weird that it gives a disk error (IO Buffer error I
think). I've badblocks scanned the drive repeatedly, and its healthy and
fine, not a bad sector to be found. But something in the Qubes
shutdown/startup is making Qubes think there's some bad sectors (maybe a
problem the luks or lvm level? The ext4/lvm/luks/usb layering might not
be shutting down elegantly.)

(At boot, things fly by pretty quickly, and where the system crashes, I
don't have the logs stored in a file, but it almost appears to me that
more than one fack gets run [perhaps stepping on each other]. That's just
a hunch though, I'll try to narrow it down more. The fact the journal
needs recovering at all after a normal shutdown still remains a problem.)

System is an AMD64 with 4G memory, JMicron SATA->USB Controller on a 2.5"
500G Samsung drive.

JJ

@andrewdavidwong

This comment has been minimized.

Show comment
Hide comment
@andrewdavidwong

andrewdavidwong Aug 19, 2016

Member

Update:

On 2016-08-19 06:18, johnyjukya@[...].org wrote:

This problem persists in 3.2rc2.

(And I get 0 errors on the same USB drive under Tails. When I can find
the SATA power connector around here somewhere, I'll try moving the
drive
direct onto the SATA bus.)

I think the problem may be that systemd has a default 90 second timeout
on jobs, including unmounting root.

On an external USB drive, due to slower transfer times, the shutdown
process of all the VM's, killing processes, flushing buffers, etc.,
happens to take long enough that a clean unmount of the drive doesn't get
a chance to occur, leaned to a corrupted filesystem.

I am very new to systemd, but I believe the cause of my corruption is that
there may be a typo bug in one of the directives for systemd's
umount.target.

"systemctl show umount.target" reveals:

JobTimeoutUSec=0

"man systemd.directives" and "man system.unit" do not show any such
directive; however, they do show "JobTimeoutSec" which I believe was
likely the intended directive, and which would set no limit on waiting for
that shutdown filesystem unmount, and I believe would prevent the
corruption I was seeing.

A zgrep of all the man pages shows no indication of JobTimeoutUSec being a
legit property.

Cheers.

JJ

Member

andrewdavidwong commented Aug 19, 2016

Update:

On 2016-08-19 06:18, johnyjukya@[...].org wrote:

This problem persists in 3.2rc2.

(And I get 0 errors on the same USB drive under Tails. When I can find
the SATA power connector around here somewhere, I'll try moving the
drive
direct onto the SATA bus.)

I think the problem may be that systemd has a default 90 second timeout
on jobs, including unmounting root.

On an external USB drive, due to slower transfer times, the shutdown
process of all the VM's, killing processes, flushing buffers, etc.,
happens to take long enough that a clean unmount of the drive doesn't get
a chance to occur, leaned to a corrupted filesystem.

I am very new to systemd, but I believe the cause of my corruption is that
there may be a typo bug in one of the directives for systemd's
umount.target.

"systemctl show umount.target" reveals:

JobTimeoutUSec=0

"man systemd.directives" and "man system.unit" do not show any such
directive; however, they do show "JobTimeoutSec" which I believe was
likely the intended directive, and which would set no limit on waiting for
that shutdown filesystem unmount, and I believe would prevent the
corruption I was seeing.

A zgrep of all the man pages shows no indication of JobTimeoutUSec being a
legit property.

Cheers.

JJ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment