Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signed LineageOS build fails to boot on OnePlus One (bacon) #78

Closed
hmenke opened this issue Dec 31, 2020 · 6 comments
Closed

Signed LineageOS build fails to boot on OnePlus One (bacon) #78

hmenke opened this issue Dec 31, 2020 · 6 comments
Labels
bug Something isn't working

Comments

@hmenke
Copy link
Contributor

hmenke commented Dec 31, 2020

When building a self-signed LineageOS image the resulting product does not boot on the OnePlus One (bacon). The device starts but remains stuck at the vendor image screen for several minutes. Then it restarts automatically into recovery saying “Cannot load Android system. Your data may be corrupt.”

The configuration is similar to

{
  device = "bacon";
  flavor = "lineagos";
  variant = "eng";
  signing.enable = true;
  keyStorePath = "...";
}

The setting variant = "eng"; was chosen to be able to attach ADB during boot and inspect the logs for indication as to why boot is failing. From the dmesg it looks like there is some kind of filesystem corruption which results in the system UI crashing with an I/O error:

[    6.247181] EXT4-fs error (device mmcblk0p14): ext4_xattr_block_get:307: inode #726: comm init: bad block 256947
[    6.247314] init: cannot execve('/system/bin/surfaceflinger'): I/O error

At first sight this looks like a hardware defect, but curiously an unsigned Robotnix build (signing.enable = false;), the stock ROM, and the official LineageOS ROM all boot fine. The filesystem image inside the signedTargetFiles also reports clean with e2fsck, so the corruption is not due to the image. In the attachment there are the dmesg outputs for an unsigned build that boots fine and a signed build that fails to boot.

Attachments

dmesg_signed.log
dmesg_unsigned.log

@hmenke
Copy link
Contributor Author

hmenke commented Dec 31, 2020

This might actually be a kernel bug, but it's still surprising that signing the Robotnix build triggers this.
https://bugzilla.kernel.org/show_bug.cgi?id=193661

@danielfullmer
Copy link
Collaborator

It might be worth trying to apply the patch referenced in that bug. You can easily do this in robotnix with

source.dirs."kernel/oppo/msm8974".patches = [ ./example.patch ] ;

@hmenke
Copy link
Contributor Author

hmenke commented Jan 1, 2021

I quickly checked and it appears that this patch is indeed not part of the kernel tree. The image is rebuilding now and I'll report back when I tested it.

@hmenke
Copy link
Contributor Author

hmenke commented Jan 2, 2021

The upstream patch is torvalds/linux@dac7a4b but that didn't apply cleanly, so I had to cherry-pick it by hand :/

Patch
From 29cf3a89ce8d6c17a66e8128fce06b36e3e7c62f Mon Sep 17 00:00:00 2001
From: Henri Menke <henri@henrimenke.de>
Date: Fri, 1 Jan 2021 22:32:44 +0100
Subject: [PATCH] ext4: lock the xattr block before checksuming it

We must lock the xattr block before calculating or verifying the
checksum in order to avoid spurious checksum failures.

https://bugzilla.kernel.org/show_bug.cgi?id=193661

Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
(cherry picked from commit dac7a4b4b1f664934e8b713f529b629f67db313c)
---
 fs/ext4/xattr.c | 60 ++++++++++++++++++++++++-------------------------
 1 file changed, 29 insertions(+), 31 deletions(-)

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 890bfe137f24..820cbda82e86 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -145,33 +145,27 @@ static __le32 ext4_xattr_block_csum(struct inode *inode,
 }
 
 static int ext4_xattr_block_csum_verify(struct inode *inode,
-					sector_t block_nr,
-					struct ext4_xattr_header *hdr)
+					struct buffer_head *bh)
 {
+	struct ext4_xattr_header *hdr = BHDR(bh);
+	int ret = 1;
 	if (EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
-		EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) &&
-	    (hdr->h_checksum != ext4_xattr_block_csum(inode, block_nr, hdr)))
-		return 0;
-	return 1;
+		    EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) {
+		lock_buffer(bh);
+		ret = (hdr->h_checksum == ext4_xattr_block_csum(inode,
+							bh->b_blocknr, hdr));
+		unlock_buffer(bh);
+	}
+	return ret;
 }
 
 static void ext4_xattr_block_csum_set(struct inode *inode,
-				      sector_t block_nr,
-				      struct ext4_xattr_header *hdr)
+				      struct buffer_head *bh)
 {
-	if (!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
-		EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
-		return;
-
-	hdr->h_checksum = ext4_xattr_block_csum(inode, block_nr, hdr);
-}
-
-static inline int ext4_handle_dirty_xattr_block(handle_t *handle,
-						struct inode *inode,
-						struct buffer_head *bh)
-{
-	ext4_xattr_block_csum_set(inode, bh->b_blocknr, BHDR(bh));
-	return ext4_handle_dirty_metadata(handle, inode, bh);
+	if (EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+		    EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		BHDR(bh)->h_checksum = ext4_xattr_block_csum(inode,
+						bh->b_blocknr, BHDR(bh));
 }
 
 static inline const struct xattr_handler *
@@ -232,7 +226,7 @@ ext4_xattr_check_block(struct inode *inode, struct buffer_head *bh)
 	if (buffer_verified(bh))
 		return 0;
 
-	if (!ext4_xattr_block_csum_verify(inode, bh->b_blocknr, BHDR(bh)))
+	if (!ext4_xattr_block_csum_verify(inode, bh))
 		return -EIO;
 	error = ext4_xattr_check_names(BFIRST(bh), bh->b_data + bh->b_size,
 				      bh->b_data);
@@ -568,8 +562,9 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode,
 		le32_add_cpu(&BHDR(bh)->h_refcount, -1);
 		if (ce)
 			mb_cache_entry_release(ce);
+		ext4_xattr_block_csum_set(inode, bh);
 		unlock_buffer(bh);
-		error = ext4_handle_dirty_xattr_block(handle, inode, bh);
+		error = ext4_handle_dirty_metadata(handle, inode, bh);
 		if (IS_SYNC(inode))
 			ext4_handle_sync(handle);
 		dquot_free_block(inode, EXT4_C2B(EXT4_SB(inode->i_sb), 1));
@@ -804,13 +799,14 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
 							  s->here);
 				ext4_xattr_cache_insert(bs->bh);
 			}
+			ext4_xattr_block_csum_set(inode, bs->bh);
 			unlock_buffer(bs->bh);
 			if (error == -EIO)
 				goto bad_block;
 			if (!error)
-				error = ext4_handle_dirty_xattr_block(handle,
-								      inode,
-								      bs->bh);
+				error = ext4_handle_dirty_metadata(handle,
+								   inode,
+								   bs->bh);
 			if (error)
 				goto cleanup;
 			goto inserted;
@@ -879,10 +875,11 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
 				le32_add_cpu(&BHDR(new_bh)->h_refcount, 1);
 				ea_bdebug(new_bh, "reusing; refcount now=%d",
 					le32_to_cpu(BHDR(new_bh)->h_refcount));
+				ext4_xattr_block_csum_set(inode, new_bh);
 				unlock_buffer(new_bh);
-				error = ext4_handle_dirty_xattr_block(handle,
-								      inode,
-								      new_bh);
+				error = ext4_handle_dirty_metadata(handle,
+								   inode,
+								   new_bh);
 				if (error)
 					goto cleanup_dquot;
 			}
@@ -937,11 +934,12 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
 				goto getblk_failed;
 			}
 			memcpy(new_bh->b_data, s->base, new_bh->b_size);
+			ext4_xattr_block_csum_set(inode, new_bh);
 			set_buffer_uptodate(new_bh);
 			unlock_buffer(new_bh);
 			ext4_xattr_cache_insert(new_bh);
-			error = ext4_handle_dirty_xattr_block(handle,
-							      inode, new_bh);
+			error = ext4_handle_dirty_metadata(handle, inode,
+							   new_bh);
 			if (error)
 				goto cleanup;
 		}
-- 
2.29.2

Unfortunately, the error is still there (see attached log) but when modifying the kernel I also noticed that there are tons of fixes missing for different xattr deadlocks. Maybe I should just open a bug upstream at https://gitlab.com/LineageOS/issues/android.

Attachments

dmesg_patched.log

@danielfullmer danielfullmer added the bug Something isn't working label Jan 28, 2021
@danielfullmer
Copy link
Collaborator

I'm curious if you can still reproduce this issue using a recent robotnix commit. We're on lineage-18.1 now, and I wonder if anything has changed in the meantime.

@hmenke
Copy link
Contributor Author

hmenke commented Aug 17, 2021

I don't have a device currently. Postponed until further notice.

@hmenke hmenke closed this as completed Aug 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants