Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null Pointer Exception Crashes Installer When Selecting Default Encryption or Using Manual Partitoning #749

Open
mandersendev opened this issue Jun 25, 2020 · 2 comments
Labels
bug_triage Newly report and needs review to be investigated need to try to reproduce this local or investigate

Comments

@mandersendev
Copy link

Hello,

Last night I was attempting to deploy Clear on a bare metal test box in the hopes of transitioning infrastructure. I have been following Clear's work and am very impressed by its focus on being lightweight, performant, and secure. Clear seemed an obvious choice as a base OS for both native and containerized apps in our COTS architecture.

The bug/failure occurred immediately upon initiation installation, at which point the system crashed citing a null pointer exception without making one inch of progress. Working backward, I believe it has something to do with installs using anything other than default unencrypted partitioning.

  1. An attempt to enable full disk encryption using default partitioning and parameters was met with an installer crash citing a null pointer exception. This happened immediately upon beginning installation following selection of mostly default settings, other than time zone and host name. The crash occurred immediately, with the installer making no progress.

  2. Attempts at manual unencrypted partitioning to the most basic schema were equally unsuccessful at creating a partition table reflecting boot, root, and swap, in both legacy BIOS and EFI modes. The result was the same null pointer exception crash in (1). I used the default EXT4 for root and no volume groups, mappers, RAID, or anything else which might complicate diagnosis. This manual partitioning could not have been more vanilla - my goal at this point was to get to a booted system for evaluation.

  3. Accepting the default wipe and autopartition did work and get me to a bootable system with everything functioning normally. However, this setup would have been unsuitable for deployment because it only allocated 64MB of swap. No option was available to configure a deployable amount of swap without reverting to manual partiioning which crashed per attempt (2). Additionally, the lack of encryption would have prevented deployment. Also, formatting root as XFS would have been nice as a selectable option, though not one which was essential for deployment.

I repeated installation attempts (1) and (2) several times and I could not get past the immediate null pointer exception upon installation initiation no matter what I did.

The installation medium was a brand new USB3 drive loaded with the latest Clear ISO downloaded immediately prior to installation. Prior to etching, checksums were verified. After etching, the written image was checked for integrity again. I re-etched the image following the first failure, which made no difference.

Is this a known issue with the installer? If not, what sort of diagnostic information could I provide which might help Clear developers trace what's going on here?

The testbox was running on a Ryzen 3900X on a A320 chipset using 32GB ECC RAM. Prior to this installation, the testbox had run other distributions without issue and had been thoroughly burnt in and verified free of hardware errors in all components, particularly in memory, disks, and controllers. The disk used for the test deployment was a single SATA connected to the motherboard controller in AHCI mode. There was no RAID which might have contributed to disk issues. Network connectivity via integrated GbE, though I must say I'm impressed that Clear supported the attached Mellanox ConnectX-3 out of the box. Not needing to build/install Mellanox drivers will greatly save us time when we deploy Clear.

In short, hardware shouldn't have been an issue, whether through errors or unsupported/buggy configurations. The testbox's chipset is not bleeding edge and has been well supported by the kernel for years.

Thanks for all the work you do, and please let me know if I can help in diagnosing the cause of this problem further.

@mdhorn mdhorn transferred this issue from clearlinux/distribution Jun 26, 2020
@mdhorn mdhorn added bug_triage Newly report and needs review to be investigated need to try to reproduce this local or investigate labels Jun 26, 2020
@mdhorn
Copy link
Contributor

mdhorn commented Jun 26, 2020

@mandersendev In order to attempt to diagnose what is going on here, I'd like to get the log file from the failing installation.

Then the installation fails (or crashes), there will be a /root/clr-installer.log in the running OS image (from the USB). If the installation started before the crash, there will also be a configuration file /root/pre-install-clr-installer.yaml.
If you are able to reproduce the error noted above, please share these files in this GitHub Issue.
You may need to remove any personal data of concern from the attachments.

The two suggested methods for collecting the data are to either

  1. Copy the files from the running OS image to a second USB device
  2. sftp or scp the files from the running OS image to a second computer

@mandersendev
Copy link
Author

I'm pulling the board it happened on out of a cluster tonight and will repeat the exact same installation procedure on the exact same hardware then report back. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug_triage Newly report and needs review to be investigated need to try to reproduce this local or investigate
Projects
None yet
Development

No branches or pull requests

2 participants