Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exos X16 fails to change sector size on a Supermicro server #118

Open
danderson opened this issue Jul 11, 2023 · 5 comments
Open

Exos X16 fails to change sector size on a Supermicro server #118

danderson opened this issue Jul 11, 2023 · 5 comments

Comments

@danderson
Copy link

danderson commented Jul 11, 2023

I'm doing initial setup on some new ST16000NM003G drives (Exos X16 16TB SATA). openSeaChest_Format -d /dev/sdi --showSupportedFormats says the drives support 4096b sectors, and are currently configured with 512b sectors. However, attempting to change the sector size fails with Set Sector Configuration Ext returning: ABORTED.

Hardware-wise, the drive is connected to a Supermicro SSG-5028R-E1CR12LA-CE010 server. The device chain from CPU to drive is:

  • X10SRH-CLN4F motherboard
  • Supermicro AOC-S3008L-L8e SAS3 HBA (based on LSI/BCM 3008 IC)
  • BPN-SAS3-826EL1 SAS expander backplane (based on LSISASx28 expander IC)

Searching the issue tracker, I believe I'm seeing exactly the same symptoms as #79 , although possibly with slightly different hardware (X10 motherboard instead of X11, but also an LSI/BCM 3008 HBA, and also a supermicro server so likely similar backplane SAS expander).

I've attached the output of openSeaChest_Info -d /dev/sdi -i, openSeaChest_Format -d /dev/sdi --showSupportedFormats, and openSeaChest_Format -d /dev/sdi --setSectorSize=4096 --confirm this-will-erase-data-and-may-render-the-drive-inoperable.

sdi-info.txt
sdi-supportedformats.txt
sdi-format.txt

The linked issue has a workaround (execute the sector reconfig from a different system without all the LSI, Supermicro and SAS<>SATA stuff in the chain), so really I'm filing this issue to ask: is there any more data I could provide you to get to get more insight into this issue? Given that I can apparently reproduce it, and I'm going to be doing destructive burn-in on these drives for a few days, I can run debug commands and invasive drive changes without harming data.

@danderson
Copy link
Author

Reproducing relevant info from #79, so people don't have to go digging: in that bug the reporter had a Supermicro X11DPH-T motherboard, and the same Supermicro AOC-S3008L-L8e HBA as me. No info on the backplane in that bug, but given Supermicro's product lineup, it seems likely that it's the same expander backplane as my system, since those boards don't change much even between different server models.

@danderson
Copy link
Author

One more datapoint: I moved one of the drives to an older Supermicro server with a SAS2 storage chain, and I was able to change the sector size there successfully. Listing the hardware in that server too, just in case the A/B datapoints help:

  • Motherboard: Supermicro X10SLM+-LN4F
  • HBA: Broadcom / LSI 9211-8i
  • Backplane: Supermicro BPN-SAS2-826EL1 (based on LSI SAS2X28 expander IC)

This server is a franken-machine assembled from a used chassis+backplane, motherboard and HBA. This is not a configuration sold by Supermicro directly (whereas the one in my original report, afaik, is).

@danderson danderson changed the title Exos X16 fails to change sector size Exos X16 fails to change sector size on a Supermicro server Jul 12, 2023
@vonericsen
Copy link
Contributor

Hi @danderson,
Thanks for the logs, I will take a look and see if I find something else that might help track this down.
While debugging #79, I asked Seagate's engineer who works with Supermicro to test the Supermicro hardware we have and he could not repeat it. Seagate's engineer asked Supermicro's lab to also see if they could repeat this issue, but we never got it to repeat with the same hardware that was reported in that issue...so we really do not know what the issue is.

vonericsen added a commit to Seagate/opensea-operations that referenced this issue Jul 21, 2023
…to fast format

The interpretation of command results was wrong for when to detect an error when performing an erase of the boot sectors ahead of fast format.
This was causing the error message to show up in all cases except when both commands failed.

[Seagate/openSeaChest#118]

Signed-off-by: Tyler Erickson <tyler.erickson@seagate.com>
@danderson
Copy link
Author

Thanks for taking a look! I don't envy having to track this through all the layers to find where things are going wrong.

I filed this purely in case it provides additional clues, or if I can provide further data about the configuration that wasn't working. If that's not the case, then I'm happy to close this bug as there's only so much digging that's possible across multiple vendors like this.

@vonericsen
Copy link
Contributor

vonericsen commented Jul 21, 2023

I reviewed the logs and I cannot figure out what would be wrong right now.
Everything is being populated in the command correctly according to the specifications.

I've asked to see if someone in Seagate's firmware group can help me understand the spec's abort reason "the device is unable to complete processing of the command" to see if that can help me track it back to a feature interaction or something else in the firmware that I may be able to control.
The other cases for the command abort from the spec are not the issue since the fields are all being filled in properly (unless for some reason the HBA firmware is filtering them out on the bus, but you would need a bus trace to see this).

The only other thing I can think of while I dig backwards is have you tried updating the HBA firmware at all?
I'm not sure if it will fix it, but sometimes updating HBA firmware resolves odd things like this.
In #111, updating the HBA firmware resolved a strange bug where the drive was not going into the idle or standby modes like it should. Maybe there is something similar going on here and causing the drive to think it cannot do the fast format right now because of some other bus activity from the HBA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants