Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on calculation of "Last Test Age" in smart_report.sh #25

Open
flederohr opened this issue Feb 9, 2023 · 1 comment
Open

Error on calculation of "Last Test Age" in smart_report.sh #25

flederohr opened this issue Feb 9, 2023 · 1 comment

Comments

@flederohr
Copy link

flederohr commented Feb 9, 2023

The display of the Last Test Age was working for years without any issues.
On the last smart report i had this output:

+-------+------------------------+----+------+-----+-----+-------+-------+--------+------+----------+------+-----------+----+
|Device |Serial                  |Temp| Power|Start|Spin |ReAlloc|Current|Offline |Seek  |Total     |High  |    Command|Last|
|       |Number                  |    | On   |Stop |Retry|Sectors|Pending|Uncorrec|Errors|Seeks     |Fly   |    Timeout|Test|
|       |                        |    | Hours|Count|Count|       |Sectors|Sectors |      |          |Writes|    Count  |Age |
+-------+------------------------+----+------+-----+-----+-------+-------+--------+------+----------+------+-----------+----+
|ada0 ? |WD-************         |39  | 65620|  186|    0|      0|      0|       0|   N/A|       N/A|   N/A|        N/A|2732*|

...


########## SATA drive /dev/ada0 Serial: WD-************
########## Western Digital Red (WDC ************)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   182   173   021    Pre-fail  Always       -       3900
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       188
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   011   011   000    Old_age   Always       -       65620
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       186
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       154
193 Load_Cycle_Count        0x0032   199   199   000    Old_age   Always       -       3283
194 Temperature_Celsius     0x0022   108   094   000    Old_age   Always       -       39
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

No Errors Logged

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Short offline       Completed without error       00%        61         -

On further analysis i found out that the S.M.A.R.T. LifeTime(hours) counter seems to have reset itself

 /usr/local/sbin/smartctl -l selftest /dev/ada0
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p2 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%        61         -
# 2  Extended offline    Completed without error       00%     65483         -
# 3  Short offline       Completed without error       00%     65334         -
# 4  Short offline       Completed without error       00%     65214         -
# 5  Extended offline    Completed without error       00%     65099         -
# 6  Short offline       Completed without error       00%     64974         -
# 7  Short offline       Completed without error       00%     64854         -
# 8  Extended offline    Completed without error       00%     64739         -
# 9  Short offline       Completed without error       00%     64590         -

In this resource i got the explanation that this counter is normally stored in a 16 bit field but could also differ for different HDD vendors: https://serverfault.com/questions/1041661/s-m-a-r-t-lifetime-hours-resetting-to-zero

For me i could fix the issue by adding a modulo function in the calculation
testAge=sprintf("%.0f", ((onHours % 65535) - lastTestHours) / 24);

testAge=sprintf("%.0f", (onHours - lastTestHours) / 24);

@SavageCore
Copy link

Oh my, thank you! Thought the tests hadn't been running and I was panicking. Will PR this change.

SavageCore added a commit to SavageCore/FreeNAS-scripts that referenced this issue Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants