Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add support for additional arguments when smartctl is executed - Seagate drives use 48 bit raw values and only the first 16 bits are the error data #255

Closed
Parlane opened this issue May 17, 2022 · 25 comments · Fixed by #280 or #308
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@Parlane
Copy link

Parlane commented May 17, 2022

Describe the bug
Seagate Ironwolf drives show as FAILED with high seek and read error counts

Expected behavior

Some way to configure per drive some extra arguments to smartctl calls.

Seagate ironwolfs use a 48 bit value that is made up of 16 bits of error count and 32 bit of total count of read or seek events.

For smartctl I have to manually specify the correct bits to read from:
smartctl /dev/sdb -a -v 1,raw48:54 -v 7,raw48:54

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   083   067   044    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0003   085   080   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       112
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   071   060   045    Pre-fail  Always       -       0

And smartctl without the specification:

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   083   067   044    Pre-fail  Always       -       200450784
  3 Spin_Up_Time            0x0003   085   080   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       112
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   071   060   045    Pre-fail  Always       -       12399940

The 200450784 value above is 0xBF2A2E0, which is only 28 bits of data (so only part of the count, not the error), the full hex would be:
00000BF2A2E0 where it would then be split as [0000][0BF2A2E0] and 0 is the actual value of Raw_Read_Error_Rate

Screenshots

image

image

@Parlane Parlane added the bug Something isn't working label May 17, 2022
@AnalogJ AnalogJ added enhancement New feature or request and removed bug Something isn't working labels May 17, 2022
@AnalogJ AnalogJ changed the title [BUG] Seagate drives use 48 bit raw values and only the first 16 bits are the error data [Feature] Add support for additional arguments when smartctl is executed - Seagate drives use 48 bit raw values and only the first 16 bits are the error data May 17, 2022
@AnalogJ
Copy link
Owner

AnalogJ commented May 17, 2022

Sounds like a worthwhile enhancement. I'll need to take a closer look at the smartctl documentation, but this should be easy enough to implement by expanding the collector.yaml config file.

@somebody-somewhere-over-the-rainbow

This is also true for Seagate Exos X18 (18TB) drives. I would assume that this is true for most - if not all - modern Seagate drives...

AnalogJ added a commit that referenced this issue May 28, 2022
…or device scanning, device identification and smart data retrieval.

adding tests for command overrides.

rename GetScanOverrides() to GetDeviceOverrides()

fixes #255
@AnalogJ
Copy link
Owner

AnalogJ commented May 28, 2022

Hey @Parlane @alexw1982
I made some changes to the collector & collector config file so that it supports overriding the smartctl --info and smartctl --xargs commands that Scrutiny uses for data collection.

Once the beta branch finishes building, can you pull the docker images and test out the changes?

Here are the relevant config file changes:

https://github.com/AnalogJ/scrutiny/blob/beta/example.collector.yaml#L57-L61

https://github.com/AnalogJ/scrutiny/blob/beta/example.collector.yaml#L74-L78

@Parlane
Copy link
Author

Parlane commented May 29, 2022

I had to specify 'ata' for each device otherwise it did not try to find them?

# Commented Scrutiny Configuration File
#
# The default location for this file is /opt/scrutiny/config/collector.yaml.
# In some cases to improve clarity default values are specified,
# uncommented. Other example values are commented out.
#
# When this file is parsed by Scrutiny, all configuration file keys are
# lowercased automatically. As such, Configuration keys are case-insensitive,
# and should be lowercase in this file to be consistent with usage.


######################################################################
# Version
#
# version specifies the version of this configuration file schema, not
# the scrutiny binary. There is only 1 version available at the moment
version: 1

# The host id is a label used for identifying groups of disks running on the same host
# Primiarly used for hub/spoke deployments (can be left empty if using all-in-one image).
host:
  id: ""


# This block allows you to override/customize the settings for devices detected by
# Scrutiny via `smartctl --scan`
# See the "--device=TYPE" section of https://linux.die.net/man/8/smartctl
# type can be a 'string' or a 'list'
devices:
  # example to show how to override the smartctl command args (per device), see below for how to override these globally.
  - device: /dev/sda
    type: 'ata'
    commands:
      metrics_info_args: '--info --json -T permissive' # used to determine device unique ID & register device with Scrutiny
      metrics_smart_args: '-v 1,raw48:54 -v 7,raw48:54 --xall --json -T permissive' # used to retrieve smart data for each device.
  - device: /dev/sdb
    type: 'ata'
    commands:
      metrics_info_args: '--info --json -T permissive' # used to determine device unique ID & register device with Scrutiny
      metrics_smart_args: '-v 1,raw48:54 -v 7,raw48:54 --xall --json -T permissive' # used to retrieve smart data for each device.
  - device: /dev/sdc
    type: 'ata'
    commands:
      metrics_info_args: '--info --json -T permissive' # used to determine device unique ID & register device with Scrutiny
      metrics_smart_args: '-v 1,raw48:54 -v 7,raw48:54 --xall --json -T permissive' # used to retrieve smart data for each device.
  - device: /dev/sdd
    type: 'ata'
    commands:
      metrics_info_args: '--info --json -T permissive' # used to determine device unique ID & register device with Scrutiny
      metrics_smart_args: '-v 1,raw48:54 -v 7,raw48:54 --xall --json -T permissive' # used to retrieve smart data for each device.
  - device: /dev/sde
    type: 'ata'
    commands:
      metrics_info_args: '--info --json -T permissive' # used to determine device unique ID & register device with Scrutiny
      metrics_smart_args: '-v 1,raw48:54 -v 7,raw48:54 --xall --json -T permissive' # used to retrieve smart data for each device.

Still shows as failed, I guess you are using the value result and not the raw value:


      {
        "id": 1,
        "name": "Raw_Read_Error_Rate",
        "value": 82,
        "worst": 64,
        "thresh": 44,
        "raw": {
          "value": 0,
          "string": "0"
        }
      },

image

~# smartctl /dev/sda -v 1,raw48:54 -v 7,raw48:54 --xall --json -T permissive
{
  "json_format_version": [
    1,
    0
  ],
  "smartctl": {
    "version": [
      7,
      3
    ],
    "svn_revision": "5338",
    "platform_info": "x86_64-linux-5.17.0-2-amd64",
    "build_info": "(local build)",
    "argv": [
      "smartctl",
      "-v",
      "1,raw48:54",
      "-v",
      "7,raw48:54",
      "--xall",
      "/dev/sda",
      "--json",
      "-T",
      "permissive"
    ],
    "drive_database_version": {
      "string": "7.3/5319"
    },
    "exit_status": 0
  },
  "local_time": {
    "time_t": 1653787546,
    "asctime": "Sun May 29 13:25:46 2022 NZST"
  },
  "device": {
    "name": "/dev/sda",
    "info_name": "/dev/sda [SAT]",
    "type": "sat",
    "protocol": "ATA"
  },
  "model_family": "Seagate IronWolf",
  "model_name": "ST8000VN004-3CP101",
  "serial_number": "WP00C97D",
  "wwn": {
    "naa": 5,
    "oui": 3152,
    "id": 3773008640
  },
  "firmware_version": "SC60",
  "user_capacity": {
    "blocks": 15628053168,
    "bytes": 8001563222016
  },
  "logical_block_size": 512,
  "physical_block_size": 4096,
  "rotation_rate": 7200,
  "form_factor": {
    "ata_value": 2,
    "name": "3.5 inches"
  },
  "trim": {
    "supported": false
  },
  "in_smartctl_database": true,
  "ata_version": {
    "string": "ACS-4 (minor revision not indicated)",
    "major_value": 4064,
    "minor_value": 65535
  },
  "sata_version": {
    "string": "SATA 3.3",
    "value": 511
  },
  "interface_speed": {
    "max": {
      "sata_value": 14,
      "string": "6.0 Gb/s",
      "units_per_second": 60,
      "bits_per_unit": 100000000
    },
    "current": {
      "sata_value": 3,
      "string": "6.0 Gb/s",
      "units_per_second": 60,
      "bits_per_unit": 100000000
    }
  },
  "smart_support": {
    "available": true,
    "enabled": true
  },
  "read_lookahead": {
    "enabled": true
  },
  "write_cache": {
    "enabled": true
  },
  "ata_dsn": {
    "enabled": false
  },
  "ata_security": {
    "state": 41,
    "string": "Disabled, frozen [SEC2]",
    "enabled": false,
    "frozen": true
  },
  "smart_status": {
    "passed": true
  },
  "ata_smart_data": {
    "offline_data_collection": {
      "status": {
        "value": 130,
        "string": "was completed without error",
        "passed": true
      },
      "completion_seconds": 567
    },
    "self_test": {
      "status": {
        "value": 0,
        "string": "completed without error",
        "passed": true
      },
      "polling_minutes": {
        "short": 1,
        "extended": 728,
        "conveyance": 2
      }
    },
    "capabilities": {
      "values": [
        123,
        3
      ],
      "exec_offline_immediate_supported": true,
      "offline_is_aborted_upon_new_cmd": false,
      "offline_surface_scan_supported": true,
      "self_tests_supported": true,
      "conveyance_self_test_supported": true,
      "selective_self_test_supported": true,
      "attribute_autosave_enabled": true,
      "error_logging_supported": true,
      "gp_logging_supported": true
    }
  },
  "ata_sct_capabilities": {
    "value": 20669,
    "error_recovery_control_supported": true,
    "feature_control_supported": true,
    "data_table_supported": true
  },
  "ata_smart_attributes": {
    "revision": 10,
    "table": [
      {
        "id": 1,
        "name": "Raw_Read_Error_Rate",
        "value": 82,
        "worst": 64,
        "thresh": 44,
        "when_failed": "",
        "flags": {
          "value": 15,
          "string": "POSR-- ",
          "prefailure": true,
          "updated_online": true,
          "performance": true,
          "error_rate": true,
          "event_count": false,
          "auto_keep": false
        },
        "raw": {
          "value": 0,
          "string": "0"
        }
      },
      {
        "id": 3,
        "name": "Spin_Up_Time",
        "value": 99,
        "worst": 99,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 3,
          "string": "PO---- ",
          "prefailure": true,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": false,
          "auto_keep": false
        },
        "raw": {
          "value": 0,
          "string": "0"
        }
      },
      {
        "id": 4,
        "name": "Start_Stop_Count",
        "value": 100,
        "worst": 100,
        "thresh": 20,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 1,
          "string": "1"
        }
      },
      {
        "id": 5,
        "name": "Reallocated_Sector_Ct",
        "value": 100,
        "worst": 100,
        "thresh": 10,
        "when_failed": "",
        "flags": {
          "value": 51,
          "string": "PO--CK ",
          "prefailure": true,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 0,
          "string": "0"
        }
      },
      {
        "id": 7,
        "name": "Seek_Error_Rate",
        "value": 100,
        "worst": 253,
        "thresh": 45,
        "when_failed": "",
        "flags": {
          "value": 15,
          "string": "POSR-- ",
          "prefailure": true,
          "updated_online": true,
          "performance": true,
          "error_rate": true,
          "event_count": false,
          "auto_keep": false
        },
        "raw": {
          "value": 0,
          "string": "0"
        }
      },
      {
        "id": 9,
        "name": "Power_On_Hours",
        "value": 100,
        "worst": 100,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 21,
          "string": "21"
        }
      },
      {
        "id": 10,
        "name": "Spin_Retry_Count",
        "value": 100,
        "worst": 100,
        "thresh": 97,
        "when_failed": "",
        "flags": {
          "value": 19,
          "string": "PO--C- ",
          "prefailure": true,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": false
        },
        "raw": {
          "value": 0,
          "string": "0"
        }
      },
      {
        "id": 12,
        "name": "Power_Cycle_Count",
        "value": 100,
        "worst": 100,
        "thresh": 20,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 1,
          "string": "1"
        }
      },
      {
        "id": 18,
        "name": "Head_Health",
        "value": 100,
        "worst": 100,
        "thresh": 50,
        "when_failed": "",
        "flags": {
          "value": 11,
          "string": "PO-R-- ",
          "prefailure": true,
          "updated_online": true,
          "performance": false,
          "error_rate": true,
          "event_count": false,
          "auto_keep": false
        },
        "raw": {
          "value": 0,
          "string": "0"
        }
      },
      {
        "id": 187,
        "name": "Reported_Uncorrect",
        "value": 100,
        "worst": 100,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 0,
          "string": "0"
        }
      },
      {
        "id": 188,
        "name": "Command_Timeout",
        "value": 100,
        "worst": 100,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 0,
          "string": "0"
        }
      },
      {
        "id": 190,
        "name": "Airflow_Temperature_Cel",
        "value": 61,
        "worst": 51,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 34,
          "string": "-O---K ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": false,
          "auto_keep": true
        },
        "raw": {
          "value": 689438759,
          "string": "39 (Min/Max 24/41)"
        }
      },
      {
        "id": 192,
        "name": "Power-Off_Retract_Count",
        "value": 100,
        "worst": 100,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 1,
          "string": "1"
        }
      },
      {
        "id": 193,
        "name": "Load_Cycle_Count",
        "value": 100,
        "worst": 100,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 2,
          "string": "2"
        }
      },
      {
        "id": 194,
        "name": "Temperature_Celsius",
        "value": 39,
        "worst": 41,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 34,
          "string": "-O---K ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": false,
          "auto_keep": true
        },
        "raw": {
          "value": 103079215143,
          "string": "39 (0 24 0 0 0)"
        }
      },
      {
        "id": 197,
        "name": "Current_Pending_Sector",
        "value": 100,
        "worst": 100,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 18,
          "string": "-O--C- ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": false
        },
        "raw": {
          "value": 0,
          "string": "0"
        }
      },
      {
        "id": 198,
        "name": "Offline_Uncorrectable",
        "value": 100,
        "worst": 100,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 16,
          "string": "----C- ",
          "prefailure": false,
          "updated_online": false,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": false
        },
        "raw": {
          "value": 0,
          "string": "0"
        }
      },
      {
        "id": 199,
        "name": "UDMA_CRC_Error_Count",
        "value": 200,
        "worst": 200,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 62,
          "string": "-OSRCK ",
          "prefailure": false,
          "updated_online": true,
          "performance": true,
          "error_rate": true,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 0,
          "string": "0"
        }
      },
      {
        "id": 240,
        "name": "Head_Flying_Hours",
        "value": 100,
        "worst": 253,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 0,
          "string": "------ ",
          "prefailure": false,
          "updated_online": false,
          "performance": false,
          "error_rate": false,
          "event_count": false,
          "auto_keep": false
        },
        "raw": {
          "value": 10408848946888725,
          "string": "21h+40m+23.499s"
        }
      },
      {
        "id": 241,
        "name": "Total_LBAs_Written",
        "value": 100,
        "worst": 253,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 0,
          "string": "------ ",
          "prefailure": false,
          "updated_online": false,
          "performance": false,
          "error_rate": false,
          "event_count": false,
          "auto_keep": false
        },
        "raw": {
          "value": 861495235,
          "string": "861495235"
        }
      },
      {
        "id": 242,
        "name": "Total_LBAs_Read",
        "value": 100,
        "worst": 253,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 0,
          "string": "------ ",
          "prefailure": false,
          "updated_online": false,
          "performance": false,
          "error_rate": false,
          "event_count": false,
          "auto_keep": false
        },
        "raw": {
          "value": 14887785,
          "string": "14887785"
        }
      }
    ]
  },
  "power_on_time": {
    "hours": 21
  },
  "power_cycle_count": 1,
  "temperature": {
    "current": 39,
    "power_cycle_min": 24,
    "power_cycle_max": 41,
    "lifetime_min": 31,
    "lifetime_max": 41,
    "op_limit_min": 5,
    "op_limit_max": 70,
    "limit_min": 5,
    "limit_max": 70,
    "lifetime_over_limit_minutes": 0,
    "lifetime_under_limit_minutes": 0
  },
  "ata_log_directory": {
    "gp_dir_version": 1,
    "smart_dir_version": 1,
    "smart_dir_multi_sector": true,
    "table": [
      {
        "address": 0,
        "name": "Log Directory",
        "read": true,
        "write": false,
        "gp_sectors": 1,
        "smart_sectors": 1
      },
      {
        "address": 1,
        "name": "Summary SMART error log",
        "read": true,
        "write": false,
        "smart_sectors": 1
      },
      {
        "address": 2,
        "name": "Comprehensive SMART error log",
        "read": true,
        "write": false,
        "smart_sectors": 5
      },
      {
        "address": 3,
        "name": "Ext. Comprehensive SMART error log",
        "read": true,
        "write": false,
        "gp_sectors": 5
      },
      {
        "address": 4,
        "name": "Device Statistics log",
        "read": true,
        "write": false,
        "gp_sectors": 256,
        "smart_sectors": 8
      },
      {
        "address": 6,
        "name": "SMART self-test log",
        "read": true,
        "write": false,
        "smart_sectors": 1
      },
      {
        "address": 7,
        "name": "Extended self-test log",
        "read": true,
        "write": false,
        "gp_sectors": 1
      },
      {
        "address": 8,
        "name": "Power Conditions log",
        "read": true,
        "write": false,
        "gp_sectors": 2
      },
      {
        "address": 9,
        "name": "Selective self-test log",
        "read": true,
        "write": true,
        "smart_sectors": 1
      },
      {
        "address": 10,
        "name": "Device Statistics Notification",
        "read": true,
        "write": true,
        "gp_sectors": 8
      },
      {
        "address": 12,
        "name": "Pending Defects log",
        "read": true,
        "write": false,
        "gp_sectors": 2048
      },
      {
        "address": 16,
        "name": "NCQ Command Error log",
        "read": true,
        "write": false,
        "gp_sectors": 1
      },
      {
        "address": 17,
        "name": "SATA Phy Event Counters log",
        "read": true,
        "write": false,
        "gp_sectors": 1
      },
      {
        "address": 19,
        "name": "SATA NCQ Send and Receive log",
        "read": true,
        "write": false,
        "gp_sectors": 1
      },
      {
        "address": 33,
        "name": "Write stream error log",
        "read": true,
        "write": false,
        "gp_sectors": 1
      },
      {
        "address": 34,
        "name": "Read stream error log",
        "read": true,
        "write": false,
        "gp_sectors": 1
      },
      {
        "address": 36,
        "name": "Current Device Internal Status Data log",
        "read": true,
        "write": false,
        "gp_sectors": 768
      },
      {
        "address": 47,
        "name": "Set Sector Configuration",
        "gp_sectors": 1
      },
      {
        "address": 48,
        "name": "IDENTIFY DEVICE data log",
        "read": true,
        "write": false,
        "gp_sectors": 9,
        "smart_sectors": 9
      },
      {
        "address": 128,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 129,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 130,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 131,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 132,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 133,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 134,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 135,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 136,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 137,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 138,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 139,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 140,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 141,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 142,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 143,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 144,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 145,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 146,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 147,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 148,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 149,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 150,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 151,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 152,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 153,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 154,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 155,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 156,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 157,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 158,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 159,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 161,
        "name": "Device vendor specific log",
        "gp_sectors": 160,
        "smart_sectors": 160
      },
      {
        "address": 162,
        "name": "Device vendor specific log",
        "gp_sectors": 16320
      },
      {
        "address": 164,
        "name": "Device vendor specific log",
        "gp_sectors": 160,
        "smart_sectors": 160
      },
      {
        "address": 166,
        "name": "Device vendor specific log",
        "gp_sectors": 192
      },
      {
        "address": 168,
        "name": "Device vendor specific log",
        "gp_sectors": 136,
        "smart_sectors": 136
      },
      {
        "address": 169,
        "name": "Device vendor specific log",
        "gp_sectors": 136,
        "smart_sectors": 136
      },
      {
        "address": 171,
        "name": "Device vendor specific log",
        "gp_sectors": 1
      },
      {
        "address": 173,
        "name": "Device vendor specific log",
        "gp_sectors": 16
      },
      {
        "address": 177,
        "name": "Device vendor specific log",
        "gp_sectors": 160,
        "smart_sectors": 160
      },
      {
        "address": 182,
        "name": "Device vendor specific log",
        "gp_sectors": 1920
      },
      {
        "address": 190,
        "name": "Device vendor specific log",
        "gp_sectors": 65535
      },
      {
        "address": 191,
        "name": "Device vendor specific log",
        "gp_sectors": 65535
      },
      {
        "address": 193,
        "name": "Device vendor specific log",
        "gp_sectors": 8,
        "smart_sectors": 8
      },
      {
        "address": 195,
        "name": "Device vendor specific log",
        "gp_sectors": 24,
        "smart_sectors": 24
      },
      {
        "address": 198,
        "name": "Device vendor specific log",
        "gp_sectors": 5184
      },
      {
        "address": 199,
        "name": "Device vendor specific log",
        "gp_sectors": 8,
        "smart_sectors": 8
      },
      {
        "address": 201,
        "name": "Device vendor specific log",
        "gp_sectors": 8,
        "smart_sectors": 8
      },
      {
        "address": 202,
        "name": "Device vendor specific log",
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 205,
        "name": "Device vendor specific log",
        "gp_sectors": 1,
        "smart_sectors": 1
      },
      {
        "address": 206,
        "name": "Device vendor specific log",
        "gp_sectors": 1
      },
      {
        "address": 207,
        "name": "Device vendor specific log",
        "gp_sectors": 512
      },
      {
        "address": 209,
        "name": "Device vendor specific log",
        "gp_sectors": 656
      },
      {
        "address": 210,
        "name": "Device vendor specific log",
        "gp_sectors": 10256
      },
      {
        "address": 212,
        "name": "Device vendor specific log",
        "gp_sectors": 2048
      },
      {
        "address": 218,
        "name": "Device vendor specific log",
        "gp_sectors": 1,
        "smart_sectors": 1
      },
      {
        "address": 224,
        "name": "SCT Command/Status",
        "read": true,
        "write": true,
        "gp_sectors": 1,
        "smart_sectors": 1
      },
      {
        "address": 225,
        "name": "SCT Data Transfer",
        "read": true,
        "write": true,
        "gp_sectors": 1,
        "smart_sectors": 1
      }
    ]
  },
  "ata_smart_error_log": {
    "extended": {
      "revision": 1,
      "sectors": 5,
      "count": 0
    }
  },
  "ata_smart_self_test_log": {
    "extended": {
      "revision": 1,
      "sectors": 1,
      "count": 0
    }
  },
  "ata_smart_selective_self_test_log": {
    "revision": 1,
    "table": [
      {
        "lba_min": 0,
        "lba_max": 0,
        "status": {
          "value": 0,
          "string": "Not_testing"
        }
      },
      {
        "lba_min": 0,
        "lba_max": 0,
        "status": {
          "value": 0,
          "string": "Not_testing"
        }
      },
      {
        "lba_min": 0,
        "lba_max": 0,
        "status": {
          "value": 0,
          "string": "Not_testing"
        }
      },
      {
        "lba_min": 0,
        "lba_max": 0,
        "status": {
          "value": 0,
          "string": "Not_testing"
        }
      },
      {
        "lba_min": 0,
        "lba_max": 0,
        "status": {
          "value": 0,
          "string": "Not_testing"
        }
      }
    ],
    "flags": {
      "value": 0,
      "remainder_scan_enabled": false
    },
    "power_up_scan_resume_minutes": 0
  },
  "ata_sct_status": {
    "format_version": 3,
    "sct_version": 522,
    "device_state": {
      "value": 0,
      "string": "Active"
    },
    "temperature": {
      "current": 38,
      "power_cycle_min": 24,
      "power_cycle_max": 41,
      "lifetime_min": 24,
      "lifetime_max": 49,
      "under_limit_count": 0,
      "over_limit_count": 22
    },
    "smart_status": {
      "passed": true
    },
    "vendor_specific": [
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      3,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0
    ]
  },
  "ata_sct_temperature_history": {
    "version": 2,
    "sampling_period_minutes": 4,
    "logging_interval_minutes": 59,
    "temperature": {
      "op_limit_min": 10,
      "op_limit_max": 25,
      "limit_min": 5,
      "limit_max": 70
    },
    "size": 128,
    "index": 24,
    "table": [
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      49,
      null,
      24,
      39,
      38,
      38,
      38,
      38,
      38,
      38,
      37,
      37,
      36,
      36,
      36,
      36,
      35,
      35,
      35,
      35,
      35,
      35,
      37,
      38,
      38
    ]
  },
  "ata_sct_erc": {
    "read": {
      "enabled": true,
      "deciseconds": 70
    },
    "write": {
      "enabled": true,
      "deciseconds": 70
    }
  },
  "ata_device_statistics": {
    "pages": [
      {
        "number": 1,
        "name": "General Statistics",
        "revision": 1,
        "table": [
          {
            "offset": 8,
            "name": "Lifetime Power-On Resets",
            "size": 4,
            "value": 1,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 16,
            "name": "Power-on Hours",
            "size": 4,
            "value": 21,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 24,
            "name": "Logical Sectors Written",
            "size": 6,
            "value": 861495235,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 32,
            "name": "Number of Write Commands",
            "size": 6,
            "value": 3228990,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 40,
            "name": "Logical Sectors Read",
            "size": 6,
            "value": 14887785,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 48,
            "name": "Number of Read Commands",
            "size": 6,
            "value": 159012,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 56,
            "name": "Date and Time TimeStamp",
            "size": 6,
            "flags": {
              "value": 128,
              "string": "---- ",
              "valid": false,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          }
        ]
      },
      {
        "number": 3,
        "name": "Rotating Media Statistics",
        "revision": 1,
        "table": [
          {
            "offset": 8,
            "name": "Spindle Motor Power-on Hours",
            "size": 4,
            "value": 21,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 16,
            "name": "Head Flying Hours",
            "size": 4,
            "value": 21,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 24,
            "name": "Head Load Events",
            "size": 4,
            "value": 2,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 32,
            "name": "Number of Reallocated Logical Sectors",
            "size": 4,
            "value": 0,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 40,
            "name": "Read Recovery Attempts",
            "size": 4,
            "value": 0,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 48,
            "name": "Number of Mechanical Start Failures",
            "size": 4,
            "value": 0,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 56,
            "name": "Number of Realloc. Candidate Logical Sectors",
            "size": 4,
            "value": 0,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 64,
            "name": "Number of High Priority Unload Events",
            "size": 4,
            "value": 1,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          }
        ]
      },
      {
        "number": 4,
        "name": "General Errors Statistics",
        "revision": 1,
        "table": [
          {
            "offset": 8,
            "name": "Number of Reported Uncorrectable Errors",
            "size": 4,
            "value": 0,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 16,
            "name": "Resets Between Cmd Acceptance and Completion",
            "size": 4,
            "value": 0,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 24,
            "name": "Physical Element Status Changed",
            "size": 4,
            "value": 0,
            "flags": {
              "value": 208,
              "string": "V-D- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": true,
              "monitored_condition_met": false
            }
          }
        ]
      },
      {
        "number": 5,
        "name": "Temperature Statistics",
        "revision": 1,
        "table": [
          {
            "offset": 8,
            "name": "Current Temperature",
            "size": 1,
            "value": 39,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 16,
            "name": "Average Short Term Temperature",
            "size": 1,
            "flags": {
              "value": 128,
              "string": "---- ",
              "valid": false,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 24,
            "name": "Average Long Term Temperature",
            "size": 1,
            "flags": {
              "value": 128,
              "string": "---- ",
              "valid": false,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 32,
            "name": "Highest Temperature",
            "size": 1,
            "value": 41,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 40,
            "name": "Lowest Temperature",
            "size": 1,
            "value": 31,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 48,
            "name": "Highest Average Short Term Temperature",
            "size": 1,
            "flags": {
              "value": 128,
              "string": "---- ",
              "valid": false,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 56,
            "name": "Lowest Average Short Term Temperature",
            "size": 1,
            "flags": {
              "value": 128,
              "string": "---- ",
              "valid": false,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 64,
            "name": "Highest Average Long Term Temperature",
            "size": 1,
            "flags": {
              "value": 128,
              "string": "---- ",
              "valid": false,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 72,
            "name": "Lowest Average Long Term Temperature",
            "size": 1,
            "flags": {
              "value": 128,
              "string": "---- ",
              "valid": false,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 80,
            "name": "Time in Over-Temperature",
            "size": 4,
            "value": 0,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 88,
            "name": "Specified Maximum Operating Temperature",
            "size": 1,
            "value": 70,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 96,
            "name": "Time in Under-Temperature",
            "size": 4,
            "value": 0,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 104,
            "name": "Specified Minimum Operating Temperature",
            "size": 1,
            "value": 5,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          }
        ]
      },
      {
        "number": 6,
        "name": "Transport Statistics",
        "revision": 1,
        "table": [
          {
            "offset": 8,
            "name": "Number of Hardware Resets",
            "size": 4,
            "value": 2,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 16,
            "name": "Number of ASR Events",
            "size": 4,
            "value": 0,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 24,
            "name": "Number of Interface CRC Errors",
            "size": 4,
            "value": 0,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          }
        ]
      },
      {
        "number": 255,
        "name": "Vendor Specific Statistics",
        "revision": 1,
        "table": [
          {
            "offset": 16,
            "name": "Vendor Specific",
            "size": 7,
            "value": 0,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 24,
            "name": "Vendor Specific",
            "size": 7,
            "value": 0,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          }
        ]
      }
    ]
  },
  "ata_pending_defects_log": {
    "size": 65535,
    "count": 0
  },
  "sata_phy_event_counters": {
    "table": [
      {
        "id": 10,
        "name": "Device-to-host register FISes sent due to a COMRESET",
        "size": 2,
        "value": 3,
        "overflow": false
      },
      {
        "id": 1,
        "name": "Command failed due to ICRC error",
        "size": 2,
        "value": 0,
        "overflow": false
      },
      {
        "id": 3,
        "name": "R_ERR response for device-to-host data FIS",
        "size": 2,
        "value": 0,
        "overflow": false
      },
      {
        "id": 4,
        "name": "R_ERR response for host-to-device data FIS",
        "size": 2,
        "value": 0,
        "overflow": false
      },
      {
        "id": 6,
        "name": "R_ERR response for device-to-host non-data FIS",
        "size": 2,
        "value": 0,
        "overflow": false
      },
      {
        "id": 7,
        "name": "R_ERR response for host-to-device non-data FIS",
        "size": 2,
        "value": 0,
        "overflow": false
      }
    ],
    "reset": false
  }
}

@AnalogJ
Copy link
Owner

AnalogJ commented May 29, 2022

Yeah in some cases the normalized data is what is used for backblaze reliability comparison:

https://www.backblaze.com/blog-smart-stats-2014-8.html#S1R

Though in this case, while the raw graph has "nicer" failure thresholds, the normalized data was chosen:

(Vendor specific raw value.) Stores data related to the rate of hardware read errors that occurred when reading data from a disk surface. The raw value has different structure for different vendors and is often not meaningful as a decimal number.

https://github.com/AnalogJ/scrutiny/blob/master/webapp/backend/pkg/thresholds/ata_attribute_metadata.go#L38

@somebody-somewhere-over-the-rainbow

so, how do I make scrutiny use the raw value to get the drive shown as healthy? or is it simply not possible at this time?

@ChromoX
Copy link

ChromoX commented Jun 12, 2022

@Parlane I followed your example and that fixed most of the fields except for the 0x07/Seek Error Rate field. Therefore Scrutiny still reports the drive as "Failed".

It seems there might be a bug where it's using the VALUE column returned by smartctl instead of the RAW_VALUE column?

@Parlane
Copy link
Author

Parlane commented Jun 12, 2022

@Parlane I followed your example and that fixed most of the fields except for the 0x07/Seek Error Rate field. Therefore Scrutiny still reports the drive as "Failed".

It seems there might be a bug where it's using the VALUE column returned by smartctl instead of the RAW_VALUE column?

Yes sorry, mine also shows as failed still. I think the real problem might now be in how smartctl calculates value. Or we could ask @AnalogJ to allow a config choice to use the raw value, with the raw_read or seek errors any value above 0 is bad IMO. And I would be happy to mark a drive as bad simply because the raw value was not 0 in the case of my seagate ironwolfs.

@NemesisRE
Copy link

That is not completely true values above 0 can be bad but don't have to be, there is also a temporal component if values are not to high and are stable, they do not increase, there is no need to replace a drive cause it is not failed it is fully operational which would mean it is only in a warning or error state.

I am speaking from experience with ~9000 drives.

@AnalogJ AnalogJ reopened this Jun 13, 2022
@AnalogJ
Copy link
Owner

AnalogJ commented Jun 13, 2022

I think this is working correctly now, but it requires some steps & an explanation (and possibly a Seagate specific troubleshooting guide).

I don't have time to write that up right now, but I'll re-open this issue so I don't forget.

@AnalogJ AnalogJ added the documentation Improvements or additions to documentation label Jun 13, 2022
@adhawkins
Copy link

I have just installed scrutiny, and am seeing the same issue with similar drives. Will follow this issue in the hope the documentation arrives.

@AnalogJ
Copy link
Owner

AnalogJ commented Jun 14, 2022

TL;DR;

  1. Upgrade to v0.4.13+
  2. Reset your drive status using the SQLite script in #device-failed-but-smart--scrutiny-passed
  3. Wait for (or manually start) the collector.

Please try these steps and comment below if they work for you. Thanks! 🙏


The following explanation is documented here

As thoroughly discussed in #255, Seagate (Ironwolf & others) drives are almost always marked as failed by Scrutiny.

The Seek Error Rate & Read Error Rate attribute raw values are typically very high, and the
normalised values (Current / Worst / Threshold) are usually quite low. Despite this, the numbers in most cases are perfectly OK

The anxiety arises because we intuitively expect that the normalised values should reflect a "health" score, with
100 being the ideal value. Similarly, we would expect that the raw values should reflect an error count, in
which case a value of 0 would be most desirable. However, Seagate calculates and applies these attribute values
in a counterintuitive way.

http://www.users.on.net/~fzabkar/HDD/Seagate_SER_RRER_HEC.html

Some analysis has been done which shows that Seagate drives break the common SMART conventions, which also causes Scrutiny's
comparison against BackBlaze data to detect these drives as failed.

So what's the Solution?

After taking a look at the BackBlaze data for the relevant Attributes (Seek Error Rate & Read Error Rate), I've decided
to disable Scrutiny analysis for them. Both are non-critical, and have low-correlation with failure.

Please note: SMART failures for these attributes will still cause the drive to be marked as failed. Only BackBlaze analysis has been disabled

If this is effecting your drives, you'll need to do the following:

  1. Upgrade to v0.4.13+
  2. Reset your drive status using the SQLite script in #device-failed-but-smart--scrutiny-passed
  3. Wait for (or manually start) the collector.

If you'd like to learn more about how the Seagate Ironwolf SMART attributes work under the hood, and how they differ from
other drives, please read the following:

@fightforlife
Copy link

Is there maybe a very similiar issue with the "Command timeout" on Seagate drives?
Both my Seagate drives report unrealistic values, while the other drives report normal values.
image
image

It seems the soultion is reported here: https://forums.tomshardware.com/threads/very-high-command-timeout.648978/

In fact the actual value is 4, not 262148.
The 48-bit raw value is often composed of three 16-bit components, ie ...
0x000000040004 = 0x0000 0x0004 0x0004

@Parlane
Copy link
Author

Parlane commented Jun 14, 2022

TL;DR;

  1. Upgrade to v0.4.13+
  2. Reset your drive status using the SQLite script in #device-failed-but-smart--scrutiny-passed
  3. Wait for (or manually start) the collector.

I am running: beta#7a6c94a (docker ghcr.io/analogj/scrutiny:beta-omnibus)

I used sqlite to update the status to not failed. Reran the "scrutiny-collector-metrics run" manually. All seagate marked as failed.

image

@Parlane
Copy link
Author

Parlane commented Jun 14, 2022

Oops I see beta is actually behind master now... I will try master instead.

@Parlane
Copy link
Author

Parlane commented Jun 14, 2022

Yay it works :) With master#145c819

Thank you @AnalogJ

@adhawkins
Copy link

Mine appears to work correctly too, thanks.

@AnalogJ
Copy link
Owner

AnalogJ commented Jun 15, 2022

fantastic, closing this out as fixed in v0.4.13 (#301)

Thanks for all you help everyone!! 🥳


@fightforlife similar issue, but not quite as dire. For the command timeout attribute, scrutiny is checking the RAW value, and since the number is so absurdly high, it doesnt even fit into any of the buckets that we're looking for, so scrutiny just marks it as warn. If you want to fix that attribute I'm guessing you can add the following line to the collector config file for your Seagate drive.

      metrics_smart_args: '-v 188,raw48:54 --xall --json -T permissive' 

@Parlane yeah I developed this change on a different branch, glad you figured it out!

@AnalogJ AnalogJ closed this as completed Jun 15, 2022
@tadly
Copy link

tadly commented Jun 15, 2022

one last question @AnalogJ

Is blackblaze comparsion for seagate coming back at all or will this stay disabled now?
My understanding is that the raw value could be used to compare against backblaze or am I misunderstanding this?
When I say raw I mean like in the OP smartctl /dev/sdb -a -v 1,raw48:54 -v 7,raw48:54

Thanks a lot for the quick fix though. Really appreciate how much time and effort you put into this project

@AnalogJ
Copy link
Owner

AnalogJ commented Jun 15, 2022

@tadly Backblaze comparision is still enabled for Seagate, its just disabled for these 2 attributes.
I'm working on a larger project to allow users to customize how Scrutiny (not SMART) analysis is done on a drive by drive basis. Unfortunately its going to take a bit of time to roll out.

Regarding the RAW attribute values, the issue is that the relevant attributes are Vendor specific, so I decided to use the Normalized value in hope that I wouldn't need to worry about how the vendor decided to encode the data. Unfortunately Seagate decided to muck around with the normalized data as well (100 & 60 are both healthy values).
Using the RAW value for those attributes would require alot more data analysis, and I'd probably need to complete #10 or something similar first.

Glad everything is working for you.

@MattKobayashi
Copy link
Sponsor Contributor

Hi @AnalogJ, sorry to bring this one back up, but I seem to be having issues passing the command timeout value override as a smartctl argument in collector.yaml. I add the following to collector.yaml below devices:

  - device: /dev/sdb
    commands:
      metrics_smart_args: "--vendorattribute=188,raw48:54 --xall --json -T permissive"
  - device: /dev/sdc
    commands:
      metrics_smart_args: "--vendorattribute=188,raw48:54 --xall --json -T permissive"
  - device: /dev/sdd
    commands:
      metrics_smart_args: "--vendorattribute=188,raw48:54 --xall --json -T permissive"
  - device: /dev/sde
    commands:
      metrics_smart_args: "--vendorattribute=188,raw48:54 --xall --json -T permissive"
  - device: /dev/sdf
    commands:
      metrics_smart_args: "--vendorattribute=188,raw48:54 --xall --json -T permissive"
  - device: /dev/sdg
    commands:
      metrics_smart_args: "--vendorattribute=188,raw48:54 --xall --json -T permissive"

All that seems to happen when I do is Scrutiny skips those drives entirely during a scan run. Do you have any ideas on what might be going wrong here?

@Parlane
Copy link
Author

Parlane commented Jun 23, 2022

Hi @AnalogJ, sorry to bring this one back up, but I seem to be having issues passing the command timeout value override as a smartctl argument in collector.yaml. I add the following to collector.yaml below devices:

  - device: /dev/sdb
    commands:
      metrics_smart_args: "--vendorattribute=188,raw48:54 --xall --json -T permissive"
  - device: /dev/sdc
    commands:
      metrics_smart_args: "--vendorattribute=188,raw48:54 --xall --json -T permissive"
  - device: /dev/sdd
    commands:
      metrics_smart_args: "--vendorattribute=188,raw48:54 --xall --json -T permissive"
  - device: /dev/sde
    commands:
      metrics_smart_args: "--vendorattribute=188,raw48:54 --xall --json -T permissive"
  - device: /dev/sdf
    commands:
      metrics_smart_args: "--vendorattribute=188,raw48:54 --xall --json -T permissive"
  - device: /dev/sdg
    commands:
      metrics_smart_args: "--vendorattribute=188,raw48:54 --xall --json -T permissive"

All that seems to happen when I do is Scrutiny skips those drives entirely during a scan run. Do you have any ideas on what might be going wrong here?

Specify the device type like this, you may also need to specify both smart args and info args:

  - device: /dev/sdb
    type: 'ata'
    commands:
      metrics_info_args: '--info --json -T permissive' # used to determine device unique ID & register device with Scrutiny
      metrics_smart_args: '--vendorattribute=188,raw48:54 --xall --json -T permissive' # used to retrieve smart data for each device.

@MattKobayashi
Copy link
Sponsor Contributor

That fixed it, thank you @Parlane!

@AnalogJ AnalogJ reopened this Jun 23, 2022
AnalogJ added a commit that referenced this issue Jun 25, 2022
…re the device. We need to make sure we correctly override the device.

fixes #255
@AnalogJ
Copy link
Owner

AnalogJ commented Jun 25, 2022

@Parlane @MattKobayashi that's definitely a bug (missing deviceType should not cause the device to be skipped).

I've made a fix (and associated tests) in the beta branch, sorry about that.
I'm going to close this issue again.

@Parlane
Copy link
Author

Parlane commented Jun 25, 2022

@Parlane @MattKobayashi that's definitely a bug (missing deviceType should not cause the device to be skipped).

I've made a fix (and associated tests) in the beta branch, sorry about that. I'm going to close this issue again.

Haha when I added device type to fix my config I assumed it was me who had done it wrong 🤣

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants