Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Some drive types do not display value for Power on Hours #43

Closed
AnalogJ opened this issue Sep 26, 2020 · 8 comments · Fixed by #51
Closed

[BUG] Some drive types do not display value for Power on Hours #43

AnalogJ opened this issue Sep 26, 2020 · 8 comments · Fixed by #51
Labels
bug Something isn't working waiting for response

Comments

@AnalogJ
Copy link
Owner

AnalogJ commented Sep 26, 2020

Describe the bug
NVMe drives do not display value for Power on Hours

Screenshots
See #37 for more information + logs.

Log Files

time="2020-09-25T17:41:29Z" level=info msg="Executing command: smartctl -a -j /dev/sdi" type=metrics
{
  "json_format_version": [
    1,
    0
  ],
  "smartctl": {
    "version": [
      7,
      0
    ],
    "svn_revision": "4883",
    "platform_info": "x86_64-linux-4.19.107-Unraid",
    "build_info": "(local build)",
    "argv": [
      "smartctl",
      "-a",
      "-j",
      "/dev/sdi"
    ],
    "exit_status": 0
  },
  "device": {
    "name": "/dev/sdi",
    "info_name": "/dev/sdi",
    "type": "scsi",
    "protocol": "SCSI"
  },
  "vendor": "TOSHIBA",
  "product": "MG03SCA300",
  "model_name": "TOSHIBA MG03SCA300",
  "revision": "DG04",
  "scsi_version": "SPC-4",
  "user_capacity": {
    "blocks": 5860533168,
    "bytes": 3000592982016
  },
  "logical_block_size": 512,
  "rotation_rate": 7200,
  "form_factor": {
    "scsi_value": 2,
    "name": "3.5 inches"
  },
  "serial_number": "Z4K0AXXXXXX",
  "device_type": {
    "scsi_value": 0,
    "name": "disk"
  },
  "local_time": {
    "time_t": 1601055689,
    "asctime": "Fri Sep 25 17:41:29 2020 America"
  },
  "smart_status": {
    "passed": true
  },
  "temperature": {
    "current": 33,
    "drive_trip": 65
  },
  "scsi_grown_defect_list": 0,
  "scsi_error_counter_log": {
    "read": {
      "errors_corrected_by_eccfast": 0,
      "errors_corrected_by_eccdelayed": 9,
      "errors_corrected_by_rereads_rewrites": 9,
      "total_errors_corrected": 9,
      "correction_algorithm_invocations": 15,
      "gigabytes_processed": "5670384.023",
      "total_uncorrected_errors": 0
    },
    "write": {
      "errors_corrected_by_eccfast": 0,
      "errors_corrected_by_eccdelayed": 0,
      "errors_corrected_by_rereads_rewrites": 0,
      "total_errors_corrected": 0,
      "correction_algorithm_invocations": 0,
      "gigabytes_processed": "102090.851",
      "total_uncorrected_errors": 0
    },
    "verify": {
      "errors_corrected_by_eccfast": 0,
      "errors_corrected_by_eccdelayed": 1,
      "errors_corrected_by_rereads_rewrites": 1,
      "total_errors_corrected": 1,
      "correction_algorithm_invocations": 1,
      "gigabytes_processed": "669629.192",
      "total_uncorrected_errors": 0
    }
  }
}
@AnalogJ AnalogJ added the bug Something isn't working label Sep 26, 2020
@AnalogJ
Copy link
Owner Author

AnalogJ commented Sep 26, 2020

Just looking at the data from the smartctl logs, it looks like nvme drives do not provide power on hours. Here's an example from an SCSI drive

SCSI output
time="2020-09-25T17:41:30Z" level=info msg="Executing command: smartctl -a -j /dev/sdk" type=metrics
{
  "json_format_version": [
    1,
    0
  ],
  "smartctl": {
    "version": [
      7,
      0
    ],
    "svn_revision": "4883",
    "platform_info": "x86_64-linux-4.19.107-Unraid",
    "build_info": "(local build)",
    "argv": [
      "smartctl",
      "-a",
      "-j",
      "/dev/sdk"
    ],
    "exit_status": 0
  },
  "device": {
    "name": "/dev/sdk",
    "info_name": "/dev/sdk",
    "type": "scsi",
    "protocol": "SCSI"
  },
  "vendor": "SEAGATE",
  "product": "ST9300503SS",
  "model_name": "SEAGATE ST9300503SS",
  "revision": "MS01",
  "scsi_version": "SPC-3",
  "user_capacity": {
    "blocks": 585937500,
    "bytes": 300000000000
  },
  "logical_block_size": 512,
  "rotation_rate": 10000,
  "form_factor": {
    "scsi_value": 3,
    "name": "2.5 inches"
  },
  "serial_number": "6SE0N87PXXXXXXX",
  "device_type": {
    "scsi_value": 0,
    "name": "disk"
  },
  "local_time": {
    "time_t": 1601055690,
    "asctime": "Fri Sep 25 17:41:30 2020 America"
  },
  "smart_status": {
    "passed": true
  },
  "temperature": {
    "current": 47,
    "drive_trip": 68
  },
  "scsi_grown_defect_list": 0,
  "power_on_time": {
    "hours": 168,
    "minutes": 42
  },
  "scsi_error_counter_log": {
    "read": {
      "errors_corrected_by_eccfast": 748616,
      "errors_corrected_by_eccdelayed": 0,
      "errors_corrected_by_rereads_rewrites": 0,
      "total_errors_corrected": 748616,
      "correction_algorithm_invocations": 748616,
      "gigabytes_processed": "505.278",
      "total_uncorrected_errors": 0
    },
    "write": {
      "errors_corrected_by_eccfast": 0,
      "errors_corrected_by_eccdelayed": 0,
      "errors_corrected_by_rereads_rewrites": 0,
      "total_errors_corrected": 0,
      "correction_algorithm_invocations": 0,
      "gigabytes_processed": "2763.970",
      "total_uncorrected_errors": 0
    }
  }
}

@AnalogJ
Copy link
Owner Author

AnalogJ commented Sep 26, 2020

We should investigate the results using the -x flag, which provides additional device data. #9

@teambvd can you try running the following command and paste the results below?
docker exec scrutiny smartctl -x -j /dev/sdi

Thanks!

@teambvd
Copy link
Contributor

teambvd commented Sep 26, 2020

That drive is a SAS drive (not NVMe) - the above command does give the power on hours at the very end:
"power_on_time": {
"hours": 39403,
"minutes": 32

NVMe drives display power on hours properly as is:
Screen Shot 2020-09-26 at 3 27 08 PM

Full output of the docker exec command:
docker exec scrutiny smartctl -x -j /dev/sdi

{
  "json_format_version": [
    1,
    0
  ],
  "smartctl": {
    "version": [
      7,
      0
    ],
    "svn_revision": "4883",
    "platform_info": "x86_64-linux-4.19.107-Unraid",
    "build_info": "(local build)",
    "argv": [
      "smartctl",
      "-x",
      "-j",
      "/dev/sdi"
    ],
    "exit_status": 0
  },
  "device": {
    "name": "/dev/sdi",
    "info_name": "/dev/sdi",
    "type": "scsi",
    "protocol": "SCSI"
  },
  "vendor": "TOSHIBA",
  "product": "MG03SCA300",
  "model_name": "TOSHIBA MG03SCA300",
  "revision": "DG04",
  "scsi_version": "SPC-4",
  "user_capacity": {
    "blocks": 5860533168,
    "bytes": 3000592982016
  },
  "logical_block_size": 512,
  "rotation_rate": 7200,
  "form_factor": {
    "scsi_value": 2,
    "name": "3.5 inches"
  },
  "serial_number": "Z4K0A12YFVL9",
  "device_type": {
    "scsi_value": 0,
    "name": "disk"
  },
  "local_time": {
    "time_t": 1601151939,
    "asctime": "Sat Sep 26 20:25:39 2020 America"
  },
  "smart_status": {
    "passed": true
  },
  "temperature": {
    "current": 28,
    "drive_trip": 65
  },
  "scsi_grown_defect_list": 0,
  "scsi_error_counter_log": {
    "read": {
      "errors_corrected_by_eccfast": 0,
      "errors_corrected_by_eccdelayed": 9,
      "errors_corrected_by_rereads_rewrites": 9,
      "total_errors_corrected": 9,
      "correction_algorithm_invocations": 15,
      "gigabytes_processed": "5674097.674",
      "total_uncorrected_errors": 0
    },
    "write": {
      "errors_corrected_by_eccfast": 0,
      "errors_corrected_by_eccdelayed": 0,
      "errors_corrected_by_rereads_rewrites": 0,
      "total_errors_corrected": 0,
      "correction_algorithm_invocations": 0,
      "gigabytes_processed": "102102.605",
      "total_uncorrected_errors": 0
    },
    "verify": {
      "errors_corrected_by_eccfast": 0,
      "errors_corrected_by_eccdelayed": 1,
      "errors_corrected_by_rereads_rewrites": 1,
      "total_errors_corrected": 1,
      "correction_algorithm_invocations": 1,
      "gigabytes_processed": "669629.192",
      "total_uncorrected_errors": 0
    }
  },
  "power_on_time": {
    "hours": 39403,
    "minutes": 32
  }
}

@AnalogJ AnalogJ changed the title [BUG] NVMe drives do not display value for Power on Hours [BUG] Some drive types do not display value for Power on Hours Sep 26, 2020
@AnalogJ
Copy link
Owner Author

AnalogJ commented Sep 26, 2020

Oh apologies. I should have taken a closer look at the drive type.
That's fantastic though, I'm glad the -x flag works. Judging by its description, it should return both the device data and the smart data for all drive types, but I've been bitten by weird inconsistencies in the past.

Let me add the flag the collector in a new branch and spin up a new docker image for you to try out.

AnalogJ added a commit that referenced this issue Sep 26, 2020
…en using `-a` flag (unlike other device types).

fixes #43
fixes #9
@AnalogJ
Copy link
Owner Author

AnalogJ commented Sep 26, 2020

Alright, I got the image built.

can you run the following image: analogj/scrutiny:x_flag

docker run -it --rm -p 8080:8080 \
-v /run/udev:/run/udev:ro \
--cap-add SYS_RAWIO \
--device=/dev/sda \
--device=/dev/sdb \
--name scrutiny \
analogj/scrutiny:x_flag

@mglubber
Copy link

Not the original poster, but I had a similar issue with an Intel 520 SSD - for some reason smartctl -a reports an absurd value for the drive. It's bit older, but I don't think it's been powered on for 105 years. Using the 'x_flag' docker images returns a much saner value of 3 years.

@teambvd
Copy link
Contributor

teambvd commented Sep 28, 2020

@AnalogJ I think that cleared up the last of my issues! To summarize the configuration, in case it helps with your documentation:

  • Poweredge r720xd
    • 2x 2.5" 10k SAS FDE drives on secondary internal midplane (separate bus, same chipset)
    • 10x 3.5" SAS and 2x 3.5" SATA on primary internal backplane
    • 1x Intel AIC NVMe (PCIe add-in-card)
  • 12x external 3.5" SAS FDE drives connected via LSI 9207-8e in an EMC KTN-STL3 disk shelf

Basically, this is about the biggest mix and match one could expect to have, at least as far as protocols are concerned. The only way I could see this getting confused now is if there were some host multipathing issues, but that's outside the scope of this product (and wouldn't be your problem anyway).

Great friggin work! Do you need the logs for anything, or...? I didn't figure you would if it all worked as designed, but perhaps you've other needs for em I don't know about...?

@AnalogJ
Copy link
Owner Author

AnalogJ commented Sep 29, 2020

Hey @teambvd
That's great to hear. Thanks for all the additional information about your setup.
I don't need any log files actually, it sounds like everything is working as intended :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working waiting for response
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants