Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cannot override incorrect device type from smartctl --scan, breaks Synology #48

Closed
arnauos opened this issue Sep 28, 2020 · 5 comments · Fixed by #88
Closed

[BUG] Cannot override incorrect device type from smartctl --scan, breaks Synology #48

arnauos opened this issue Sep 28, 2020 · 5 comments · Fixed by #88
Labels
bug Something isn't working

Comments

@arnauos
Copy link

arnauos commented Sep 28, 2020

Hi, I'm trying scrutiny on a Synology NAS (using Docker) and I notice that it does not correctly gets smart details for the internal disks.

Running the collector, I see that its trying to query the disk using scsi by the results of "smartctl -a -j /dev/sdb":

  "device": {
    "name": "/dev/sdb",
    "info_name": "/dev/sdb",
    "type": "scsi",
    "protocol": "SCSI"
  },

However, in a synology, the query must be done in "sat" mode (SCSI to ATA Translation) ("scsi" does not return smart values and "ata" does not work).

Here are the ouputs of the three modes:

scsi results:

root@a26318c5f903:/scrutiny/bin# smartctl -d scsi --all /dev/sdb                                        
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.105] (local build)                                     
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org                             
                                                                                                        
=== START OF INFORMATION SECTION ===                                                                    
Vendor:               HGST                                                                              
Product:              HDN726040ALE614                                                                   
Revision:             APGN                                                                              
Compliance:           SPC-3                                                                             
User Capacity:        4,000,787,030,016 bytes [4.00 TB]                                                 
Logical block size:   512 bytes                                                                         
Physical block size:  4096 bytes                                                                        
LU is fully provisioned                                                                                 
Rotation Rate:        7200 rpm                                                                          
Form Factor:          3.5 inches                                                                        
Logical Unit id:      0x5000cca25def1764                                                                
Serial number:        K4KXXXXX                                                                          
Device type:          disk                                                                              
Local Time is:        Mon Sep 28 21:00:47 2020 UTC                                                      
SMART support is:     Unavailable - device lacks SMART capability.                                      
                                                                                                        
=== START OF READ SMART DATA SECTION ===                                                                
Current Drive Temperature:     0 C                                                                      
Drive Trip Temperature:        0 C                                                                      
                                                                                                        
Error Counter logging not supported                                                                     
                                                                                                        
                                                                                                        
[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']                              
Device does not support Self Test logging    

ata results:

root@a26318c5f903:/scrutiny/bin# smartctl -d ata --all /dev/sdb                                         
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.105] (local build)                                     
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org                             
                                                                                                        
Read Device Identity failed: Permission denied                                                          
                                                                                                        
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.       

sat results:

root@a26318c5f903:/scrutiny/bin# smartctl -d sat --all /dev/sdb                                         
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.105] (local build)                                     
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org                             
                                                                                                        
=== START OF INFORMATION SECTION ===                                                                    
Model Family:     HGST Deskstar NAS                                                                     
Device Model:     HGST HDN726040ALE614                                                                  
Serial Number:    K4KXXXXX                                                                              
LU WWN Device Id: 5 000cca 25def1764                                                                    
Firmware Version: APGNW7JH                                                                              
User Capacity:    4,000,787,030,016 bytes [4.00 TB]                                                     
Sector Sizes:     512 bytes logical, 4096 bytes physical                                                
Rotation Rate:    7200 rpm                                                                              
Form Factor:      3.5 inches                                                                            
Device is:        In smartctl database [for details use: -P show]                                       
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4                                                 
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)                                                
Local Time is:    Mon Sep 28 20:57:22 2020 UTC                                                          
SMART support is: Available - device has SMART capability.                                              
SMART support is: Enabled                                                                               
                                                                                                        
=== START OF READ SMART DATA SECTION ===                                                                
SMART overall-health self-assessment test result: PASSED              

                                                                                                        
General SMART Values:                                                                                   
Offline data collection status:  (0x82) Offline data collection activity                                
                                        was completed without error.                                    
                                        Auto Offline Data Collection: Enabled.                          
Self-test execution status:      193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Alway
s       -       1072                                                                                    
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       38               
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       1072             
(   0)  The previous self-test routine completed                                                        
                                        without error or no self-test has ever                          
                                        been run.                                                       
Total time to complete Offline                                                                          
data collection:                (  113) seconds.                                                        
Offline data collection                                                                                 
capabilities:                    (0x5b) SMART execute Offline immediate.                                
                                        Auto Offline data collection on/off support.                    
                                        Suspend Offline collection upon new                             
                                        command.                                                        
                                        Offline surface scan supported.                                 
                                        Self-test supported.                                            
                                        No Conveyance Self-test supported.                              
                                        Selective Self-test supported.                                  
SMART capabilities:            (0x0003) Saves SMART data before entering                                
                                        power-saving mode.                                              
                                        Supports SMART auto save timer.                                 
Error logging capability:        (0x01) Error logging supported.                                        
                                        General Purpose Logging supported.                              
Short self-test routine                                                                                 
recommended polling time:        (   2) minutes.                                                        
Extended self-test routine                                                                              
recommended polling time:        ( 571) minutes.                                                        
SCT capabilities:              (0x003d) SCT Status supported.                                           
                                        SCT Error Recovery Control supported.                           
                                        SCT Feature Control supported.                                  
                                        SCT Data Table supported.                                       
                                                              
SMART Attributes Data Structure revision number: 16                                                     
Vendor Specific SMART Attributes with Thresholds:                                                       
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE        
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0                
  2 Throughput_Performance  0x0005   138   138   054    Pre-fail  Offline      -       100              
  3 Spin_Up_Time            0x0007   149   149   024    Pre-fail  Always       -       361 (Average 357)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       50               
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0                
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0                
  8 Seek_Time_Performance   0x0005   128   128   020    Pre-fail  Offline      -       18               
  9 Power_On_Hours          0x0012   097   097   000    Old_age   Always       -       24956            
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0                
194 Temperature_Celsius     0x0002   162   162   000    Old_age   Always       -       37 (Min/Max 22/52
)                                                                                                       
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0                
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0                
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0                
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0                
                                                                                                        
SMART Error Log Version: 1                                                                              
No Errors Logged                                                                                        
                              
SMART Self-test log structure revision number 1                                                         
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error         
# 1  Extended offline    Completed without error       00%     24307         -                          
# 2  Extended offline    Completed without error       00%     23563         -                          
# 3  Extended offline    Completed without error       00%     22820         -                          
# 4  Extended offline    Completed without error       00%     22098         -                          
# 5  Extended offline    Completed without error       00%     21354         -                          
# 6  Extended offline    Completed without error       00%     20636         -                          
# 7  Extended offline    Completed without error       00%     19894         -                          
# 8  Extended offline    Completed without error       00%     19195         -                          
# 9  Extended offline    Completed without error       00%     18453         -                          
#10  Extended offline    Completed without error       00%     17707         -                          
#11  Extended offline    Completed without error       00%     16987         -                          
#12  Extended offline    Completed without error       00%     16244         -                          
#13  Extended offline    Completed without error       00%     15523         -                          
#14  Extended offline    Completed without error       00%     14780         -                          
#15  Extended offline    Completed without error       00%     14036         -                          
#16  Extended offline    Completed without error       00%     13319         -                          
#17  Extended offline    Completed without error       00%     12576         -                          
#18  Extended offline    Completed without error       00%     11852         -                          
#19  Extended offline    Completed without error       00%     11113         -                          
#20  Extended offline    Completed without error       00%     10441         -                          
#21  Extended offline    Completed without error       00%      9698         -                          
                                                                                                        
SMART Selective self-test log data structure revision number 1                                          
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS                                                            
    1        0        0  Not_testing                                                                    
    2        0        0  Not_testing                                                                    
    3        0        0  Not_testing                                                                    
    4        0        0  Not_testing                                                                    
    5        0        0  Not_testing                                                                    
Selective self-test flags (0x0):                                                                        
  After scanning selected spans, do NOT read-scan remainder of disk.                                    
If Selective self-test is pending on power-up, resume after 0 minute delay.                    
@arnauos arnauos added the bug Something isn't working label Sep 28, 2020
@AnalogJ
Copy link
Owner

AnalogJ commented Sep 28, 2020

can you paste the output of smartctl --scan -j here?

@arnauos
Copy link
Author

arnauos commented Sep 28, 2020

Sure:

root@a26318c5f903:/scrutiny# smartctl --scan -j                                                                          
{                                                                                                                        
  "json_format_version": [                                                                                               
    1,                                                                                                                   
    0                                                                                                                    
  ],                                                                                                                     
  "smartctl": {                                                                                                          
    "version": [                                                                                                         
      7,                                                                                                                 
      0                                                                                                                  
    ],                                                                                                                   
    "svn_revision": "4883",                                                                                              
    "platform_info": "x86_64-linux-3.10.105",                                                                            
    "build_info": "(local build)",                                                                                       
    "argv": [                                                                                                            
      "smartctl",                                                                                                        
      "--scan",                                                                                                          
      "-j"                                                                                                               
    ],                                                                                                                   
    "exit_status": 0                                                                                                     
  },                                                                                                                     
  "devices": [                                                                                                           
    {                                                                                                                    
      "name": "/dev/sda",                                                                                                
      "info_name": "/dev/sda",                                                                                           
      "type": "scsi",                                                                                                    
      "protocol": "SCSI"                                                                                                 
    },                                                                                                                   
    {                                                                                                                    
      "name": "/dev/sdb",                                                                                                
      "info_name": "/dev/sdb",                                                                                           
      "type": "scsi",                                                                                                    
      "protocol": "SCSI"                                                                                                 
    }                                                                                                                    
  ]                                                                                                                      
} 

@AnalogJ
Copy link
Owner

AnalogJ commented Sep 29, 2020

Ok, so my problem is that Scrutiny offloads its device detection to smartctl, mostly so I don't have to deal with it. It seems like that's not returning the correct results for your system, so I'm going to need to add a way to override the device type on a device by device basis.

I have been thinking about that a bit, as you can see in the example.scrutiny.yaml but it's still just an idea.
Give me a couple of days to see what I can do here.

@AnalogJ AnalogJ changed the title [BUG] Synology support [BUG] Cannot override incorrect device type from smartctl --scan, breaks Synology Oct 4, 2020
@AnalogJ
Copy link
Owner

AnalogJ commented Oct 8, 2020

Hey everyone,

I just released a beta version of the Scrutiny docker image with support for overriding the collector device detection.

The instructions for how to create the collector config file, and the new docker image tag are available in the PR description:

#88

All feedback (success & failure) is appreciated :)

@arnauos
Copy link
Author

arnauos commented Oct 8, 2020

It works perfect now! 👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants