Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine drive formatting #7

Closed
ctengel opened this issue May 28, 2022 · 13 comments
Closed

Determine drive formatting #7

ctengel opened this issue May 28, 2022 · 13 comments
Assignees

Comments

@ctengel
Copy link
Owner

ctengel commented May 28, 2022

  • XFS
  • ext4
  • btrfs

fix SMR/4k issues

  • https://www.seagate.com/internal-hard-drives/cmr-smr-list/
    
  • https://dropbox.tech/infrastructure/smr-what-we-learned-in-our-first-year
    
  • https://en.wikipedia.org/wiki/Shingled_magnetic_recording
    
  • https://en.wikipedia.org/wiki/Advanced_Format
    

split from #2

@ctengel ctengel mentioned this issue May 28, 2022
5 tasks
@ctengel ctengel self-assigned this May 28, 2022
@ctengel
Copy link
Owner Author

ctengel commented May 30, 2022

  • f2fs ?

@ctengel
Copy link
Owner Author

ctengel commented May 30, 2022

Starting with ext4 since raspbian supports it natively best and maybe safer on cheap disk

@ctengel
Copy link
Owner Author

ctengel commented Jun 6, 2022

Reopen - we are having issues even under minimal load it seems

https://unix.stackexchange.com/questions/541463/how-to-prevent-disk-i-o-timeouts-which-cause-disks-to-disconnect-and-data-corrup

[569725.161362] INFO: task kcompactd0:43 blocked for more than 120 seconds.
[569725.161404]       Tainted: G         C        5.15.32-v8+ #1538
[569725.161425] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[569725.161437] task:kcompactd0      state:D stack:    0 pid:   43 ppid:     2 flags:0x00000008
[569725.162117] INFO: task minio:2023 blocked for more than 120 seconds.
[569725.162131]       Tainted: G         C        5.15.32-v8+ #1538
[569725.162142] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[569725.162152] task:minio           state:D stack:    0 pid: 2023 ppid:  2007 flags:0x00000000
[569845.994043] INFO: task minio:2023 blocked for more than 241 seconds.
[569845.994053]       Tainted: G         C        5.15.32-v8+ #1538
[569845.994061] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[569845.994068] task:minio           state:D stack:    0 pid: 2023 ppid:  2007 flags:0x00000000
[569942.319122] sd 0:0:0:0: [sda] tag#0 timing out command, waited 360s
[569942.319154] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=360s
[569942.319168] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x2 [current] 
[569942.319180] sd 0:0:0:0: [sda] tag#0 ASC=0x4 ASCQ=0x7 
[569942.319194] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00
[569942.319216] blk_update_request: I/O error, dev sda, sector 4882841784 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[569942.319304] Aborting journal on device sda1-8.
[569942.742001] EXT4-fs error (device sda1): ext4_journal_check_start:83: comm kworker/u8:2: Detected aborted journal
[569942.780277] EXT4-fs (sda1): Remounting filesystem read-only
[569942.780340] EXT4-fs (sda1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 57671710, error -30)
API: SYSTEM()
Time: 12:44:23 UTC 06/06/2022
DeploymentID: bf704ef6-ddf4-4f56-a424-6a825e57a021
Error: remove /mnt/obj1data/.minio.sys/tmp/0024d34a-d4df-4cfd-a7b8-45c125581015/7832be02-2c55-409b-88f9-b00138cc6c19: read-only file system (*fs.PathError)
       8: internal/logger/logger.go:278:logger.LogIf()
       7: cmd/fs-v1-helpers.go:50:cmd.fsRemoveFile()
       6: cmd/fs-v1.go:1125:cmd.(*FSObjects).putObject()
       5: cmd/fs-v1.go:1012:cmd.(*FSObjects).PutObject()
       4: cmd/data-usage-cache.go:940:cmd.(*dataUsageCache).save()
       3: cmd/fs-v1.go:322:cmd.(*FSObjects).NSScanner()
       2: cmd/data-scanner.go:221:cmd.runDataScanner()
       1: cmd/data-scanner.go:80:cmd.initDataScanner.func1()

Note no errors on minio log until it goes readonly. - so changing timeout may be ok

@ctengel ctengel reopened this Jun 6, 2022
@ctengel ctengel added this to the Basic DB with POSIX client milestone Jun 6, 2022
@ctengel ctengel pinned this issue Jun 6, 2022
@ctengel
Copy link
Owner Author

ctengel commented Jun 6, 2022

Related to active USB SSD?

@ctengel
Copy link
Owner Author

ctengel commented Jun 8, 2022

test fsck and minio restart?

look at SMART

@ctengel
Copy link
Owner Author

ctengel commented Jun 11, 2022

$ echo 1440 | sudo tee /sys/block/sda/device/timeout
$ echo 720 | sudo tee /sys/block/sda/device/eh_timeout

@ctengel
Copy link
Owner Author

ctengel commented Jun 11, 2022

Uh oh seems to still be a mess... and ctengel/linkmeddle#74 maybe made it worse...

@ctengel
Copy link
Owner Author

ctengel commented Jun 12, 2022

NOTE - not permanent yet

Splitting these seem to be the way to do it

@ctengel
Copy link
Owner Author

ctengel commented Jun 16, 2022

Power draw seems to be a constant issue!

$ sudo parted /dev/sdb
GNU Parted 3.4
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) help                                                             
  align-check TYPE N                       check partition N for TYPE(min|opt) alignment
  help [COMMAND]                           print general help, or help on COMMAND
  mklabel,mktable LABEL-TYPE               create a new disklabel (partition table)
  mkpart PART-TYPE [FS-TYPE] START END     make a partition
  name NUMBER NAME                         name partition NUMBER as NAME
  print [devices|free|list,all|NUMBER]     display the partition table, available devices, free space, all found partitions, or a particular partition
  quit                                     exit program
  rescue START END                         rescue a lost partition near START and END
  resizepart NUMBER END                    resize partition NUMBER
  rm NUMBER                                delete partition NUMBER
  select DEVICE                            choose the device to edit
  disk_set FLAG STATE                      change the FLAG on selected device
  disk_toggle [FLAG]                       toggle the state of FLAG on selected device
  set NUMBER FLAG STATE                    change the FLAG on partition NUMBER
  toggle [NUMBER [FLAG]]                   toggle the state of FLAG on partition NUMBER
  unit UNIT                                set the default unit to UNIT
  version                                  display the version number and copyright information of GNU Parted
(parted) mklabel                                                          
New disk label type? gpt
Warning: The existing disk label on /dev/sdb will be destroyed and all data on this disk will be lost. Do you want to continue?
Yes/No? y                                                                 
(parted) mkpart                                                           
Partition name?  []? 
File system type?  [ext2]? ext4                                           
Start? 1                                                                  
End? 100%                                                                 
(parted) print                                                            
Model: ...
Disk /dev/sdb: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name          Flags
 1      1049kB  2000GB  2000GB  ext4         ...

(parted) quit                                         

@ctengel
Copy link
Owner Author

ctengel commented Jun 22, 2022

smart is handy!

@ctengel
Copy link
Owner Author

ctengel commented Nov 6, 2022

More problems with minio....

Upon upgrading...

$ ./mc update

 You are running an older version of mc released 4 months ago
 Update: https://dl.min.io/client/mc/release/linux-arm64/archive/mc.RELEASE.2022-10-29T10-09-23Z


mc 22.94 MiB / 22.94 MiB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 100.00% 304.35 KiB/s 1m17smc updated to version RELEASE.2022-10-29T10-09-23Z successfully.
$ ./mc admin update xyz/
Server `xyz/` updated successfully from 2022-06-11T19:55:32Z to 2022-10-29T06-21-33Z
$ ./start.sh

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ You are running an older version of MinIO released 2 weeks ago ┃
┃ Update: Run `mc admin update`                                  ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

API: http://0.0.0.0:9000 
RootUser: minio 
RootPass: xyz 

Console: http://0.0.0.0:9001 
RootUser: minio 
RootPass: xyz 

Command-line: https://docs.min.io/docs/minio-client-quickstart-guide
   $ mc alias set myminio http://0.0.0.0:9000 minio xyz

Documentation: https://docs.min.io
Finished loading IAM sub-system (took 0.1s of 0.1s to load data).
Restarting on service signal
Automatically configured API requests per node based on available memory on the system: 16
API: http://0.0.0.0:9000 
RootUser: minio 
RootPass: xyz 

Console: http://0.0.0.0:9001 
RootUser: minio 
RootPass: xyz 

Command-line: https://docs.min.io/docs/minio-client-quickstart-guide
   $ mc alias set myminio http://0.0.0.0:9000 minio xyz

Documentation: https://docs.min.io
Finished loading IAM sub-system (took 0.1s of 0.4s to load data).
Restarting on service signal
ERROR Unable to use the drive /mnt/xyz: Drive /mnt/xyz: found backend type fs, expected xl or xl-single: Invalid arguments specified
$ ./start.sh
ERROR Unable to use the drive /mnt/xyz: Drive /mnt/xyz: found backend type fs, expected xl or xl-single: Invalid arguments specified

This seems to have something to do with minio/minio#14331 ... which at a glance is confusing because it has gateway in the title and single drive deeper embedded. minio/docs#624 may add more info

See also minio/minio#15967 - right now we don't have much data yet but some questions are raised

  • will our ext4 strategy work well or do we need to go to xfs to do new type?
  • is this adding complexity in the event of a data recovery scenario
  • this seems to be the second major issue where intuitive operation could cause major problems - 1st was go get/install not working, 2nd is here regular upgrade - is this sustainable?

The fix is likely to reformat and restart

In other news

  • need to document upgrade resolution
  • need to document use of SMART
  • need to document partitioning and formatting
  • need to document power issues with rpi
  • need to document linux tuning

@ctengel
Copy link
Owner Author

ctengel commented Nov 7, 2022

Also start.sh

#!/bin/bash
MINIO_ROOT_USER=minio MINIO_ROOT_PASSWORD=xyz exec /home/minio/minio server /mnt/xyz --address 0.0.0.0:9000 --console-address 0.0.0.0:9001

And stuff from linkmeddle#74

$ sudo mkfs.ext4 /dev/sda1
$ chown minio:minio /mnt/obj1data

Update system how?

@ctengel ctengel closed this as completed in 4506d53 Nov 7, 2022
@ctengel ctengel mentioned this issue Nov 7, 2022
3 tasks
@ctengel ctengel unpinned this issue Jun 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant