Fix/ub-600 : sg_inq is skipped if device is faulty #250

olgashtivelman · 2018-10-15T13:13:26Z

in case a device is faulty we identify the situation and skip sg_inq + sontinue to to do the rest of the umount flow as needed.
this is done to prevent a situation where if a device has been faulty for a while and we try to umount it (~6 minutes) the sg_inq command usually gets stuck.
this should work for both ubuntu + red hat (the multipath output is a bit differente between them)

This change is

coveralls · 2018-10-16T21:32:23Z

Coverage increased (+1.3%) to 57.742% when pulling 9b2f077 on fix/UB-600_remove_faulty_device_with_no_sginq into 1c0104b on dev.

shay-berman

Reviewable status: 0 of 7 files reviewed, 12 unresolved discussions (waiting on @olgashtivelman and @shay-berman)

remote/mounter/block_device_utils/block_device_utils_test.go, line 436 at r2 (raw file):

		It("should return error if device is faulty", func() {
			dev := "mpathhe"
			mpath := `mpathhe (36001738cfc9035eb0000000000cea5f6) dm-3 IBM     ,2810XIV

please add more UTs for additional output options like:

with ubuntu output as well
with different output for "inactive faulty offline"
with all physical devices like this "33:0:0:1 sdb 8:16 ## ## ##"
Note: i am not sure you have the right indintation of multipathing output (e.g: the physical devices line should goes one indintation to the policy line I think. I am not saying its a problem but just validate that unit testing using the current output)

remote/mounter/block_device_utils/block_device_utils_test.go, line 446 at r2 (raw file):

			Expect(fakeExec.ExecuteWithTimeoutCallCount()).To(Equal(0))
		})
		It("should not return an error if check for faulty device failed", func() {

i am not sure how the test description related to the test it self.
it looks like the test check that Getwwn function return error.
so please explain or fix the test description

remote/mounter/block_device_utils/block_device_utils_test.go, line 454 at r2 (raw file):

							- 34:0:0:1 sdc 8:32 fault faulty running`
			_, err := bdUtils.GetWwnByScsiInq(mpath, dev)
			Expect(err).To(HaveOccurred())

what about adding more unit testing that the multipath output contain few faulty but at least 1 in running state - lets make sure it works ok after your fix.

remote/mounter/block_device_utils/errors.go, line 89 at r2 (raw file):

}

func (e *DeviceNotFoundError) Error() string {

i think there is VolumeNotFoundError as well in this file. i wonder if its different and how?
if u really need additional error that talk about device not found in multipath, maybe ass the word Multipath tot eh error struct.

remote/mounter/block_device_utils/mpath.go, line 105 at r2 (raw file):

				return mpath, b.logger.ErrorRet(err, "failed")
			default:
				b.logger.Error("Failed to run multipath command while executing sg_inq.", logs.Args{{"err", err}})

maybe if u can please fix this text. "Failed to run sg_inq on multipath device " + mpath

remote/mounter/block_device_utils/mpath.go, line 158 at r2 (raw file):

}

func (b *blockDeviceUtils) GetWwnByScsiInq(mpathOutput string, dev string) (string, error) {

Olga, a suggestion
its not really mandatory to pass the mpathoputput to this function, instead you can had a dedicated function that check if device is faulty(which u already have) and then just call it before calling GetWwnByScsiInq.
up to you if you want to change it

remote/mounter/block_device_utils/mpath.go, line 201 at r2 (raw file):

		// we should not get here since we get the device from the multipath output so there is not reason for it to be missing
		// but in case something weird occurs we need to continue to not hurt the current flow.
		b.logger.Warning("an error occured while trying to check if device is faulty.", logs.Args{{"err", err}, {"device", dev}})

so what does it mean?
that if its happened, then isFaulty == true? does it make sense? can we just return error here - and since we idempotent then it will retry it again. what do you think?

remote/mounter/block_device_utils/mpath.go, line 204 at r2 (raw file):

	}
	if isFaulty {
		return "", b.logger.ErrorRet(&FaultyDeviceError{dev}, "Device is faulty. sg_inq will not run on a faulty device.")

text fix : just change the text a bit - ....("The device is faulty. Therefore skipping sg_inq check on this device " + dev)

remote/mounter/block_device_utils/utils.go, line 16 at r2 (raw file):

 * limitations under the License.
 */
package block_device_utils

we should carefull about adding utils.go file, because it just a generic name.
Instead call it multipath_utils.go
you can also consider to add it inside the mpath.go (i know its not part of the interface, but its related to multipath. anyway if not inside mpath.go then rename it to multipath_utils.go)

remote/mounter/block_device_utils/utils.go, line 26 at r2 (raw file):

)

func checkIsFaulty(mpath string, logger logs.Logger) bool {

if its the multipath output then please rename the param to multipathOutput so it will be self explanatory

about the regex "(\d)+:(\d)+:(\d)+:(\d)+.[failed|active]+."
ok i guess the \d\d\d\d is to catch HCTL (which is the [host|connectivity|target|lun] of the scsi device). But why double \? is it ok?

please if u can add a comment with the multiapth output so it will be easy to understand what it looking for.
and add a command that your regex catch all the physical lines of the multipath device by using HCTL as ancore to catch the device. so it will be clearer.

remote/mounter/block_device_utils/utils.go, line 28 at r2 (raw file):

func checkIsFaulty(mpath string, logger logs.Logger) bool {
	re := regexp.MustCompile("(\\d)+:(\\d)+:(\\d)+:(\\d)+.*[failed|active]+.*")
	paths := re.FindAllString(mpath, -1)

BTW is this works only on a specific multipath device output? just double checking.

remote/mounter/block_device_utils/utils.go, line 35 at r2 (raw file):

	logger.Debug(fmt.Sprintf("Device paths are : [%s]", paths))
	isFaulty := true
	activeRe := regexp.MustCompile("active.*ready.*running.*")

you can change .* to .+ because for sure there is a char between them.
also add a comment that say - that this regex is to find a real and working physical devices of a multipath device. and "active ready running" indicate about it

remote/mounter/block_device_utils/utils.go, line 37 at r2 (raw file):

	activeRe := regexp.MustCompile("active.*ready.*running.*")
	for _, path := range paths {
		logger.Debug(fmt.Sprintf("path : [%s] contains faulty? %t", path, strings.Contains(path, "faulty")))

maybe you can change it to warning - since its more then debug level.

shay-berman

Reviewable status: 0 of 7 files reviewed, 17 unresolved discussions (waiting on @olgashtivelman)

remote/mounter/block_device_utils/utils_test.go, line 50 at r2 (raw file):

			Expect(isFaulty).To(Equal(true))
		})
		It("returns true on a device with no paths", func() {

please add another UT with stange output like
|- 33:0:0:1 sdb 8:16 ## ## ##
- 34:0:0:1 sdc 8:32 ## ## ##`

remote/mounter/block_device_utils/utils_test.go, line 78 at r2 (raw file):

			})
			It("return error for one device which does not exists", func() {
				mapth := `mpathb (36001738cfc9035eb0000000000d0ec0e) dm-3 IBM     ,2810XIV

please add to the output another device (so u will have 2 devices in the output) like the above one.
so it will cover more.

remote/mounter/block_device_utils/utils_test.go, line 126 at r2 (raw file):

					Expect(result).To(Equal(output))
				})
				It("return right mpath for second device in multiple device scenario", func() {

what do you mean second device? clarify the test case here please.

remote/mounter/block_device_utils/utils_test.go, line 134 at r2 (raw file):

					device := "mpathc"
					result, err := findDeviceMpathOutput(mpathOutput, device, logs.GetLogger())
					fmt.Println(output)

is it correct to write directly to stdout? not sure

remote/mounter/block_device_utils/utils_test.go, line 156 at r2 (raw file):

					Expect(result).To(Equal(output))
				})
				It("return error fif device does not exist", func() {

fif

remote/mounter/block_device_utils/utils_test.go, line 310 at r2 (raw file):

})

func TestBlockDeviceUtilsUtils(t *testing.T) {

VERY well done
great unit testing file!

olgashtivelman

Reviewable status: 0 of 7 files reviewed, 17 unresolved discussions (waiting on @shay-berman and @olgashtivelman)

remote/mounter/block_device_utils/block_device_utils_test.go, line 436 at r2 (raw file):