Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use table to accelerate bsr calculate #34

Open
zdyj3170101136 opened this issue May 19, 2021 · 12 comments
Open

use table to accelerate bsr calculate #34

zdyj3170101136 opened this issue May 19, 2021 · 12 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@zdyj3170101136
Copy link

在计算bsr的时候,某些操作系统上使用一种for循环的方式计算,时间复杂度是O(n),n是这个数第从小到大最大的为1的位。
经过golang sdk库math/bits/bits.go/Len函数的启发,我想出了一种算法,具有O(1)的复杂度, 并且计算时间稳定。
并且针对64位操作系统和32位操作系统有不同的优化。
几乎减少了百分之九十的计算时间。

@zdyj3170101136
Copy link
Author

性能测试的代码

// Exported (global) variable serving as input for some
// of the benchmarks to ensure side-effect free calls
// are not optimized away.
var Input uint64 = 285870213051353865

// Exported (global) variable to store function results
// during benchmarking to ensure side-effect free calls
// are not optimized away.
var Output int

func BenchmarkLeadingZeros(b *testing.B) {
   var s int
   for i := 0; i < b.N; i++ {
      s += bsr2(uint(Input) >> (uint(i) % bits.UintSize))
   }
   Output = s
}

@zdyj3170101136
Copy link
Author

用于对比的代码

func bsr(x uint) int {
	r := 0
	for x != 0 {
		x = x >> 1
		r += 1
	}
	return r - 1
}

@zdyj3170101136
Copy link
Author

我写的代码


func bsr2(x uint) int {
	if bits.UintSize == 32 {
		return Len32(uint32(x))
	}
	return Len64(uint64(x))
}

// Len32 returns the minimum number of bits required to represent x; the result is 0 for x == 0.
func Len32(x uint32) (n int) {
	if x >= 1<<16 {
		x >>= 16
		n = 16
	}
	if x >= 1<<8 {
		x >>= 8
		n += 8
	}
	return n + int(bsr8tab[x])
}

// Len64 returns the minimum number of bits required to represent x; the result is 0 for x == 0.
func Len64(x uint64) (n int) {
	if x >= 1<<32 {
		x >>= 32
		n = 32
	}
	if x >= 1<<16 {
		x >>= 16
		n += 16
	}
	if x >= 1<<8 {
		x >>= 8
		n += 8
	}
	return n + int(bsr8tab[x])
}

var bsr8tab = [256]uint8{
	0x00, 0x00, 0x01, 0x01, 0x02, 0x02, 0x02, 0x02, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03,
	0x03, 0x03, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04,
	0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
	0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
	0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
	0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
	0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
	0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
	0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
	0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
	0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
	0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
	0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
	0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
	0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
	0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
}

@zdyj3170101136
Copy link
Author

性能测试结果对比

goos: darwin
goarch: amd64
pkg: code.byted.org/middleware/mcache
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkLeadingZeros
BenchmarkLeadingZeros-12    	610674840	         1.927 ns/op
goos: darwin
goarch: amd64
pkg: code.byted.org/middleware/mcache
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkLeadingZeros
BenchmarkLeadingZeros-12    	98333178	        12.16 ns/op

@zdyj3170101136
Copy link
Author

证明结果无误的例子

func TestBsr(t *testing.T) {
	assert.Equal(t, bsr(4), bsr2(4), 2)
	assert.Equal(t, bsr(24), bsr2(24), 4)
	assert.Equal(t, bsr((1<<10)-1), bsr2((1<<10)-1), 9)
	assert.Equal(t, bsr((1<<30)+(1<<19)+(1<<16)+(1<<1)), bsr2((1<<30)+(1<<19)+(1<<16)+(1<<1)), 30)
}

/usr/local/go/bin/go tool test2json -t /private/var/folders/41/3lxydnv150369kfnhqjbr2j40000gn/T/___TestBsr_in_code_byted_org_middleware_mcache.test -test.v -test.paniconexit0 -test.run ^\QTestBsr\E$
=== RUN TestBsr
--- PASS: TestBsr (0.00s)
PASS

@PureWhiteWu PureWhiteWu added enhancement New feature or request good first issue Good for newcomers labels May 19, 2021
@PureWhiteWu
Copy link
Collaborator

要不提个 pr,顺便,Len32 和 Len64 我觉得可以不用暴露出去,改为小写?

@jxskiss
Copy link

jxskiss commented Jun 6, 2021

Len32 和 Len64 合并成单个函数是不是就可以呀,
if bits.UintSize == 32 是常量比较,我理解编译器可以优化掉,所以单个函数也不会增加额外的计算成本吧。

@jxskiss
Copy link

jxskiss commented Jun 7, 2021

个人看法,bits.Len32 和 bits.Len64 这些标准库的函数没必要复制一份?
bits.Len64(x) - 1 跟上面的代码是等价的吧。

@zdyj3170101136
Copy link
Author

个人看法,bits.Len32 和 bits.Len64 这些标准库的函数没必要复制一份?
bits.Len64(x) - 1 跟上面的代码是等价的吧。

是的。
我是写完之后才意识到这一点。

@luchsh
Copy link
Collaborator

luchsh commented Jun 24, 2021

并且针对64位操作系统和32位操作系统有不同的优化。

能贴一下你发现的是哪个库会用到吗?

@zdyj3170101136
Copy link
Author

zdyj3170101136 commented Jun 30, 2021 via email

@zdyj3170101136
Copy link
Author

zdyj3170101136 commented Jul 1, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

5 participants