rtlamr seems single threaded #30

kevarh · 2015-05-02T18:26:06Z

I'm running rtlamr on a four core Odroid C1. It has just enough horse power to work correctly but I do get a fair number of ll+ messages from rtl_tcp. I get a single core pegged at 100% CPU utilization.

Is it possible to split decode into go routines to better make use of multiple cores?

$ uname -a
Linux odroid 3.10.75-84 #1 SMP PREEMPT Sat Apr 25 18:33:08 BRT 2015 armv7l armv7l armv7l GNU/Linux

$ go version
go version go1.4.2 linux/arm


$ go tool pprof `which rtlamr` rtlamr.prof 
Entering interactive mode (type "help" for commands)
(pprof) top20   
35.12s of 35.88s total (97.88%)
Dropped 70 nodes (cum <= 0.18s)
      flat  flat%   sum%        cum   cum%
     7.20s 20.07% 20.07%      7.20s 20.07%  github.com/bemasher/rtlamr/decode.Decoder.Filter
     6.78s 18.90% 38.96%      6.78s 18.90%  github.com/bemasher/rtlamr/decode.Decoder.Search
     6.78s 18.90% 57.86%      6.78s 18.90%  runtime.memmove
     5.40s 15.05% 72.91%      7.63s 21.27%  github.com/bemasher/rtlamr/decode.MagLUT.Execute
     3.65s 10.17% 83.08%      3.65s 10.17%  github.com/bemasher/rtlamr/decode.Decoder.Pack
     2.23s  6.22% 89.30%      2.23s  6.22%  math.Sqrt
     2.13s  5.94% 95.23%      2.13s  5.94%  github.com/bemasher/rtlamr/decode.Quantize
     0.78s  2.17% 97.41%      0.85s  2.37%  syscall.Syscall
     0.03s 0.084% 97.49%      7.66s 21.35%  github.com/bemasher/rtlamr/decode.(*MagLUT).Execute
     0.03s 0.084% 97.58%      0.24s  0.67%  github.com/bemasher/rtlamr/scm.(*Parser).Parse
     0.03s 0.084% 97.66%      1.02s  2.84%  net.(*conn).Read
     0.02s 0.056% 97.71%     34.20s 95.32%  github.com/bemasher/rtlamr/decode.Decoder.Decode
     0.02s 0.056% 97.77%     35.72s 99.55%  main.(*Receiver).Run
     0.02s 0.056% 97.83%      0.87s  2.42%  syscall.Read
     0.01s 0.028% 97.85%      0.21s  0.59%  github.com/bemasher/rtlamr/scm.Parser.Parse
     0.01s 0.028% 97.88%      0.85s  2.37%  syscall.read

The text was updated successfully, but these errors were encountered:

bemasher · 2015-05-02T22:21:52Z

The main problem with building concurrency into rtlamr is that each stage depends on the result of the previous, the best we can do is pipe-lining. This doesn't necessarily help anyway because each block of data must be processed entirely by one section of code before it can be handed to the next.

Splitting the blocks into smaller pieces would increase granularity of execution and allow each section to spend more time processing and less time waiting but this will likely sacrifice throughput because most of the DSP related tasks are optimized to the point that main memory accesses are the bottleneck in most systems rather than the CPU.

I'll do some testing to see if this will help any on lower power systems, I'm betting the only place we can add concurrency would be splitting Pack and Search off from the rest of the execution.

bemasher · 2016-04-22T01:28:27Z

I've done some investigation into this and there isn't much opportunity for performance improvement by separating portions of the signal chain into goroutines. There will be some benefit to this when multiple protocols per invocation is supported but until then I'm closing this issue.

bemasher closed this as completed Apr 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rtlamr seems single threaded #30

rtlamr seems single threaded #30

kevarh commented May 2, 2015

bemasher commented May 2, 2015

bemasher commented Apr 22, 2016

rtlamr seems single threaded #30

rtlamr seems single threaded #30

Comments

kevarh commented May 2, 2015

bemasher commented May 2, 2015

bemasher commented Apr 22, 2016