Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rtlamr seems single threaded #30

Closed
kevarh opened this issue May 2, 2015 · 2 comments
Closed

rtlamr seems single threaded #30

kevarh opened this issue May 2, 2015 · 2 comments

Comments

@kevarh
Copy link

kevarh commented May 2, 2015

I'm running rtlamr on a four core Odroid C1. It has just enough horse power to work correctly but I do get a fair number of ll+ messages from rtl_tcp. I get a single core pegged at 100% CPU utilization.

Is it possible to split decode into go routines to better make use of multiple cores?

$ uname -a
Linux odroid 3.10.75-84 #1 SMP PREEMPT Sat Apr 25 18:33:08 BRT 2015 armv7l armv7l armv7l GNU/Linux

$ go version
go version go1.4.2 linux/arm


$ go tool pprof `which rtlamr` rtlamr.prof 
Entering interactive mode (type "help" for commands)
(pprof) top20   
35.12s of 35.88s total (97.88%)
Dropped 70 nodes (cum <= 0.18s)
      flat  flat%   sum%        cum   cum%
     7.20s 20.07% 20.07%      7.20s 20.07%  github.com/bemasher/rtlamr/decode.Decoder.Filter
     6.78s 18.90% 38.96%      6.78s 18.90%  github.com/bemasher/rtlamr/decode.Decoder.Search
     6.78s 18.90% 57.86%      6.78s 18.90%  runtime.memmove
     5.40s 15.05% 72.91%      7.63s 21.27%  github.com/bemasher/rtlamr/decode.MagLUT.Execute
     3.65s 10.17% 83.08%      3.65s 10.17%  github.com/bemasher/rtlamr/decode.Decoder.Pack
     2.23s  6.22% 89.30%      2.23s  6.22%  math.Sqrt
     2.13s  5.94% 95.23%      2.13s  5.94%  github.com/bemasher/rtlamr/decode.Quantize
     0.78s  2.17% 97.41%      0.85s  2.37%  syscall.Syscall
     0.03s 0.084% 97.49%      7.66s 21.35%  github.com/bemasher/rtlamr/decode.(*MagLUT).Execute
     0.03s 0.084% 97.58%      0.24s  0.67%  github.com/bemasher/rtlamr/scm.(*Parser).Parse
     0.03s 0.084% 97.66%      1.02s  2.84%  net.(*conn).Read
     0.02s 0.056% 97.71%     34.20s 95.32%  github.com/bemasher/rtlamr/decode.Decoder.Decode
     0.02s 0.056% 97.77%     35.72s 99.55%  main.(*Receiver).Run
     0.02s 0.056% 97.83%      0.87s  2.42%  syscall.Read
     0.01s 0.028% 97.85%      0.21s  0.59%  github.com/bemasher/rtlamr/scm.Parser.Parse
     0.01s 0.028% 97.88%      0.85s  2.37%  syscall.read
@bemasher
Copy link
Owner

bemasher commented May 2, 2015

The main problem with building concurrency into rtlamr is that each stage depends on the result of the previous, the best we can do is pipe-lining. This doesn't necessarily help anyway because each block of data must be processed entirely by one section of code before it can be handed to the next.

Splitting the blocks into smaller pieces would increase granularity of execution and allow each section to spend more time processing and less time waiting but this will likely sacrifice throughput because most of the DSP related tasks are optimized to the point that main memory accesses are the bottleneck in most systems rather than the CPU.

I'll do some testing to see if this will help any on lower power systems, I'm betting the only place we can add concurrency would be splitting Pack and Search off from the rest of the execution.

@bemasher
Copy link
Owner

I've done some investigation into this and there isn't much opportunity for performance improvement by separating portions of the signal chain into goroutines. There will be some benefit to this when multiple protocols per invocation is supported but until then I'm closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants