Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce CPU load #21

Open
mihaigalos opened this issue Jun 10, 2022 · 26 comments
Open

Reduce CPU load #21

mihaigalos opened this issue Jun 10, 2022 · 26 comments

Comments

@mihaigalos
Copy link

mihaigalos commented Jun 10, 2022

Hi,
Awesome stuff here!

Did you consider an option for reducing the CPU load?
I experimented a bit with removing the rng calls and noticed performance improvements.
How about creating a vector of possible values (everything which is currently random) and looping through it?

The user doesn't really care if a parameter is really random or not, they won't notice. They do care about CPU consumption, however. 😅

Here's what I've tried, let me know if you can reproduce:

diff --git a/src/update.rs b/src/update.rs
index 45e68eb..e39b3e7 100644
--- a/src/update.rs
+++ b/src/update.rs
@@ -18,15 +18,14 @@ pub fn reset<F>(create_color: F, rain: &mut Rain, us: &UserSettings)
 where
     F: Fn(style::Color, style::Color, u8) -> Vec<style::Color>,
 {
-    let mut rng = thread_rng();
     let h16 = rain.height;
     let hsize = rain.height as usize;
     let now = Instant::now();
     for i in rain.queue.iter() {
         if rain.locations[*i] > hsize + rain.length[*i] {
-            rain.charaters[*i] = gen::create_drop_chars(h16, &us.group);
+            rain.charaters[*i] = vec!['a', 'b', 'c'];
             rain.locations[*i] = 0;
-            rain.length[*i] = rng.gen_range(4..hsize - 10);
+            rain.length[*i] = 10;
             rain.colors[*i] = create_color(
                 us.rain_color.into(),
                 us.head_color.into(),
@@ -34,7 +33,7 @@ where
             );
             rain.time[*i] = (
                 now,
-                Duration::from_millis(rng.gen_range(us.speed.0..us.speed.1)),
+                Duration::from_millis(10),
             );
         }
     }
@cowboy8625
Copy link
Owner

Thanks for the interest in the project!

I have not done any CPU usage test in a while but last it showed to be take up 1-5% on a 200 character wide terminal.

I have seem to miss placed the document on what I was using to test with due to that was over a year ago and a few machines.
How are you testing this?

Regardless I do see your point and this would probably nice change.

@mihaigalos
Copy link
Author

mihaigalos commented Jun 10, 2022

How are you testing this?

Very flat cargo run --, not sure how relevant it is. I see a 6-9% CPU usage on my old i5 (details below).
Maybe I'm biased, I find that a bit too much for something printing chars to the terminal.

CPU stats - click to expand!
$ lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   39 bits physical, 48 bits virtual
CPU(s):                          4
On-line CPU(s) list:             0-3
Thread(s) per core:              2
Core(s) per socket:              2
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           60
Model name:                      Intel(R) Core(TM) i5-4570T CPU @ 2.90GHz
Stepping:                        3
CPU MHz:                         1200.000
CPU max MHz:                     2900.0000
CPU min MHz:                     800.0000
BogoMIPS:                        5786.64
Virtualization:                  VT-x
L1d cache:                       64 KiB
L1i cache:                       64 KiB
L2 cache:                        512 KiB
L3 cache:                        4 MiB
NUMA node0 CPU(s):               0-3
Vulnerability Itlb multihit:     KVM: Mitigation: VMX disabled
Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
Vulnerability Srbds:             Mitigation; Microcode
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs 
                                 bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt ts
                                 c_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1
                                  avx2 smep bmi2 erms invpcid xsaveopt dtherm arat pln pts md_clear flush_l1d

@mihaigalos
Copy link
Author

I was curious so I profiled the release binary. I don't see anything very strange, certainly not related to rng:
image
callgrind_traces.tar.gz

@mihaigalos
Copy link
Author

I see you were already on the right track here, draw is taking 93% of all CPU cycles:
image

@cowboy8625
Copy link
Owner

yeah I have been meaning to work on the IO part more.

@cowboy8625
Copy link
Owner

I would have to do more research to be sure but if I remember correctly when queueing up a write call for the terminal if the queue exceeds a certain size flush is called premature. I believe there is a way to extend the buffer size but I am not sure this would help even if I knew how to do that.

@cowboy8625
Copy link
Owner

I see you were already on the right track here, draw is taking 93% of all CPU cycles:

I used flame to profile last time but ultimately came to the same conclusion.

@mihaigalos
Copy link
Author

I would have to do more research to be sure but if I remember correctly when queueing up a write call for the terminal if the queue exceeds a certain size flush is called premature.

Yes:

The terminal will flush automatically if the buffer is full.

@mihaigalos
Copy link
Author

Potentially relevant: this.

@mihaigalos
Copy link
Author

Would you consider using a different crate for outputting to the screen?
termatrix which is using termion is nowhere near as cool as your take, but is <2% of CPU (4-5x improvement!).

@cowboy8625
Copy link
Owner

cowboy8625 commented Jun 10, 2022

I don't believe termion is cross platform but even if it is I do not want to swap out crates there are other ways of getting performance out of crossterm.

Rather than printing each character which can be a lot if your using the shading flag. Printing each row or Printing the hole screen at once could improve performance.

@cowboy8625
Copy link
Owner

I will work on reducing RNG calls as well even though its probably not a massive contributor to the performance issue but ever little bit helps.

@mihaigalos
Copy link
Author

Rather than printing each character which can be a lot if your using the shading flag. Printing each row or Printing the hole screen at once could improve performance.

That would be great. I noticed the X server is also very busy redrawing the screen (>50% CPU), so perhaps redrawing everything at once would reduce events to it as well!
image

@cowboy8625
Copy link
Owner

I cant get my CPU up that high. How big is your screen?

image

@cowboy8625
Copy link
Owner

Making the screen pretty large does spike up the cpu to 17%.
image

@mihaigalos
Copy link
Author

I cant get my CPU up that high. How big is your screen?

A very generic 1920x1080 with the following terminal settings:

~ » tput cols
213
-------------------
~ » tput lines
57

I guess the behavior is more pronounced in my case because the CPU is older.
But I actually noticed it on a Linux VM on Windows, which is even more pronounced.

@cowboy8625
Copy link
Owner

But I actually noticed it on a Linux VM on Windows, which is even more pronounced.

Yeah windows and mac native terminals do not do well esc codes, but using the alacritty terminal helps a lot with this.

I guess the behavior is more pronounced in my case because the CPU is older.

Yeah that make sense, I'm sure this will help. Some time today I should be done with a basic implementation of it but not all flags will work. It will give a good indication if printing the hole screen helps. (If it doesn't I will be shocked)

@cowboy8625
Copy link
Owner

Sorry for the delay. Work has taken over this week. I will have something up on Saturday or soon if I get the time.

@cowboy8625
Copy link
Owner

cowboy8625 commented Jun 18, 2022

So I have been working on the new draw improvements. This has turned out to be a usual a bit more complicated then I originally thought. Formatting the screen for characters that wider then a single space can throw a wrench in things. LOL
I have ideas on fixing it but just thought Id post when I have so far.

One odd bug I manage to make was when adding color to the rain I get a weird solid character row flashing at the top. Just uncomment out the Some here if you want to take a look.

Just keep in mind this is highly unfinished work.

@mihaigalos
Copy link
Author

Looks great. Cannot reproduce the bug, can you perhaps try in a docker? Let me know if you need help.
Here's what I see (-s doesn't work yet):

cargo run -- -c jap
image

@cowboy8625
Copy link
Owner

Yeah pretty much no flags work yet.

@mihaigalos
Copy link
Author

Hi @cowboy8625, anything I can do to help here?

@cowboy8625
Copy link
Owner

No sorry man its been a crazy couple months. Just got married and other life things have been taking all my free time. Ill work on it this weekend for sure.

@mihaigalos
Copy link
Author

Wow, congratulations! I wish you all the best. 👍

@cowboy8625
Copy link
Owner

Thanks!

cowboy8625 added a commit that referenced this issue Aug 2, 2022
@cowboy8625
Copy link
Owner

Body and head colors work now. Some small bugs still to address but slowly getting there when I have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants