Skip to content

hal_parport: Do not write outputs, if not required#2513

Merged
rene-dev merged 1 commit intoLinuxCNC:masterfrom
mbuesch:hal-parport-writeopt
Jun 18, 2023
Merged

hal_parport: Do not write outputs, if not required#2513
rene-dev merged 1 commit intoLinuxCNC:masterfrom
mbuesch:hal-parport-writeopt

Conversation

@mbuesch
Copy link
Copy Markdown
Contributor

@mbuesch mbuesch commented Jun 2, 2023

This optimizes hal_parport in that outputs are only written, if the output data actually changed.

The idea behind this is that the outb is a pretty expensive operation. It takes in the order of microseconds to execute. In contrast, the check if data has changed takes basically no time.

This saves a lot of runtime in case outputs on a port are written rarely (e.g. initialization only) or never. Either of outdata_ctrl or outdata may be unused. Especially in case of multiple parports chances are that not all outdata/ctrl "channels" are used. In that case these outbs only have to be performed once to give the port a known state.

That enables the use of older machines with fast(er) base thread periods.

@mbuesch mbuesch force-pushed the hal-parport-writeopt branch from 3a03863 to 067f3f8 Compare June 5, 2023 15:56
hal_u32_t debug1, debug2;
long long write_time;
unsigned char outdata;
unsigned short outdata;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about fixed size types like uint8_t here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would be fine with uint16_t. But the whole file uses variable size types, so I didn't want to deviate.
I changed it to be bigger than 8 bits to detect the first-write case.

@hansu
Copy link
Copy Markdown
Member

hansu commented Jun 5, 2023

This sounds really good. Do you have some numbers how much CPU load is saved with this?

@mbuesch
Copy link
Copy Markdown
Contributor Author

mbuesch commented Jun 6, 2023

Yes, I had done measurements with halscope, but I can't currently find the results. :(
I will redo the measurements once I get my machine working again (It's currently broken due to other reasons). That will take me a week or so.

The speedup was significant. The time it takes to access an I/O port is in the order of a couple of microseconds. That can quickly add up (depending on the setup and especially for small thread periods). I currently have a core load of less than 80 %, whereas I had a core load of >95 % (not safe to operate) before.

But I'll get you better numbers...

@hansu
Copy link
Copy Markdown
Member

hansu commented Jun 6, 2023

Yes, I had done measurements with halscope, but I can't currently find the results. :(

With halscope? I think measuring just the CPU load (with htop or similar) would be sufficient.

@mbuesch
Copy link
Copy Markdown
Contributor Author

mbuesch commented Jun 6, 2023

I don't think that is sufficient, because what matters are the worst case execution times for single task cycles.
A single task cycle filling the time slice by over 100 % will still break the system, even if the average load is much less.

This change does really only help, if at least one of the I/O ports is either never written or maybe only written once during init or at times where we can tolerate a task deadline miss.
But if we have an average load of say 90% and every 1 s we have one cycle with 110% that means it will probably not work. But that won't be visible in htop.

That said, this change does basically not increase the worst case latency. The time for the additional compare only takes nanoseconds and can be ignored. (If you were that close to deadline misses, the system was not really stable anyway).
But it can decrease the worst case latency under the above conditions. Therefore, it will either do good or nothing. depending on your setup.

@mbuesch
Copy link
Copy Markdown
Contributor Author

mbuesch commented Jun 16, 2023

Ok, so I repaired my machine. The numbers (prev = without this change, opt = with this change) are:

average RT CPU load (htop):
prev: ~86%
opt: ~73%

prev parport write runtime floor: 7.5 µs
prev_01

prev parport write runtime maximum: 23.4 µs
prev_02

opt parport write runtime floor: 2.5 µs
opt_01

opt parport write runtime maximum: 15.7 µs
opt_02

As you can see, in my setup all runtimes go significantly down.

@rene-dev
Copy link
Copy Markdown
Member

the linuxcnc norway meeting approves this PR.

@rene-dev rene-dev merged commit b884f78 into LinuxCNC:master Jun 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants