-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Message generation performance #344
Comments
A good start might be the caching of function results: https://docs.python.org/3/library/functools.html#functools.lru_cache. |
Unfortunatly, |
We can't simply annotate all our functions and properties with |
I think a good first approach would be to replace all unnecessary inline calls by variables. This does't require complex state management as it is done at method scope but it might reduce the load significantly. I tested this in a single example and even with #!/usr/bin/env -S python3 -O
def count() -> int:
x = 0
for i in range(1, 1000):
for j in range(i, i * 10):
x += i + j
return x
def compare_twice() -> bool:
return count() > 10 and count() < 100
def compare_once() -> bool:
cnt = count()
return cnt > 10 and cnt < 100
if __name__ == "__main__":
print(compare_once())
print(compare_twice())
As one can see, |
I'm not aware of any tool which tries to do that. I suppose it is not that easy to detect valid cases reliably, as you would have to check if a called function has any side effect. |
I thought this should be already the case.
I suppose that shouldn't be that hard. Or am I wrong? |
Have we ever through about PyPy? Do you have opinions on that? It only supports Python 3.6.9, though. |
I have no experience with PyPy. The performance improvement looks nice. I don't know how hard it would be for us to keep compatibility to PyPy. It seems the issues caused by PyPy could be quite annoying. So I'm not sure if it would be worth the effort, but it might be worth a try. |
This reads like PyPy shouldn't be our first choice, but we may consider it later should we run out of ideas. |
I think even if PyPy would give us the advertised 4-5 times better performance this isn't enough. Also I've taken a deeper look into our code. The main problem isn't that we're doing overly heavy work at some point but that we've implemented many properties to be calculated dynamically and that we've taken the existence of these properties for granted. This leads to two problems: we can't really transfer those properties to state because we never paid attention to where we change them because they're recalculated anyway. And we're using them everywhere because we already have them. Especially the last point is a problem because they're connected to each other so calling one property might actually call 3 or 4 more. Also we used them for checks. E.g. we check if a property has a certain value, and then use it. Since checking and using are done in two different functions the value is actually calculated twice. |
I also think so.
In what way would the architecture need to be changed to alleviate this problem? Do you have concrete actions in mind? |
The best example for this problem are the if self._has_length(field):
length = self._get_length(field) While this looks unproblematic at first it calculates the unchecked value twice. The calculation itself is not cheap either, if it is a non-scalar value it might iterate through the whole packet first. |
That makes sense. Thanks for the explanation. |
2521b25 implements #344 (comment) and prevents the recalculation of already known accessible fields. Comparison: Baseline:
Optimized
Optimized
|
Would there be any significant drawback in precalculating (presetting length etc, calculating accessible fields) only the next field? This would require setting all fields in order and changing a single field would require setting all following fields again (assuming the change would not change the path). |
Would this be compatible to the current checksum functionality? In ICMP the checksum can only be calculated when also the fields behind the checksum field are set. So at least internally a field must be settable without invalidating other fields. The idea of the current design was that it should be possible to do minor modifications to a message without the need the recreate the whole message (e.g. just changing address fields). I don't know how often such a situation will occur, but they would be more complicated to handle. |
In this case the performance highly depends on the use case. I'm not really sure if we can do anything about it. Setting a single field and validating the rest of the message should be much faster than setting all fields (assuming the message stays valid). We could also introduce a flag that can be set not to prepare the whole message. The default would be the current behavior but by setting this flag in the |
What would happen if I try to set another (maybe valid, but not precalculated field)? This optimization sounds sounds sensible for tools like the Fuzzer, which generates valid messages with the fields in the right order. OTOH, what's the use-case of the old interface then? IIRC, we originally designed it for the Fuzzer to explore possible paths in the message. It's not used in that way, though, as the Fuzzer uses it's own model. |
Only the next accessible field is calculated so setting any other field would cause an error.
Well one could create a complete message with the new interface and then change a single field in the message with the old interface without loosing the complete message. This would be the most efficient use of both interfaces if one wanted to change only the address in e.g. IPv4. |
I see, that makes sense. |
If it is solved and there are no plans to further work on this issue in the near future, it should be closed. |
PyRFLX has performance problems when generating messages. See attached profiling of the generation of 100 icmp messages.
rf_1.zip
The text was updated successfully, but these errors were encountered: