-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OPTIMISATION] Optimisations for DEBUG informations (easily done, big performances) #2270
Comments
Thanks for the suggestions. I don't get the idea for A : the value of DEBUG[1] and DEBUG[2] should change before every call to I am more convinced by B. We should test how much space and execution time this would bring. For C and D :
|
Okay... for the optimization B, the gain in performances will be HUGE. Optimisation B seems 3.8x faster than current Brython implementation. ... BUT there is an even better way... that is x36 faster. $B.LID = 0x000000000007;
$B.IID = 0x000000000003; let R = $B.$getattr_pep657(locals___main__.a, 'replace')
$B.IID = 0x00020002000E; call(R)('e');
// instead of :
// $B.set_lineno(frame, 7);
// $B.$call($B.$getattr_pep657(locals___main__.a, 'replace', [0,0,3]), [2,2,14])('e') Note it is possible to offer 3 outputs mode :
We use an intermediate variable (R), but the browser is clever and optimize it (so it has no cost). $B.IID (Instruction ID) is global, and can be accessed from anywhere... This has several advantages :
$B.LID (Line ID) is global, and can be accessed from anywhere... This has several advantages :
And for a production output... Just don't include the |
The idea was to increment this number each time the transpiler needs to write one.
Ouch... Yeah it hurts... Possible solutions :
function LID_base(frame, lid) {
$B.LID = lid // if using the last solution. Else, for solution C, just do nothing.
}
function LID_trace(frame, lid) {
// do your stuff
}
$B.L = LID_base
// .....
$B.L(frame, 7)
do stuff
$B.L(frame, 8) Then, when |
Okay, this night I took a better look at the transpiled code from online Editor.
|
@PierreQuentel At the top of the issue, I will put the optimization you find interesting. So that'd be easier for you to browse in this issue. I found an easy way to get a VERY HUGE performance gain in the current system :
Indeed using an array is costly :
Using an uni-directionnal tree would instead be waaay quicker : let ptr = { previous: null, /* all your frame attribute here */ }
function enter_frame(ptr) {
// do your stuff
return {previous: ptr, /* all your frame attribute here */ }
}
function leave_frame(ptr) {
// do your stuff
return ptr.previous
}
// usage :
ptr = enter_frame(ptr);
// frame stuff
ptr = leave_frame(ptr);
// printing the stack :
let cur = ptr
while( cur !== null ) {
// do your stuff
cur = ptr.previous
} Printing the stack would be slower, but we don't care. |
… encode_position() in ast_to_js.js. It is decoded by functions that use them with function $B.decode_position. Related to issue #2270.
Salut Denis, With the commit above, encoding and decoding the format information is delegated to functions in
|
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Okay, let's do better :
Here what the code should looks like (not tested) : function encode_position(a, b, c, d = c){
// assuming a <= b <= c <= d (if we had more info about "d" we could do something with it) :
d -= c;
c -= b;
b -= a;
let nbbits = Math.log2(a | b | c | d);
let pos = 1 << nbbits; // our flag.
has_d = +(d !== 0)
has_b = +(c !== 0)
let indic = has_b + (has_d << 2)
// multiplication prevents condition ? - should be faster ?
pos |= d
pos <<= nbbits * has_d
pos |= b
pos <<= nbbits * has_b
pos <<= (nbbits+2)
pos |= c << (nbbits+2)
pos |= a << 2
pos |= indic
return "0x" + pos.toString(16); // condensed format
}
$B.decode_position = function(pos){
let has_b = pos & 0x1
let has_d = (pos & 0x2) >> 1
let nb_values = 2 + has_b + has_d
pos >>= 2
let nbbits = Math.log2( pos ) / nb_values //TODO verify value...
let mask = (1 << nbbits)-1
a = pos & mask
pos >>= nbbits
c = pos & mask
pos >>= nbbits
// multiplication prevents condition ? - should be faster ?
b = (pos & mask) * has_b
pos >>= nbbits * has_b
d = (pos & mask) * has_d
pos >>=nbbits * has_d
if( has_d !== true )
return [a,b,c]
return [a,b,c,d];
} |
About function f(){
try{
x
}catch(err){
err.message
}finally{
3 + 4
}
}
function g(){
try{
x
3 + 4
}catch(err){
err.message
3 + 4
}
}
g() The link you put on "it doesn't add any overcost." seems broken. |
About "2. Benchmarks for frame cost": we really can't do without frames management. It is not used only to handle exceptions: if you remove it, built-in functions like |
More generally: one of the design principles in Brython is that it should be as close to Python as possible; between performance and compliance with the language definition, I always favour compliance. This is why I am opposed to options at the user's choice to determine which parts of Brython should or should not work as expected by a Python developer. It's too difficult to explain (for us) and to remember (for users) which options should unplug which feature, and it would make debugging difficult if unplugging a feature has side-effects we had not thought of. |
About the replacement of position information: are you sure there will be such an increase in performance ? I tried 2 different versions, one with an array and one with an hex, there is no difference on Firefox or Chrome: N = 10000000
function ignore(){}
var t0 = Date.now()
for(var i=0; i<N; i++){
ignore([1,2,3])
}
console.log(Date.now() - t0)
var t0 = Date.now()
for(var i=0; i<N; i++){
ignore(0x999)
}
console.log(Date.now() - t0) Also tested on jsPerf, with even 0.3% improvement with arrays... |
Well, it seems |
Thanks for your answser.
Yes : https://jsperf.app/mitogo Ofc, the increase is only for this part of code, so you won't have an overall x6 speed increase.
Your function
Else, on "stylish"/"readability" considerations, if it doesn't cause performances reductions (ofc), wouldn't it be better to put the instruction ID as first parameter (but I guess putting it last is better as you can do $B.$call([2,2,14], $B.$getattr_pep657([0,0,3], locals___main__.a, 'replace') )('e')
// or
$B.$call(0x22E, $B.$getattr_pep657(0x3, locals___main__.a, 'replace') )('e') I'm picking at straws here, but it is easier to visually associate the position to the called function (and if many lines, align the code better). In current implementation the code for But yeah, I'm really picking at straws here xD. I guess the only optimization we could do on the transpiler production are :
(*) $B.$call($B.$getattr_pep657(locals_exec.x, 'foo', [0, 0, 5]), [2, 2, 7])() Which seems strange to me as I'd expect to see either : $B.$call(locals_exec.x, $B.$getattr_pep657(locals_exec.x, 'foo', [0, 0, 5]), [2, 2, 7])()
// or
$B.$call($B.$getattr_pep657(locals_exec.x, 'foo', [0, 0, 5]), [2, 2, 7])(locals_exec.x) Making me think that return function(obj, attr) {
let fct = obj[attr]
// some checks
return function(args) { fct.call(obj, args) }
} Then there could be ways to optimize it a little I guess. I'll try to hide messages that are not relevant anymore to keep this thread clean. |
Your test https://jsperf.app/mitogo modifies a global variable, which is never the case in Brython code (or unwillingly...). If you set a local variable, there is no difference, though it's likely that the data is initialized : https://jsperf.app/xofeta. |
This isn't a global variable (it isn't in window) : it is a local variable.
The browser sees that the variable is assigned but never used. The browser can be very tricky on small part of codes. |
I made a test to prevent any agressive Browser optimisation, the conclusions are :
The explanation is quite simple: the array needs to be rebuilt at each iterations which is costly and require dynamic memory allocation. Whereas the number is constant and can be stored on the stack. Using arrays takes also increase memory usage if lots of functions are called without interruptions, as the garbage collector might not have the time to pass and liberate the memory. - Test 1 : https://jsperf.app/xofeta/2 - Test 2 : doing nothing but a check ( https://jsperf.app/xofeta/3 ) |
…. Denis Migdal's suggestion in issue #2270
Salut Denis,
Thanks for the suggestion. I have implemented it in commit 44ba0d1 (many things to change in many places !). The performance gain is not huge; test results vary from one run to another, but there is nothing really noticeable. If you have tools to compare this version to the previous one it will help. |
Could you also test various ways of encoding position info (compared to the current implementation with an array) and see if it actually improves the overall performance ? Not only unit tests as you did, but tests with complete Brython code, as in the integrated speed measurements made by speed/make_report.html. |
Hi, I'll have to take some time to write my answer as there are lot to say and some tricky things to explain. Else, during this week, I worked on the proposal for asynchronous import, and on its redaction. It is almost done, and I think I'll post it during this WE. |
Optimisation in Brython is quite tricky. Not only we have the constrainsts to mimic CPython, thus requiring some "costly" way of doing things... but we are also in JS, on a Browser, which is quite tricky... and I discover new nuances each days.
|
Maybe you can also point me to some Brython scripts that you think are quite slow to execute (e.g. compared to CPython), and I'll investigate where the cost is coming from ? |
It seems the directory |
Correct, I have removed the Ace scripts from |
I'll have to get a deeper look on the documentation. I'm downloading the Brython archive from the git, I set a webserver on |
Things are improving, thanks to your suggestions in the past days (encore merci !) and an optimization in the creation of class instances (commit cbaccc0). For the first time I think, none of the tests in
Welcome to my world ;-)
As you noticed, parsing function arguments is complex because of the flexibility offered by Python. I see you had ideas to improve the algorithm, if you can focus on this topic and provide speed improvements, even limited, it would be great. The regular expression module (in |
You must run the script |
Hum, the following error is printed when executing
I'm happy to hear that ^^
It seems that even if Python have some flexibility, there are still some rules we can exploit to speed up the process. I think the process would be to rewrite it from scratch, and start with the more easy case (no arguments), and to benchmark it. Add a new case (e.g. "normal arguments"), and re-benchmark it on the 2 cases to see if a significative cost is added on the first case, and so one. That'd enable to see when the addition of a new parameter would "costs" a lot, and if we need to split stuff. In order to test that, I'll have to find a way to extract the function in order to execute it "alone" while not having the whole function call. That'd improve reproducibility and would make JS test easier. Maybe I should create a small git in order to do the benchmarks and adaptations little by little.
Aren't CPython regular expression the "same" as JS regular expressions ? |
The bug when starting server.py should be fixed by the commit above.
How I wish it would be the case ! But no, the Python engine has a slightly different syntax, and features that the JS engine doesn't have... See this page for the ever-evolving list of differences. [EDIT] this page is 7 years old, the differences have changed a lot since then As always, the JS engine is available in Brython as |
Humm..... I'd like to test argument parsing like that :
from browser import window
def run():
print('ok')
window.run = run
setTimeout( () => {
function test() {
let foo = __BRYTHON__.jsobj2pyobj(window.run);
console.log(foo);
console.log( __BRYTHON__.args0( foo, arguments ) );
}
test('ok')
}, 1000) The issue is that |
Thanks
Yeah so it's 99% the same, but as always it's the 1% we almost never use that fucks up everything xD. So the solution is either to have some "conversion" from Re regex to JS regex, which may be a little hard to manage (and introduce an additional cost), or to implement the whole Re regex in JS... Yeah that's fucked up whatever you choose. xD |
Yeah, 4k lines for re... I could take a look to what takes the most time... but I'm afraid that wouldn't do miracles... I assume python "re" is implemented in C++ (or something like that) in some python implementation. If you can somehow compile it to WASM, maybe you could get the full implementation while having some perf increases ??? |
It is written in C, the files are here. I have no idea how to convert it to WASM or Javascript. |
For the current implementation, did you use this file that you rewrote by hands, or did you implement your own regex engine from scratch ? As it is in C, there is almost no way a JS version would be faster than CPython. To convert it to wasm, I think there are tools like emscriptem. But tbf I am not sure how the results would be. |
Yes, I wrote everything by hand. Adapting the C code (as I did for other scripts, |
I see, you are quite courageous xD. Then it may be that you have some algorithm issue, or something of the kind. Else, could it be possible to get |
If it's only for your tests, the best is to hack the function to make it return what you need |
I think it'd be best practices that : let Y = jsobj2pyobj( pyobj2jsObj( X ) )
X === Y # they are the same object. I think such symmetry might prevents issues. Else, I found a little workaround : from browser import window
def run(i):
print('ok')
def getRun():
window.run = run
window.getRun = getRun let orig = __BRYTHON__.pyobj2jsobj;
__BRYTHON__.pyobj2jsobj = (args) => {
let ret = orig(args);
ret.$orig = args;
__BRYTHON__.pyobj2jsobj = orig;
return ret;
}
window.getRun();
let foo = window.run.$orig;
console.log('run is', foo);
console.log( __BRYTHON__.args0( foo, ["ok"] ) );
}, 1000) So now I can test it a little better (maybe more tomorrow). Maybe more during the WE. |
I still have some JS errors related to ""ace" on the speed pages. I think I'll start to do some micro optimizations on the files (array preallocation, using The goal would be to clean the code a little, reduce a little memory operations (e.g. when preallocating arrays) and helping browser optimizations (e.g. using There are ~50k lines to look at, so I'll do it little by little. |
Found an interesting link for it : Gotta check on it one day. |
Hi,
I suggest several small optimizations for the debug information that Brython inserts into the JS generated code :
Interesting optimisations
Please also look at
A. Use constants as index to an array, instead of arrays[NO]Please also look at :
Currently, an array describing the position of the "executed" python code is given as the last argument of underlying JS functions, e.g. :
This has the disadvantage of potentially building an array each time the line is executed, and takes more space in the generated JS code.
Therefore, I suggest using globally defined constants, defined during transpilation time. This would increase execution performances while reducing the size of the generated code :
The code would then be written :
This would add an insignificant cost when printing error messages, i.e. to do operation
DEBUG_SYMBOLS[the_constant]
.But it doesn't matter as :
B. Build the constant as a mathematical expression from the 3 numbers [a,b,c].
In the previous suggestion, we used an array to convert a constant to an array containing the information required for DEBUG.
But a JS constant has 53bits exploitable.
This can be split into 3x16 bits, i.e. 3 numbers in the range [0 - 65536]. I'd argue that no python lines should ever be that long (PEP indicates lines should be <79 characters).
At transpillation time, we would simply do :
code :
Giving the following code :
Then, when handling the error :
This would be quite quick when handling errors, while taking less RAM memory (no needs for a
DEBUG_SYMBOLS
structure). Current version takes 7 characters minimum for the[a,b,c]
, array, this version may takes at most 8 characters, so one more (but generally would be at most 7 characters).Of course, we can reduce the value of
NB_BITS
if needed, and even add the line number if we have enough bits.This makes the code more readable as we understand that
0x09
is a special value.More opti :
With more knowledge about the 3 [a,b,c] values, we can reduce the number of bits.
E.g. if a <= b <= c, instead of storing [a,b,c] into 53 bits, we can store [a, b-a, c-a] into 53bits.
Then, we can attribute less bits to e.g. b-a and c-a.
Which could liberate bits to store the line numbers if interesting.
C. Replace all the(NO)$B.set_lineno(frame, 7);
by a map ?Instead of keeping track of the line number, which likely kills the execution performances, creating a structure enabling to convert the JS stackstrace into the Python stacktrace (this would also enable better error printing I think).
The structure would be globally defined as :
I am not sure about the value we need to put in FILE_A/FILE_B, we'll have to look at JS stacktrace more in depth when raising exceptions inside Brython code. Maybe there are also ways to cheat a little.
Getting the line numbers would then be done as :
When transpiling the Python code, the filemap would be built as :
This may make transpilation slightly slower, but I'll argue that :
This will also make error printing more slow (but we don't care).
Also, with that, would we still need the frame system ? Is there other lines numbers hidden in the generated JS code ? (e.g. for brython class definition ?)
EDIT: If the generated JS file is minimified, the file numbers will not match anymore.
I think the minimifier can also build a sourcemap, that will give you the good lines numbers in the stacktrace.
Else, a library might be able to do the trick. Else, I'd argue that if you minimify, you want performances.
D. Option execution-mode="production" to remove debug informations, and make output cleaner/shorter.(NO)Currently, because Brython doesn't implement sourcemaps, the generated JS code is bloated with arrays, and
$B.set_lineno(frame, 7);
calls.This makes the generated code harder to read for a human, increases the generated code size, and has a cost on performances.
This is really useful when developing to better understand errors and fix them. However, in production, we might want to go faster and might not be interested by debugging information.
For this usage, sourcemaps are generally used (cf more info about sourcemaps structures), but I can understand it might not be easy to make.
Hence, I suggest to implement an option
execution-mode
that could take 3 values :$B.set_lineno
in the generated code (as well for other functions with the same goal).Cordially,
The text was updated successfully, but these errors were encountered: