Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OPTIMISATION] Implements str as string with prototype replacement. #2269

Open
Tracked by #2250
denis-migdal opened this issue Oct 9, 2023 · 15 comments
Open
Tracked by #2250

Comments

@denis-migdal
Copy link
Contributor

denis-migdal commented Oct 9, 2023

Hi,

Currently, Brython converts all string into his own structure thanks to $B.String(jsobj) (cf code).

I suggest to not make conversions between str and string, but instead doing some kind of "hot prototype replacement" when a method is called.

Currently, Brython functions call are converted into JS :

"eee".replace()

becomes :

$B.$call($B.$getattr_pep657('eee', 'replace', [0, 0, 13]), [6, 6, 15])()

I suggest, in the function $B.$getattr_pep657 function, to do the following :

if( obj instanceof String ) // converts String into string.
    obj = obj.toString()
if( typeof obj === 'string') { // if string, don't look for the method in obj prototype, but instead in e.g. $B.__string__
    let fct = $B.__string__[method_name];
    // do stuff
}

With $B.__string__ being a global variable :

$B.__string__ = {
    replace: function(...) {  /* do brython stuff */  },
    join: function(...) { /* .... */ },
    // ....  etc. etc.
}

The cost would be two additionals conditions that you are very likely already doing (so 0 cost in fact).
The gain, would be avoiding str <=> string conversions while making the code/process cleaner/easier.

@PierreQuentel What do you think ?

Cordially,

@denis-migdal
Copy link
Contributor Author

denis-migdal commented Oct 10, 2023

Okay, in the same fashion, we can also do it for arrays.

  • list are array, we use the same system as proposed for str and string to change its API.

The issue relies in the JS <=> Brython interactions.

  • For Brython => JS...

    • if the Array is a wrapper, return the internal object.
    • else, we have no other choice but to use a proxy, even if it is slow(*), BUT :
      • users are not expected to use the JS/DOM API anyways.
      • if performances are needed, users would need to explicitly copy the array before giving it to JS.
      • internally, in Brython libraries, we can skip the proxy creation if we know the JS function won't do bad interactions with the array (so we would have native JS performances).
  • For JS => Brython

    • If the Array is a proxy, return the internal object (to get the native perfs).
    • else, we can use a wrapper, with another set of prototype replacement to handle lazy JS<=>Brython type conversions when assessing the array element.
      • if performances are needed, idem, need to explicitly copy the array.

(*) the alternative would requires the array to be a wrapper in the first place, with synchronization issues, that'd add a little cost on the Brython side. I think we could argue that perfs on Brython side are more important than perfs on JS side.

I think this would boost execution speed on array operations in Brython, while handling good JS <=> Brython conversions, without sacrificing perfs in the Brython side.

@denis-migdal
Copy link
Contributor Author

denis-migdal commented Oct 10, 2023

For tupple, we could have a class inheriting Array (enables using typeof on it, and likely Array.isArray()), whose constructor would take its initial values.
But its replaced prototype would be different, and its proxy (when Brython => JS conversion) would throw exceptions when trying to modify it (maybe by default throwing exception for everything except for a list of authorized method).

@PierreQuentel
Copy link
Contributor

These ideas are worth considering, but the main question remains, will they improve the overall performance ? We need testing before answering this question.

If you add one, or two, or three tests inside getattr_pep567, then you will lose some time for all the objects that don't pass the test. Is this cost balanced by a clear benefit on those that pass one of them ?

I did a quick try with this code in py_utils.js:

$B.__string__ = {
    lower: function(){
        var $ = $B.args('lower', 0, {}, [], arguments, {}, null, null)
        return this.toLowerCase()
    }
}

$B.$getattr_pep657 = function(obj, attr, position){
    if(typeof obj == 'string' || obj instanceof String){
        if($B.__string__.hasOwnProperty(attr)){
            return $B.__string__[attr].bind(obj)
        }
    }
    try{
        return $B.$getattr(obj, attr)
    }catch(err){
        $B.set_exception_offsets(err, $B.decode_position(position))
        throw err
    }
}

I measured the time taken by these tests

from time import perf_counter as timer

t0 = timer()
for i in range(1_000_000):
    'ABcD'.lower()
print('__string__', timer() - t0)

t0 = timer()
x = 0
for i in range(1_000_000):
    x.bit_count()
print('int.bit_count', timer() - t0)

The result is a 13% improvement for str.lower() for the version with $B.__string__ compare to the present version, but a loss of 6% pour int.bit_count()

Side note : I don't coerce String to string, because for some str methods (__len__ for instance) it's mandatory to keep the distinction because the result would be incorrect if coerced (unicode code points in the "astral plane" count for 2 characters in a JS string, instead of 1 in Python str)

@denis-migdal

This comment was marked as outdated.

@denis-migdal
Copy link
Contributor Author

denis-migdal commented Oct 11, 2023

Hi ;)

I might be forgetting some details, but I think I found a way to avoid doing tests for each supported type of object (0(n)) for a small overcost ((O(1)). I don't know if this small overcost is greater than doing a test. Depending on the context I think this small overcost would be either :

  • lookup in prototype (generally slow).
  • lookup in object properties (generally quick).

I think I can also prevent some of the function calls.

The rational is to put our own getattr/setattr brython function into supported object prototype.

// INITIALIZATION
function addBrythonSupportFor(target, bry_obj) {
     target.prototype.$BRY_getattr = function(name){  // equiv to $B.getattr()
          return bry_obj[name];
     }
      target.prototype.$BRY_setattr = function(name, value) {} // etc....
}

addBrythonSupportFor( String, {
    replace: function foo() {}
} );
addBrythonSupportFor( Object, {
} );
//... etc.

//NOTE: we may also use Symbol, it'll be cleaner, and maybe faster (but I have doubt).
// const $BRY_getattr = Symbol();
// target.prototype[$BRY_getattr] = function(name){  // equiv to $B.getattr()
//          return bry_obj[name];
//     }


// define $callM to call methods // equiv to $B.getattr_pepXXXX() + $B.call + ()
$B.$callM(target, method, ...args) {
       try {
        
             let m = target.$BRY_getattr(method);
                       // [OPTI???] target.__proto__.$BRY_getattr(method)
                       // [SYMBOL?] target[$BRY_getattr] // target.__proto__[$BRY_getattr](method)
             // do brython stuff
             let ret = m.call(target, ...args)
             // do brython stuff
             return ret
       } catch(e) {
            // do things
       }
}
         // OR
$B.$callM(target, method) { // equiv to $B.getattr_pepXXXX() + $B.call
       try {
        
             let m = target.$BRY_getattr(method);
             // do brython stuff
             let fct = m.bind(target)
             // do brython stuff
             return fct
       } catch(e) {
            // do things
       }
}

// call of a method :
$B.$callM( obj, "replace", ...args )
        // OR
$B.$callM( obj, "replace")(  ...args ) 

This is the "simple version". In fact, $B.$callM should looks like that :

$B.$callM( [0,1,2], [3,4,5], obj, "replace", ...args);
// or
$B.$callM(obj, [0,1,2], "replace", [3,4,5], ...args);
// or
$B.$callM([0,1,2], obj, [3,4,5], "replace", ...args);

@denis-migdal
Copy link
Contributor Author

denis-migdal commented Oct 11, 2023

Wouldn't this system also solves the issue for arrays JS <=> Brython conversions and synchronizations as the content would always be the same being in JS or Brython, only the API to access and use the stored object changes ?

This would avoid wrappings, and copies. Maybe it could even help for JS and Brython functions and classes ?

EDIT: this would also enables Brython users to redefine the API of some JS classes that are not currently redefined.
Using Object, we can also create a "strict" mode to forbid any DOM/JS API use for users that wants security and stick to Python.

So even if it would maybe cost a little more, the cost may be acceptable regarding to the advantages ?

@denis-migdal
Copy link
Contributor Author

denis-migdal commented Oct 11, 2023

The behavior of this in JS works like that :

function foo() { console.log(this) }
foo()               //Window
foo.call(null) //Window
let x = {}
x.foo = foo
x.foo()           // Object { foo: foo() }
z = x.foo
z()                  // Window
(function(){return x.foo})() //function foo()

The behavior of self in Python works like that :

class Z:
      def foo(self):
           print(self)
foo = Z.foo
foo()
# Traceback (most recent call last):
#  File "<stdin>", line 1, in <module>
# TypeError: Z.foo() missing 1 required positional argument: 'self'

z = Z()
z.foo() # <__main__.Z object at 0x7f978caa8400>

x = z.foo
x() # <__main__.Z object at 0x7f978caa8400>

a = A()
a.foo = z.foo
a.foo() # <__main__.Z object at 0x7f978caa8400>

I now understand a little better why there is a bind() in getattr().
I think all functions/methods declared in Brython should be binded at creation (if it isn't already the case) to avoid strange interations with JS.

In the solution I proposed, it'll force you to bind at each call of $B.call( $B.getattr_pepXXX("eee", "replace") )(...).
I suggest that such pattern could be detected and rewritten as :

$B.callM( "eee", "replace", ...);
$B.callM = function(obj, method, ...args) {
      getattr( obj, method ).call(obj, ...args) // "this" in python method isn't modified as binded at creation, "this" in JS method is modified.
}

With "getattr" not doing any bind, and getattr_pepXXX doing a bind for JS functions/methods ?
Then, foo = window.foo would give a foo binded to window if JS function/method, else would be already to null for Python functions, or X if it was a method from the class X (a bind on a bind doesn't change the original bind).

This could be a way to not have strange surprises with JS "this" and Brython "self" bindings while avoiding doing binds at each function/method access ?

@denis-migdal
Copy link
Contributor Author

denis-migdal commented Oct 11, 2023

Did some tests.
I'd advise to use Symbol() as it doesn't seem to be slower and would be safer/cleaner.

Screenshot 2023-10-11 at 15-08-41 Test calls perfs
https://jsperf.app/wucime

The consequences of that is that if we define getattr() in the Object prototype, it shouldn't be slower than current implementation :

// INITIALIZATION
const $BRY_GETATTR = Symbol();

function addBrythonSupportFor(target, bry_obj) {
     target.prototype[$BRY_GETATTR] = function(name){
          return bry_obj[name];
     }
}

addBrythonSupportFor( String, {
    replace: function foo() {}
} );
addBrythonSupportFor( Object, {} );

// in $B.getattr_pep or $B.callM :
let m = target[$BRY_GETATTR];

Note: this also enable to :

  • avoid using conditions.
  • easily implement a costless "strict" Brython mode :
const $BRY_GETATTR = Symbol();
const $BRY_STRICT_MODE = true;

function addBrythonSupportFor(target, bry_obj, enforce) {
     target.prototype[$BRY_GETATTR] = ( enforce === undefined ? enforce : $BRY_STRICT_MODE )
                                                                           ? function(name){
                                                                           
                                                                                if( bry_obj.hasOwnProperty[name] ) // maybe there are ways to optimise it.
                                                                                     return bry_obj[name]
                                                                                
                                                                                throw new Error('Property ... doesn't exists (you are in strict mode');
                                                                           }
                                                                           : function(name){
                                                                           
                                                                                if( bry_obj.hasOwnProperty[name] ) // maybe there are ways to optimise it.
                                                                                     return bry_obj[name]
                                                                                
                                                                                if( ! this.hasOwnProperty[name] )
                                                                                     throw new Error('Property ... doesn't exists')
                                                                                return this[name];
                                                                           }
}

If the property is a JS property, it costs an additional check in non-strict mode.
I think this is an acceptable cost :

  • It enables a strict mode that could be safer for Brython users that wants to limit themselves to the Brython API.
  • I don't think the fact the JS API would be slightly slower is an issue as the Brython should be used.

If addBrythonSupportFor() is exposed, it enables users to easily declare an API for an existing JS object that isn't supported by Brython.

@denis-migdal

This comment was marked as outdated.

@denis-migdal

This comment was marked as outdated.

@denis-migdal
Copy link
Contributor Author

denis-migdal commented Oct 11, 2023

I'm hiding previous comments to keep the conversation clean.

It seems that Symbol() are faster than accessing by name, but the browser has harder time to optimize it in short examples ( https://jsperf.app/wucime/6 ) :

  • Chromium : b[FOO]() is faster, b.foo() is 16% slower.
  • Firefox : b[FOO]() and b.foo() are roughly the same, __proto__ is slower,

TL:DR; So in conclusion, Symbol() should be used as it'll be safer and faster in real life use cases, even if in short example, browsers aggressive optimizations tends to favor the alternative (https://jsperf.app/wucime/5).

@denis-migdal
Copy link
Contributor Author

Just though of it, but we also could force users to explicitly indicate when they want to use the JS API ?
That'd likely make the code safer and more readable ?

Then I'd assume a strict mode would be unnecessary as JSAPI calls would be explicit ?

// not optimized, not fully written, it is just for the example
function addBrythonSupportFor(target, bry_obj, enforce) {

     for(let elem in target.prototype)
          bry_obj['$JSAPI_' + elem] = target.property[elem];

     target.prototype[$BRY_GETATTR] = ...
}

addBrythonSupportFor( String, {
    replace: function foo() {}
} );

//usage :
"eee".replace("e", "f"); // function foo
"eee".$JSAPI_replace("e", "f"); // original JS replace function

@denis-migdal
Copy link
Contributor Author

mmm... now that's funny.
I wanted to test defineProperty() and it seems that non-enumerable non-configurable non-writable Symbol property is the fastest on both Chromium and Firefox (by a very slightly advance) : https://jsperf.app/wucime/7

The advantage compared to the standard way of assigning values, is that we can configure whether :

  • the property is enumerable (by default symbols will not be listed when doing for i in o / o.keys() etc.
  • the property is writable (we can assign it or not).
  • the property is configurable : we can delete it, or modify it with defineProperty().

@denis-migdal
Copy link
Contributor Author

denis-migdal commented Oct 23, 2023

@PierreQuentel What should be the next step on this issue ?

Knowing that I think prototype substitution has several advantages (cf this message and the following) :

  • Reduce code complexity : remove issues of object JS <=> Brython conversions, wrappers duplications, retrieve original object when unwrapping, infinite loops while wrapping, synchronisations/concurrent access issues, etc. Therefore making the code cleaner (which I argue is more important than raw performances).
  • Increase performances (?): at term, would allow to remove some checks/conditions by using the delegated prototype. So it should be overall faster (would need to declare the prototype substitution for Object and others builtin types that could fallback to current behavior in a first time, for a small overcost). Removing some issues of JS <=> Brython conversions would also increase performances I think.
  • Delegate responsibilities: currently, some functions needs to do checks/conditions depending on the type. Prototype substitution would enable to delegate responsibilities to the prototype substitution, so the whole object behavior is defined in one place, in one place only. Which might also better respect encapsultion. (I argue this is more important than raw performances).
  • Extensible : would allow Brython users to easily declare alternate prototypes for classes not currently implemented by Brython-only API. Else, it would requires to modify themselves pyobj2jsobj() and jsobj2pyobj() which isn't best practice (which I argue is more important than raw performances).
  • Security: enables easy implementation of a strict mode (non-Brython API functions are forbidden), or an explicit mode (JS API functions names are e.g. prefixed by JS_).

I think there are ways to redefine little by little Brython functions in order to add this new feature little by little for an overcost, until the time we are ready to make the full switch :

  1. Add a new [BRY_JS2PY]() function to all JS types prototypes (not enumerable ofc, using a Symbol to prevents collisions) that'd be called by jsobj2pyobj() and other functions that creates wrapper/convert objects. This should remove some conditions, and therefore increase performances at the cost of one method call. However, once done, jsobj2pyobj(X) calls could be directly replaced by X[BRY_JS2PY]() calls, so should remove a big part (and even all) of the method call over cost.
  2. Add a new [BRY_PY2JS]() function to some wrappers. Idem, that should allows to optimise pyobj2jsobj().
  3. Inside the wrapper produced by [BRY_JS2PY](), always store the original JS obj, that'd be returned when calling [BRY_PY2JS]() (would ensure Brython <=> JS symmetry ).

This would make JS <=> Brython conversions more extensibles (enable Brython users to declare a conversion for one JS class), with delegation of responsibilities, and should induce an increase in performances (remove a function call and conditions at the cost of a method call).

  1. Replace calls of getattr(), setattr(), etc. by a method call on the wrapper.

Once done, we should be able to control whether we wants to use a wrapper or the original object with substitute prototype, simply by changing the return value of the [BRY_JS2PY]() function for the type we want.

  1. Now we can start replacing some wrappers by the raw JS obj with some prototype substitution. With the possibility to easily revert back (just one line to modify inside [BRY_JS2PY]() function).

In the eventuality a small over cost is induced I'd argue that extensibility and clean code are more important than raw execution speed, and that should reduce also some memory usage. I'd argue that an eventual over cost should be so small (function call vs method call) that it shouldn't matter, other parts of code are doing way more costly operations and can be optimized. I'd also argue that normally we should get performances increase due to the removal of some checks/conditions.

@denis-migdal
Copy link
Contributor Author

I think we really NEED to do that for lists/tuples as seen in #2298. Brython <=> JS conversions are unsafe as they are asymmetrical, as well as a bit complex and chaotic.
Once implemented for one object, doing it for other objects should not add an additional cost. It will also enables users to add their own Python wrappers to JS classes.

Tuple would use Object.freeze(object1); to make the array read-only.
Prototype substitution will be used to provide the correct API interface:

  • JS arrays Object.isFrozen(array) === false would be converted as a list.
  • JS arrays Object.isFrozen(array) === true would be converted as a tuple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants