Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable built-ins compile for all boards #8687

Merged
merged 5 commits into from
Dec 5, 2023
Merged

Conversation

bill88t
Copy link

@bill88t bill88t commented Dec 3, 2023

This enables builtins.compile() for all espressif boards.

This is currently in draft as I want to run more tests and see how it affects storage space.

I have already implemented some basic compilation support within ljinux which heavily relies on exec and the results are pretty impressive.
However since I still haven't made a benchmark, I have no numbers to present.
I suggest as you try for yourselves.

I want to enable this on more and more ports, but I plan on doing it in different PRs, one step at a time.
If we end up enabling it on everything, then we can just revert the per-port PRs and enable it globally.

@bill88t
Copy link
Author

bill88t commented Dec 3, 2023

@RetiredWizard pinging for interest.

global crun
try:
    from builtins import compile
    def crun(code: str) -> None:
        exec(compile(code, "", "exec"))

except ImportError:
    def crun(code: str) -> None:
        exec(code)

@bill88t
Copy link
Author

bill88t commented Dec 3, 2023

I fully finished the code to use compile() in ljinux for all programs. If there is compiler functionality on the board it compiles and unloads the raw code (source currently on the dev branch).
I tested on esp32, s3, c3.

image

The speedup is most apparent on the slower boards. c3 and regular esp32 became much more responsive.
Additionally since I no longer need to keep the source code loaded, ram usage is decreased.
It's perfectly stable. Marking as ready for review.

@bill88t bill88t marked this pull request as ready for review December 3, 2023 16:03
@jepler
Copy link
Member

jepler commented Dec 3, 2023

I would never have predicted a performance improvement from this. I can see why if you can cache the result of compile() and use it multiple times it could make a difference.

I made some benchmark code and ran it on the unix port on my desktop computer (amd64)

import time

n = 100_000

def timeit(title, f):
    t0 = time.time()
    for _ in range(n): f()
    t1 = time.time()
    dt = t1 - t0
    print(f"{title:<20}: {dt:8.3f}")

def crun(s):
    return exec(compile(s, "", "exec"))

codestr = "sum(range(100))"
code = compile(codestr, "", "exec")
timeit("exec", lambda: exec(codestr))
timeit("compile", lambda: compile(codestr, "", "exec"))
timeit("crun", lambda: crun(codestr))
timeit("code", lambda: exec(code))

typical results:

exec                :    0.946
compile             :    0.706
crun                :    1.344
code                :    0.383

@RetiredWizard
Copy link

I swapped the exec with exec(compile()) in PyDOS. I haven't run any benchmarks but PyDOS doesn't seem to run much different. I have been testing on an S3 so I'll give the C3/ESP32 chips a try as well but PyDOS actually ran pretty well on those chips.

I don't think I fully understand how the compile command should be used. PyDOS only executes a single python line at a time so I'm not sure I see where the speedup comes from.

@RetiredWizard
Copy link

RetiredWizard commented Dec 3, 2023

Still on the ESP32-S3, I decided to try timing a program:

Running with exec(compile())

(8155376)/>bench 
time
The current time is: 13:01:20
turtle
Turtle time! Lets draw a rainbow benzene
Press Enter to continue... 
time
The current time is: 13:01:53

Running with just exec()

(8160288)/>bench 
time
The current time is: 00:01:33
turtle
Turtle time! Lets draw a rainbow benzene
Press Enter to continue... 
time
The current time is: 00:01:41

The turtle program ran much faster without the compile modification, however if I execute exec(compile('import turtle','','exec')) it matches the faster speed of 8 seconds.

After re-testing both the exec(compile()) and exec() methods ran in 8 seconds.

@bill88t
Copy link
Author

bill88t commented Dec 3, 2023

First of all, applications like ljinux are python scripts for the most part, not shell scripts.
So the kernel does not get back control at all while the app runs.
And for long running apps that have plenty of loops and/or user interactions, like nano for example, where we open it once and spend minutes inside, not having to re-interpret anything is pretty massive.
In the backround, the code is just looping over and over. If the loops are in ASM already, it's instant.

Let's spice up the code example and run it on C3.

import time

n = 1000

def timeit(title, f):
    t0 = time.time()
    for _ in range(n): f()
    t1 = time.time()
    dt = t1 - t0
    print(f"{title:<20}: {dt:8.3f}")
    
codestr = """
for i in range(1000):
    a = sum(range(100)) ^ i
    b = int(a / 3)
"""

timeit("compile", lambda: compile(codestr, "", "exec"))
timeit("code", lambda: exec(code))
timeit("exec", lambda: exec(codestr))

C3 Results:

compile             :    3.000
code                :    0.000
exec                :  149.000

This is because of the interpreter.
Having repetitive logic in bytecode, even if we use it only a few times really makes a difference.

@RetiredWizard
Copy link

After some Discord discussions will @bill88t I was really confused by my earlier benchmark results and re-tested. Upon re-testing the compile option ran in 8 seconds rather than the 33 seconds I got earlier.

I was using web workflow to move some files back and forth and was seeing issues with the web workflow apparently locking up the serial REPL and possibly crashing the WiFi interface. I haven't been able to isolate for a reproducible issue yet but I suspect it was WiFi issues that slowed down the earlier Compile test.

@bill88t bill88t marked this pull request as draft December 4, 2023 02:05
@bill88t
Copy link
Author

bill88t commented Dec 4, 2023

I too made an oopsie with my testing (today it's borked tests day apparently).
I had removed my code setter and since I didn't reload I was using an earlier value on my testing above.

Old code (fixed):

import time

n = 1000

def timeit(title, f):
    t0 = time.time()
    for _ in range(n): f()
    t1 = time.time()
    dt = t1 - t0
    print(f"{title:<20}: {dt:8.3f}")
    

codestr = """
for i in range(1000):
    a = sum(range(100)) ^ i
    b = int(a / 3)
"""

code = compile(codestr, "", "exec")
timeit("compile", lambda: compile(codestr, "", "exec"))
timeit("code", lambda: exec(code))
timeit("exec", lambda: exec(codestr))

Results (S3):

compile             :    1.000
code                :   82.000
exec                :   84.000

Newer tests:

from time import monotonic
n = 2
def timeit(title, f):
    t0 = monotonic()
    for _ in range(n): f()
    dt = monotonic() - t0
    print(f"{title:<10}: {dt}s")

codestr = """
a = 0
for i in range(100):
    b = 0
    for i in range(100000):
        b += i
"""
code = compile(codestr, "", "exec")
timeit("compile", lambda: compile(codestr, "", "exec"))
timeit("code", lambda: exec(code))
timeit("exec", lambda: exec(codestr))

Results (S3):

compile   : 0.00390625s
code      : 101.547s
exec      : 102.008s

I actually have no idea what is going on.
The speedup is apparent in use, but I cannot reproduce it in sample code.

I drafted and undrafted cause I was attempting to enable native emitters and stuff.
But they yielded no benefits whatsoever.

In any case, the function works.
I have nothing more to add or test, it's good to go.

@bill88t bill88t marked this pull request as ready for review December 4, 2023 05:14
Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this for all ports or none. There's no reason to limit it to one port.

@bill88t bill88t changed the title Enable built-ins compile for all espressif boards Enable built-ins compile for all boards Dec 4, 2023
@bill88t
Copy link
Author

bill88t commented Dec 4, 2023

Briefly tested on nrf and rp2. I can't test the others, since I don't have any.

@bill88t bill88t requested a review from tannewt December 4, 2023 20:36
@RetiredWizard
Copy link

RetiredWizard commented Dec 4, 2023

I ran a couple quick tests on the SAMD51 and mimxrt10xx. I didn't run any benchmarks and didn't notice any speed differences but the build seemed to work fine.

Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok with me!

@tannewt tannewt merged commit 156f417 into adafruit:main Dec 5, 2023
474 checks passed
@bill88t bill88t deleted the espcompile branch December 8, 2023 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants