Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: There seems to be a runtime related overhead with using ext-php-rs extensions in PHP #303

Open
bgrande opened this issue Dec 8, 2023 · 11 comments

Comments

@bgrande
Copy link

bgrande commented Dec 8, 2023

First of all: Thank you for the great work! It's very interesting and promising.

While testing a simple extension's runtime (measuring time in php script) and running the script multiple times I noticed a constant overhead of around >400ms (at my system) related to my extension build with ext-php-rs.
The comparison for the overhead is a comparable native PHP function and a pure Rust version of it.

Here are the basics:

Rust ext-php-rs

use ext_php_rs::prelude::*;

#[php_function]
pub fn find_mapping(needle: String, haystack: Vec<String>) -> Option<String> {
    match haystack.iter().position(|elem| elem == &needle) {
        None => None,
        Some(_ind) => Some(needle)
    }
}

#[php_module]
pub fn module(module: ModuleBuilder) -> ModuleBuilder {
    module
}

This got built via cargo build --release.

Rust pure

fn find(needle: String, haystack: &Vec<String>) -> Option<String> {
    match haystack.iter().position(|elem| elem == &needle) {
        None => None,
        Some(_ind) => Some(needle)
    }
}

Also built with cargo build --release.

PHP with ext-php-rs

function find(string $testString, array $randomStrings): string {
  return find_mapping($testString, $randomStrings);
}

using sth. like /usr/bin/php -d extension=./target/release/librs_php_ext2.so test.php

PHP native

function find(string $testString, array $randomStrings): string {
  if ($index = array_search($testString, $randomStrings)) {
    return $testString;
  }
  return '';
}

$randomStrings looks like this:

$randomStrings = [];
for ($i = 0; $i < 9999999; $i++) {
    $randomStrings[] = hash('sha1', random_bytes(5));
}

Now, measuring the times with $start = microtime(true); and $end = microtime(true) in the script gives me similar results (depending on the values' position and my machine, of course) like this (for the same string):

time taken with Rust: '413.46788406372ms'
time taken with PHP internal: '8.3808898925781ms'

While Rust pure is quite what you'd expect:

Time elapsed in find() is: 4.85131ms

Are there any known issues or did anybody have a similar experience?

@bartvanhoutte
Copy link

Having similar experience, it seems like converting data from PHP to rust and vice versa is the culprit?

@bgrande
Copy link
Author

bgrande commented Dec 18, 2023

@bartvanhoutte I'm not entirely sure. I haven't had time to dig through the whole library code so far.
But yes, either it's related to data conversion or some C binding mapping.

@danog
Copy link
Collaborator

danog commented Dec 18, 2023

Yeah data conversion is not the direct culprit here, as no conversion is being done here, most of the library just wraps PHP's zend datastructures.
I've encountered similar performance issues and I'm planning to debug them once I have some more time...

@bgrande
Copy link
Author

bgrande commented Dec 18, 2023

Thank you @danog!

I just tried running the performance test with gradually increasing the size of the array with random data.
The smaller the array the closer Rust's and PHP native's performance are. The bigger the array, the longer the Rust extension takes. Compared to PHP native, of course.

Therefore, my conclusion: It might be related to the memory usage somehow.

@bartvanhoutte
Copy link

Thanks @danog.

@bgrande I've noticed similar behavior when writing a function and increasing the amount of arguments passed on to the function.

@bgrande
Copy link
Author

bgrande commented Dec 20, 2023

@bartvanhoutte that's interesting. Might also be memory related.

@ju1ius
Copy link
Contributor

ju1ius commented Jan 10, 2024

@bgrande Conversion between PHP and Rust types is not free. The structures have to be allocated and the bytes copied. The conversion from ZendHashTable to Vec<String> is likely to be the bottleneck here.

Try accepting Zend types as input instead and you should get comparable performance (untested but you get the idea):

#[php_function]
pub fn find_mapping(needle: &ZendStr, haystack: &ZendHashTable) -> Option<&ZendStr> {
    todo!()
}

@bgrande
Copy link
Author

bgrande commented Jan 15, 2024

@ju1ius Thank you a lot. That makes a lot of sense. I will try this and report back with the results here.

@bgrande
Copy link
Author

bgrande commented Mar 4, 2024

Ok, finally an update here:
I tested this with some minimal versions and the culprit definitely is the haystack parameter. Which makes sense because this is the data intense part with a huge array of strings.

However, I tested @ju1ius suggestions and saw some real improvements here:

#[php_function]
pub fn find_mapping(needle: &Zval, haystack: &ZendHashTable) -> Option<String> {
    match haystack.iter().position(|elem| elem.1.str() == needle.str()) {
        None => None,
        Some(_ind) => needle.string()
    }
}

With that I'm down to avg for a bit more than double the native PHP version. Which is way less than before but still a relatively huge difference.

It might work even better with a Zval reference return value but that didn't work for me due to having to define lifetime specifiers. Which the php_function macro doesn't seem to support right now.

@ju1ius
Copy link
Contributor

ju1ius commented Mar 4, 2024

It might work even better with a Zval reference return value but that didn't work for me due to having to define lifetime specifiers. Which the php_function macro doesn't seem to support right now.

Returning references from Rust functions «exported» to PHP cannot be supported. The Zend Engine is a C API and thus cannot «understand» Rust's generics and lifetimes.

Even if it were supported, it would be the wrong thing to do in your case. Whenever you return a reference-counted value from a PHP function, you must increment it's reference count. If your function were implemented in C, it would return a copy of the zval needle after incrementing the reference count of the underlying zend_string, using the RETURN_COPY(needle) macro.

Also, remember that Zend Strings are just bytes without an encoding. They're conceptually equivalent to a Box<[u8]> for persistent strings, or to an Rc<[u8]> for non-persistent strings. So going back-and-forth between ZendString and String requires not only copying the bytes, but also potentially performing UTF-8 validation.

So an even more performant implementation should look like this:

#[php_function]
pub fn find_mapping(needle: &Zval, haystack: &ZendHashTable) -> Option<ZVal> {
    let Some(needle) = needle.zend_str() else {
        return None; // should rather be a type error, but oh well...
    };
    haystack.iter().find_map(|(_, value)| value.zend_str()
        .filter(|zs| zs == needle)
        .map(|_| value.shallow_clone())
    )
}

@bgrande
Copy link
Author

bgrande commented Mar 5, 2024

Thx @ju1ius! I haven't thought about the internals much but it makes sense, of course.

Your solution is a bit faster, indeed. It went down from a bit more than double to almost double the PHP native functions.

I guess, for tasks like this it's probably the best we can do.

Apart from my example being a bit constructed we can at least derive a conclusion here, I think:
When your task does include a lot of data to be (type) converted, especially for strings, performance won't be as good as you might initially expect. As for everything else using ext-php-rs is a great solution.

Therefore, this isn't really a bug so closing it would be ok for everyone?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants