Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely Poor Performance Parsing PHP #1873

Open
wspeirs opened this issue May 16, 2017 · 9 comments
Open

Extremely Poor Performance Parsing PHP #1873

wspeirs opened this issue May 16, 2017 · 9 comments

Comments

@wspeirs
Copy link

wspeirs commented May 16, 2017

I built the lexer and parser from the grammars repo for PHP and I get extremely poor performance (~3s to lex, and 24s to parse) with a simple 80 line PHP program (taken from the Wordpress repo). A simple example of this performance issue can be found here: https://github.com/wspeirs/php_linter

I read the previous posts (#944, #1243) about performance issues and have:

  • Set the error handler to BailErrorStrategy
  • Set the prediction mode to PredictionMode.SLL

I'm using Python 3.5 on Linux, and ANTLR4.7:

Python 3.5.2 (default, Nov 17 2016, 17:05:23) 
[GCC 5.4.0 20160609] on linux
ANTLR Parser Generator  Version 4.7

I'm hoping there is just a "make go faster" flag I'm missing somewhere, and I don't have to re-work the entire grammar (especially seeing as how I didn't make it) or that I just found a nasty bug in ANTLR4.

Thanks!

@KvanTTT
Copy link
Member

KvanTTT commented May 16, 2017

Could you add a file or link of your simple 80-line program? I'll take a look. C# version of grammar works fairly fast.

@wspeirs
Copy link
Author

wspeirs commented May 16, 2017

It is in the main.py file as a string in the repo I linked: https://github.com/wspeirs/php_linter/blob/master/main.py

@wspeirs
Copy link
Author

wspeirs commented May 16, 2017

<?php
ignore_user_abort(true);
if ( !empty($_POST) || defined('DOING_AJAX') || defined('DOING_CRON') )
	die();
define('DOING_CRON', true);
if ( !defined('ABSPATH') ) {
	/** Set up WordPress environment */
	require_once( dirname( __FILE__ ) . '/wp-load.php' );
}
function _get_cron_lock() {
	global $wpdb;
	$value = 0;
	if ( wp_using_ext_object_cache() ) {
		$value = wp_cache_get( 'doing_cron', 'transient', true );
	} else {
		$row = $wpdb->get_row( $wpdb->prepare( "SELECT option_value FROM $wpdb->options WHERE option_name = %s LIMIT 1", '_transient_doing_cron' ) );
		if ( is_object( $row ) )
			$value = $row->option_value;
	}
	return $value;
}
if ( false === $crons = _get_cron_array() )
	die();
$keys = array_keys( $crons );
$gmt_time = microtime( true );
if ( isset($keys[0]) && $keys[0] > $gmt_time )
	die();
$doing_cron_transient = get_transient( 'doing_cron' );
if ( empty( $doing_wp_cron ) ) {
	if ( empty( $_GET[ 'doing_wp_cron' ] ) ) {
		// Called from external script/job. Try setting a lock.
		if ( $doing_cron_transient && ( $doing_cron_transient + WP_CRON_LOCK_TIMEOUT > $gmt_time ) )
			return;
		$doing_cron_transient = $doing_wp_cron = sprintf( '%.22F', microtime( true ) );
		set_transient( 'doing_cron', $doing_wp_cron );
	} else {
		$doing_wp_cron = $_GET[ 'doing_wp_cron' ];
	}
}
if ( $doing_cron_transient != $doing_wp_cron )
	return;
foreach ( $crons as $timestamp => $cronhooks ) {
	if ( $timestamp > $gmt_time )
		break;
	foreach ( $cronhooks as $hook => $keys ) {
		foreach ( $keys as $k => $v ) {
			$schedule = $v['schedule'];
			if ( $schedule != false ) {
				$new_args = array($timestamp, $schedule, $hook, $v['args']);
				call_user_func_array('wp_reschedule_event', $new_args);
			}
			wp_unschedule_event( $timestamp, $hook, $v['args'] );
 			do_action_ref_array( $hook, $v['args'] );
			if ( _get_cron_lock() != $doing_wp_cron )
				return;
		}
	}
}
if ( _get_cron_lock() == $doing_wp_cron )
	delete_transient( 'doing_cron' );
die();

@KvanTTT
Copy link
Member

KvanTTT commented May 16, 2017

It parsed less than a 0.5 sec on C# SHarwerll runtime. Unfortunately, I'm not working with Python runtime and can not help you. Maybe it's related to bugs and defects in Python runtime.

@ericvergnaud
Copy link
Contributor

Hi,
Python is 30x slower than Java/C#.
If you need performance simply don't use Python.
Eric

@wspeirs
Copy link
Author

wspeirs commented May 16, 2017

@ericvergnaud is there something inherently different between the way Python runtime works and the C# runtime? I struggle to believe that everything is working as expected when it takes 0.5s with C# and 24s with Python. What inside the runtime would cause it to be so slow in Python, but not in C#?

@ericvergnaud
Copy link
Contributor

The tests demonstrate that everything works as expected.
That said, the 30x slower is an average i.e. the performance ratio varies from one area to another.
Antlr relies enormously on maps, and it seems the JVM and the CLR are very good at optimizing these.

@wspeirs
Copy link
Author

wspeirs commented May 16, 2017

I started digging into the profile a bit:

ncalls   tottime    percall    cumtime    percall    filename:lineno(function)
1254773  2.98       0          19.84      0.007      ParserATNSimulator.py:1135(closure_)
5696573  1.509      0          2.349      0          {built-in method builtins.hash}
1535239  1.174      0          1.174      0          ATNConfig.py:25(__init__)
1747037  1.149      0          1.516      0          PredictionContext.py:134(__eq__)
295265   1.105      0          5.102      0          PredictionContext.py:444(mergeArrays)
1595582  1.062      0          4.481      0          ParserATNSimulator.py:1365(getEpsilonTarget)
1270322  1.006      0          19.843     0.007      ParserATNSimulator.py:1093(closureCheckingStopState)

builtins.hash is clearly called a lot, but only accounts for 1.5s. Looking at cum time, it seems that something to do with the closure in ParserANTSimulator.py is taking up all the time:

ncalls   tottime    percall    cumtime    percall    filename:lineno(function)
97       0.002      0          24.52      24.52      {built-in method builtins.exec}
1544     0.206      0          21.744      0.014     ParserATNSimulator.py:290(adaptivePredict)
1544     0.011      0          20.476      0.013     ParserATNSimulator.py:382(execATN)
1126     0.013      0          20.448      0.018     ParserATNSimulator.py:498(computeTargetState)
2936     0.002      0          19.845      0.007     ParserATNSimulator.py:1087(closure)
1270322  1.006      0          19.843      0.007     ParserATNSimulator.py:1093(closureCheckingStopState)
1254773  2.98       0          19.84       0.007     ParserATNSimulator.py:1135(closure_)
1126     0.08       0          19.059      0.017     ParserATNSimulator.py:659(computeReachSet)
56       0.001      0          13.494      0.75      PHPParser.py:3930(nonEmptyStatement)
462817   0.78       0          11.063      0         ATNConfigSet.py:71(add)

This spark any ideas with anyone?

@ericvergnaud
Copy link
Contributor

You will see a very similar profile in Java and C# i.e. most of the time is spent in ParserATNSimulator.
If you look at the last line, you'll see that 11s are spent in ATNConfigSet.add, which is in line with what I wrote earlier (should have wrote maps and sets).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants