Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize inserting keywords into tl_search_index #1277

Closed
leofeyer opened this issue Dec 31, 2017 · 12 comments
Closed

Optimize inserting keywords into tl_search_index #1277

leofeyer opened this issue Dec 31, 2017 · 12 comments
Assignees
Labels
Milestone

Comments

@leofeyer
Copy link
Member

Using a combined insert query to add the keywords to tl_search_index will greatly improve the performance.

/** @var Connection $objConnection */
$objConnection = \System::getContainer()->get('database_connection');

// Remove existing index
$objDatabase->prepare("DELETE FROM tl_search_index WHERE pid=?")
			->execute($intInsertId);

$arrValues = array();

foreach ($arrIndex as $k=>$v)
{
	$arrValues[] = sprintf('(%s, %s, %s, %s)', (int) $intInsertId, $objConnection->quote($k), $objConnection->quote($v), $objConnection->quote($arrData['language']));
}

// Create the new index
$objDatabase->query("INSERT INTO tl_search_index (pid, word, relevance, language) VALUES " . implode(', ', $arrValues));

I was able to rebuild the search index on contao.org in about 2 minutes compared to over 10 minutes before.

@contao/developers Shall we change this in Contao 4.4 as well?

@leofeyer leofeyer added the bug label Dec 31, 2017
@leofeyer leofeyer added this to the 4.5.1 milestone Dec 31, 2017
@aschempp
Copy link
Member

Are you sure about the database quoting? Also, why mix the two connections and not just use Doctrine?

@aschempp
Copy link
Member

On a similar note, this can be done just with the legacy class.

// Remove existing index
$objDatabase->prepare("DELETE FROM tl_search_index WHERE pid=?")
			->execute($intInsertId);

$arrQuery = array();
$arrValues = array();

foreach ($arrIndex as $k=>$v)
{
        $arrQuery[] = '(%s, %s, %s, %s)';
	$arrValues[] = (int) $intInsertId;
	$arrValues[] = $k;
	$arrValues[] = $v;
	$arrValues[] = $arrData['language'];
}

// Create the new index
$objDatabase->prepare("INSERT INTO tl_search_index (pid, word, relevance, language) VALUES " . implode(', ', $arrQuery))->execute($arrValues);

@leofeyer
Copy link
Member Author

I am sure that we should quote the values if that is what you mean. 😄

We might as well use only Doctrine of course. Or add a quote() method to our Database class. Those implementation details need to be discussed.

The code was just a quick fix for contao.org.

@leofeyer
Copy link
Member Author

I don't like your second solution. It works for sure, but we are building two possibly giant arrays that consume a lot of memory for nothing.

@aschempp
Copy link
Member

aschempp commented Dec 31, 2017

Really? Counting the characters, I think ('foo', 'bar') is exactly the same amount as (%s, %s) and foo, bar? 😁

@leofeyer
Copy link
Member Author

Except your approach requires a lot of vsprintf() to build the query string.

@aschempp
Copy link
Member

It requires one, instead of multiple sprintf?

@leofeyer
Copy link
Member Author

Sorry, I meant it requires a giant vsprintf() instead of multiple small sprintf(). No idea if there is any performance difference though.

@m-vo
Copy link
Member

m-vo commented Jan 1, 2018

If the sprintf() / vsprintf() performance is the bottleneck then plain old string concatenation would probably be the way to go. :-)

(I'd be more concerned about possible injections. )

@leofeyer
Copy link
Member Author

leofeyer commented Jan 2, 2018

It seems that one giant vsprintf() is more performant: https://3v4l.org/Up3jQ

@leofeyer leofeyer modified the milestones: 4.5.1, 4.4.12 Jan 2, 2018
@leofeyer leofeyer changed the title Optimize inserting keywords in tl_search_index Optimize inserting keywords into tl_search_index Jan 2, 2018
@leofeyer
Copy link
Member Author

leofeyer commented Jan 2, 2018

Implemented in 6f8a020.

@leofeyer leofeyer closed this as completed Jan 2, 2018
@m-vo
Copy link
Member

m-vo commented Jan 2, 2018

First of all: I'm fine with the changes as this really makes things faster. :D

And I really don't want to nitpick but rather raise awareness that those tests should probably better be profiled inside the application. If you for instance swap the execution order of the two tested versions (https://3v4l.org/lvnqS) the already small differences (overall same magnittude) gets even smaller showing that these tests are in no way free of side effects.

@leofeyer leofeyer self-assigned this Jan 2, 2018
@leofeyer leofeyer modified the milestones: 4.4.12, 4.4 May 14, 2019
leofeyer pushed a commit that referenced this issue Feb 11, 2020
Description
-----------

See #1277

Commits
-------

c999f8e5 Roll back route name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants