-
Notifications
You must be signed in to change notification settings - Fork 1
Remove custom parser in favor of AST parser #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/Import.php
Outdated
|
||
fclose( $handle ); | ||
protected function remove_comments( $text ) { | ||
return preg_replace( '/\/\*.*?\*\/(;)?/s', '', $text ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to remove all the comments with a regex because the AST parser was identifying them as queries to execute, which caused it to fail. For example Error: SQLite import could not execute statement: SET @saved_cs_client = @@character_set_client */;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sejas Ahh, this is because the /*!<number> ...*/
comments are special MySQL comments that can execute queries conditionally based on the MySQL version. Therefore, /*!40101 SET character_set_client = @saved_cs_client */;
means execute this on all versions >= 4.1.1
. So the part with the query being executed is correct.
But somehow, the execution fails... so let's keep a quick fix. Regexes are tricky because they can match any random string in the dump, etc. What about catching the error instead, and if it starts with SQLite import could not execute statement: SET @
, then we would skip it? I would then check the root cause of the failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I just noticed is that the statement doesn't include the leading /*
for some reason 🤔 Trying just $this->assertQuery( '/*!40101 SET character_set_client = @saved_cs_client */;' );
— and this passes. Anyway, we can have a hotfix for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I noticed that too. The leading /*
exists in the file, but not when using the AST parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the regex and used a basic check: ca2f090#diff-aea1542aa0e46981e70c6bfb53a15a583242580d74dfb5df25dbfe71d098757bR159-R162
I'm checking that the query that failed starts with SET
and it contains */
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice solution! This way it will target exactly these "wrongly parsed" comments.
$this->driver->query( $statement ); | ||
} catch ( Exception $e ) { | ||
// Skip errors when executing SET comment statements | ||
if ( 0 === strpos( $statement, 'SET ' ) && false !== strpos( $statement, '*/' ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot use str_starts_with or str_contains because we support PHP 7.4
// Skip errors when executing SET comment statements | ||
if ( 0 === strpos( $statement, 'SET ' ) && false !== strpos( $statement, '*/' ) ) { | ||
WP_CLI::warning( 'SQLite import SET comment statement: ' . $statement ); | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #13 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
@sejas I couldn't make it work for either query: ![]() ![]() |
… noise in Studio alert error messages
@wojtekn , thanks for testing it. About the first error, it seems the WordPress/sqlite-database-integration#250 had another error related to the encoding. I fixed it on d897a9e |
Currently I found an error when importing a whole site with 185MB+ in multiple tables and it seems it exhausts the Wasm/Node memory.
|
@sejas I wonder if there's a memory leak of some sorts, or if there's a single query whose AST doesn't fit twice into memory. Does any of these help?
|
@sejas Oh, now I see there is
(It would mean that the limit for a single query is 10MB, or whatever we set it to). |
@sejas One last question—wasn't the encoding perhaps the original issue? The original query-parsing code seems quite solid—tracking both escaping and quotes. And it seems to process the queries from WordPress/sqlite-database-integration#250 correctly. What if we only add the encoding fix and keep the old query parsing code? Would that work? |
Do we need the new parser for this? It seems like we want to iterate over queries from a potentially large files – maybe the tokenizer alone would suffice? We'd just split at the query boundary and presto? |
@adamziel Yes! That's a great point. We still need to make the tokenizer support streaming, but it's likely much easier than with the parser. |
Fixes:
As suggested in WordPress/sqlite-database-integration#263 (comment) , we'll use the AST parser to split the queries from a SQL dump file. The only downside is that we are loading the whole dump into memory.
Testing Instructions
~/Library/Application Support/Studio/server-files/sqlite-command/src/Import.php
with the changes from this PR.