Skip to content

Conversation

sejas
Copy link
Member

@sejas sejas commented Oct 8, 2025

Fixes:

As suggested in WordPress/sqlite-database-integration#263 (comment) , we'll use the AST parser to split the queries from a SQL dump file. The only downside is that we are loading the whole dump into memory.

Testing Instructions

src/Import.php Outdated

fclose( $handle );
protected function remove_comments( $text ) {
return preg_replace( '/\/\*.*?\*\/(;)?/s', '', $text );
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to remove all the comments with a regex because the AST parser was identifying them as queries to execute, which caused it to fail. For example Error: SQLite import could not execute statement: SET @saved_cs_client = @@character_set_client */;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sejas Ahh, this is because the /*!<number> ...*/ comments are special MySQL comments that can execute queries conditionally based on the MySQL version. Therefore, /*!40101 SET character_set_client = @saved_cs_client */; means execute this on all versions >= 4.1.1. So the part with the query being executed is correct.

But somehow, the execution fails... so let's keep a quick fix. Regexes are tricky because they can match any random string in the dump, etc. What about catching the error instead, and if it starts with SQLite import could not execute statement: SET @, then we would skip it? I would then check the root cause of the failure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I just noticed is that the statement doesn't include the leading /* for some reason 🤔 Trying just $this->assertQuery( '/*!40101 SET character_set_client = @saved_cs_client */;' ); — and this passes. Anyway, we can have a hotfix for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I noticed that too. The leading /* exists in the file, but not when using the AST parser.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the regex and used a basic check: ca2f090#diff-aea1542aa0e46981e70c6bfb53a15a583242580d74dfb5df25dbfe71d098757bR159-R162
I'm checking that the query that failed starts with SET and it contains */.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice solution! This way it will target exactly these "wrongly parsed" comments.

$this->driver->query( $statement );
} catch ( Exception $e ) {
// Skip errors when executing SET comment statements
if ( 0 === strpos( $statement, 'SET ' ) && false !== strpos( $statement, '*/' ) ) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot use str_starts_with or str_contains because we support PHP 7.4

Comment on lines 158 to 162
// Skip errors when executing SET comment statements
if ( 0 === strpos( $statement, 'SET ' ) && false !== strpos( $statement, '*/' ) ) {
WP_CLI::warning( 'SQLite import SET comment statement: ' . $statement );
continue;
}
Copy link
Member Author

@sejas sejas Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sejas sejas requested a review from a team October 8, 2025 12:02
Copy link
Contributor

@JanJakes JanJakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

@wojtekn
Copy link
Contributor

wojtekn commented Oct 8, 2025

@sejas I couldn't make it work for either query:

Screenshot 2025-10-08 at 14 26 31 Screenshot 2025-10-08 at 14 27 22

@sejas
Copy link
Member Author

sejas commented Oct 8, 2025

@wojtekn , thanks for testing it.
The second error occurred because I provided only the insert query without including the create table statement. If you use the full SQL dump, it will run correctly.

About the first error, it seems the WordPress/sqlite-database-integration#250 had another error related to the encoding. I fixed it on d897a9e

@sejas
Copy link
Member Author

sejas commented Oct 8, 2025

Currently I found an error when importing a whole site with 185MB+ in multiple tables and it seems it exhausts the Wasm/Node memory.

Success: Imported from 'studio-backup-sql-2025-10-08-15-27-38.sql'.
Imported /var/folders/_x/rbv26n3925q_01dbs4jzd8t80000gn/T/studio_backupECIZ4S/sql/wp_actionscheduler_logs.sql in 0.729s
Importing /var/folders/_x/rbv26n3925q_01dbs4jzd8t80000gn/T/studio_backupECIZ4S/sql/wp_analytics_wp_events.sql...
Error during import of /var/folders/_x/rbv26n3925q_01dbs4jzd8t80000gn/T/studio_backupECIZ4S/sql/wp_analytics_wp_events.sql: PHP.run() failed with exit code 255. 

=== Stdout ===
 #!/usr/bin/env php
Warning - SQLite import skipped SET comment statement: SET @saved_cs_client     = @@character_set_client */;
Warning - SQLite import skipped SET comment statement: SET character_set_client = utf8mb4 */;
Warning - SQLite import skipped SET comment statement: SET character_set_client = @saved_cs_client */;
<br />
<b>Fatal error</b>:  Allowed memory size of 268435456 bytes exhausted (tried to allocate 4096 bytes) in <b>/wordpress/wp-content/mu-plugins/sqlite-database-integration/wp-includes/sqlite-ast/class-wp-sqlite-driver.php</b> on line <b>3176</b><br />


=== Stderr ===
 PHP Fatal error:  Allowed memory size of 268435456 bytes exhausted (tried to allocate 4096 bytes) in /wordpress/wp-content/mu-plugins/sqlite-database-integration/wp-includes/sqlite-ast/class-wp-sqlite-driver.php on line 3176

Would have bumped stat: studio-app-import=failure
Sentry Logger [log]: Captured error event `Database import failed: PHP.run() failed with exit code 255. 

@JanJakes
Copy link
Contributor

JanJakes commented Oct 8, 2025

@sejas I wonder if there's a memory leak of some sorts, or if there's a single query whose AST doesn't fit twice into memory. Does any of these help?

  1. Call unset( $ast ); just after $statement = substr( ... );.
  2. Call gc_collect_cycles() at the end of each loop, or just after $statement = substr( ... );.

@JanJakes
Copy link
Contributor

JanJakes commented Oct 8, 2025

@sejas Oh, now I see there is file_get_contents(), unlike in the old code. Hmm, right. Doing this correctly means supporting streaming all the way. A quick fix with a basic approach could be something like

  1. Read up to, e.g., 10MB of content.
  2. Parse and execute all the queries.
  3. When encountering a parse error, take new 10MB from the beginning of the failing query.
  4. Now if it fails again, it's an error. If it passes, it was incomplete input.

(It would mean that the limit for a single query is 10MB, or whatever we set it to).

@JanJakes
Copy link
Contributor

JanJakes commented Oct 8, 2025

@sejas One last question—wasn't the encoding perhaps the original issue? The original query-parsing code seems quite solid—tracking both escaping and quotes. And it seems to process the queries from WordPress/sqlite-database-integration#250 correctly.

What if we only add the encoding fix and keep the old query parsing code? Would that work?

@sejas
Copy link
Member Author

sejas commented Oct 9, 2025

We have two different issues:

  • 250 is an encoding error
  • 263 is a parse error

We can create two different PRs for those 👍 .

As discussed on Slack, let's keep the WP-CLI parser because adding support for file streams in the AST parser seems more complex.

@sejas sejas closed this Oct 9, 2025
@adamziel
Copy link

adamziel commented Oct 9, 2025

Do we need the new parser for this? It seems like we want to iterate over queries from a potentially large files – maybe the tokenizer alone would suffice? We'd just split at the query boundary and presto?

@JanJakes
Copy link
Contributor

JanJakes commented Oct 9, 2025

maybe the tokenizer alone would suffice

@adamziel Yes! That's a great point. We still need to make the tokenizer support streaming, but it's likely much easier than with the parser.

@JanJakes JanJakes mentioned this pull request Oct 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants