Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retroactive transliteration of file names broken with MySQL 8.0.22+ #4931

Open
jlfranklin opened this issue Feb 3, 2021 · 39 comments
Open

Comments

@jlfranklin
Copy link
Member

Description of the bug

One test fails under Ubuntu 20.04 that does not fail under Debian 9 (php 7.0, MariaDB 10.1.47).

It may be a configuration or database issue. Database is using default char set 'utf8', not utf8mb4, and utf8mb4 is not enabled in settings.php.

Steps To Reproduce

To reproduce the behavior:

  1. Install Backdrop in an Ubuntu 20.04 server with MySQL 8.0, PHP 7.4 FPM, and Apache 2.4.41)
  2. Run the full suite of tests.

Actual behavior

---- File API: File uploading transliteration (FileUploadTransliterationTest) ----


Status    Group      Filename          Line Function                            
--------------------------------------------------------------------------------
Exception PDOExcepti system.admin.inc  2679 system_transliteration_retroactive(
    SQLSTATE[HY000]: General error: 3995 Character set 'utf8_general_ci' cannot
    be used in conjunction with 'binary' in call to regexp_like.: SELECT
    COUNT(*) AS expression
    FROM 
    (SELECT 1 AS expression
    FROM 
    {file_managed} file_managed
    WHERE  (uri NOT REGEXP BINARY :db_condition_placeholder_0) ) subquery; Array
    (
        [:db_condition_placeholder_0] => /[a-z0-9_.-]+$
    )
Fail      Other      file.test         1073 FileUploadTransliterationTest->test
    "File transliteration is not yet enabled." found
Exception PDOExcepti system.admin.inc  2679 system_transliteration_retroactive(
    SQLSTATE[HY000]: General error: 3995 Character set 'utf8_general_ci' cannot
    be used in conjunction with 'binary' in call to regexp_like.: SELECT
    COUNT(*) AS expression
    FROM 
    (SELECT 1 AS expression
    FROM 
    {file_managed} file_managed
    WHERE  (uri NOT REGEXP BINARY :db_condition_placeholder_0) ) subquery; Array
    (
        [:db_condition_placeholder_0] => /[a-z0-9_.-]+$
    )
Fail      Other      file.test         1103 FileUploadTransliterationTest->test
    Two files available for transliteration.
Fail      Other      file.test         1104 FileUploadTransliterationTest->test
    Found the Transliterate button
Fail      Other      file.test         1104 FileUploadTransliterationTest->test
    Found the requested form fields at
Fail      Other      file.test         1105 FileUploadTransliterationTest->test
    "2 file names have been successfully transliterated." found
Fail      Other      file.test         1111 FileUploadTransliterationTest->test
    File name transliterated and lower case after bulk updating.

Additional information

Add any other information that could help, such as:

  • Backdrop 1.18.1
  • Apache 2.4.41, PHP 7.4.3
  • Ubuntu 20.04
@indigoxela
Copy link
Member

indigoxela commented Feb 4, 2021

@jlfranklin Many thanks for reporting this issue.

My suspicion is that it has something to do with some stricter query handling in MySQL 8 (and eventually recent MariaDB). I don't think it's related to utf8-general vs. utf8-mb4.

The query looks a bit strange anyway...

SELECT
    COUNT(*) AS expression
    FROM 
    (SELECT 1 AS expression
    FROM 
    {file_managed} file_managed
    WHERE  (uri NOT REGEXP BINARY :db_condition_placeholder_0) )

I mean the NOT REGEXP BINARY part.
UPDATE: which turned out to be totally OK, according to the MySQL docs, at least in select queries, but maybe not anymore in subqueries?

First of all: I suspect, not only the test, but also the functionality is broken. I can't actually test locally, as I don't have MySQL 8 here on my dev.

But you can. Steps to verify:

  • Go to admin/config/media/file-system
  • Keep "Transliterate file names during upload" turned on, but
  • turn off "Lowercase transliterated file names"
  • Create a file with a name like "mYfAnCyFiLe.txt" and upload
  • Back to admin/config/media/file-system
  • Now turn on "Lowercase transliterated file names"
  • Go to admin/config/media/file-system/transliteration (retroactive transliteration)

Is the file with camel-case name listed there? (I doubt it.)

@indigoxela
Copy link
Member

Although I'm currently not able to reproduce (no MySQL 8 here), I created a PR with a slightly different syntax.

@jlfranklin does that patch change anything for your Ubuntu setup?

@indigoxela
Copy link
Member

indigoxela commented Feb 4, 2021

Ah, there it is: https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-22.html

"
Regular expression functions such as REGEXP_LIKE() yielded inconsistent results with binary string arguments. These functions now reject binary strings with an error. (Bug #31031886, Bug #98951, Bug #31031888, Bug #98950)
"

A solution for both, MySQL 5 and 8 will be tricky. The older does not support the case sensitive switch within the regex (?-i), the newer refuses to compare to BINARY.

But maybe CAST or CONVERT could be used to get a case sensitive charset?

@jlfranklin can you try the latest variant from my PR, please?

@klonos
Copy link
Member

klonos commented Feb 4, 2021

I'm currently not able to reproduce (no MySQL 8 here)

I believe I can use MySQL 8 with lando on my local. I'll try this soon as I can get to it.

@jlfranklin
Copy link
Member Author

The PR passes on MySQL 8. (Also passes on PostgreSQL, FWIW. SQLite never passed that test, so I have no expectation of it working with this PR.)

@indigoxela
Copy link
Member

The PR passes on MySQL 8. (Also passes on PostgreSQL...

@jlfranklin good to know. I was a bit concerned regarding PostgreSQL.

I believe I can use MySQL 8 with lando on my local. I'll try this soon as I can get to it.

@klonos cool, many thanks! We still need to verify my suspicion that not only the test, but also the functionality is currently broken with MySQL 8 and gets fixed by the PR.

@klonos
Copy link
Member

klonos commented Feb 5, 2021

OK, I've tested this:

  • provisioned a new lando instance with:
    • vanilla Backdrop 1.x (1.19.x-dev)
    • database: mysql:8.0 (which resulted in 8.0.19)
  • tested using the steps provided by @indigoxela ->

Is the file with camel-case name listed there? (I doubt it.)

Yes it was 😅

image

@klonos
Copy link
Member

klonos commented Feb 5, 2021

...transliteration works w/o any error btw:

image

...and the file does actually get transliterated to myfancyfile.txt

@indigoxela
Copy link
Member

database: mysql:8.0 (which resulted in 8.0.19)

Many thanks for testing. 👍 Unfortunately... according to their changelog the breaking change came with MySQL 8.0.22.

@klonos I suppose, lando doesn't ship with that version yet?

@klonos
Copy link
Member

klonos commented Feb 5, 2021

They say that they do not officially support specifying patch versions, but it is possible: https://docs.lando.dev/config/mysql.html#supported-versions

Gimme a few mins to try that...

@klonos
Copy link
Member

klonos commented Feb 5, 2021

...nope. That didn't work 😅 ...I need to rest, and will return to it later today/tonight.

@indigoxela
Copy link
Member

indigoxela commented Feb 5, 2021

...nope. That didn't work ...I need to rest, and will return to it later today/tonight.

Hey, no need to rush!

@jlfranklin should be able to verify, as he obviously has an affected version.

(The version seems to be 8.0.23-0ubuntu0.20.04.1.)

@jlfranklin
Copy link
Member Author

jlfranklin commented Feb 5, 2021

Is the file with camel-case name listed there? (I doubt it.)

You called it. The list is empty on my Ubuntu 20.04 VM:

For reference:

  • Ubuntu 20.04.2 LTS
  • MySQL version 8.0.23-0ubuntu0.20.04.1.
  • PHP 7.4.3-4ubuntu2.4
  • Apache 2.4.41-4ubuntu3.1
  • /etc/mysql/mysql.conf.d/99-silkscreen.cnf

@indigoxela indigoxela changed the title FileUploadTransliterationTest fails under Ubuntu 20.04 Retroactive transliteration of file names broken with MySQL 8.0.22+ Feb 5, 2021
@jlfranklin
Copy link
Member Author

With the patch from the PR, the list is empty.

Without the patch, the transliteration page returns this:

image

@indigoxela
Copy link
Member

indigoxela commented Feb 5, 2021

@jlfranklin many thanks for verifying! I'll need to get a dev setup with MySQL 8.0.22+. That's not difficult, just a little time consuming - I need to setup an extra VM for that.

Yes, that Exception is the same as in the failing test "...cannot be used in conjunction with "binary" in call to regexp_like...".

@jenlampton jenlampton added this to the 1.20.3 milestone Nov 18, 2021
@quicksketch quicksketch modified the milestones: 1.20.3, 1.20.4 Dec 3, 2021
@herbdool
Copy link

I think the code looks fine. Doesn't look like it's stopping us from doing future abstraction if necessary.

By the way, I was confused about the mention of testing with Postgres and Sqlite. Backdrop officially only supports MySQL (and MariaDB) so it's officially ok if it breaks other dbs.

@klonos
Copy link
Member

klonos commented Jan 1, 2022

I can finally test this with Lando now!...

For anyone else that wants to know how, please see: lando/lando#2948 (comment)

@klonos
Copy link
Member

klonos commented Jan 1, 2022

I've installed a vanilla Backdrop instance, using:

  • latest 1.x
  • php 7.4.26
  • apache 2.4.51
  • mysql 8.0.22

I then tried the steps to reproduce provided by @indigoxela, but when I navigated to admin/config/media/file-system/transliteration (retroactive transliteration) I got the same error as @jlfranklin in #4931 (comment):

PDOException: SQLSTATE[HY000]: General error: 3995 Character set 'utf8mb4_0900_ai_ci'
cannot be used in conjunction with 'binary' in call to regexp_like.:
SELECT COUNT(*) AS `expression` FROM (SELECT 1 AS `expression` FROM {file_managed} file_managed
WHERE (`uri` NOT REGEXP BINARY :db_condition_placeholder_0) ) subquery;
Array ( [:db_condition_placeholder_0] => /[a-z0-9_.-]+$ ) in system_transliteration_retroactive()
(line 2683 of /app/docroot/core/modules/system/system.admin.inc).

I then destroyed the instance, and reinstalled with the changes in the PR. When I tried the steps to reproduce, I did not hit that PDOException, but the list in admin/config/media/file-system/transliteration was empty:

image

I then destroyed the instance once again, and installed again, this time with mysql 8.0.23 + the changes from the PR. This time when visiting admin/config/media/file-system/transliteration the file was listed there. Transliteration worked as expected:

image

[edit] please ignore my previous report re upload errors after deleting and trying to re-upload the transliterated file. This seems to be a pre-existing issue, and mot specific to mysql: #5424.

So to summarize:

  • with mysql 8.0.22 and vanilla Backdrop 1.x -> PDOException 👎🏼
  • with mysql 8.0.22 and the changes from the PR applied -> no PDOException 👍🏼 but empty list of to-be-transliterated files 👎🏼
  • with mysql 8.0.23+ (or mariadb) and the changes from the PR applied -> no PDOException 👍🏼 and the file listed in the to-be-transliterated list 👍🏼

As such, I'm setting this back to "needs work". If the empty list in admin/config/media/file-system/transliteration with mysql v8.0.22 is expected (due to the bug in mysql), then please feel free to set this back to RTBC.

@indigoxela
Copy link
Member

As such, I'm setting this back to "needs work".

I'm a bit confused. @klonos you already realized that the problem you encountered is unrelated to the problem we try to fix here. And still you reject this fix? Mind to explain, why? Or am I missing the point?

Anyway, rebased. I didn't change the lables and leave it to someone else to decide.

@klonos
Copy link
Member

klonos commented Jan 1, 2022

Sorry for the confusion @indigoxela ...I've set it back to "needs work" because of this:

  • with mysql 8.0.22 and the changes from the PR applied -> no PDOException 👍🏼 but empty list of to-be-transliterated files 👎🏼

...but as I explained:

If the empty list in admin/config/media/file-system/transliteration with mysql v8.0.22 is expected (due to the bug in mysql), then please feel free to set this back to RTBC.

@indigoxela
Copy link
Member

indigoxela commented Jan 2, 2022

I tried to switch the GHA workflow to the exact version (8.0.22), but that's not supported. The currently installed 8 version with that action is 8.0.27, so I run the functional tests with that one. (Cool GHA!)
Note that php 5.3 is supposed to completely fail with MySQL 8 - that's nothing to fix or even consider.

The transliteration test passes with version 8.0.27, but there's another test failing, which points to another problem with either the test or the functionality behind it: DatabaseTemporaryQueryTestCase

TBH, currently my motivation to keep on working on this PR is rather low. Even lower if I have to hunt some possible bug in an outdated MySQL version (8.0.22). The effort to install specific mysql versions is rather high for me and personally, I don't have any websites out there running on version 8.

If anyone wants to take over, please do!

I guess, DatabaseTemporaryQueryTestCase needs a new issue - that problem is unrelated here.

@herbdool
Copy link

herbdool commented Jan 2, 2022

@indigoxela do you agree that this issue still needs work or is it ready (and testing results as expected)? I had trouble following some of this.

I agree about making a new issue for anything else.

@klonos
Copy link
Member

klonos commented Jan 2, 2022

TBH, currently my motivation to keep on working on this PR is rather low. Even lower if I have to hunt some possible bug in an outdated MySQL version (8.0.22). The effort to install specific mysql versions is rather high for me and personally, I don't have any websites out there running on version 8.

@indigoxela it is now (relatively) easy to install any patch-level version of mysql v8.0.x with Lando, but I'm assuming that your local tooling/workflow may be different. In any case, if you do get motivated to try that, then:

  1. Try lando init and choose the backdrop recipe.
  2. Edit the resulting .lando.yml file to look like this:
    name: 4931
    # Use existing Backdrop recipe, then customise below.
    recipe: backdrop
    config:
      # Where Backdrop resides, relative to this file.
      webroot: docroot
      database: mysql:8.0.22
      php: '7.4'
      backdrush: 1.x-1.x
      config:
        database: custom-mysql.cnf
  3. Download the default mysql.cnf file for lando/backdrop from https://github.com/lando/cli/blob/main/plugins/lando-recipes/recipes/backdrop/mysql.cnf and save it as custom-mysql.cnf in the same folder as your .lando.yml file (if you choose a different name or path to save it, then edit your .lando.yml file to reflect that).
  4. Edit custom-mysql.cnf, and comment-out the lines starting with query_cache_limit and query_cache_size.
  5. Run lando destroy -y followed by lando start.

You should now have an instance with the version of mysql specified in step 2 above. To test with another version:

  1. Edit the database: mysql:8.0.22 line in your .lando.yml file.
  2. Run lando destroy -y followed by lando start.

@indigoxela
Copy link
Member

@klonos many thanks for the detailed instructions, but I never used Lando, I don't even do local development (on localhost) - I use remote webservers. So still someone else should carry it over the finish line.

do you agree that this issue still needs work

@herbdool I can't actually tell, as I didn't verify the problem with version 8.0.22 (that exact version).

@klonos
Copy link
Member

klonos commented Jan 3, 2022

So still someone else should carry it over the finish line.

Fair enough 👍🏼

...I didn't verify the problem with version 8.0.22 (that exact version).

It was initially mentioned by @jlfranklin back in Feb.:

With the patch from the PR, the list is empty.

And I was able to confirm now.

@indigoxela
Copy link
Member

I think, we should remove the milestone here. The existing PR unsuccessfully waited for review for months, and it's now obsolete anyway.
Should I also close my PR?

@indigoxela indigoxela removed this from the 1.20.4 milestone Jan 3, 2022
@herbdool
Copy link

herbdool commented Jan 3, 2022

Fine by me.

@indigoxela
Copy link
Member

Fine by me.

@herbdool Which of it? 😏 Removing milestone or closing PR, or something else?

@herbdool
Copy link

herbdool commented Jan 3, 2022

Removing milestone and closing PR. Someone can always find the PR later if needed.

@indigoxela
Copy link
Member

By chance I had a VM with PHP 8.1 and MySQL 8.0.31 available and did a quick test run. Still a problem.

See the verbose test result
Tests to be run:
 - File uploading transliteration (FileUploadTransliterationTest)

Test run started:
 Tuesday, 8 November, 2022 - 4:09pm

Test summary
------------

File API: File uploading transliteration (FileUploadTransliterationTest) 54 passes, 6 fails, 2 exceptions, and 17 debug messages. [4.399s]

Test run duration: 4 sec

Detailed test results
---------------------


---- File API: File uploading transliteration (FileUploadTransliterationTest) ----


Status    Group      Filename          Line    Function                            
------------------------------------------------------------------------------------------------------------------------
Exception PDOExcepti system.admin.inc  2690                                                                            
    SQLSTATE[HY000]: General error: 3995 Character set 'utf8mb4_0900_ai_ci' cannot be used in conjunction with 'binary'
    in call to regexp_like.: SELECT COUNT(*) AS `expression`
    FROM 
    (SELECT 1 AS `expression`
    FROM 
    {file_managed} file_managed
    WHERE  (`uri` NOT REGEXP BINARY :db_condition_placeholder_0) ) subquery; Array
    (
        [:db_condition_placeholder_0] => /[a-z0-9_.-]+$
    )
Fail      Other      file.test         1073    testTransliteration()                                                   
    "File transliteration is not yet enabled." found
Exception PDOExcepti system.admin.inc  2690                                                                            
    SQLSTATE[HY000]: General error: 3995 Character set 'utf8mb4_0900_ai_ci' cannot be used in conjunction with 'binary'
    in call to regexp_like.: SELECT COUNT(*) AS `expression`
    FROM 
    (SELECT 1 AS `expression`
    FROM 
    {file_managed} file_managed
    WHERE  (`uri` NOT REGEXP BINARY :db_condition_placeholder_0) ) subquery; Array
    (
        [:db_condition_placeholder_0] => /[a-z0-9_.-]+$
    )
Fail      Other      file.test         1103    testTransliteration()                                                   
    Two files available for transliteration.
Fail      Other      file.test         1104    testTransliteration()                                                   
    Found the Transliterate button
Fail      Other      file.test         1104    testTransliteration()                                                   
    Found the requested form fields at
Fail      Other      file.test         1105    testTransliteration()                                                   
    "2 file names have been successfully transliterated." found
Fail      Other      file.test         1111    testTransliteration()                                                   
    File name transliterated and lower case after bulk updating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants