Skip to content

test1560: skip some URLs if UTF-8 is not supported#17933

Closed
sergiodj wants to merge 1 commit intocurl:masterfrom
sergiodj:lib1560-fix-locale-non-utf8
Closed

test1560: skip some URLs if UTF-8 is not supported#17933
sergiodj wants to merge 1 commit intocurl:masterfrom
sergiodj:lib1560-fix-locale-non-utf8

Conversation

@sergiodj
Copy link

@sergiodj sergiodj commented Jul 15, 2025

Debian CI found that lib1560 implements tests that will fail when
UTF-8 isn't supported. We can detect that with nl_langinfo and skip
the specific URLs that fail (i.e., those whose getflags are either
CURLU_PUNYCODE or CURLU_PUNY2IDN).

Co-authored-by: Viktor Szakats

@github-actions github-actions bot added the tests label Jul 15, 2025
@sergiodj
Copy link
Author

Cc @samueloph @charles2910

@sergiodj sergiodj force-pushed the lib1560-fix-locale-non-utf8 branch from 3d43e7a to 0291d16 Compare July 15, 2025 12:44
@sergiodj
Copy link
Author

@guilherme-puida

@sergiodj sergiodj force-pushed the lib1560-fix-locale-non-utf8 branch from 0291d16 to 5140200 Compare July 15, 2025 12:55
@vszakats
Copy link
Member

Where is HAVE_LANGINFO_CODESET defined? I wonder if this'd need some logic for CMake for example.

@sergiodj sergiodj force-pushed the lib1560-fix-locale-non-utf8 branch from 5140200 to 5069d03 Compare July 15, 2025 13:12
@sergiodj
Copy link
Author

Where is HAVE_LANGINFO_CODESET defined? I wonder if this'd need some logic for CMake for example.

Sorry, you're correct, I made a mistake here. This macro needs to be defined; I was first considering using USE_LIBIDN2 but I don't think that's a good idea because using libidn2 isn't necessarily an indication that UTF-8 will be used.

Let me do some work to define the HAVE_LANGINFO_CODESET macro.

@sergiodj sergiodj marked this pull request as draft July 15, 2025 13:27
@vszakats
Copy link
Member

If there is a way to detect this via existing macros and perhaps envs, it may simplify things.

@bagder
Copy link
Member

bagder commented Jul 15, 2025

We probably need checks for nl_langinfo and langinfo.h added, and then add an include of the header if present.

@sergiodj sergiodj force-pushed the lib1560-fix-locale-non-utf8 branch 3 times, most recently from 3de3ad0 to 91d3739 Compare July 15, 2025 15:31
@sergiodj
Copy link
Author

OK, I force-pushed new changes which implement autotools and cmake checks for langinfo.h, nl_langinfo and CODESET. I'm not a cmake expert so I'd appreciate a review.

@sergiodj sergiodj marked this pull request as ready for review July 15, 2025 15:50
@sergiodj sergiodj requested a review from vszakats July 15, 2025 15:50
@vszakats
Copy link
Member

vszakats commented Jul 15, 2025

It's many build-time checks, feeling a little excessive to run in every build, just to use it in a single test.

Is there a way to avoid some or all of them? Is there perhaps a runtime check that doesn't involve a call into a nl_langinfo()?

Worst case I'm thinking to reduce it to a single build-time check for nl_langinfo(CODESET), and pre-fill that (or avoid the check) for platforms that don't need or have them (or always have them) (Windows, macOS) for CMake.

@testclutch
Copy link

Analysis of PR #17933 at 91d37396:

Test 870 failed, which has NOT been flaky recently, so there could be a real issue in this PR. Note that this test has failed in 6 different CI jobs (the link just goes to one of them).

Generated by Testclutch

@vszakats
Copy link
Member

vszakats commented Jul 15, 2025

Would it be possible to detect the lack of UTF-8 support via
runtests.pl in Perl?

Then pass the information to lib1560 via an env, or the test
command-line?

according to the man page, locale charmap returns this
information.
https://linux.die.net/man/3/nl_langinfo

@vszakats
Copy link
Member

Or, would it work to set LC_ALL=en_US.UTF-8 for 1560, as
is done already for other tests, for example 1448?

@bagder
Copy link
Member

bagder commented Jul 15, 2025

Or, would it work to set LC_ALL=en_US.UTF-8 for 1560, as is done already for other tests, for example 1448?

Test 1560 already set such a thing. I believe this attempt, if I understand things correctly, is for systems without a functional UTF-8 locale so setting one doesn't make it work.

@vszakats
Copy link
Member

vszakats commented Jul 15, 2025

Or, would it work to set LC_ALL=en_US.UTF-8 for 1560, as is done already for other tests, for example 1448?

Test 1560 already set such a thing. I believe this attempt, if I understand things correctly, is for systems without a functional UTF-8 locale so setting one doesn't make it work.

Oh, right, I was grepping for LC_ALL, while test1560 sets LANG. It's the only test doing that.

While I can't test on a system not having UTF-8 support at all. If the default isn't UTF-8,
LC_ALL is able to change it to UTF-8, while LANG isn't. As seen via locale charmap (= the command-line equivalent of nl_langinfo(CODESET):

$ export LC_ALL=C
$ locale charmap
ANSI_X3.4-1968
$ LC_ALL=en_US.UTF-8 locale charmap
UTF-8
$ LANG=en_US.UTF-8 locale charmap
ANSI_X3.4-1968

(For systems with no UTF-8 at all, I wonder how the other tests pass, that require LC_ALL=*UTF8.)

@vszakats
Copy link
Member

vszakats commented Jul 15, 2025

I could replicate 1560 failing by setting LC_ALL=C before running the tests, on macOS. (on Linux I couldn't)

This patch fixes it:

--- a/tests/data/test1560
+++ b/tests/data/test1560
@@ -13,7 +13,7 @@ urlapi
 none
 </server>
 <setenv>
-LANG=en_US.UTF-8
+LC_ALL=en_US.UTF-8
 </setenv>
 <features>
 file

edit: Also replicates on Linux. (libidn2 wasn't enabled in my initial try)

@bagder
Copy link
Member

bagder commented Jul 15, 2025

This patch fixes it:

I just suspect it is not the same case as @sergiodj is trying to address here...

@vszakats

This comment was marked as outdated.

@vszakats
Copy link
Member

runtests provides the codeset-utf8 feature to detect if the env supports UTF-8.

This information could be propagated via an env and used as a filter:

--- a/tests/libtest/lib1560.c
+++ b/tests/libtest/lib1560.c
@@ -1428,7 +1428,7 @@ static int set_url(void)
    2. Set one or more parts
    3. Extract and compare all parts - not the URL
 */
-static int setget_parts(void)
+static int setget_parts(int has_utf8)
 {
   int i;
   int error = 0;
@@ -1440,6 +1440,10 @@ static int setget_parts(void)
       error++;
       break;
     }
+    if((setget_parts_list[i].getflags == CURLU_PUNYCODE ||
+        setget_parts_list[i].getflags == CURLU_PUNY2IDN) && !has_utf8) {
+      continue;
+    }
     if(setget_parts_list[i].in)
       rc = curl_url_set(urlp, CURLUPART_URL, setget_parts_list[i].in,
                         setget_parts_list[i].urlflags);
@@ -1525,7 +1529,7 @@ static int set_parts(void)
   return error;
 }
 
-static int get_url(void)
+static int get_url(int has_utf8)
 {
   int i;
   int error = 0;
@@ -1536,6 +1540,10 @@ static int get_url(void)
       error++;
       break;
     }
+    if((get_url_list[i].getflags == CURLU_PUNYCODE ||
+        get_url_list[i].getflags == CURLU_PUNY2IDN) && !has_utf8) {
+      continue;
+    }
     rc = curl_url_set(urlp, CURLUPART_URL, get_url_list[i].in,
                       get_url_list[i].urlflags);
     if(!rc) {
@@ -1565,7 +1573,7 @@ static int get_url(void)
   return error;
 }
 
-static int get_parts(void)
+static int get_parts(int has_utf8)
 {
   int i;
   int error = 0;
@@ -1576,6 +1584,10 @@ static int get_parts(void)
       error++;
       break;
     }
+    if((get_parts_list[i].getflags == CURLU_PUNYCODE ||
+        get_parts_list[i].getflags == CURLU_PUNY2IDN) && !has_utf8) {
+      continue;
+    }
     rc = curl_url_set(urlp, CURLUPART_URL,
                       get_parts_list[i].in,
                       get_parts_list[i].urlflags);
@@ -2018,15 +2030,17 @@ err:
 
 static CURLcode test_lib1560(char *URL)
 {
+  int has_utf8 = !!getenv("CURL_TEST_HAVE_CODESET_UTF8");
+
   (void)URL; /* not used */
 
   if(urldup())
     return (CURLcode)11;
 
-  if(setget_parts())
+  if(setget_parts(has_utf8))
     return (CURLcode)10;
 
-  if(get_url())
+  if(get_url(has_utf8))
     return (CURLcode)3;
 
   if(huge())
@@ -2047,7 +2061,7 @@ static CURLcode test_lib1560(char *URL)
   if(set_parts())
     return (CURLcode)2;
 
-  if(get_parts())
+  if(get_parts(has_utf8))
     return (CURLcode)4;
 
   if(clear_url())
--- a/tests/runtests.pl
+++ b/tests/runtests.pl
@@ -809,6 +809,9 @@ sub checksystemfeatures {
     $feature{"crypto"} = $feature{"NTLM"} || $feature{"Kerberos"} || $feature{"SPNEGO"};
     $feature{"local-http"} = servers::localhttp();
     $feature{"codeset-utf8"} = lc(langinfo(CODESET())) eq "utf-8";
+    if($feature{"codeset-utf8"}) {
+        $ENV{'CURL_TEST_HAVE_CODESET_UTF8'} = 1;
+    }
 
     # make each protocol an enabled "feature"
     for my $p (@protocols) {

@sergiodj
Copy link
Author

It's many build-time checks, feeling a little excessive to run in every build, just to use it in a single test.

Actually, nl_langinfo is also called (unchecked) in iconv_to_utf8, so we have precedence and could use these checks elsewhere too.

Is there a way to avoid some or all of them? Is there perhaps a runtime check that doesn't involve a call into a nl_langinfo()?

What we could do is simplify the checks. For example, assume that if nl_langinfo is available then CODESET will be too.

@vszakats
Copy link
Member

vszakats commented Jul 17, 2025

It's many build-time checks, feeling a little excessive to run in every build, just to use it in a single test.

Actually, nl_langinfo is also called (unchecked) in iconv_to_utf8, so we have precedence and could use these checks elsewhere too.

It's for Apple platforms only.

Is there a way to avoid some or all of them? Is there perhaps a runtime check that doesn't involve a call into a nl_langinfo()?

What we could do is simplify the checks. For example, assume that if nl_langinfo is available then CODESET will be too.

Moving platform-dependent feature detection to C comes with a great
overhead (for every build session of every platform), and avoiding it would
be best, if there is a way.

The simplest would be using the detection logic already present in runtests.pl:

$feature{"codeset-utf8"} = lc(langinfo(CODESET())) eq "utf-8";

If this doesn't fit this case, another could be added in a similar vain
for this particular case? Then communicate it via an env to the test.
Tests are always run through runtests.pl, making it safe.

@sergiodj
Copy link
Author

Sure, I'm OK with the patch proposed in #17933 (comment) , btw.

@vszakats
Copy link
Member

Sure, I'm OK with the patch proposed in #17933 (comment) , btw.

If you say it works, nice! Would you prefer applying it, or shall I do?

@sergiodj
Copy link
Author

Sure, I'm OK with the patch proposed in #17933 (comment) , btw.

If you say it works, nice! Would you prefer applying it, or shall I do?

I can do it.

Debian CI found that lib1560 implements tests that will fail when
UTF-8 isn't supported.  We can detect that with "nl_langinfo" and skip
the specific URLs that fail (i.e., those whose "getflags" are either
CURLU_PUNYCODE or CURLU_PUNY2IDN).

Signed-off-by: Sergio Durigan Junior <sergiodj@sergiodj.net>
Signed-off-by: Viktor Szakats <commit@vsz.me>
@sergiodj sergiodj force-pushed the lib1560-fix-locale-non-utf8 branch from 91d3739 to 66435e1 Compare July 17, 2025 09:40
@sergiodj
Copy link
Author

FTR, I force-pushed the new changes a few days ago.

@vszakats vszakats changed the title tests: Skip some lib1560 URLs if UTF-8 isn't supported test1560: skip some URLs if UTF-8 is not supported Jul 19, 2025
@vszakats vszakats closed this in 7d1ca2e Jul 19, 2025
@vszakats
Copy link
Member

Thank you @sergiodj! Merged now.

vszakats added a commit that referenced this pull request Jul 21, 2025
To fix running test 1560 when `LC_ALL` is set to something unexpected
(e.g. `C`). Also syncing it with the rest of tests.

Also:
- GHA/linux: enable `libidn2` in more jobs.
  Also to enable test 1560 reproducing this issue in more jobs.
- GHA/linux: run tests with `LC_ALL=C` in one of the jobs.
- GHA/linux: switch to the non-deprecated package name for libidn2.
- GHA/macos: run tests with non-default locale settings in one job.
- GHA/macos: enable AppleIDN in that job.

Ref: #17933 (comment)
Follow-up to f27262b #10196

Closes #17938
vszakats added a commit to vszakats/curl that referenced this pull request Jul 22, 2025
```
test 1560...[URL API]
 valgrind ERROR ==13362== 104 bytes in 1 blocks are definitely lost in loss record 1 of 1
==13362==    at 0x484D953: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==13362==    by 0x48E1302: curl_dbg_calloc (in /home/runner/work/curl/curl/bld/lib/libcurl.so.4.8.0)
==13362==    by 0x4931D12: curl_url (in /home/runner/work/curl/curl/bld/lib/libcurl.so.4.8.0)
==13362==    by 0x14F658: get_parts (in /home/runner/work/curl/curl/bld/tests/libtest/libtests)
==13362==    by 0x150AC6: test_lib1560 (in /home/runner/work/curl/curl/bld/tests/libtest/libtests)
==13362==    by 0x17F5D5: main (in /home/runner/work/curl/curl/bld/tests/libtest/libtests)
==13362==
```
Ref: https://github.com/curl/curl/actions/runs/16446352985/job/46479298080?pr=17988#step:41:3007

Follow-up to 7d1ca2e curl#17933
vszakats added a commit to vszakats/curl that referenced this pull request Jul 22, 2025
```
test 1560...[URL API]
 valgrind ERROR ==13362== 104 bytes in 1 blocks are definitely lost in loss record 1 of 1
==13362==    at 0x484D953: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==13362==    by 0x48E1302: curl_dbg_calloc (in /home/runner/work/curl/curl/bld/lib/libcurl.so.4.8.0)
==13362==    by 0x4931D12: curl_url (in /home/runner/work/curl/curl/bld/lib/libcurl.so.4.8.0)
==13362==    by 0x14F658: get_parts (in /home/runner/work/curl/curl/bld/tests/libtest/libtests)
==13362==    by 0x150AC6: test_lib1560 (in /home/runner/work/curl/curl/bld/tests/libtest/libtests)
==13362==    by 0x17F5D5: main (in /home/runner/work/curl/curl/bld/tests/libtest/libtests)
```
Ref: https://github.com/curl/curl/actions/runs/16446352985/job/46479298080?pr=17988#step:41:3007

Follow-up to 7d1ca2e curl#17933
vszakats added a commit to vszakats/curl that referenced this pull request Jul 22, 2025
```
test 1560...[URL API]
 valgrind ERROR ==13362== 104 bytes in 1 blocks are definitely lost in loss record 1 of 1
==13362==    at 0x484D953: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==13362==    by 0x48E1302: curl_dbg_calloc (in /home/runner/work/curl/curl/bld/lib/libcurl.so.4.8.0)
==13362==    by 0x4931D12: curl_url (in /home/runner/work/curl/curl/bld/lib/libcurl.so.4.8.0)
==13362==    by 0x14F658: get_parts (in /home/runner/work/curl/curl/bld/tests/libtest/libtests)
==13362==    by 0x150AC6: test_lib1560 (in /home/runner/work/curl/curl/bld/tests/libtest/libtests)
==13362==    by 0x17F5D5: main (in /home/runner/work/curl/curl/bld/tests/libtest/libtests)
```
Ref: https://github.com/curl/curl/actions/runs/16446352985/job/46479298080?pr=17988#step:41:3007

Follow-up to 7d1ca2e curl#17933
vszakats added a commit that referenced this pull request Jul 22, 2025
The issue is missed in CI, because valgrind jobs all run with UTF-8
support.

Fixing:
```
test 1560...[URL API]
 valgrind ERROR ==13362== 104 bytes in 1 blocks are definitely lost in loss record 1 of 1
==13362==    at 0x484D953: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==13362==    by 0x48E1302: curl_dbg_calloc (in /curl/bld/lib/libcurl.so.4.8.0)
==13362==    by 0x4931D12: curl_url (in /curl/bld/lib/libcurl.so.4.8.0)
==13362==    by 0x14F658: get_parts (in /curl/bld/tests/libtest/libtests)
==13362==    by 0x150AC6: test_lib1560 (in /curl/bld/tests/libtest/libtests)
==13362==    by 0x17F5D5: main (in /curl/bld/tests/libtest/libtests)
```
Ref: https://github.com/curl/curl/actions/runs/16446352985/job/46479298080?pr=17988#step:41:3007

Follow-up to 7d1ca2e #17933

Closes #17998
vszakats added a commit to vszakats/curl that referenced this pull request Jul 28, 2025
vszakats added a commit that referenced this pull request Jul 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

4 participants

Comments