Stop using gh9946_utf_16_replacement.patch #135

mingwandroid · 2020-11-14T17:19:06Z

I am not sure what this breaks though, ping @hmaarrfk

On linux-aarch64, I am seeing this when attempting to use a conda
that includes this patch:

Downloading and Extracting Packages
conda-4.9.2          | 3.1 MB    | ################################################################################################# | 100%
Preparing transaction: done
Executing transaction: done
ERROR conda.core.link:_execute(698): An error occurred while installing package 'conda-forge::ncurses-6.2-h7fd3ca4_3'.
Rolling back transaction: done

LookupError('unknown encoding: utf-16-le')

IMHO the job of figuring out what encodings need replacing is a
job for conda-build. not an install-time one for conda (where this
involves potentially significant computation and introduces
variability - due e.g. to future bugs in conda - into what gets
installed). The replacements can be pre-calculated and therefore
should be.

Checklist

Used a personal fork of the feedstock to propose changes
Bumped the build number (if the version is unchanged)
Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
Ensured the license file is being packaged.

conda-forge-linter · 2020-11-14T17:19:12Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@hmaarrfk

I am not sure what this breaks though, ping @hmaarrfk On linux-aarch64, I am seeing this when attempting to use a conda that includes this patch: ``` Downloading and Extracting Packages conda-4.9.2 | 3.1 MB | ################################################################################################# | 100% Preparing transaction: done Executing transaction: done ERROR conda.core.link:_execute(698): An error occurred while installing package 'conda-forge::ncurses-6.2-h7fd3ca4_3'. Rolling back transaction: done LookupError('unknown encoding: utf-16-le') ``` IMHO the job of figuring out what encodings need replacing is a job for conda-build. not an install-time one for conda (where this involves potentially significant computation and introduces variability - due e.g. to future bugs in conda - into what gets installed). The replacements can be pre-calculated and therefore should be.

hmaarrfk · 2020-11-14T17:24:47Z

Is there any alternative to this?

I understand you want this to be done the "correct" way, but this is going to break wxWidgets which uses non UTF8 encoding.

What platform doesn't have utf-16-le encoding installed by default? Can we depend on something, or skip that encoding?

mingwandroid · 2020-11-14T17:59:14Z

Is there any alternative to this?

Yes, pre-calculating the replacements inside conda-build. The package format does not need any changes for this either.

mingwandroid · 2020-11-14T17:59:54Z

I tested with linux-aarch64 on a raspberry pi 4 using conda 4.9.2 build 0.

hmaarrfk · 2020-11-14T18:46:30Z

I tested with linux-aarch64 on a raspberry pi 4 using conda 4.9.2 build 0.

Are you saying that I should rebuild packages that have UTF 16 with conda build 4.9.2?

mingwandroid · 2020-11-14T21:04:20Z

No, I am only saying that this seems to be breaking linux-aarch64 (for reasons that aren't 100% clear, for sure, there could be a build issue here and the encoding is just plain missing rather than not supposed to be there). That should be investigated separately.

The fixes I see, in my personal order of preference are:

Add a new feature to conda-build (try-various-encodings-for-prefix-replacement-detection), then revert this patch and rebuild wx with the new conda-build.
Revert this patch. Modify bld.bat of wx so that it converts the affected files using e.g. iconv to utf-8 where conda-build will happily process them (and hopefully wx will too, I expect it would).
Add a try: except: to the the patch.

I doubt Anaconda will merge this patch though due to the concerns I've identified.

hmaarrfk · 2020-11-14T21:35:43Z

Add a new feature to conda-build

You and I have been working on this together for quite some time (we identified use-cases, wrote tests, ....)

I don't know where to start looking to implement the feature in conda-build. Does the feature you speak speak of (i.e. identifying the location of replacements) already exist for UTF-8?

We have a test ready, but I guess we still haven't had the time to truely get it running in conda-build.

The missing encoding is rather troubling

mingwandroid · 2020-11-14T21:45:07Z

UTF-8 can be, by and large, treated as ASCII by most code, so conda-build needs no special handling for UTF-8. This is because UTF-8 was designed to be compatible with standard C string functions and leads naturally to the core beauty of UTF-8.

As to where the code lives, just search for 'prefix' and 'replace' in conda-build's source code. It shouldn't be too bad to work with that stuff. I have no time personally to look into this and until someone reports that wx on defaults is broken and that's made a priority issue for me, I doubt that I'll have time to look at it.

Anyway, for me, the best fix is not to carry files that are not UTF-8 when a UTF-8 version would work just fine (whether it would or not for wx, I do not know, but it's easily tested).

hmaarrfk · 2020-11-14T22:35:12Z

can you try the following lines of code on yor machine:

"foo".encode("utf-16-le")
"bar".encode("utf-16le")

I understand you don't like the current solution for conda-forge, but we do have packages that use utf-16 and so for now, this is the best thing that conda-forge has to support this.

mingwandroid · 2020-11-14T22:36:47Z

You have one package, the fixes are very equivalent in terms of workload but very different in terms of cleanliness.

mingwandroid · 2020-11-14T22:38:07Z

These packages should be declared broken too IMHO. If you want changes in conda like this that affect whether a package 'works' or not, I urge you to go via Anaconda instead of doing your own thing.

hmaarrfk · 2020-11-14T22:40:33Z

This issue has been raised to the conda issue tracker since 2017.

conda/conda#4298

Nobody wants to do backdoor things like this.

Is there a better place to ask for the feature?

You and I are both (for various different reasons) not in a position to open the conda-build source code and submit "the cleaner" fix for this.

mingwandroid · 2020-11-14T22:40:43Z

We must leave it for others to decide the correct approach here I think. Please chime in @conda-forge/core

mingwandroid · 2020-11-14T22:41:41Z

Thing is, it was never a conda issue, it was always a conda-build issue! If it had been filed there I would've fixed it years ago.

hmaarrfk · 2020-11-14T22:43:29Z

It was filed for conda-build as well, and your answer at the time was the same as it is here:
conda/conda-build#3420

mingwandroid · 2020-11-14T22:46:03Z

It was filed for conda-build as well, and your answer at the time was the same as it is here:

It absolutely is not! It is simply a snippet of (pseudo, almost!) code with no indication of where I think it should go! My comments to your PR to conda are actual warnings and objections, laid out in detail with alternatives. Still the patch ended up here anyway.

With all the respect possible. the thing I don't get is why fixing it in conda was considered the thing to do instead of in conda-build! It's the same effort basically and I said that from the moment you proposed the patch to conda (which I have consistently stated that I was against).

mingwandroid · 2020-11-14T22:47:11Z

I won't respond further to this tonight Mark, have a good evening.

hmaarrfk · 2020-11-14T22:48:02Z

I personally have a feeling that this is due to an alias missing in your particular version of python as installed on your RPi.

I'm not sure where in python the "standard" encoding is stored so your help in understanding the correct spelling would be appreciated

Is the most correct spelling utf_16_le, utf-16-le or utf-16le?

This is a question that would be helpful to answer even when we submit the patch to conda-build.

Have a good evening Ray.

mingwandroid · 2020-11-14T22:50:05Z

But I'm using conda-forge exclusively here! Anyway, I'll dig into the specifics you've asked me about tomorrow and maybe try to figure out why the encoding is missing. Hopefully someone will find time to add this feature to conda-build, it is a feature I've never been opposed to either, but my attempts at guidance seemed futile!

Over and out, for real!

mingwandroid · 2020-11-14T22:51:18Z

Gaah! Can't stop myself! I would approve of just adding a try except for now. That's good enough.

hmaarrfk · 2020-11-14T22:52:21Z

I'm honestly trying to find where to implement this in conda-build while setting up other systems.

It just isn't as easy as you think (and I often do this mistake too) to look through a large codebase...

mingwandroid · 2020-11-14T23:01:37Z

Here you go Mark: https://github.com/conda/conda-build/blob/master/conda_build/build.py#L167

If you want some suggestions for the implementation I would convert all files to utf-8, recording any that changed. For any that did change, determine the original encoding (there are packages that will help with this, for example https://pypi.org/project/chardet/).

Then for all (utf-8) files in which the prefix was found, for those which were utf-8 originally, proceed as normal. For those that were not utf-8 originally, figure out the replacement locations in the original file, loaded in binary mode, by searching for the binary string of the prefix replacement value encoded into that original encoding.

isuruf · 2020-11-14T23:04:01Z

How can I reproduce the failure with wxwidgets?

I tried the following change and it printed nothing,

def replace_prefix(mode, data, placeholder, new_prefix):
     popular_encodings = [
         'utf-8',
         # Make sure to specify -le and -be so that the UTF endian prefix
         # doesn't show up in the string
         'utf-16-le', 'utf-16-be',
         'utf-32-le', 'utf-32-be'
     ]
     for encoding in popular_encodings:
         data_orig = data
         if mode == FileMode.text:
             data = data.replace(placeholder.encode(encoding),
                                 new_prefix.encode(encoding))
             if data_orig != data and encoding != 'utf-8':
                 print(encoding)
         elif mode == FileMode.binary:
             data = binary_replace(data,
                                   placeholder.encode(encoding),
                                   new_prefix.encode(encoding),
                                   encoding=encoding)
             if data_orig != data and encoding != 'utf-8':
                 print(encoding)
         else:
             raise CondaIOError("Invalid mode: %r" % mode)
     return data

hmaarrfk · 2020-11-14T23:06:03Z

I guess it was in the Hugin package created that was interacting with wxWidgets

See screenshot. Platform shows Ubuntu being used.
conda-forge/staged-recipes#11664 (comment)

hmaarrfk · 2020-11-14T23:10:10Z

conda create --name hugin hugin --yes
[...]
Preparing transaction: done
Verifying transaction: | utf-32-le
utf-32-le
/ utf-32-le
utf-32-le
done
Executing transaction: - 
-

is the output when trying to install hugin.

mingwandroid · 2020-11-15T12:37:57Z

Trying to reproduce this today, it does not fail with the newly released (I guess) conda-forge/linux-aarch64::ncurses-6.2-h7fd3ca4_4 while I did happen yesterday with conda-forge/linux-aarch64::ncurses-6.2-h7fd3ca4_3.

Now I cannot get it to go wrong even when trying to force that package (conda create -n py39 conda=4.9.2 python=3.9 ncurses=6.2=h7fd3ca4_3 -c conda-forge), so I must apologize to you for the fuss @hmaarrfk!

Still it'd be nice to make progress on this. I'll try to raise it with the rest of the conda team to get their opinions too.

Cheers.

hmaarrfk · 2020-11-15T18:11:19Z

I have a feeling it was due to updating conda at the same time as ncurses, some pyc files likely got changed, and as such, the "upgrade" wasn't really smooth.

No issue about the fuss, though, I think I could have probably used a little more delay in some replies and that would have helped both of us better understand the other's point.

I really wish I had more time to sit and think about the problem the correct way in earnest, but work has been stretching me thin.

mingwandroid · 2020-11-15T18:41:41Z

Yes. I think that analysis is correct. I think, if conda is to be updated it should happen in 3 stages:

Update conda deps, update conda, update anything else (with the new conda).

ocefpaf · 2021-09-07T17:03:14Z

@hmaarrfk I'm experiencing

ERROR conda.core.link:_execute(698): An error occurred while installing package 'conda-forge::<random pkg here>.
Rolling back transaction: done

LookupError('unknown encoding: utf-16-le')

with both latest mambaforge and miniforge. Not sure what is the best approach here but that made the transition to mini/mamba-forge alternatives impossible at my day job. Do we still need that patch? Is that something we can work with upstream conda?

hmaarrfk · 2021-09-08T15:49:05Z

I'm ok removing it.... but... I would ask that we try one more thing before we do:

It seems that the correct encoding is utf_16_le and not utf-16-le. Are we OK trying that patch?

The reason I think that will help is that according to python's documentation

Python comes with a number of codecs built-in, either implemented as C functions or with dictionaries as mapping tables. The following table lists the codecs by name, together with a few common aliases, and the languages for which the encoding is likely used. Neither the list of aliases nor the list of languages is meant to be exhaustive. Notice that spelling alternatives that only differ in case or use a hyphen instead of an underscore are also valid aliases; therefore, e.g. 'utf-8' is a valid alias for the 'utf_8' codec.

The feature I needed this for is for the wxWidgets https://github.com/conda-forge/wxwidgets-feedstock part of https://github.com/conda-forge/hugin-feedstock/
however, I've since removed the GUI from the functionality I require, so I don't have a stake in maintaining "utf-16-le" anymore.

Documentation: https://docs.python.org/3/library/codecs.html#standard-encodings

hmaarrfk · 2021-09-08T15:50:26Z

@ocefpaf are you able to give us more information about your environment that is causing things to break for you for miniforge? If so, we do have capabilities ot create test docker images for them. let me know.

ocefpaf · 2021-09-08T16:14:38Z

@ocefpaf are you able to give us more information about your environment that is causing things to break for you for miniforge?

The conda environment? It is a clean mini/mamba-forge installation and a conda update --all --yes right after it is installed. All with only conda-forge as the main channel. The problem does not happen with miniconda b/c the starting point is a non-patch conda but right after the update any new updates will produce the same error.

Let me know if you need any specific details.

If so, we do have capabilities or create test docker images for them. let me know.

Probably not. I don't see that error in CIs I'm setting :-/, only locally and on some colleagues laptops.

PS: what do you think about an if-clause like @isuruf suggested above?

hmaarrfk · 2021-09-08T17:26:01Z

I'm open to it, but I find it VERY curious that things are failing in the first place. These encodings to my knowledge are just part of python.

So if I can't recreate the issues, I'm not sure if we will actually be able to "fix" anything.

ocefpaf · 2021-09-08T17:41:06Z

These encodings to my knowledge are just part of python.

Makes sense. Not sure what is going on locally though. Is there any specific info from my system that could help us debug this?

hmaarrfk · 2021-09-08T17:57:35Z

ultimately, it might be because conda is trying to update itself or python, and there is a missing encoding file during this process.

So in between updating python, the encoding file isn't in the cache, therefore the module can't be found in the short period between removing the old python, and patching the new one.

This caching capability is available for quite some time (since at least 3.7)
https://github.com/python/cpython/blob/3.7/Lib/encodings/__init__.py#L74

It seems that this might have been the cause with @mingwandroid trying this once, then it not working, then when he tried it again, he got past it somehow, and it started working.

It might be worth putting a try except and moving on, ore removing this entirely and doing it "property" as mingwandroid suggested initially.

I'm not in a position to do it "properly" any time soon, but for me, I'm ok with removing this patch entirely

Releasing my own conda if I find it necessary.
Simply dropping support for wxWidget packages for my own day job.

ocefpaf · 2021-09-08T18:31:56Z

I'm not in a position to do it "properly" any time soon, but for me, I'm ok with removing this patch entirely

Releasing my own conda if I find it necessary.

Simply dropping support for wxWidget packages for my own day job.

Why not the try/except? Seems to be the path of least resistance, no? I mean, both 1 and 2 seems to drop the issue on you only. A "proper" solution may take too long due to conda's development times.

hmaarrfk · 2021-09-08T19:24:33Z

try except is fine with me. it just kinda feels like more of a hack. i would really appreciate it though

ocefpaf · 2021-09-08T19:26:10Z

it just kinda feels like more of a hack.

Indeed but I don't want to have you maintaining your own conda just for the sake of a corner case when updating things 😄

hmaarrfk · 2021-09-08T19:49:55Z

See #149

ocefpaf · 2021-09-09T19:04:29Z

Closing this b/c we need a better solution that will make everybody happy. Thanks all for the discussion!

mingwandroid requested review from isuruf, jakirkham, jjhelmus, kalefranz, mbargull, mcg1969, msarahan, mwcraig, ocefpaf, patricksnape, pelson and scopatz as code owners November 14, 2020 17:19

mingwandroid force-pushed the cf.master branch from 967f457 to e2c190d Compare November 14, 2020 17:19

mingwandroid force-pushed the cf.master branch from e2c190d to 9a70663 Compare November 14, 2020 17:21

hmaarrfk mentioned this pull request Nov 14, 2020

Binary files with prefix in wchar_t (or other encoding) conda/conda-build#3420

Closed

hmaarrfk mentioned this pull request Sep 8, 2021

WIP: Test update script again conda-forge/miniforge#210

Closed

5 tasks

hmaarrfk mentioned this pull request Sep 8, 2021

Use underscores instead of hyphens #148

Closed

5 tasks

ocefpaf mentioned this pull request Sep 9, 2021

WIP: Try except patch #149

Closed

5 tasks

ocefpaf closed this Sep 9, 2021

Stop using gh9946_utf_16_replacement.patch #135

Stop using gh9946_utf_16_replacement.patch #135

Conversation

mingwandroid commented Nov 14, 2020 • edited

conda-forge-linter commented Nov 14, 2020

hmaarrfk commented Nov 14, 2020

mingwandroid commented Nov 14, 2020

mingwandroid commented Nov 14, 2020 • edited

hmaarrfk commented Nov 14, 2020

mingwandroid commented Nov 14, 2020 • edited

hmaarrfk commented Nov 14, 2020

mingwandroid commented Nov 14, 2020

hmaarrfk commented Nov 14, 2020

mingwandroid commented Nov 14, 2020

mingwandroid commented Nov 14, 2020

hmaarrfk commented Nov 14, 2020

mingwandroid commented Nov 14, 2020

mingwandroid commented Nov 14, 2020

hmaarrfk commented Nov 14, 2020

mingwandroid commented Nov 14, 2020 • edited

mingwandroid commented Nov 14, 2020 • edited

hmaarrfk commented Nov 14, 2020

mingwandroid commented Nov 14, 2020

mingwandroid commented Nov 14, 2020

hmaarrfk commented Nov 14, 2020

mingwandroid commented Nov 14, 2020 • edited

isuruf commented Nov 14, 2020

hmaarrfk commented Nov 14, 2020

hmaarrfk commented Nov 14, 2020 • edited

mingwandroid commented Nov 15, 2020

hmaarrfk commented Nov 15, 2020

mingwandroid commented Nov 15, 2020

ocefpaf commented Sep 7, 2021

hmaarrfk commented Sep 8, 2021

hmaarrfk commented Sep 8, 2021

ocefpaf commented Sep 8, 2021

hmaarrfk commented Sep 8, 2021

ocefpaf commented Sep 8, 2021

hmaarrfk commented Sep 8, 2021

ocefpaf commented Sep 8, 2021

hmaarrfk commented Sep 8, 2021

ocefpaf commented Sep 8, 2021

hmaarrfk commented Sep 8, 2021

ocefpaf commented Sep 9, 2021

mingwandroid commented Nov 14, 2020 •

edited

mingwandroid commented Nov 14, 2020 •

edited

mingwandroid commented Nov 14, 2020 •

edited

mingwandroid commented Nov 14, 2020 •

edited

mingwandroid commented Nov 14, 2020 •

edited

mingwandroid commented Nov 14, 2020 •

edited

hmaarrfk commented Nov 14, 2020 •

edited