Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop using gh9946_utf_16_replacement.patch #135

Closed
wants to merge 1 commit into from

Conversation

mingwandroid
Copy link
Contributor

@mingwandroid mingwandroid commented Nov 14, 2020

I am not sure what this breaks though, ping @hmaarrfk

On linux-aarch64, I am seeing this when attempting to use a conda
that includes this patch:

Downloading and Extracting Packages
conda-4.9.2          | 3.1 MB    | ################################################################################################# | 100%
Preparing transaction: done
Executing transaction: done
ERROR conda.core.link:_execute(698): An error occurred while installing package 'conda-forge::ncurses-6.2-h7fd3ca4_3'.
Rolling back transaction: done

LookupError('unknown encoding: utf-16-le')

IMHO the job of figuring out what encodings need replacing is a
job for conda-build. not an install-time one for conda (where this
involves potentially significant computation and introduces
variability - due e.g. to future bugs in conda - into what gets
installed). The replacements can be pre-calculated and therefore
should be.

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

I am not sure what this breaks though, ping @hmaarrfk

On linux-aarch64, I am seeing this when attempting to use a conda
that includes this patch:

```
Downloading and Extracting Packages
conda-4.9.2          | 3.1 MB    | ################################################################################################# | 100%
Preparing transaction: done
Executing transaction: done
ERROR conda.core.link:_execute(698): An error occurred while installing package 'conda-forge::ncurses-6.2-h7fd3ca4_3'.
Rolling back transaction: done

LookupError('unknown encoding: utf-16-le')
```

IMHO the job of figuring out what encodings need replacing is a
job for conda-build. not an install-time one for conda (where this
involves potentially significant computation and introduces
variability - due e.g. to future bugs in conda - into what gets
installed). The replacements can be pre-calculated and therefore
should be.
@hmaarrfk
Copy link
Contributor

Is there any alternative to this?

I understand you want this to be done the "correct" way, but this is going to break wxWidgets which uses non UTF8 encoding.

What platform doesn't have utf-16-le encoding installed by default? Can we depend on something, or skip that encoding?

@mingwandroid
Copy link
Contributor Author

Is there any alternative to this?

Yes, pre-calculating the replacements inside conda-build. The package format does not need any changes for this either.

@mingwandroid
Copy link
Contributor Author

mingwandroid commented Nov 14, 2020

I tested with linux-aarch64 on a raspberry pi 4 using conda 4.9.2 build 0.

@hmaarrfk
Copy link
Contributor

I tested with linux-aarch64 on a raspberry pi 4 using conda 4.9.2 build 0.

Are you saying that I should rebuild packages that have UTF 16 with conda build 4.9.2?

@mingwandroid
Copy link
Contributor Author

mingwandroid commented Nov 14, 2020

No, I am only saying that this seems to be breaking linux-aarch64 (for reasons that aren't 100% clear, for sure, there could be a build issue here and the encoding is just plain missing rather than not supposed to be there). That should be investigated separately.

The fixes I see, in my personal order of preference are:

  1. Add a new feature to conda-build (try-various-encodings-for-prefix-replacement-detection), then revert this patch and rebuild wx with the new conda-build.
  2. Revert this patch. Modify bld.bat of wx so that it converts the affected files using e.g. iconv to utf-8 where conda-build will happily process them (and hopefully wx will too, I expect it would).
  3. Add a try: except: to the the patch.

I doubt Anaconda will merge this patch though due to the concerns I've identified.

@hmaarrfk
Copy link
Contributor

Add a new feature to conda-build

You and I have been working on this together for quite some time (we identified use-cases, wrote tests, ....)

I don't know where to start looking to implement the feature in conda-build. Does the feature you speak speak of (i.e. identifying the location of replacements) already exist for UTF-8?

We have a test ready, but I guess we still haven't had the time to truely get it running in conda-build.

The missing encoding is rather troubling

@mingwandroid
Copy link
Contributor Author

UTF-8 can be, by and large, treated as ASCII by most code, so conda-build needs no special handling for UTF-8. This is because UTF-8 was designed to be compatible with standard C string functions and leads naturally to the core beauty of UTF-8.

As to where the code lives, just search for 'prefix' and 'replace' in conda-build's source code. It shouldn't be too bad to work with that stuff. I have no time personally to look into this and until someone reports that wx on defaults is broken and that's made a priority issue for me, I doubt that I'll have time to look at it.

Anyway, for me, the best fix is not to carry files that are not UTF-8 when a UTF-8 version would work just fine (whether it would or not for wx, I do not know, but it's easily tested).

@hmaarrfk
Copy link
Contributor

can you try the following lines of code on yor machine:

"foo".encode("utf-16-le")
"bar".encode("utf-16le")

I understand you don't like the current solution for conda-forge, but we do have packages that use utf-16 and so for now, this is the best thing that conda-forge has to support this.

@mingwandroid
Copy link
Contributor Author

You have one package, the fixes are very equivalent in terms of workload but very different in terms of cleanliness.

@mingwandroid
Copy link
Contributor Author

These packages should be declared broken too IMHO. If you want changes in conda like this that affect whether a package 'works' or not, I urge you to go via Anaconda instead of doing your own thing.

@hmaarrfk
Copy link
Contributor

This issue has been raised to the conda issue tracker since 2017.

conda/conda#4298

Nobody wants to do backdoor things like this.

Is there a better place to ask for the feature?

You and I are both (for various different reasons) not in a position to open the conda-build source code and submit "the cleaner" fix for this.

@mingwandroid
Copy link
Contributor Author

We must leave it for others to decide the correct approach here I think. Please chime in @conda-forge/core

@mingwandroid
Copy link
Contributor Author

Thing is, it was never a conda issue, it was always a conda-build issue! If it had been filed there I would've fixed it years ago.

@hmaarrfk
Copy link
Contributor

It was filed for conda-build as well, and your answer at the time was the same as it is here:
conda/conda-build#3420

@mingwandroid
Copy link
Contributor Author

mingwandroid commented Nov 14, 2020

It was filed for conda-build as well, and your answer at the time was the same as it is here:

It absolutely is not! It is simply a snippet of (pseudo, almost!) code with no indication of where I think it should go! My comments to your PR to conda are actual warnings and objections, laid out in detail with alternatives. Still the patch ended up here anyway.

With all the respect possible. the thing I don't get is why fixing it in conda was considered the thing to do instead of in conda-build! It's the same effort basically and I said that from the moment you proposed the patch to conda (which I have consistently stated that I was against).

@mingwandroid
Copy link
Contributor Author

mingwandroid commented Nov 14, 2020

I won't respond further to this tonight Mark, have a good evening.

@hmaarrfk
Copy link
Contributor

I personally have a feeling that this is due to an alias missing in your particular version of python as installed on your RPi.

I'm not sure where in python the "standard" encoding is stored so your help in understanding the correct spelling would be appreciated

Is the most correct spelling utf_16_le, utf-16-le or utf-16le?

This is a question that would be helpful to answer even when we submit the patch to conda-build.

Have a good evening Ray.

@mingwandroid
Copy link
Contributor Author

But I'm using conda-forge exclusively here! Anyway, I'll dig into the specifics you've asked me about tomorrow and maybe try to figure out why the encoding is missing. Hopefully someone will find time to add this feature to conda-build, it is a feature I've never been opposed to either, but my attempts at guidance seemed futile!

Over and out, for real!

@mingwandroid
Copy link
Contributor Author

Gaah! Can't stop myself! I would approve of just adding a try except for now. That's good enough.

@hmaarrfk
Copy link
Contributor

I'm honestly trying to find where to implement this in conda-build while setting up other systems.

It just isn't as easy as you think (and I often do this mistake too) to look through a large codebase...

@mingwandroid
Copy link
Contributor Author

mingwandroid commented Nov 14, 2020

Here you go Mark: https://github.com/conda/conda-build/blob/master/conda_build/build.py#L167

If you want some suggestions for the implementation I would convert all files to utf-8, recording any that changed. For any that did change, determine the original encoding (there are packages that will help with this, for example https://pypi.org/project/chardet/).

Then for all (utf-8) files in which the prefix was found, for those which were utf-8 originally, proceed as normal. For those that were not utf-8 originally, figure out the replacement locations in the original file, loaded in binary mode, by searching for the binary string of the prefix replacement value encoded into that original encoding.

@isuruf
Copy link
Member

isuruf commented Nov 14, 2020

How can I reproduce the failure with wxwidgets?

I tried the following change and it printed nothing,

def replace_prefix(mode, data, placeholder, new_prefix):
     popular_encodings = [
         'utf-8',
         # Make sure to specify -le and -be so that the UTF endian prefix
         # doesn't show up in the string
         'utf-16-le', 'utf-16-be',
         'utf-32-le', 'utf-32-be'
     ]
     for encoding in popular_encodings:
         data_orig = data
         if mode == FileMode.text:
             data = data.replace(placeholder.encode(encoding),
                                 new_prefix.encode(encoding))
             if data_orig != data and encoding != 'utf-8':
                 print(encoding)
         elif mode == FileMode.binary:
             data = binary_replace(data,
                                   placeholder.encode(encoding),
                                   new_prefix.encode(encoding),
                                   encoding=encoding)
             if data_orig != data and encoding != 'utf-8':
                 print(encoding)
         else:
             raise CondaIOError("Invalid mode: %r" % mode)
     return data

@hmaarrfk
Copy link
Contributor

I guess it was in the Hugin package created that was interacting with wxWidgets

See screenshot. Platform shows Ubuntu being used.
conda-forge/staged-recipes#11664 (comment)

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Nov 14, 2020

conda create --name hugin hugin --yes
[...]
Preparing transaction: done
Verifying transaction: | utf-32-le
utf-32-le
/ utf-32-le
utf-32-le
done
Executing transaction: - 
- 

is the output when trying to install hugin.

@mingwandroid
Copy link
Contributor Author

Trying to reproduce this today, it does not fail with the newly released (I guess) conda-forge/linux-aarch64::ncurses-6.2-h7fd3ca4_4 while I did happen yesterday with conda-forge/linux-aarch64::ncurses-6.2-h7fd3ca4_3.

Now I cannot get it to go wrong even when trying to force that package (conda create -n py39 conda=4.9.2 python=3.9 ncurses=6.2=h7fd3ca4_3 -c conda-forge), so I must apologize to you for the fuss @hmaarrfk!

Still it'd be nice to make progress on this. I'll try to raise it with the rest of the conda team to get their opinions too.

Cheers.

@hmaarrfk
Copy link
Contributor

I have a feeling it was due to updating conda at the same time as ncurses, some pyc files likely got changed, and as such, the "upgrade" wasn't really smooth.

No issue about the fuss, though, I think I could have probably used a little more delay in some replies and that would have helped both of us better understand the other's point.

I really wish I had more time to sit and think about the problem the correct way in earnest, but work has been stretching me thin.

@mingwandroid
Copy link
Contributor Author

Yes. I think that analysis is correct. I think, if conda is to be updated it should happen in 3 stages:

Update conda deps, update conda, update anything else (with the new conda).

@ocefpaf
Copy link
Member

ocefpaf commented Sep 7, 2021

@hmaarrfk I'm experiencing

ERROR conda.core.link:_execute(698): An error occurred while installing package 'conda-forge::<random pkg here>.
Rolling back transaction: done

LookupError('unknown encoding: utf-16-le')

with both latest mambaforge and miniforge. Not sure what is the best approach here but that made the transition to mini/mamba-forge alternatives impossible at my day job. Do we still need that patch? Is that something we can work with upstream conda?

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Sep 8, 2021

I'm ok removing it.... but... I would ask that we try one more thing before we do:

It seems that the correct encoding is utf_16_le and not utf-16-le. Are we OK trying that patch?

The reason I think that will help is that according to python's documentation

Python comes with a number of codecs built-in, either implemented as C functions or with dictionaries as mapping tables. The following table lists the codecs by name, together with a few common aliases, and the languages for which the encoding is likely used. Neither the list of aliases nor the list of languages is meant to be exhaustive. Notice that spelling alternatives that only differ in case or use a hyphen instead of an underscore are also valid aliases; therefore, e.g. 'utf-8' is a valid alias for the 'utf_8' codec.

The feature I needed this for is for the wxWidgets https://github.com/conda-forge/wxwidgets-feedstock part of https://github.com/conda-forge/hugin-feedstock/
however, I've since removed the GUI from the functionality I require, so I don't have a stake in maintaining "utf-16-le" anymore.

Documentation: https://docs.python.org/3/library/codecs.html#standard-encodings

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Sep 8, 2021

@ocefpaf are you able to give us more information about your environment that is causing things to break for you for miniforge? If so, we do have capabilities ot create test docker images for them. let me know.

@ocefpaf
Copy link
Member

ocefpaf commented Sep 8, 2021

@ocefpaf are you able to give us more information about your environment that is causing things to break for you for miniforge?

The conda environment? It is a clean mini/mamba-forge installation and a conda update --all --yes right after it is installed. All with only conda-forge as the main channel. The problem does not happen with miniconda b/c the starting point is a non-patch conda but right after the update any new updates will produce the same error.

Let me know if you need any specific details.

If so, we do have capabilities or create test docker images for them. let me know.

Probably not. I don't see that error in CIs I'm setting :-/, only locally and on some colleagues laptops.

PS: what do you think about an if-clause like @isuruf suggested above?

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Sep 8, 2021

I'm open to it, but I find it VERY curious that things are failing in the first place. These encodings to my knowledge are just part of python.

So if I can't recreate the issues, I'm not sure if we will actually be able to "fix" anything.

@ocefpaf
Copy link
Member

ocefpaf commented Sep 8, 2021

These encodings to my knowledge are just part of python.

Makes sense. Not sure what is going on locally though. Is there any specific info from my system that could help us debug this?

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Sep 8, 2021

ultimately, it might be because conda is trying to update itself or python, and there is a missing encoding file during this process.

So in between updating python, the encoding file isn't in the cache, therefore the module can't be found in the short period between removing the old python, and patching the new one.

This caching capability is available for quite some time (since at least 3.7)
https://github.com/python/cpython/blob/3.7/Lib/encodings/__init__.py#L74

It seems that this might have been the cause with @mingwandroid trying this once, then it not working, then when he tried it again, he got past it somehow, and it started working.

It might be worth putting a try except and moving on, ore removing this entirely and doing it "property" as mingwandroid suggested initially.

I'm not in a position to do it "properly" any time soon, but for me, I'm ok with removing this patch entirely

  1. Releasing my own conda if I find it necessary.
  2. Simply dropping support for wxWidget packages for my own day job.

@hmaarrfk hmaarrfk mentioned this pull request Sep 8, 2021
5 tasks
@ocefpaf
Copy link
Member

ocefpaf commented Sep 8, 2021

I'm not in a position to do it "properly" any time soon, but for me, I'm ok with removing this patch entirely

  1. Releasing my own conda if I find it necessary.
  2. Simply dropping support for wxWidget packages for my own day job.

Why not the try/except? Seems to be the path of least resistance, no? I mean, both 1 and 2 seems to drop the issue on you only. A "proper" solution may take too long due to conda's development times.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Sep 8, 2021

try except is fine with me. it just kinda feels like more of a hack. i would really appreciate it though

@ocefpaf
Copy link
Member

ocefpaf commented Sep 8, 2021

it just kinda feels like more of a hack.

Indeed but I don't want to have you maintaining your own conda just for the sake of a corner case when updating things 😄

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Sep 8, 2021

See #149

@ocefpaf ocefpaf mentioned this pull request Sep 9, 2021
5 tasks
@ocefpaf
Copy link
Member

ocefpaf commented Sep 9, 2021

Closing this b/c we need a better solution that will make everybody happy. Thanks all for the discussion!

@ocefpaf ocefpaf closed this Sep 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants