Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Windows] Encoding problem in password and extra lines #412

Open
4llan opened this issue Aug 21, 2018 · 9 comments
Open

[Windows] Encoding problem in password and extra lines #412

4llan opened this issue Aug 21, 2018 · 9 comments

Comments

@4llan
Copy link

4llan commented Aug 21, 2018

Hello! I have some passwords that have some extra lines, and QtPass don't handle special chars in panel and in edit box.

i.e: "configurações" turns to be "configurações" in QtPass.

When I use gpg directly from Powershell/command line to save decrypted file, I get the expected text in UTF-8.

  1. There's something to do with QtPass to read and show the decrypted output in UTF-8?
  2. If I had some password using special chars in UTF-8 QtPass will not handle it?
    --
    I created a new password using QtPass. This password contains special chars and I put "configurações" in the extra line. Result:
    Pass: "áéíóúãããããããã"
    Extra line: "configurações"
    --
    Saved in extra line "§" got "§" in panel, edit box and clipboard, just like saved password '§' turns to '§' when copied to clipboard or shown when editing #91

I'm using Windows 10 1803 with QtPass 1.2.3 (and installed gnupg-w32-2.2.9_20180712)

Thanks

@4llan
Copy link
Author

4llan commented Aug 22, 2018

I installed QtPass in Arch Linux and the encoding is OK (creating new password with special chars and decrypting that password with "configurações" in it)

@rdoeffinger
Copy link
Contributor

I have not debugged, but my best guess is that the problem is with these lines in executor.cpp:
QTextCodec *codec = QTextCodec::codecForLocale();
QString pout = codec->toUnicode(internal.readAllStandardOutput());
QString perr = codec->toUnicode(internal.readAllStandardError());
Here the output from the program is converted from the local code page to Unicode.
However if the output isn't in the local code page (and for the file data when decrypting it will be in whatever it was originally stored as) that results in such breakage.
This seems somewhat inconsistent with what we do on encoding, since the input we always convert to utf8!
This issue should be reproducible on Linux when setting the code page to a non-UTF-8 one. It's just that Windows is nowadays probably the only system so stuck in the past that it doesn't use UTF-8 by default...
I think a likely reasonable, though rather ugly solution is to simply try to always interpret stdout as UTF-8, and only treat it as local code page if it's not valid UTF-8.
Likelyhood of non-UTF-8 data being valid encoding should be low enough that that seems like a reasonable risk to take.

@bjmgeek
Copy link

bjmgeek commented Sep 21, 2018

In that respect, UTF-8 has some nice properties. Specifically, it's possible to completely verify a given sequence of bytes, whether it is valid utf-8. Not only that, but any ASCII (7-bit) text is valid, and upper ascii is not. So, if you get something that's not valid UTF-8, it's probably a local encoding. The converse is also almost always true, which is that any text in a local encoding, that uses characters above 127 is not valid UTF-8. For example, according to https://www.w3.org/International/questions/qa-validator-charset-check

If the encoding selected or detected is US-ASCII, UTF-8, UTF-16, or iso-2022-jp (Japanese JIS), and the validator does not complain about encoding problems, there is an extremely high probability that the selected encoding is correct. Note that US-ASCII is a strict subset of UTF-8, and so if US-ASCII works, UTF-8 will work, too.

This tells me that your proposed solution is in fact not that ugly.

@bjmgeek
Copy link

bjmgeek commented Sep 21, 2018

Although, that being said, QString::fromUtf8 tries to parse the input as UTF-8, but if it's not valid, it still outputs something, rather than throwing an exception. That seems sub-optimal. https://stackoverflow.com/a/18228382/2660408 provides an example of how to validate UTF-8 in QT.

@Kishi85
Copy link

Kishi85 commented Nov 13, 2018

To quickly debug this I've tried decrypting a file from the password-store manually using gpg on the command line. This already shows broken encoding upon decrypt (gpg -d) for any special chars. Decrypting the file on Linux gpg shows the correctly encoded password (even if this password was previously encrypted on a Windows machine).

This leads me to believe that the problem might not lie within QtPass but rather gpg4win as QtPass is just parsing the output of the gpg command in native mode. It could be related to this task on the gpg4win tracker: https://dev.gnupg.org/T2281

@4llan
Copy link
Author

4llan commented Nov 13, 2018 via email

rdoeffinger added a commit to rdoeffinger/qtpass that referenced this issue Nov 17, 2018
Since we (most sensibly) encode text as UTF-8 before
encrypting we should assume that the password files
contain UTF-8 when decrypting, instead of the current
locale encoding.
This is the biggest issue on Windows, since it doesn't
even officially support locales with UTF-8 encoding.
For compatibility, detect if the data is not valid UTF-8
and fall back to Qt's BOM based approach, which provides
support for UTF-16 and falls back to current locale encoding.

Fixes issue IJHack#412
@rdoeffinger
Copy link
Contributor

gpg just decodes the data in the file, it doesn't deal with text encoding, it neither can nor should (the linked gnupg ticket is only about things like status/error messages, help text etc. not the actual data in the file which we care about for this issue).
All you are testing by running GPG is whether the terminal program you are using is using UTF-8 encoding or not.
The pull request fixes the issue, I tested it with a few öäü which were broken before and fine after.

@rdoeffinger
Copy link
Contributor

And another reminder: this is NOT a Windows-specific issue. It will happen on ANY non-UTF-8 locale. Which a lot of old RedHat 6 and similar installs still use. So some Linux user will be affected as well.

@tajnymag
Copy link

tajnymag commented Aug 17, 2019

I'd like to report this issue is still happening on clean (sandbox) Windows 10 with Gpg4Win.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants