Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken encoding when pasting Cyrillic text #618

Closed
T-igra opened this issue Sep 27, 2023 · 45 comments
Closed

Broken encoding when pasting Cyrillic text #618

T-igra opened this issue Sep 27, 2023 · 45 comments
Milestone

Comments

@T-igra
Copy link

T-igra commented Sep 27, 2023

Broken encoding when pasting Cyrillic text copied from a web page.
1.8.0 Beta 6, Windows 10 (64)

image

@dpradov
Copy link
Owner

dpradov commented Oct 6, 2023

Hello, could you show a URL where I can test it? And please, indicate your regional configuration, and the options in the note where you pasted it.

@T-igra
Copy link
Author

T-igra commented Oct 9, 2023

https://habr.com/ru/articles/691192/

I simply selected and copied (Ctrl+C) the text in Russian and then pasted (Ctrl+V) it into the note.

@dpradov
Copy link
Owner

dpradov commented Nov 13, 2023

Hello @T-igra , I cannot reproduce the problem
Could you show what the regional configuration of your computer is (default codepage and language settings)?

Something like this (in my case):

imagen

imagen

Please, show me also the content of 'Note properties'
imagen

Regards
Daniel

dpradov added a commit that referenced this issue Jan 29, 2024
- Right-click on the node -> Export
  - Exporting from plain text to RTF and HTML failed

- File | Export...
  - Exporting from plain text to RTF and HTML failed
  - Exporting from RTF was not successful if the system codepage was UTF8 (65001)
   (In W10 that codepage is enabled with the option "Beta: Use Unicode UTF- 8 for worldwide language support"

  That current codepage in then system can be consulted by looking in the Windows registry, for example with the
  following instruction from the command line (CMD):
   REG QUERY "HKLM\SYSTEM\CurrentControlSet\Control\Nls\CodePage" -v ACP | find /I "ACP"

  The modification of the codepage can be done from the Windows configuration:
  #618 (comment)

  Ref:
  https://learn.microsoft.com/en-us/cpp/text/locales-and-code-pages?view=msvc-170
@gregta
Copy link

gregta commented May 8, 2024

Dear Daniel Prado Velasco!
For version 1.9.3.1, the encoding problem remains relevant. Again, KeyNote NF, when capturing text in the original Unicode encoding to the clipboard (for example, on the page https://ru.wikipedia.org/wiki/Вега,_Лопе_де ) inserts the captured text in ANSI encoding. Your recommendations, unfortunately, do not help. Windows 8.1, x32 bit, russian.
knt-encoding
So far I am using version 1.7.8.1, and there were no such problems there.
Additionally, when you try to change the font of a note, you receive an error warning.
knt-Text change

@gregta
Copy link

gregta commented May 8, 2024

Clipboard settings.
knt-options-clipboard

@dpradov
Copy link
Owner

dpradov commented May 9, 2024

I'll try to see what could be happening. The problem is that you are using a very old (and unsupported) version of Windows. And at the very least it would be difficult for me to reproduce it, since I use W11 and I program on a virtual machine with W10...

(The support for Windows 8.1 ended on January 10, 2023. After that date, no further security updates or bug fixes will be provided for Windows 8.1)

Could you at least try if it works for you in W10 or W11, in your language? It is to be confirmed that it is something related to the version of Windows, and not just the language.

Additionally, when you try to change the font of a note, you receive an error warning.

It would also be of great help to me if you could verify if this error occurs on a computer with W10 or W11. Could you check it on a computer with those versions of Windows? (and your language and regional configuration, of course)

@gregta
Copy link

gregta commented May 10, 2024 via email

@dpradov
Copy link
Owner

dpradov commented May 10, 2024

Sorry, but it's not as simple as you indicate. There are many things that can cause you to have problems when pasting text from a browser. First of all, if you look at the "News" note in the help file you will see that version 1.8.0 incorporated very important changes, starting with the use of a new, much more modern version of the compiler, at least adapted for the new versions of Windows. When you paste from a browser KeyNote needs to convert HTML text into RTF and to do this it has to use a system library. There could be incompatibilities between that library and the changes that the application incorporated. Also, the version of the RichEdit control that is available in Windows 8.1 is much older than the one used in W10 and W11 and the subsequent adjustments in the application, tested against the new versions, could cause problems in some cases with versions previous (perhaps).
On the other hand, I needed to make many changes to the code to adapt it to the new version of the compiler, starting with the management of the "string" type. And an endless number of other things.
That said, the application includes many corrections and improvements compared to version 1.7.8.

imagen

My unprofessional logic when comparing versions tells me that it is this new part that is the deterioration of the function, because version 1.7.8.1 coped quite well with encodings when inserting text.

Well, it has nothing to do with these new functionalities, but with the conversion from HTML and the way in which ANSI or Unicode is interpreted, which of course may be due to some error caused by the new versions (but in something more internal).

As for my Windows 8.1, I'm used to it and I like it better. I don't need updates, and security is provided by the antivirus. Therefore, I will not update version 8.1. I don't have versions 10 and 11.

It's your decision. I can assure you that there are vulnerabilities that an antivirus cannot avoid.

At the bottom of my comment on the site there is an error (Unexpected Error) when trying to change the default font, which you did not mention in the letter.

Yes I got to answer you:

Additionally, when you try to change the font of a note, you receive an error warning.
It would also be of great help to me if you could verify if this error occurs on a computer with W10 or W11. Could you check it on a computer with those versions of Windows? (and your language and regional configuration, of course)

I need you to try the same actions that give you these problems from a computer with W10 or W11 (from a friend or family member [*]), to confirm if it is something related to the characteristics of the page code and the type of font you use, or the Windows version (or perhaps both).

[*] Remember that it is not necessary to install Keynote NF. It can be run from a folder, portable.

@dpradov
Copy link
Owner

dpradov commented May 10, 2024

In addition to the above, I would like to offer you help in localizing
KeyNote NF in Russian if you explain to me how to do this. I'm not a
programmer, so the details of this are unclear to me.

There is information here:
imagen

But it is not yet possible because I must first generate the file to be translated, and I have not done so because I still need to make many adjustments and I do not have time for it. For the moment I have preferred to advance on other issues that I understand are more important or urgent.

Thanks for the offer. When it is possible to do the translation I will let you know.

@dpradov
Copy link
Owner

dpradov commented May 10, 2024

Additionally, when you try to change the font of a note, you receive an error warning.

Does it happen to you after trying to paste text from a web page (with the coding problem you mention)? Or at any time and any folder? For example, if you create a new file and right at that moment you press F6 and change the source, do you get the error?

@dpradov
Copy link
Owner

dpradov commented May 10, 2024

when capturing text in the original Unicode encoding to the clipboard (for example, on the page [https://ru.wikipedia.org/wiki/Вега,_Лопе_де](https://ru.wikipedia.org/wiki/

What browser (and version) do you use?

In the screenshot you show with information from that URL (data from Félix Lope de Vega) there is part that looks good and part that looks bad. How did you incorporate the part that looks good? Is it an image, a screenshot?

@dpradov
Copy link
Owner

dpradov commented May 10, 2024

@Stefanoko, could you please check from the last version (1.9.3) run on your computer with Windows 7 (if you still have it) if it works for you to paste text from the web page indicated by @gregta (the text is in Russian, using UTF8 encoding as I see in the source code of the page)?
It works perfectly for me, but I am using W10 and W11, and I suspect that it may be something related to making the BOM character string explicit.
If I remember correctly (I would have to review the code) in current versions of RichEdit I needed to add the characters (BOM) that identify the UTF8 text (In hexadecimal: EF, BB, BF)
Perhaps the problem is because the version of the RichEdit control available in W8.1 does not correctly handle those BOM characters. And the RichEdit version of W7 should be very similar to that of W8.1

Thanks

imagen

@dpradov
Copy link
Owner

dpradov commented May 10, 2024

@gregta
Can you do the following test?
Copy the paragraph you used as an example from the browser. Then paste it into WordPad. It should appear exclusively as plain text, without offering the various hyperlinks it contains.
Now copy the paragraph again from the browser and paste it into KeyNote NF. It doesn't matter right now that it looks bad. This operation should have forced the conversion from HTML to RTF so that on the clipboard you should now see from KeyNote NF:

imagen

Or from WordPad:
imagen

Then try pasting into WordPad again. It should use RTF format. You can also force it through Paste Special.
As shown?
If the conversion has been correct, it should paste the paragraph including the hyperlinks.

It is to try to find out if the problem occurs in the conversion process or when copying the converted text into the note.

@dpradov
Copy link
Owner

dpradov commented May 11, 2024

Good news. I have managed to reproduce the problem on W10, as @T-igra also pointed out
On my virtual machine with W10 I have installed the Russian language and have also set it in the region/country and regional format settings. With these changes the application has continued to work well, with complete normality.
But it has been from setting Russian as the system's regional configuration, from the following window:

imagen

when the behavior you describe has been reproduced, and the copied text has stopped pasting correctly.
That setting is what "will be used when displaying text in programs that do not support Unicode"

This is great news because it allows me to analyze what is causing it and correct it.

@gregta
Copy link

gregta commented May 11, 2024 via email

@dpradov
Copy link
Owner

dpradov commented May 11, 2024

Does it happen to you after trying to paste text from a web page (with the coding problem you mention)? Or at any time and any folder? For example, if you create a new file and right at that moment you press F6 and change the source, do you get the error?
NO - in all cases, i.e. it was never possible to reach the font change dialog.

And you also can't see the font dialog for any text selection using the toolbar button or the "Format | Font" menu option?
imagen

@gregta
Copy link

gregta commented May 11, 2024 via email

@dpradov
Copy link
Owner

dpradov commented May 11, 2024

Sorry, I don't know if I understand you correctly. Do you mean that it also fails you using the menu and the toolbar button on the text?

@gregta
Copy link

gregta commented May 11, 2024 via email

@dpradov
Copy link
Owner

dpradov commented May 11, 2024

I'm trying to solve this pasting problem from the browser and it's not being easy...
The same code, compiled from Delphi using Russian as the system locale, works perfectly on that computer, with that configuration. Pasted text is inserted perfectly as RTF.
However, the application compiled in this way, which works on the computer with the config.regional in Russian, does not work on my computer with the config.regional in Spanish. And vice versa.
I'm doing a lot of tests and so far nothing.
By the way, I had a question about how version 1.7.8.1 worked in this sense, since I had the memory that the conversion from HTML to RTF did not go well and it was precisely from more recent versions that I managed to do it ( at least for languages that don't use Cyrillic..)

The thing is, I have tried to paste the paragraph from that page in Russian, with version 1.7.8.1, from my computer in Spanish and from the other in Russian, and in both it works fine but without converting to RTF, just offering it as plain text. And I think that until version 1.7.9.4 and especially since 1.8.0, there was no conversion from HTML to RTF, so all the text formatting was lost (hyperlinks, font formatting, paragraph formatting, etc. .)

Can you confirm that when you use 1.7.8 on your computer this is how it behaves, and you only really paste plain text from the browser? Because if so, you would get the same result for now from the current version pasting plain text explicitly (Ctrl+Shift+V or Ctrl+Shift+Insert)

@dpradov
Copy link
Owner

dpradov commented May 11, 2024

Yes, it fails, and it doesn't depend on using the menu or the toolbar.

This is already rarer, and I don't reproduce it on the computer (W10) where I have set Russian as the system locale

@gregta
Copy link

gregta commented May 11, 2024 via email

dpradov added a commit that referenced this issue May 11, 2024
…ti-byte character set

This affected, for example, to Cyrillic text
It will now work regardless of whether UTF8 is the default codepage in your local/regional settings.

Ref: #618, #629, #609
@dpradov
Copy link
Owner

dpradov commented May 12, 2024

Hello @gregta
One question, out of curiosity. When you browse a page whose address uses Russian characters, like the one you showed me, isn't there an easy way to get the address copied using the same Cyrillic characters shown? When I copy and paste the URL from the browser, I always get the encoded characters:

https://ru.wikipedia.org/wiki/%D0%92%D0%B5%D0%B3%D0%B0,_%D0%9B%D0%BE%D0%BF%D0%B5_%D0%B4%D0%B5

imagen

Likewise, if I capture a fragment of the page from KeyNote with Web Clip I get:

imagen

This happens to me whether I do it from my machine configured in Spanish or from the virtual machine with the system encoding in Russian (using Firefox and Edge)

@gregta
Copy link

gregta commented May 13, 2024 via email

@gregta
Copy link

gregta commented May 13, 2024 via email

dpradov added a commit that referenced this issue May 14, 2024
If True (1) it will manage %XX in URL as UTF8, finally converting the whole URL to ANSI or UTF8
depending on current codepage

Example:
https://ru.wikipedia.org/wiki/%D0%92%D0%B5%D0%B3%D0%B0,_%D0%9B%D0%BE%D0%BF%D0%B5_%D0%B4%D0%B5
->
https://ru.wikipedia.org/wiki/Вега,_Лопе_де

Certain characters that could be encoded with %XX won't be converted:

  ' ', '/', '?','!','''', '&', '%', '#', '$', '[', ']', '(',')', ',', ';', '*', ':', '@', '=', '+'

Ex: http://www.example.com/space%20here.html won't be modified

On "Insert URL" and "Choose Action for URL" dialogs, although URLWebDecode=1, you can force the URL not to be
modified pressing Shift when exiting URL field, or when clicking on OK (Insert URL) or Modify (Choose Action for URL).
Even if URL is not modified (because of Shift or URLWebDecode=0), when text URL is empty or equal to URL it will be set
with the decoded version of the URL field.
So, for example, if you pressed Shift while changing focus from "URL" to "Text", the URL field will maintain the %XX
characters, but the "Text" field will be set decoded. In the example, this way you can automatically have:
   URL: https://ru.wikipedia.org/wiki/%D0%92%D0%B5%D0%B3%D0%B0,_%D0%9B%D0%BE%D0%BF%D0%B5_%D0%B4%D0%B5
  Text: https://ru.wikipedia.org/wiki/Вега,_Лопе_де

When using ClipCap or Web Copy (Ctrl+W or Ctrl+Shift+W), the clip URL will be automatically adapted if URLWebDecode = 1.
Any other pasted hyperlinks, interspersed in the text, will be adapted only by opening and modifying via the
"Choose Action for URL" dialog box.

Ref: #618
@dpradov
Copy link
Owner

dpradov commented May 14, 2024

Additionally, when you try to change the font of a note, you receive an error warning.

@gregta , could you try version 1.9.0 and tell me if that error doesn't appear when opening the Fonts dialog window?
It occurs to me that the only thing that may be affecting you may be a change I made in version 1.9.1, in response to issue #613:

Fixed: Setting the font size with the Font dialog when using scaling settings other than 100% results in a larger font than expected
There is an issue with TFontDialog in Delphi when using scaling settings other than 100%. This issue can cause the selected font to appear larger than expected.
The problem lies in how TFontDialog (and, in fact, the underlying Win32 ChooseFont API) handles DPI awareness.

I suspect that maybe the change made, which works perfectly in W10 and W11, might not be compatible with W8.1
Can you try it and tell me? Please also confirm to me if at any time you see the Font dialog window or, on the contrary,
the exception raises before it is displayed.

@dpradov
Copy link
Owner

dpradov commented May 14, 2024

Another thing. Could you do the following simple test? I would need you to write "Вега,_Лопе_де" with WordPad. Once done, save it and open the .RTF file with notepad. Please copy its content in a comment here, as I do. I have tested it in the virtual machine with the system settings in Russian, and in my PC, in Spanish. I all cases the result is the same.
I would like to check if I don't have the virtual machine configured correctly (since I have several languages installed) or maybe RTF by default registers it this way for any non ASCII character.

"Вега,_Лопе_де" ->

{\rtf1\ansi\ansicpg1251\deff0\nouicompat\deflang1049{\fonttbl{\f0\fnil\fcharset204 Calibri;}{\f1\fnil\fcharset0 Calibri;}}
{\*\generator Riched20 10.0.19041}\viewkind4\uc1 
\pard\sa200\sl276\slmult1\f0\fs22\'c2\'e5\'e3\'e0,_\'cb\'ee\'ef\'e5_\'e4\'e5\f1\lang10\par
}

In my own language, Spanish, it also happens that if I use a word with accents or simply not ASCII, like "ñ" it shows it in a similar way: 'f1
But in my language that rarely happens, mainly in accents and "ñ". But from the tests I've done, in Russian (and I assume many other MBCS languages) almost any character is saved in RTF that way.

"España" ->

{\rtf1\ansi\ansicpg1252\deff0\nouicompat\deflang3082{\fonttbl{\f0\fnil\fcharset0 Calibri;}}
{\*\generator Riched20 10.0.22621}\viewkind4\uc1 
\pard\sa200\sl276\slmult1\f0\fs22\lang10 Espa\'f1a\par
}

@dpradov
Copy link
Owner

dpradov commented May 14, 2024

Curiously, if I write the following from Notepad, WordPad it also recognizes it, but as soon as I make any modification, everything is converted to the previous format.
I'll have to look into it a little more.

{\rtf1\ansi\ansicpg1251\deff0\nouicompat\deflang1049{\fonttbl{\f0\fnil\fcharset204 Calibri;}{\f1\fnil\fcharset0 Calibri;}}
{\*\generator Riched20 10.0.19041}\viewkind4\uc1 
\pard\sa200\sl276\slmult1\f0\fs22\Вега,_Лопе_де\f1\lang10\par
}

@gregta
Copy link

gregta commented May 14, 2024 via email

@gregta
Copy link

gregta commented May 14, 2024 via email

@gregta
Copy link

gregta commented May 14, 2024 via email

@dpradov
Copy link
Owner

dpradov commented May 14, 2024

Yes, Daniel, in version 1.9.0.1 the font change dialog opens normally
both in the toolbar and in the menu.

Perfect, thanks

dpradov added a commit that referenced this issue May 14, 2024
…#613

A change was made in version 1.9.1, in response to issue #613, that could be affecting negatively when
executing KeyNote in Windows 8.1:

  Fixed: Setting the font size with the Font dialog when using scaling settings other than 100% results in a larger font than expected
  --
  There is an issue with TFontDialog in Delphi when using scaling settings other than 100%. This issue can cause the selected font to
  appear larger than expected.
  The problem lies in how TFontDialog (and, in fact, the underlying Win32 ChooseFont API) handles DPI awareness.

I suspect that maybe the change made, which works perfectly in W10 and W11, might not be compatible with W8.1

On Issue #618 it is confirmed that the exception that appears when opening the Fonts dialog (in W8.1) does not occur
in version 1.9.0 (last version before the change)

Ref: #618, #618 (comment)
@dpradov
Copy link
Owner

dpradov commented May 14, 2024

As I have revised, by-the book RTF is limited to new line plus the characters between ASCII 32 (space) and ASCII 126 (the “~” character)
I am lucky because Spanish language (as English and others) use codepage ANSI 1252 (which is basically Latin-1 with some characters added
between 128 and 159), and the main characters I use are in the range 32 - 126.

As I see the Russian characters in ANSI 1251 are above that range and that is why they are displayed as 'xy

In RTF the keyword \ansicpgN (ex: \ansicpg1251) represents the default ANSI code page used to perform the Unicode to ANSI conversion when writing RTF text,
and to interpret the escape sequence 'xy, with the two-digit hexadecimal representation of the character’s number in the corresponding codepage.

But I seems not possible to use chars beyond 128 with escape sequence. Althoug some RTF readers could understand it, after any modification the format
is reverted to the usual way, as I tested before.

RTF is already quite verbose and uses 'xy even more. It's nothing terrible either. In any case, it is always possible to use the compressed format in KeyNote to significantly reduce the file size. And with the compression level Default or even Fast it is more than enough

@gregta
Copy link

gregta commented May 15, 2024 via email

@gregta
Copy link

gregta commented May 15, 2024 via email

@gregta
Copy link

gregta commented May 18, 2024 via email

@dpradov
Copy link
Owner

dpradov commented May 18, 2024

I created a new file in the new version and was surprised why I didn't see the tree panel (see screenshot).

I don't see any screenshot
Could you attach that new test file you created? You can add it to a comment by including it in a .zip

@gregta
Copy link

gregta commented May 18, 2024 via email

@dpradov
Copy link
Owner

dpradov commented May 18, 2024

Sorry, but the images or files that you upload from email do not seem to be incorporated, because I do not see them.
Anyway, you don't need to send me anything. I just saw that it's happening to me too... I had not noticed.
It should be clearly related to "*Fixed: tree panel width could be reduced on restart (if it were wide enough)"

I look at it and correct as I go.
I will upload a version that corrects it in a while

@gregta
Copy link

gregta commented May 18, 2024 via email

@dpradov
Copy link
Owner

dpradov commented May 18, 2024

While I upload the patch, I recommend that you take a look at the improvements that the new version includes. If you have been working with version 1.7.8, you will see that there are many new features, for example with image management.
The help file is completely updated and has a lot of information.

dpradov added a commit that referenced this issue May 18, 2024
 Correction over commit f028bb2 :
 (Fixed: Tree panel width could be reduced on restart (if it was wide enough))

Ref: #618 (comment)
@gregta
Copy link

gregta commented May 18, 2024 via email

@dpradov
Copy link
Owner

dpradov commented May 18, 2024

It is now resolved and the new version (patch) uploaded.
Thanks for letting me know

@gregta
Copy link

gregta commented May 18, 2024 via email

@gregta
Copy link

gregta commented May 19, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants