Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utf-8 encoding problem, ???????? instead of Thai letters #8

Open
skytizens opened this issue Nov 12, 2020 · 6 comments
Open

utf-8 encoding problem, ???????? instead of Thai letters #8

skytizens opened this issue Nov 12, 2020 · 6 comments

Comments

@skytizens
Copy link

Hi, I have a problem detecting qrcode which contains Thai characters. Example below:

qr code decoded by zxing wrapper: "ABCC051 ?????????????????????????"
qr code decoded by zbar wrapper: "ABCC051 เอกสารประกอบการขอสินเชื่อ"

Could you please recommend me a way to fix this issue?

Best regards and thank you

@dlenski
Copy link
Owner

dlenski commented Nov 12, 2020

Give more information about how you're decoding and encoding.

It's quite likely that the problem is on the encoding side; the encoder may not be correctly signalling the appropriate character set used for the Thai text using ECI.

Because many QR encoders (mis)behave by failing to mark the character set appropriately, ZXing includes logic to try to guess it, but it only knows how to distinguish UTF-8, ShiftJIS, and ISO-8859-1.

@skytizens
Copy link
Author

Hello Daniel, Sorry for late answer.
It's hard for me to answer your question. I'm just using Python 3, calling your method to get TEXT output from zxing and using zbar at the same time.
From what I see on google someone else had a similar problem:
zxing/zxing#708

Is it possible to force/set encoding via API?

Best regards,
David

@dlenski
Copy link
Owner

dlenski commented Nov 14, 2020

It's hard for me to answer your question. I'm just using Python 3, calling your method to get TEXT output from zxing and using zbar at the same time.

Please show a complete working example of the code including how the barcode is encoded.

From what I see on google someone else had a similar problem: zxing/zxing#708

Yes. And as described in zxing/zxing#708 (comment), the problem is almost certainly with the encoding of the barcode.

In that case, the QR code was encoded in a Chinese character set, but without including an ECI code to inform the decoder of this, which contravenes the standard for how QR codes are supposed to be encoded.

My educated guess is that your case is similar: perhaps your barcode's contents are encoded in ISO-8859-11 (Thai charset) but also without including a corresponding ECI code.

Is it possible to force/set encoding via API?

In the encoder? I have no idea… it depends on what encoder you are using. 🤷‍♂️ (If the encoder is also ZXing, then yes it is certainly possible to set the correct ECI code.)

If you are asking whether there is a way to band-aid the ZXing decoder so that it uses the intended character set of the barcode, despite the absence of the required ECI code, then yes, you can do this in the Java API by passing a DecodeHintType.CHARACTER_SET to the reader.

There is no way to do this via the Python module currently. Patching the ZXing CLI to make it possible to specify a charset hint would make that possible.

@skytizens
Copy link
Author

skytizens commented Nov 15, 2020

Hello Daniel, Please find example of the source code and qrcode example below:

import os
import pyzbar.pyzbar as pyzbar
import zxing
from PIL import Image
reader = zxing.BarCodeReader()

def readBarcodeZ(im) : 
    decodedObjects = pyzbar.decode(im)
    allbarcodes = []
    for obj in decodedObjects:
        allbarcodes.append (dict(type=obj.type, data=obj.data.decode('utf-8'), rect=obj.rect))
    return allbarcodes

def readBarcodeX(im) : 
    decodedObject = reader.decode(im)
    return decodedObject

imageFile = "qrcode-example-thai.png"
im = Image.open(imageFile)
im = im.convert('LA')
imBarcodeZ = readBarcodeZ(im)
imBarcodeX = readBarcodeX(imageFile)

print ("Zbar:", imBarcodeZ)
print ("Zxing:", imBarcodeX)

qrcode-example-thai

We use UTF-8 as the code page for encoding Thai characters. It looks like the easiest way is to just add an ECI code when creating qr code.

@skytizens
Copy link
Author

Daniel, Just update to my previous comment, I just ran a test on this site:
https://zxing.org/w/decode.jspx

and the decoded result looks good. Please find print screen below.
2020-11-16_08h44_08

@dlenski
Copy link
Owner

dlenski commented Nov 16, 2020

Hello Daniel, Please find example of the source code and qrcode example below:

When I run this code, with this example image, it works as intended… that's with 5bc2e3f, Python 3.6.9 on Linux with locale/charset of en_US.UTF-8, and ZXing java libraries v3.4.1.

Zxing: BarCode(raw='TEST-THAI2-วันนี้คุณเป็นอย่างไรบ้าง?', parsed='TEST-THAI2-วันนี้คุณเป็นอย่างไรบ้าง?', format='QR_CODE', type='TEXT', points=[(231.5, 522.0), (231.5, 308.5), (445.0, 308.5), (423.5, 500.5)])

I'm not sure what you're expecting me to be able to tell you. You haven't given me enough information.

  1. You didn't include the output of your example code. It works for me. Does it not work for you? I have no idea. 🤷‍♂️
  2. The code and barcode image you showed are clearly not the same as the ones from for the initial post. I can't read barcodes or interpret source code which I've never seen. 🤷‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants