Skip to content

Add optional parameter 'ignore', 'replace' or 'backslashreplace' in string the decoding function #258

@rudaporto

Description

@rudaporto

I've found some issues with some records stored in a MySQL database.
At this line, an UnicodeDecodeError exception is raised when I'm fetching data from a specific table:

https://github.com/PyMySQL/mysqlclient-python/blob/3de469db63c5da47d61ac79bba5dadc0d22bde5c/MySQLdb/connections.py#L231

I know that the issue is caused by some malformed data store in MySQL that cannot be decoded as UTF-8 since they were incorrectly encoded in the first place.

But it would be nice if the driver has a better way to deal with these cases by simply adding an option in the connection defining the string decode configuration: 'strict', 'ignore', 'replace' or 'backslashreplace'.

So, basically I'm proposing to add the following option to the connection class.

    :param str decode_errors='strict':
        The default value is 'strict' but it can accept 'ignore' or 'replace' in PY2
        And in PY3 it can be also set as 'backslashreplace'

And the string_decoder implementation would be:

def string_decoder(s):
    return s.decode(db.encoding, errors=decode_errors)

References:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions