Skip to content
This repository was archived by the owner on May 17, 2024. It is now read-only.

Conversation

@dlawin
Copy link
Contributor

@dlawin dlawin commented Sep 23, 2023

For some mysql users, select DATA_TYPE FROM information_schema.columns is returning b'str'

It seems collation ~can~ affect the QueryResult types returned by mysql-connector-python. Strings can be returned as byte strings (or byte arrays??) when using _bin collations. In addition char_set="utf8" is actually an alias that can cause more inconsistencies.

See:

This PR explicitly sets the encoding and collation used by the client to (hopefully) resolve #603 -- I'm not able to reproduce that exact scenario, but a similar one when changing the collation locally.
Screenshot 2023-09-22 at 8 02 56 PM

Note the column level collation setting for "DATA_TYPE", setting it in the client overrides this and any other settings.

@dlawin dlawin requested review from nolar and vvkh September 23, 2023 02:17
@dlawin dlawin self-assigned this Sep 23, 2023
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
try:
return mysql.connect(charset="utf8", use_unicode=True, **self._args)
conn = mysql.connect(charset="utf8mb4", use_unicode=True, **self._args)
conn.set_charset_collation(charset="utf8mb4", collation="utf8mb4_0900_ai_ci")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This collation was added in MySQL 8.0.1 released sometime around 2018. We might expect much older versions for the typical migration/replication use cases for which data-diff is used (usually for the purpose of leaving those older versions). Should we make it so that the absence of this collation does not raise an error a few lines below?

And a second question: "ci" means "case insensitive". Does it make bisection by textual PKs, as well as where-filtering, case-insensitive by default, which might come as a surprise to users?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nolar I think I need to look into it more deeply, but from what I can tell, all of the default collations are _ci

@github-actions
Copy link
Contributor

This pull request has been marked as stale because it has been open for 60 days with no activity. If you would like the pull request to remain open, please comment on the pull request and it will be added to the triage queue. Otherwise, it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues/PRs that have gone stale label Nov 28, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2023

Although we are closing this pull request as stale, it's not gone forever. PRs can be reopened if there is renewed community interest. Just add a comment and it will be reopened for triage.

@github-actions github-actions bot closed this Dec 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

stale Issues/PRs that have gone stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TypeError when running for MySQL

3 participants