You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've received this a few times now - when PyAirbyte cannot determine a type with confidence, it prints a warning message but users often are led to believe this is fatal. We should improve the messaging here so it is more clear that this is a warning - or we should remove the messaging altogether.
Whenever PyAirbyte determines a column type, it uses a heuristic to try to determine the "best" column type given the input data type. However, PyAirbyte always applies a "failsafe" data type when none can be determined otherwise. This is normally a string, JSON, or Variant column type, and in most cases the sync will succeed regardless - with no real loss of fidelity.
Example cases:
A classic example of this is a data type that is defined as anyOf(string, object). Since databases generally don't support a column type that would equally support either a string or an object (aka dictionary/struct), the cache will determine the best-available failover type, which would probably be VARCHAR - meaning object-type inputs would be converted to string. This works fine in practice; while we might lose some "data type fidelity" we generally have zero loss of "data fidelity".
Clarified messaging:
The right thing to do here (apart from adding more advanced heuristics) would be to improve messaging so that the user knows that this is not a failure condition, and that we do have a valid typing fail-safe that we are applying for this condition. Alternatively, we could eliminate this messaging entirely.
The text was updated successfully, but these errors were encountered:
aaronsteers
changed the title
🐛 Bug: Warnings looks like errors
🐛 Bug: Warnings looks like errors: Could not determine airbyte type from JSON schemaMay 1, 2024
We've received this a few times now - when PyAirbyte cannot determine a type with confidence, it prints a warning message but users often are led to believe this is fatal. We should improve the messaging here so it is more clear that this is a warning - or we should remove the messaging altogether.
Raised (most recently) here:
Background:
Whenever PyAirbyte determines a column type, it uses a heuristic to try to determine the "best" column type given the input data type. However, PyAirbyte always applies a "failsafe" data type when none can be determined otherwise. This is normally a string, JSON, or Variant column type, and in most cases the sync will succeed regardless - with no real loss of fidelity.
Example cases:
A classic example of this is a data type that is defined as
anyOf(string, object)
. Since databases generally don't support a column type that would equally support either a string or an object (aka dictionary/struct), the cache will determine the best-available failover type, which would probably beVARCHAR
- meaning object-type inputs would be converted to string. This works fine in practice; while we might lose some "data type fidelity" we generally have zero loss of "data fidelity".Clarified messaging:
The right thing to do here (apart from adding more advanced heuristics) would be to improve messaging so that the user knows that this is not a failure condition, and that we do have a valid typing fail-safe that we are applying for this condition. Alternatively, we could eliminate this messaging entirely.
cc @marcosmarxm , @bindipankhudi
The text was updated successfully, but these errors were encountered: