You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is possible to "corrupt" data coming out of the tap by inputting incorrectly cased values in the config file. While it does not directly affect the tap itself, it can lead to problems downstream (we've just hit this problem).
I am not sure what the correct solution is, so I'm opening this ticket for discussion and awareness.
In config, set "repositories": ["meltanolabs/tap-github"] (lowercase M and L on purpose) and run the tap deselecting the repositories stream, and keeping at least one child stream, issues for instance.
The tap will run fine, since the github api will handle the case inconsistencies in the request and return data containing MeltanoLabs instead of meltanolabs.
The obvious solution to this problem is to avoid incorrectly cased input in the config, but this is fairly fragile. Another option would be to change the way the child streams get their org/repo values by parsing the url field for instance (which is received from github and matches the "official" repo name casing), but this seems fairly costly performance-wise.
Are there any best practices around this in other taps?
The text was updated successfully, but these errors were encountered:
Hi, @laurentS. Thanks for raising. I definitely think this is worth fixing but I'm not sure as of yet what the best fix would be.
This is happening, I think, because MeltanoLabs/tap-github is both a config value and also a data element. While the source API may not be case aware or case sensitive, we certainly should expect the output data to be.
A less than ideal solution would be to just throw a hard failure when casing doesn't match up. But perhaps more gracious would be to defer/coerce to the datasets' casing rules if possible.
It is possible to "corrupt" data coming out of the tap by inputting incorrectly cased values in the config file. While it does not directly affect the tap itself, it can lead to problems downstream (we've just hit this problem).
I am not sure what the correct solution is, so I'm opening this ticket for discussion and awareness.
In config, set
"repositories": ["meltanolabs/tap-github"]
(lowercaseM
andL
on purpose) and run the tap deselecting therepositories
stream, and keeping at least one child stream,issues
for instance.The tap will run fine, since the github api will handle the case inconsistencies in the request and return data containing
MeltanoLabs
instead ofmeltanolabs
.But because of these lines https://github.com/MeltanoLabs/tap-github/blob/0851e80578bbb17f69faa4c6b2220d30ed1802c5/tap_github/repository_streams.py#L103-107 the output from the tap will contain
org=meltanolabs
butfull_name=MeltanoLabs/tap-github
(note the difference in upper/lowercase chars). If you use theorg
andrepo
columns to join records from different streams together, you can end up with inconsistent values which are hard to reconcile.The obvious solution to this problem is to avoid incorrectly cased input in the config, but this is fairly fragile. Another option would be to change the way the child streams get their
org/repo
values by parsing theurl
field for instance (which is received from github and matches the "official" repo name casing), but this seems fairly costly performance-wise.Are there any best practices around this in other taps?
The text was updated successfully, but these errors were encountered: