Conversation
|
Thanks for adding this! I merged it and will redeploy the docs. What do you think the solution for this is? Should we have a callback to rewrite file paths to match? |
|
I guess that may be the correct answer, if you change authorities you should probably re-write all old path information? We were tossing around other ideas of just comparing scheme + path, but I keep imagining HDFS-like implementations which put metadata in other fields. Like one could imagine file://path#someOtherMetadataThatIsUserSpecific screwing everything up again ... |
|
@rdblue @RussellSpitzer @manishmalhotrawork It sounds reasonable to me to just look at scheme and path parts of URI. That way, we should be safe w.r.t. changing authorities or query params (is it even possible?). I think we definitely need a fix for this. |
|
@rdblue @RussellSpitzer @manishmalhotrawork Right now, we have a UDF to produce file names that we use in the join condition. What about having a UDF that would take 2 strings (one actual location and one location referenced in the metadata) and produce a boolean to indicate whether they match or not. Inside the UDF, we can construct |
|
Having a UDF that accepts columns from two relations does not eliminate the cross join. I guess we have two options:
|
@aokolnychyi thanks for adding dtails. yeah as our internal discussion and internal PR I have is similar to this. I believe not considering So, either can avoid checking scheme as well, or have a flag to consider that or not. |
@aokolnychyi yeah almost similar the change I did, will be raising PR shortly. |
@rdblue here is the warning we discussed in the ASF Slack