Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -359,7 +359,18 @@ package object expressions {

case ambiguousReferences =>
// More than one match.
val referenceNames = ambiguousReferences.map(_.qualifiedName).mkString(", ")
var referenceNames = ""
if (ambiguousReferences.map(_.qualifiedName).toSet.size == 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, SQLQueryTestSuite is failing, so you many want to check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated relevant out files under sql/core/src/test/resources/sql-tests/results/postgreSQL

val sz = ambiguousReferences.size
var i = 0
for (ref <- ambiguousReferences) {
i = i + 1
referenceNames += ref.qualifiedName + "#" + ref.exprId.id
if (i < sz) referenceNames += ", "
}
} else {
referenceNames = ambiguousReferences.map(_.qualifiedName).mkString(", ")
}
throw new AnalysisException(s"Reference '$name' is ambiguous, could be: $referenceNames.")
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3225,7 +3225,7 @@ select * from
struct<>
-- !query output
org.apache.spark.sql.AnalysisException
Reference 'f1' is ambiguous, could be: j.f1, j.f1.; line 2 pos 63
Reference 'f1' is ambiguous, could be: j.f1#x, j.f1#x.; line 2 pos 63
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the attr ID doesn't seem to help much. Users still don't know how to fix the query (is it un-fixable?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For json path expression case, one expression is from filter and the other is from projection.

      String query = "SELECT id, address, get_json_string(phone, '$.key[1].m[2].b') as key " +
                    "FROM mycatalog.test.person " +
                    "WHERE get_json_string(phone, '$.key[1].m[2].b') >= '100' order by id limit 2";

get_json_string produces the json path expression.

See innerResolve() of sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:

2021-02-21 04:04:00,467 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] inner Resolving 'phone->'key'->1->'m'->2->>'b' to phone->'key'->1->'m'->2->>'b'#25

I am willing to get input from people who are familiar with the analyzer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same feeling with @cloud-fan. I think the other databases show a message message with the same granularity in the case, e.g.,

postgres=# create table t1 (id int);
CREATE TABLE
postgres=# create table t2 (id int);
CREATE TABLE
postgres=# select * from t1, t2 where id = id;
ERROR:  column reference "id" is ambiguous
LINE 1: select * from t1, t2 where id = id;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above example from postgres doesn't apply to the json path case because there is only one table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does https://github.com/apache/spark/pull/31613/files#r581342459 answer @cloud-fan's question?

So with the attr ID added, how it helps the case you show? I think it is the point we care about.

BTW, what is get_json_string? Do you mean get_json_object?

Copy link
Contributor Author

@tedyu tedyu Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering snippet of physical plan:

      +- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->>'b'#10, phone->'key'->1->'m'->2->'b'#12] Cassandra Scan: test.person

multiple json path expressions would be accompanied by ExprId.id. It would be easier to match the reference (with exprId.id) given in the AnalysisException with the expression.

w.r.t. get_json_string, it is a function which is interpreted by Spark extension, translating arguments to json path expression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't help because end-users can't specify attr id when referring to the column. If a table/relation has duplicated column names, I think the only way out is to get the column by position, e.g. df.select(Column(df.logicalPlan.output(2))), and attr id doesn't matter.



-- !query
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3253,7 +3253,7 @@ select * from
struct<>
-- !query output
org.apache.spark.sql.AnalysisException
Reference 'f1' is ambiguous, could be: j.f1, j.f1.; line 2 pos 72
Reference 'f1' is ambiguous, could be: j.f1#x, j.f1#x.; line 2 pos 72


-- !query
Expand Down