You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Exception: AnalysisException: Reference 'ID' is ambiguous, could be: ID, ID.
Steps to Reproduce (for bugs)
import dbldatagen as dg
from pyspark.sql import types as T
SparkSession.builder.getOrCreate()
dg.DataGenerator(spark, rows=100, partitions=2).withColumn("ID", T.StringType()).build()
Context
I see that the problem is here
And it may be solved by renaming of my column "ID" to "ID_" before generation and then renaming it back after but it looks little creepy for production... Why you cannot use something less frequent usable for inner ID column? Like datagen__technical__inner__id for example?
Your Environment
dbldatagen version used: 0.2.0rc1
Databricks Runtime version: 10.4 LTS
Cloud environment used: AWS
The text was updated successfully, but these errors were encountered:
Best to make it configurable. Or at least raise some exceptions about it on the stage of column adding... Because it is really unobvious to get AnalysisException: Reference 'ID' is ambiguous, could be: ID, ID. Because ID is very often usable name. We have such a column in each GDWH table for example. Thank you!
Expected Behavior
Generation of column with name "ID".
Current Behavior
Exception:
AnalysisException: Reference 'ID' is ambiguous, could be: ID, ID.
Steps to Reproduce (for bugs)
Context
I see that the problem is here
And it may be solved by renaming of my column "ID" to "ID_" before generation and then renaming it back after but it looks little creepy for production... Why you cannot use something less frequent usable for inner ID column? Like
datagen__technical__inner__id
for example?Your Environment
dbldatagen
version used: 0.2.0rc1The text was updated successfully, but these errors were encountered: