Problem with column which is named "ID" #107

SemyonSinchenko · 2022-07-22T15:36:40Z

Expected Behavior

Generation of column with name "ID".

Current Behavior

Exception:
AnalysisException: Reference 'ID' is ambiguous, could be: ID, ID.

Steps to Reproduce (for bugs)

import dbldatagen as dg
from pyspark.sql import types as T

SparkSession.builder.getOrCreate()

dg.DataGenerator(spark, rows=100, partitions=2).withColumn("ID", T.StringType()).build()

Context

I see that the problem is here
And it may be solved by renaming of my column "ID" to "ID_" before generation and then renaming it back after but it looks little creepy for production... Why you cannot use something less frequent usable for inner ID column? Like datagen__technical__inner__id for example?

Your Environment

dbldatagen version used: 0.2.0rc1
Databricks Runtime version: 10.4 LTS
Cloud environment used: AWS

The text was updated successfully, but these errors were encountered:

ronanstokes-db · 2022-08-25T10:42:33Z

ID and id are reserved column names - we can look at making this configurable if needed but in the current release these are reserved for system use

SemyonSinchenko · 2022-08-27T20:55:38Z

Best to make it configurable. Or at least raise some exceptions about it on the stage of column adding... Because it is really unobvious to get AnalysisException: Reference 'ID' is ambiguous, could be: ID, ID. Because ID is very often usable name. We have such a column in each GDWH table for example. Thank you!

ronanstokes-db · 2022-10-04T00:41:38Z

Will add a fix in two phases

phase 1 : will warn when column named id is added
phase 2: allow renaming of the seed column

ronanstokes-db self-assigned this Oct 4, 2022

ronanstokes-db linked a pull request Oct 4, 2022 that will close this issue

Feature id fixes #114

Merged

12 tasks

ronanstokes-db added enhancement New feature or request bug Something isn't working labels Oct 4, 2022

ronanstokes-db added this to the v0.2.2 milestone Oct 22, 2022

ronanstokes-db modified the milestones: v0.2.2, v0.3.1 Dec 1, 2022

ronanstokes-db closed this as completed in #114 Feb 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with column which is named "ID" #107

Problem with column which is named "ID" #107

SemyonSinchenko commented Jul 22, 2022

ronanstokes-db commented Aug 25, 2022

SemyonSinchenko commented Aug 27, 2022 •

edited

ronanstokes-db commented Oct 4, 2022

Problem with column which is named "ID" #107

Problem with column which is named "ID" #107

Comments

SemyonSinchenko commented Jul 22, 2022

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

Context

Your Environment

ronanstokes-db commented Aug 25, 2022

SemyonSinchenko commented Aug 27, 2022 • edited

ronanstokes-db commented Oct 4, 2022

SemyonSinchenko commented Aug 27, 2022 •

edited