Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with column which is named "ID" #107

Closed
SemyonSinchenko opened this issue Jul 22, 2022 · 3 comments · Fixed by #114
Closed

Problem with column which is named "ID" #107

SemyonSinchenko opened this issue Jul 22, 2022 · 3 comments · Fixed by #114
Assignees
Labels
bug Something isn't working enhancement New feature or request
Milestone

Comments

@SemyonSinchenko
Copy link

Expected Behavior

Generation of column with name "ID".

Current Behavior

Exception:
AnalysisException: Reference 'ID' is ambiguous, could be: ID, ID.

Steps to Reproduce (for bugs)

import dbldatagen as dg
from pyspark.sql import types as T

SparkSession.builder.getOrCreate()

dg.DataGenerator(spark, rows=100, partitions=2).withColumn("ID", T.StringType()).build()

Context

I see that the problem is here
And it may be solved by renaming of my column "ID" to "ID_" before generation and then renaming it back after but it looks little creepy for production... Why you cannot use something less frequent usable for inner ID column? Like datagen__technical__inner__id for example?

Your Environment

  • dbldatagen version used: 0.2.0rc1
  • Databricks Runtime version: 10.4 LTS
  • Cloud environment used: AWS
@ronanstokes-db
Copy link
Contributor

ID and id are reserved column names - we can look at making this configurable if needed but in the current release these are reserved for system use

@SemyonSinchenko
Copy link
Author

SemyonSinchenko commented Aug 27, 2022

Best to make it configurable. Or at least raise some exceptions about it on the stage of column adding... Because it is really unobvious to get AnalysisException: Reference 'ID' is ambiguous, could be: ID, ID. Because ID is very often usable name. We have such a column in each GDWH table for example. Thank you!

@ronanstokes-db
Copy link
Contributor

Will add a fix in two phases

  • phase 1 : will warn when column named id is added
  • phase 2: allow renaming of the seed column

@ronanstokes-db ronanstokes-db self-assigned this Oct 4, 2022
@ronanstokes-db ronanstokes-db linked a pull request Oct 4, 2022 that will close this issue
12 tasks
@ronanstokes-db ronanstokes-db added enhancement New feature or request bug Something isn't working labels Oct 4, 2022
@ronanstokes-db ronanstokes-db added this to the v0.2.2 milestone Oct 22, 2022
@ronanstokes-db ronanstokes-db modified the milestones: v0.2.2, v0.3.1 Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants