Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHOENIX-2196 - auto capitalize field names in DF->Phoenix table save method #114

Closed
wants to merge 2 commits into from
Closed

Conversation

randerzander
Copy link

No description provided.

@randerzander
Copy link
Author

As an example, reading a CSV file where headers aren't capitalized, the below is necessary before calling df.save:

var df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("DROPMALFORMED", "true").load(input)
val columns = df.columns.map(x => x.toUpperCase)
df = df.toDF(columns:_*)

It would be nice for end users if the .save implementation handled this detail

@JamesRTaylor
Copy link
Contributor

Thanks for the pull. One feature that Phoenix (and other RDBMS) has is that if a column identifier is double quoted, then it's treated as case sensitive. Otherwise, it's upper cased as you've mentioned. Would be good if we could accommodate this too in the phoenix-spark integration. If we can't do this, then we may need to keep it as case sensitive as otherwise we can't support the case sensitive use case.

@randerzander
Copy link
Author

If I understand correctly, this change should work-

Any fieldnames with starting and ending quotes are untouched. If they aren't quoted, then they're auto-capitalized.

Does this work better?

@JamesRTaylor
Copy link
Contributor

Yes, I think that would work. We'd need unit tests around this too, please. The function we go through to normalize identifier/column references is SchemaUtil.normalizeIdentifier(String identifier). Here's a SQL example (which would work fine):
{code}
CREATE TABLE "t" (id VARCHAR PRIMARY KEY, "v" VARCHAR);
UPSERT INTO "t" (ID, "v") VALUES ('a','b');
SELECT ID, "v" FROM "t";
SELECT id, "v" FROM "t";
{code}

Would you have some cycles to review this pull, @jmahonin?

@jmahonin
Copy link
Contributor

Sure thing, on a quick glance on mobile this looks good, but I'll try spend some time with it tomorrow.

@jmahonin
Copy link
Contributor

@randerzander Not sure if you saw, but I've got a new version of your patch up at https://issues.apache.org/jira/browse/PHOENIX-2196

If you could take a quick look at let me know if that works for you I'll get it in ASAP.

@taoshide
Copy link

Spark DataFrame to save the data, how to use SEQUENCE to call the saveToPhoenix method?

@stoty
Copy link
Contributor

stoty commented Aug 1, 2023

Already merged.

@stoty stoty closed this Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants