PHOENIX-2196 - auto capitalize field names in DF->Phoenix table save method #114

randerzander · 2015-08-23T18:20:04Z

No description provided.

randerzander · 2015-08-23T18:25:40Z

As an example, reading a CSV file where headers aren't capitalized, the below is necessary before calling df.save:

var df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("DROPMALFORMED", "true").load(input)
val columns = df.columns.map(x => x.toUpperCase)
df = df.toDF(columns:_*)

It would be nice for end users if the .save implementation handled this detail

JamesRTaylor · 2015-08-23T18:59:12Z

Thanks for the pull. One feature that Phoenix (and other RDBMS) has is that if a column identifier is double quoted, then it's treated as case sensitive. Otherwise, it's upper cased as you've mentioned. Would be good if we could accommodate this too in the phoenix-spark integration. If we can't do this, then we may need to keep it as case sensitive as otherwise we can't support the case sensitive use case.

randerzander · 2015-08-23T19:20:06Z

If I understand correctly, this change should work-

Any fieldnames with starting and ending quotes are untouched. If they aren't quoted, then they're auto-capitalized.

Does this work better?

JamesRTaylor · 2015-08-23T19:55:00Z

Yes, I think that would work. We'd need unit tests around this too, please. The function we go through to normalize identifier/column references is SchemaUtil.normalizeIdentifier(String identifier). Here's a SQL example (which would work fine):
{code}
CREATE TABLE "t" (id VARCHAR PRIMARY KEY, "v" VARCHAR);
UPSERT INTO "t" (ID, "v") VALUES ('a','b');
SELECT ID, "v" FROM "t";
SELECT id, "v" FROM "t";
{code}

Would you have some cycles to review this pull, @jmahonin?

jmahonin · 2015-08-23T21:30:12Z

Sure thing, on a quick glance on mobile this looks good, but I'll try spend some time with it tomorrow.

jmahonin · 2015-08-25T15:34:07Z

@randerzander Not sure if you saw, but I've got a new version of your patch up at https://issues.apache.org/jira/browse/PHOENIX-2196

If you could take a quick look at let me know if that works for you I'll get it in ASAP.

taoshide · 2016-02-24T02:07:42Z

Spark DataFrame to save the data, how to use SEQUENCE to call the saveToPhoenix method?

stoty · 2023-08-01T12:41:08Z

Already merged.

auto capitalize DF field names in DataFrameFunctions save method

c32d65c

Preserved quoted field names, else capitalize field names

6053879

stoty closed this Aug 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PHOENIX-2196 - auto capitalize field names in DF->Phoenix table save method #114

PHOENIX-2196 - auto capitalize field names in DF->Phoenix table save method #114

randerzander commented Aug 23, 2015

randerzander commented Aug 23, 2015

JamesRTaylor commented Aug 23, 2015

randerzander commented Aug 23, 2015

JamesRTaylor commented Aug 23, 2015

jmahonin commented Aug 23, 2015

jmahonin commented Aug 25, 2015

taoshide commented Feb 24, 2016

stoty commented Aug 1, 2023

PHOENIX-2196 - auto capitalize field names in DF->Phoenix table save method #114

PHOENIX-2196 - auto capitalize field names in DF->Phoenix table save method #114

Conversation

randerzander commented Aug 23, 2015

randerzander commented Aug 23, 2015

JamesRTaylor commented Aug 23, 2015

randerzander commented Aug 23, 2015

JamesRTaylor commented Aug 23, 2015

jmahonin commented Aug 23, 2015

jmahonin commented Aug 25, 2015

taoshide commented Feb 24, 2016

stoty commented Aug 1, 2023