Skip to content
Permalink
Browse files
[CARBONDATA-4325] Update Data frame supported options in document and…
… fix partition table creation with df spatial property

Why is this PR needed?
1. Only specific properties are supported using dataframe options. Need to update the documentation.
2. Create partition table fails with Spatial index property for carbon table created with dataframe in spark-shell.

What changes were proposed in this PR?
1. Added data frame supported properties in the documentation.
2. Using spark-shell, the table gets created with carbon session and catalogTable.properties
is empty here. Getting the properties from catalogTable.storage.properties to access the properties set.

Does this PR introduce any user interface change?
No

Is any new testcase added?
No, tested in cluster.

This closes #4250
  • Loading branch information
ShreelekhyaG authored and Indhumathi27 committed Mar 4, 2022
1 parent 59f23c0 commit c840b5f30b15df54778b2a83608c727d25553d7c
Showing 2 changed files with 20 additions and 1 deletion.
@@ -96,6 +96,24 @@ df.write.format("carbon").save("/user/person_table")
val dfread = spark.read.format("carbon").load("/user/person_table")
dfread.show()
```
## Supported OPTIONS using dataframe

In addition to the above [Supported Options](#supported-options), following properties are supported using dataframe.

| Property | Default Value | Description |
|-----------------------------------|-------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| bucket_number | NA | Number of buckets to be created. For more details, see [Bucketing](./ddl-of-carbondata.md#bucketing). |
| bucket_columns | NA | Columns which are to be placed in buckets. For more details, see [Bucketing](./ddl-of-carbondata.md#bucketing). |
| streaming | false | Whether the table is a streaming table. For more details, see [Streaming](./ddl-of-carbondata.md#streaming). |
| timestampformat | yyyy-MM-dd HH:mm:ss | For specifying the format of TIMESTAMP data type column. For more details, see [TimestampFormat](./ddl-of-carbondata.md#dateformattimestampformat). |
| dateformat | yyyy-MM-dd | For specifying the format of DATE data type column. For more details, see [DateFormat](./ddl-of-carbondata.md#dateformattimestampformat). |
| SPATIAL_INDEX | NA | Used to configure Spatial Index name. This name is appended to `SPATIAL_INDEX` in the subsequent sub-property configurations. `xxx` in the below sub-properties refer to index name. Generated spatial index column is not allowed in any properties except in `SORT_COLUMNS` table property.For more details, see [Spatial Index](./spatial-index-guide). |
| SPATIAL_INDEX.xxx.type | NA | Type of algorithm for processing spatial data. Currently, supports 'geohash' and 'geosot'. |
| SPATIAL_INDEX.xxx.sourcecolumns | NA | longitude and latitude column names as in the table. These columns are used to generate index value for each row. |
| SPATIAL_INDEX.xxx.originLatitude | NA | Latitude of origin. |
| SPATIAL_INDEX.xxx.gridSize | NA | Grid size of raster data in metres. Currently, spatial index supports raster data. |
| SPATIAL_INDEX.xxx.conversionRatio | NA | Conversion factor. It allows user to translate longitude and latitude to long. For example, if the data to load is longitude = 13.123456, latitude = 101.12356. User can configure conversion ratio sub-property value as 1000000, and change data to load as longitude = 13123456 and latitude = 10112356. Operations on long is much faster compared to floating-point numbers. |
| SPATIAL_INDEX.xxx.class | NA | Optional user custom implementation class. Value is fully qualified class name. |

Reference : [list of carbon properties](./configuration-parameters.md)

@@ -928,7 +928,8 @@ object CommonLoadUtils {
.map(columnName => columnName.toLowerCase())
attributes.filterNot(a => staticPartCols.contains(a.name.toLowerCase))
}
val spatialProperty = catalogTable.properties.get(CarbonCommonConstants.SPATIAL_INDEX)
val spatialProperty = catalogTable.storage
.properties.get(CarbonCommonConstants.SPATIAL_INDEX)
// For spatial table, dataframe attributes will not contain geoId column.
val isSpatialTable = spatialProperty.isDefined && spatialProperty.nonEmpty &&
dfAttributes.length + 1 == expectedColumns.size

0 comments on commit c840b5f

Please sign in to comment.