Closed
Description
The DataFrame returned by JavaEsSparkSQL.esDF contains Scala Buffers when a query string is specified, but not when the simpler overload is used. Code snippet and log output is below.
DataFrame rdd = null;
SourceLoaderES.logger.info(" loading " + indexName + "/" + docType + " using REST to get column list ");
String query = getQueryString(indexName, docType);
SourceLoaderES.logger.debug("ES query string" + query);
// rdd = JavaEsSparkSQL.esDF(_handler._sc, indexName + "/" + docType, query);
rdd = JavaEsSparkSQL.esDF(_handler._sc, indexName + "/" + docType);
DataFrame rdd2 = JavaEsSparkSQL.esDF(_handler._sc, indexName + "/" + docType, query);
SourceLoaderES.logger.debug("inferred SCHEMA " + rdd.schema().toString());
if (AdminMgr.getConfig(SyncRunner.DEBUG_ENABLE_DF_COUNTS) != null) {
if (AdminMgr.getConfig(SyncRunner.DEBUG_ENABLE_DF_COUNTS).equals( "true")) {
logger.warn(" DataFrame count: " + rdd.count());
logger.warn(" DataFrame first row: " + rdd.showString(1));
}
}
logger.warn(" DataFrame first row: " + rdd.showString(1));
logger.warn(" DataFrame2 first row: " + rdd2.showString(1));
//SourceLoaderES.logger.info("ES SOURCE " + sourceName + " SCHEMA " + rdd.schema().toString());
Log output from this snippet:
2015-07-10 17:09:16,281 [ (Sync) 15 - Sync 1] INFO com.dg.data.sync.SourceLoaderES - loading g5b778bb6-faa0-41a0-bb36-8a4c54b13774_dg_dim1/dim_lookups using REST to get column list
2015-07-10 17:09:16,297 [ (Sync) 15 - Sync 1] DEBUG com.dg.data.sync.SourceLoaderES - returned mapping for query string: {"properties":{"xid":{"type":"string"},"ApplicationLanguageId":{"type":"long"},"LookupKeyActive":{"type":"string"},"LookupKey":{"type":"long"},"LookupLevel":{"type":"long"},"LookupModule":{"type":"long"},"LookupKeyName":{"type":"string"}}}
2015-07-10 17:09:16,297 [ (Sync) 15 - Sync 1] DEBUG com.dg.data.sync.SourceLoaderES - ES query string{"query":{"match_all":{}},"fields":["xid","ApplicationLanguageId","LookupKeyActive","LookupKey","LookupLevel","LookupModule","LookupKeyName"]}
2015-07-10 17:09:16,359 [ (Sync) 15 - Sync 1] DEBUG com.dg.data.sync.SourceLoaderES - inferred SCHEMA StructType(StructField(ApplicationLanguageId,LongType,true), StructField(LookupKey,LongType,true), StructField(LookupKeyActive,StringType,true), StructField(LookupKeyName,StringType,true), StructField(LookupLevel,LongType,true), StructField(LookupModule,LongType,true), StructField(xid,StringType,true))
2015-07-10 17:09:16,641 [ (Sync) 15 - Sync 1] WARN com.dg.data.sync.SourceLoaderES - DataFrame first row: ApplicationLanguageId LookupKey LookupKeyActive LookupKeyName LookupLevel LookupModule xid
1 11 true 11 am 2 61 61-2-true-1-11
2015-07-10 17:09:16,797 [ (Sync) 15 - Sync 1] WARN com.dg.data.sync.SourceLoaderES - DataFrame2 first row: ApplicationLanguageId LookupKey LookupKeyActive LookupKeyName LookupLevel LookupModule xid
Buffer(1) Buffer(11) Buffer(true) Buffer(11 am) Buffer(2) Buffer(61) Buffer(61-2-true-...