Skip to content

Cannot restrict DataFrame to certain mapping field #497

Closed
@analyticswarescott

Description

@analyticswarescott

The DataFrame returned by JavaEsSparkSQL.esDF contains Scala Buffers when a query string is specified, but not when the simpler overload is used. Code snippet and log output is below.

    DataFrame rdd = null;
            SourceLoaderES.logger.info(" loading " + indexName + "/" + docType + " using REST to get column list ");

            String query = getQueryString(indexName, docType);
            SourceLoaderES.logger.debug("ES query string" + query);                 
           // rdd = JavaEsSparkSQL.esDF(_handler._sc, indexName + "/" + docType, query);

           rdd = JavaEsSparkSQL.esDF(_handler._sc, indexName + "/" + docType);
           DataFrame rdd2 = JavaEsSparkSQL.esDF(_handler._sc, indexName + "/" + docType, query);


        SourceLoaderES.logger.debug("inferred SCHEMA " + rdd.schema().toString());


        if (AdminMgr.getConfig(SyncRunner.DEBUG_ENABLE_DF_COUNTS) != null) {
            if (AdminMgr.getConfig(SyncRunner.DEBUG_ENABLE_DF_COUNTS).equals( "true")) {
               logger.warn(" DataFrame count: " + rdd.count());
               logger.warn(" DataFrame first row: " + rdd.showString(1));
            }
        }
        logger.warn(" DataFrame first row: " + rdd.showString(1));
        logger.warn(" DataFrame2 first row: " + rdd2.showString(1));
        //SourceLoaderES.logger.info("ES SOURCE " + sourceName + "  SCHEMA " + rdd.schema().toString());

Log output from this snippet:

2015-07-10 17:09:16,281 [ (Sync) 15 - Sync 1] INFO  com.dg.data.sync.SourceLoaderES -  loading g5b778bb6-faa0-41a0-bb36-8a4c54b13774_dg_dim1/dim_lookups using REST to get column list 
2015-07-10 17:09:16,297 [ (Sync) 15 - Sync 1] DEBUG com.dg.data.sync.SourceLoaderES -  returned mapping for query string: {"properties":{"xid":{"type":"string"},"ApplicationLanguageId":{"type":"long"},"LookupKeyActive":{"type":"string"},"LookupKey":{"type":"long"},"LookupLevel":{"type":"long"},"LookupModule":{"type":"long"},"LookupKeyName":{"type":"string"}}}
2015-07-10 17:09:16,297 [ (Sync) 15 - Sync 1] DEBUG com.dg.data.sync.SourceLoaderES - ES query string{"query":{"match_all":{}},"fields":["xid","ApplicationLanguageId","LookupKeyActive","LookupKey","LookupLevel","LookupModule","LookupKeyName"]}
2015-07-10 17:09:16,359 [ (Sync) 15 - Sync 1] DEBUG com.dg.data.sync.SourceLoaderES - inferred SCHEMA StructType(StructField(ApplicationLanguageId,LongType,true), StructField(LookupKey,LongType,true), StructField(LookupKeyActive,StringType,true), StructField(LookupKeyName,StringType,true), StructField(LookupLevel,LongType,true), StructField(LookupModule,LongType,true), StructField(xid,StringType,true))
2015-07-10 17:09:16,641 [ (Sync) 15 - Sync 1] WARN  com.dg.data.sync.SourceLoaderES -  DataFrame first row: ApplicationLanguageId LookupKey LookupKeyActive LookupKeyName LookupLevel LookupModule xid           
1                     11        true            11 am         2           61           61-2-true-1-11
2015-07-10 17:09:16,797 [ (Sync) 15 - Sync 1] WARN  com.dg.data.sync.SourceLoaderES -  DataFrame2 first row: ApplicationLanguageId LookupKey  LookupKeyActive LookupKeyName LookupLevel LookupModule xid                 
Buffer(1)             Buffer(11) Buffer(true)    Buffer(11 am) Buffer(2)   Buffer(61)   Buffer(61-2-true-...

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions