Skip to content

Index datasource from Hadoop 3.1.1 hdfs failed in kerberized cluster #10456

@arifpratama398

Description

@arifpratama398

"Index datasource from Hadoop 3.1.1 hdfs failed in kerberized cluster"

Affected Version

0.18.1

Description

I am trying to index data in HDFS to Druid but failed.

Command :

curl --negotiate -u:druid-XXX@XXX.COM -b /tmp/krb5cc_1008 -X 'POST' -H 'Content-Type:application/json' -d @/home/druid/wikipedia-index-hadoop.json http://XXX.XXX.com:8390/druid/indexer/v1/task

Json Spec :

{
		  "type" : "index_hadoop",
		  "spec" : {
		    "dataSchema" : {
		      "dataSource" : "wikipedia_hadoop_29092020",
		      "parser" : {
		        "type" : "hadoopyString",
		        "parseSpec" : {
		          "format" : "json",
		          "dimensionsSpec" : {
		            "dimensions" : [
		              "channel",
		              "cityName",
		              "comment",
		              "countryIsoCode",
		              "countryName",
		              "isAnonymous",
		              "isMinor",
		              "isNew",
		              "isRobot",
		              "isUnpatrolled",
		              "metroCode",
		              "namespace",
		              "page",
		              "regionIsoCode",
		              "regionName",
		              "user",
		              { "name": "added", "type": "long" },
		              { "name": "deleted", "type": "long" },
		              { "name": "delta", "type": "long" }
		            ]
		          },
		          "timestampSpec" : {
		            "format" : "auto",
		            "column" : "time"
		          }
		        }
		      },
		      "metricsSpec" : [],
		      "granularitySpec" : {
		        "type" : "uniform",
		        "segmentGranularity" : "day",
		        "queryGranularity" : "none",
		        "intervals" : ["2015-09-12/2015-09-13"],
		        "rollup" : false
		      }
		    },
		    "ioConfig" : {
		      "type" : "hadoop",
		      "inputSpec" : {
		        "type" : "static",
		        "paths" : "/user/druid/quickstart/wikiticker-2015-09-12-sampled.json.gz"
		      }
		    },
		    "tuningConfig" : {
		      "type" : "hadoop",
		      "partitionsSpec" : {
		        "type" : "hashed",
		        "targetPartitionSize" : 5000000
		      },
		      "forceExtendableShardSpecs" : true,
		      "jobProperties" : {
		        "fs.default.name" : "hdfs://nn",
		        "fs.defaultFS" : "hdfs://nn/user/druid",
		        "dfs.datanode.address" : "0.0.0.0:50010",
		        "dfs.client.use.datanode.hostname" : "true",
		        "dfs.datanode.use.datanode.hostname" : "true",
		        "yarn.resourcemanager.hostname" : "xxx.xxx.com",
		        "yarn.nodemanager.vmem-check-enabled" : "false",
		        "mapreduce.map.java.opts" : "-Duser.timezone=UTC -Dfile.encoding=UTF-8",
		        "mapreduce.job.user.classpath.first" : "true",
		        "mapreduce.reduce.java.opts" : "-Duser.timezone=UTC -Dfile.encoding=UTF-8",
		        "mapreduce.map.memory.mb" : 1024,
		        "mapreduce.reduce.memory.mb" : 1024
		      }
		    }
		  },
		  "hadoopDependencyCoordinates": ["org.apache.hadoop:hadoop-client:3.1.1"]
		}

Errors while processing index in Task Log

org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
2020-09-30T03:27:20,417 WARN [task-runner-0-priority-0] org.apache.hadoop.ipc.Client - Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
2020-09-30T03:27:20,531 WARN [task-runner-0-priority-0] org.apache.hadoop.ipc.Client - Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
Error: com.google.inject.internal.Errors.checkNotNull(Ljava/lang/Object;Ljava/lang/String;)Ljava/lang/Object;
Error: com.google.inject.internal.Errors.checkNotNull(Ljava/lang/Object;Ljava/lang/String;)Ljava/lang/Object;

i am already set druid for hadoop kerberos cluster by set in _common

druid.security.extensions.loadList=["druid-kerberos"]
druid.hadoop.security.kerberos.keytab=/etc/security/keytabs/druid.headless.keytab
druid.hadoop.security.kerberos.principal=druid-XXX@XXX.COM

i am also following doc from https://druid.apache.org/docs/0.18.1/tutorials/tutorial-kerberos-hadoop.html ,copying hadoop configuration *-site.xml to druid conf dir but still facing same error.

am I missing something ?

Thanks in advance.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions