Use the official aws-sdk instead of jet3t #5382

jihoonson · 2018-02-12T19:38:30Z

Fixes #4289.

In this PR, the version of the below libraries are changed.

Hadoop => ~~2.9.0 to match library versions~~ 2.8.3
AWS-SDK => 1.11.199 (Compatibility issue with aws-java-sdk:1.10.77 and httpclient:4.5.3 #4456)
Jackson => ~~2.8.10~~ 2.6.7
HttpClient => 4.5.3 (https://github.com/druid-io/druid/blob/master/common/pom.xml#L160-L169, https://issues.apache.org/jira/browse/HTTPCLIENT-1727)

Also, I removed AWSCredentialsProvider which is not used anymore.

I've done the below tests.

Indexing
- Hadoop indexing with Hadoop 2.7.5 (works with S3NFileSystem)
- Hadoop indexing with Hadoop 2.8.3
- Hadoop indexing with Hadoop 2.9.0
- Hadoop indexing with Hadoop 3.0 (works when hadoopDependencyCoordinates was set to 2.9.0)
- Local indexing
  - With prefetch
  - Without prefetch
  - Multi-region support: can read data from multiple regions
insert-segment-to-db tool: this tool finds segments under the given directory of deep storage
Moving segments with moveTask
Killing segments with killTask

This change is

b-slim · 2018-02-12T20:54:10Z

@jihoonson does this means we are dropping support for Hadoop < 2.9.0 ?

jihoonson · 2018-02-12T21:09:38Z

@b-slim yes. I tested with 2.7.5, but it didn't work because of the mismatched version of aws-sdk library.

jihoonson · 2018-02-12T21:11:08Z

I added WIP label because this PR needs more testing on larger clusters. My tests were done only in my laptop.

jihoonson · 2018-02-12T21:12:28Z

Also, there are some TODO notes I left which should be addressed.

gianm · 2018-02-12T21:13:10Z

Is it possible to keep the same version of aws-java-sdk as we have now, and do this patch without changing the Hadoop version? Or is there a reason they both need to be upgraded?

I am just asking because people struggle with upgrades every time we change the Hadoop version, so I was hoping it wouldn't be necessary (I was hoping the next upgrade after 2.7.3 would be to 3.x, sometime in the future after its deployment becomes more widespread).

jihoonson · 2018-02-12T21:24:41Z

@gianm yes, there is a reason. To use aws-java-sdk, we need to match its version to that used by Hadoop as well as other libraries' versions like jackson, httpclient, and so on. (Unfortunately, I lost my note listing the libraries needing version change.) From my testing, 2.9.0 was the oldest working version.

gianm · 2018-02-12T21:43:15Z

@gianm yes, there is a reason. To use aws-java-sdk, we need to match its version to that used by Hadoop as well as other libraries' versions like jackson, httpclient, and so on. (Unfortunately, I lost my note listing the libraries needing version change.) From my testing, 2.9.0 was the oldest working version.

I don't understand why we need to use aws-java-sdk from Hadoop at all. I thought that when we run Hadoop jobs, we use its FileSystem implementations both for reading (possibly from S3) and writing (to a possibly S3 deep storage) and do not call jets3t directly. Could we solve this by excluding the java sdk from the main classloader when we are running Hadoop jobs, meaning we only load the one provided in hadoop-dependencies?

gianm · 2018-02-12T21:48:04Z

I don't understand why we need to use aws-java-sdk from Hadoop at all.

This should say: I don't understand why we need to use Druid's version of aws-java-sdk from Hadoop at all.

b-slim · 2018-02-12T22:02:56Z

FYI am all in to move to 3.0 but I really think we should be able to support hadoop 2.7 line (if possible) 2.9 is not really popular and I don't think it will be, the most likely, the users will stick with 2.7 line or do the leap and move to 3.0.
@jihoonson can you please elaborate around #5382 (comment) ? what is the test that failed and what kind of dependency is breaking? Thanks

jihoonson · 2018-02-13T05:04:12Z

I don't understand why we need to use aws-java-sdk from Hadoop at all. I thought that when we run Hadoop jobs, we use its FileSystem implementations both for reading (possibly from S3) and writing (to a possibly S3 deep storage) and do not call jets3t directly.

Yes, this is correct. What I meant is the old implementation of S3AFileSystem uses some deprecated APIs of aws-sdk which are not in the recent versions. This is not just an issue of aws-sdk, but also some other libraries like jackson.

jihoonson · 2018-02-13T05:19:31Z

@b-slim when I tested with hadoop 2.7.5, I could see the below error.

2018-02-13T05:02:23,061 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_lineitem_2018-02-13T05:02:00.954Z, type=index_hadoop, dataSource=lineitem}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:222) ~[druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:184) ~[druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:450) [druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:422) [druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_161]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_161]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_161]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_161]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_161]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:219) ~[druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	... 7 more
Caused by: java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager.<init>(Lcom/amazonaws/services/s3/AmazonS3;Ljava/util/concurrent/ThreadPoolExecutor;)V
	at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:287) ~[?:?]
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667) ~[?:?]
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93) ~[?:?]
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2701) ~[?:?]
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2683) ~[?:?]
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:372) ~[?:?]
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) ~[?:?]
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:500) ~[?:?]
	at org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:110) ~[?:?]
	at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301) ~[?:?]
	at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318) ~[?:?]
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196) ~[?:?]
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) ~[?:?]
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) ~[?:?]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_161]
	at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_161]
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) ~[?:?]
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) ~[?:?]
	at io.druid.indexer.DetermineHashedPartitionsJob.run(DetermineHashedPartitionsJob.java:119) ~[druid-indexing-hadoop-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at io.druid.indexer.JobHelper.runJobs(JobHelper.java:362) ~[druid-indexing-hadoop-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:91) ~[druid-indexing-hadoop-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:325) ~[druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_161]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_161]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_161]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_161]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:219) ~[druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	... 7 more

This error means that S3AFileSystem of 2.7.5 is using a constructor of TransferManager which is not in the recent version of aws-sdk. So, I added an older version (1.7.4) of aws-sdk to the classpath, and changed the dependency order in the classpath for aws-sdk 1.7.4 to be appeared earlier. This caused another error shown below.

Error: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;
Error: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

Error: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;
Error: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;
Error: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;
Error: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;
Error: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;
Error: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;
Error: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;
2018-02-12T23:37:31,308 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_lineitem_2018-02-12T23:36:37.550Z, type=index_hadoop, dataSource=lineitem}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:231) ~[druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:184) ~[druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:450) [druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:422) [druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_161]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_161]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_161]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_161]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_161]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:228) ~[druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	... 7 more
Caused by: io.druid.java.util.common.ISE: Job[class io.druid.indexer.DetermineHashedPartitionsJob] failed!
	at io.druid.indexer.JobHelper.runJobs(JobHelper.java:384) ~[druid-indexing-hadoop-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:91) ~[druid-indexing-hadoop-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:325) ~[druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_161]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_161]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_161]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_161]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:228) ~[druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	... 7 more

This error came from the mismatched version of Jackson, so I also added an older version of Jackson to the classpath as well and adjusted the order. This causes the below error.

2018-02-12T23:56:18,757 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_lineitem_2018-02-12T23:56:12.573Z, type=index_hadoop, dataSource=lineitem}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:231) ~[druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:184) ~[druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:450) [druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:422) [druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_161]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_161]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_161]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_161]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_161]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:228) ~[druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	... 7 more
Caused by: java.lang.ExceptionInInitializerError
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:311) ~[druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_161]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_161]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_161]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_161]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:228) ~[druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	... 7 more
Caused by: com.google.inject.CreationException: Unable to create injector, see the following errors:

1) Error in custom provider, java.lang.VerifyError: Cannot inherit from final class
  at io.druid.jackson.JacksonModule.jsonMapper(JacksonModule.java:46)
  at io.druid.jackson.JacksonModule.jsonMapper(JacksonModule.java:46)
  while locating com.fasterxml.jackson.databind.ObjectMapper annotated with interface io.druid.guice.annotations.Json
  while locating com.fasterxml.jackson.databind.ObjectMapper
    for the 1st parameter of io.druid.guice.JsonConfigurator.<init>(JsonConfigurator.java:67)
  at io.druid.guice.ConfigModule.configure(ConfigModule.java:40)
  while locating io.druid.guice.JsonConfigurator
    for the 2nd parameter of io.druid.guice.JsonConfigProvider.inject(JsonConfigProvider.java:188)
  at io.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:133)

2) Error in custom provider, java.lang.VerifyError: Cannot inherit from final class
  at io.druid.jackson.JacksonModule.jsonMapper(JacksonModule.java:46)
  at io.druid.jackson.JacksonModule.jsonMapper(JacksonModule.java:46)
  while locating com.fasterxml.jackson.databind.ObjectMapper annotated with interface io.druid.guice.annotations.Json
  while locating com.fasterxml.jackson.databind.ObjectMapper
    for the 1st parameter of io.druid.guice.JsonConfigurator.<init>(JsonConfigurator.java:67)
  at io.druid.guice.ConfigModule.configure(ConfigModule.java:40)
  while locating io.druid.guice.JsonConfigurator
    for the 2nd parameter of io.druid.guice.JsonConfigProvider.inject(JsonConfigProvider.java:188)
  at io.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:133)

2 errors
	at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:470) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.InternalInjectorCreator.injectDynamically(InternalInjectorCreator.java:176) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:110) ~[guice-4.1.0.jar:?]
	at com.google.inject.Guice.createInjector(Guice.java:99) ~[guice-4.1.0.jar:?]
	at com.google.inject.Guice.createInjector(Guice.java:73) ~[guice-4.1.0.jar:?]
	at io.druid.guice.GuiceInjectors.makeStartupInjector(GuiceInjectors.java:60) ~[druid-processing-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at io.druid.indexer.HadoopDruidIndexerConfig.<clinit>(HadoopDruidIndexerConfig.java:106) ~[druid-indexing-hadoop-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:311) ~[druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_161]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_161]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_161]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_161]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:228) ~[druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	... 7 more
Caused by: java.lang.VerifyError: Cannot inherit from final class
	at java.lang.ClassLoader.defineClass1(Native Method) ~[?:1.8.0_161]
	at java.lang.ClassLoader.defineClass(ClassLoader.java:763) ~[?:1.8.0_161]
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) ~[?:1.8.0_161]
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) ~[?:1.8.0_161]
	at java.net.URLClassLoader.access$100(URLClassLoader.java:73) ~[?:1.8.0_161]
	at java.net.URLClassLoader$1.run(URLClassLoader.java:368) ~[?:1.8.0_161]
	at java.net.URLClassLoader$1.run(URLClassLoader.java:362) ~[?:1.8.0_161]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_161]
	at java.net.URLClassLoader.findClass(URLClassLoader.java:361) ~[?:1.8.0_161]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_161]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_161]
	at com.fasterxml.jackson.datatype.guava.GuavaModule.setupModule(GuavaModule.java:55) ~[jackson-datatype-guava-2.8.10.jar:2.8.10]
	at com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:524) ~[jackson-databind-2.8.10.jar:2.8.10]
	at io.druid.jackson.DefaultObjectMapper.<init>(DefaultObjectMapper.java:47) ~[druid-processing-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at io.druid.jackson.DefaultObjectMapper.<init>(DefaultObjectMapper.java:35) ~[druid-processing-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at io.druid.jackson.JacksonModule.jsonMapper(JacksonModule.java:46) ~[druid-processing-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at io.druid.jackson.JacksonModule$$FastClassByGuice$$fa1bdb70.invoke(<generated>) ~[druid-processing-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at com.google.inject.internal.ProviderMethod$FastClassProviderMethod.doProvision(ProviderMethod.java:264) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.ProviderMethod$Factory.provision(ProviderMethod.java:401) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.ProviderMethod$Factory.get(ProviderMethod.java:376) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:194) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:56) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:110) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:90) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:268) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:194) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.SingleMethodInjector.inject(SingleMethodInjector.java:82) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.MembersInjectorImpl.injectMembers(MembersInjectorImpl.java:132) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.MembersInjectorImpl$1.call(MembersInjectorImpl.java:93) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.MembersInjectorImpl$1.call(MembersInjectorImpl.java:80) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1085) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.MembersInjectorImpl.injectAndNotify(MembersInjectorImpl.java:80) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.Initializer$InjectableReference.get(Initializer.java:223) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.Initializer.injectAll(Initializer.java:132) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.InternalInjectorCreator.injectDynamically(InternalInjectorCreator.java:174) ~[guice-4.1.0.jar:?]
	at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:110) ~[guice-4.1.0.jar:?]
	at com.google.inject.Guice.createInjector(Guice.java:99) ~[guice-4.1.0.jar:?]
	at com.google.inject.Guice.createInjector(Guice.java:73) ~[guice-4.1.0.jar:?]
	at io.druid.guice.GuiceInjectors.makeStartupInjector(GuiceInjectors.java:60) ~[druid-processing-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at io.druid.indexer.HadoopDruidIndexerConfig.<clinit>(HadoopDruidIndexerConfig.java:106) ~[druid-indexing-hadoop-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:311) ~[druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_161]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_161]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_161]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_161]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:228) ~[druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
	... 7 more

I guess the last error is because of the mismatched Jackson version when the peon tries to initialize some variables using Guice injection which means a newer version of Jackson is required. I'm not sure how the second and the third errors can be fixed (maybe by shading?).

I understand what @gianm and @b-slim are concerned with. I'm doing further testing to figure out what's the best way. I'll post the result here once my testing is done.

drcrallen · 2018-02-13T17:18:17Z

@jihoonson does this deprecate https://github.com/druid-io/druid/blob/druid-0.12.0-rc1/pom.xml#L1170-L1176 the spark2 profile?

drcrallen · 2018-02-13T17:22:27Z

Related: #5288

drcrallen · 2018-02-13T17:23:35Z

extensions-core/s3-extensions/pom.xml

-            <groupId>net.java.dev.jets3t</groupId>
-            <artifactId>jets3t</artifactId>
+            <groupId>com.amazonaws</groupId>
+            <artifactId>aws-java-sdk-s3</artifactId>


is this still needed since aws-common pulls in the bundle?

I'll check this.

drcrallen · 2018-02-13T17:24:46Z

Please add the removal of AWSSessionCredentialsAdapter to the master PR comment with justification (i.e. does the aws sdk do this already?)

drcrallen · 2018-02-13T17:27:52Z

extensions-core/s3-extensions/src/main/java/io/druid/storage/s3/S3DataSegmentMover.java

+      final S3ObjectSummary objectSummary = listResult.getObjectSummaries().get(0);
+      if (objectSummary.getStorageClass() != null &&
+          StorageClass.fromValue(StringUtils.toUpperCase(objectSummary.getStorageClass())).equals(StorageClass.Glacier)) {
+        throw new ISE(


This is to keep it from being caught in a retry loop? can a comment be added to the code here as such?

Looks like my fault. I don't want to change any logic in this PR. Changed to AmazonServiceException.

drcrallen · 2018-02-13T17:30:17Z

extensions-core/s3-extensions/src/main/java/io/druid/storage/s3/S3DataSegmentPuller.java

+            @Override
+            public void close() throws IOException
+            {
+              delegate.close();


s3Object should still be closed even if delegate close has an error

Thanks, fixed.

drcrallen · 2018-02-13T17:31:00Z

extensions-core/s3-extensions/src/main/java/io/druid/storage/s3/S3DataSegmentPuller.java

          }
+
+          return new InputStream()


would FilterInputStream with a close override work here?

drcrallen · 2018-02-13T17:33:55Z

extensions-core/s3-extensions/src/main/java/io/druid/storage/s3/S3DataSegmentPusher.java

    } else {
-      s3Client.putObject(bucketName, object);
+      log.info("Pushing [%s] to bucket[%s] and key[%s].", file, bucket, key);


(nit-pick) would this be better if the s3client had a logger?

We're not really adding any information here that couldn't be gathered from reasonable logging in the s3client I don't think?

I'm not sure. I just moved this log which was outside of the if clause to its inside.

drcrallen · 2018-02-13T17:34:31Z

extensions-core/s3-extensions/src/main/java/io/druid/storage/s3/S3DataSegmentPusher.java

    }
+
+    log.info("Deleting file [%s]", file.getAbsolutePath());


suggest adding in info as to why, like Deleting temporary cached file [%s]

drcrallen · 2018-02-13T17:34:47Z

extensions-core/s3-extensions/src/main/java/io/druid/storage/s3/S3DataSegmentPusher.java

    }
+
+    log.info("Deleting file [%s]", file.getAbsolutePath());
+    file.delete();


Do we care if it was successful?

Not sure. The original code didn't care about it.

drcrallen · 2018-02-13T17:36:54Z

...ons-core/s3-extensions/src/main/java/io/druid/storage/s3/S3TimestampVersionedDataFinder.java

+            while (objectSummaryIterator.hasNext()) {
+              final S3ObjectSummary objectSummary = objectSummaryIterator.next();
+              // TODO: what is going on here?
+              String keyString = objectSummary.getKey().substring(coords.path.length());


it is taking the path beyond the "search" prefix, and checking the regex to see if it matches the remainder (after the prefix)

drcrallen · 2018-02-13T17:41:44Z

extensions-core/s3-extensions/src/main/java/io/druid/storage/s3/S3Utils.java

    };
  }

-  public static String constructSegmentPath(String baseKey, String storageDir)
+  static String constructSegmentPath(String baseKey, String storageDir)


we don't know if these were used in extensions, can the things that were public remain so?

drcrallen · 2018-02-13T17:42:14Z

extensions-core/s3-extensions/src/main/java/io/druid/storage/s3/S3Utils.java

+    if (key.endsWith("/") && objectMetadata.getContentLength() == 0) {
+      return true;
+    }
+    // Recognize s3sync.rb directory placeholders by MD5/ETag value.


is this string in source somewhere?

Yes, it was in the original code.

drcrallen · 2018-02-13T17:45:39Z

extensions-core/s3-extensions/src/main/java/io/druid/storage/s3/S3Utils.java

+    }
+    final S3ObjectSummary objectSummary = result.getObjectSummaries().get(0);
+    if (!objectSummary.getBucketName().equals(bucket) || !objectSummary.getKey().equals(key)) {
+      throw new ISE("Wrong object[%s] for bucket[%s] and key[%s]", objectSummary, bucket, key);


Does this mean that if there are multiple keys that match the prefix, then if the returned value isn't the first one in the result this will fail? that seems kind of fragile.

Yes, I added javadoc for this method.

jihoonson · 2018-02-14T02:51:51Z

@drcrallen

@jihoonson does this deprecate https://github.com/druid-io/druid/blob/druid-0.12.0-rc1/pom.xml#L1170-L1176 the spark2 profile?

Yes, maybe. I didn't test yet.

Please add the removal of AWSSessionCredentialsAdapter to the master PR comment with justification (i.e. does the aws sdk do this already?)

AWSSessionCredentialsAdapter was used by only S3StorageDruidModule to pass credentials from aws-java-sdk's AWSCredentialsProvider to jet3t's Rest3Client. We don't need it anymore. Updated the master PR description.

drcrallen

Any configs of plain text should be compatible with a PasswordProvider once it goes in, so I'm ok with that. Please file the issue before merging

jihoonson · 2018-03-20T19:48:11Z

@drcrallen thanks. Raised #5507.

jon-wei · 2018-03-20T20:01:46Z

extensions-core/s3-extensions/pom.xml

-            <artifactId>jets3t</artifactId>
-            <scope>provided</scope>
-        </dependency>
+      <dependency>


indentation looks off here

jon-wei · 2018-03-20T20:13:43Z

extensions-core/s3-extensions/src/main/java/io/druid/storage/s3/S3DataSegmentPusher.java

    }
+
+    log.info("Deleting temporary cached file [%s]", file.getAbsolutePath());
+    file.delete();


Would it be better to clean this file up in a finally block?

Good point. Fixed.

jon-wei · 2018-03-20T20:30:57Z

extensions-core/s3-extensions/src/main/java/io/druid/storage/s3/S3Utils.java

+    if (result.getKeyCount() == 0) {
+      throw new ISE("Cannot find object for bucket[%s] and key[%s]", bucket, key);
+    }
+    final S3ObjectSummary objectSummary = result.getObjectSummaries().get(0);


Is it possible for result.getKeyCount() to be > 0 but result.getObjectSummaries() to have size 0?

From the javadoc of ListObjectsV2Result, getKeyCount() returns number of keys returned with this response, and getObjectSummaries() returns a list of the object summaries describing the objects stored in the S3 bucket. Since each object associates with a key in S3, it shouldn't happen.

jon-wei · 2018-03-20T20:40:50Z

extensions-core/s3-extensions/src/main/java/io/druid/storage/s3/S3Utils.java

  {
    String filename = key.substring(key.lastIndexOf("/") + 1); // characters after last '/'
    filename = filename.substring(0, filename.length() - suffix.length()); // remove the suffix from the end
    return filename;
  }
+
+  static AccessControlList grantFullControlToBucketOwver(AmazonS3 s3Client, String bucket)


Owver -> Owner

jon-wei · 2018-03-20T20:50:54Z

extensions-core/s3-extensions/src/main/java/io/druid/storage/s3/S3Utils.java

+
+  /**
+   * Gets a single {@link S3ObjectSummary} from s3. Since this method might throw an exception if there are multiple
+   * objets that match the given key, this method should be used only when it's guaranteed that the given key is unique


objets -> objects

jihoonson · 2018-03-20T21:45:44Z

@jon-wei thanks for the review. Addressed your comments.

jihoonson · 2018-03-21T22:36:15Z

I'm going to merge this PR shortly. @b-slim @drcrallen @jon-wei thanks for the review!

b-slim · 2018-03-23T18:42:37Z

@jihoonson i see this PR is merged so what is the story now? are we dropping support for Hadoop < 2.8.3 if, that is the case did we have a round of vote at mailing list? is this suppose to be 0.13.0 ?

jihoonson · 2018-03-23T19:01:08Z

@b-slim we still support Hadoop 2.7 line. I tested Hadoop 2.7.3 and 2.7.5. When Hadoop 2.7 is used with s3 deep storage together, S3N still works but S3A doesn't work which is same for the master before this PR. To use S3A, they need a custom hadoop-aws which shades all conflicted libraries like aws, guava, and jackson.

b-slim · 2018-03-23T19:57:24Z

@jihoonson,
-First as per this thread we are suppose to have support from 2.6.0 and higher, https://groups.google.com/forum/#!topic/druid-development/vFjR0pdq83U. So am not sure if you have done testing with that as well or we are dropping support for any version less than 2.7.3?
-Second, am wondering, have you tested that with HDFS as deep storage or only S3?
-Third if it works with 2.7.X why changing the version Hadoop to 2.8.3?
Thanks.

jihoonson · 2018-03-23T20:29:57Z

@b-slim

-First as per this thread we are suppose to have support from 2.6.0 and higher, https://groups.google.com/forum/#!topic/druid-development/vFjR0pdq83U. So am not sure if you have done testing with that as well or we are dropping support for any version less than 2.7.3?

I tested Apache Hadoop 2.6.0, 2.7.1, 2.7.3, 2.7.5, 2.8.3, 2.9.0, 3.0.0, Cloudera 5.7.0, and HDP 5.7. But, it would be great if you can test more Apache Hadoop and HDP versions as well.

-Second, am wondering, have you tested that with HDFS as deep storage or only S3?

I've tested four cases when deep storage is HDFS, S3A via HDFS, S3N via HDFS, and direct S3.

-Third if it works with 2.7.X why changing the version Hadoop to 2.8.3?

Well, at first, I wanted to use Hadoop 2.9 to use more recent libraries especially Jackson. But, it broke the support for Hadoop 2.7, so I needed to change it again. IMO, it doesn't make sense to use Hadoop 2.7.3 as our default Hadoop version because we don't support the whole features for that version like S3A.

Deprecated due to apache#5382

Deprecated due to #5382

Deprecated due to apache#5382

* Update defaultHadoopCoordinates in documentation. To match changes applied in #5382. * Remove a parameter with defaults from example configuration file. If it has reasonable defaults, then why would it be in an example config file? Also, it is yet another place that has been forgotten to be updated and will be forgotten in the future. Also, if someone is running different hadoop version, then there's much more work to be done than just changing this property, so why give users false hopes? * Fix typo in documentation.

* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks Currently a config called 'maxRowsInMemory' is present which affects how much memory gets used for indexing.If this value is not optimal for your JVM heap size, it could lead to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might be bad for query performance and a higher value will limit number of persists but require more jvm heap space and could lead to OOM. 'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes kept in memory before persisting. * The default value is 1/3(Runtime.maxMemory()) * To maintain the current behaviour set 'maxBytesInMemory' to -1 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them will be respected i.e. the first one to go above threshold will trigger persist * Fix check style and remove a comment * Add overlord unsecured paths to coordinator when using combined service (#5579) * Add overlord unsecured paths to coordinator when using combined service * PR comment * More error reporting and stats for ingestion tasks (#5418) * Add more indexing task status and error reporting * PR comments, add support in AppenderatorDriverRealtimeIndexTask * Use TaskReport instead of metrics/context * Fix tests * Use TaskReport uploads * Refactor fire department metrics retrieval * Refactor input row serde in hadoop task * Refactor hadoop task loader names * Truncate error message in TaskStatus, add errorMsg to task report * PR comments * Allow getDomain to return disjointed intervals (#5570) * Allow getDomain to return disjointed intervals * Indentation issues * Adding feature thetaSketchConstant to do some set operation in PostAgg (#5551) * Adding feature thetaSketchConstant to do some set operation in PostAggregator * Updated review comments for PR #5551 - Adding thetaSketchConstant * Fixed CI build issue * Updated review comments 2 for PR #5551 - Adding thetaSketchConstant * Fix taskDuration docs for KafkaIndexingService (#5572) * With incremental handoff the changed line is no longer true. * Add doc for automatic pendingSegments (#5565) * Add missing doc for automatic pendingSegments * address comments * Fix indexTask to respect forceExtendableShardSpecs (#5509) * Fix indexTask to respect forceExtendableShardSpecs * add comments * Deprecate spark2 profile in pom.xml (#5581) Deprecated due to #5382 * CompressionUtils: Add support for decompressing xz, bz2, zip. (#5586) Also switch various firehoses to the new method. Fixes #5585. * This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks Currently a config called 'maxRowsInMemory' is present which affects how much memory gets used for indexing.If this value is not optimal for your JVM heap size, it could lead to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might be bad for query performance and a higher value will limit number of persists but require more jvm heap space and could lead to OOM. 'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes kept in memory before persisting. * The default value is 1/3(Runtime.maxMemory()) * To maintain the current behaviour set 'maxBytesInMemory' to -1 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them will be respected i.e. the first one to go above threshold will trigger persist * Address code review comments * Fix the coding style according to druid conventions * Add more javadocs * Rename some variables/methods * Other minor issues * Address more code review comments * Some refactoring to put defaults in IndexTaskUtils * Added check for maxBytesInMemory in AppenderatorImpl * Decrement bytes in abandonSegment * Test unit test for multiple sinks in single appenderator * Fix some merge conflicts after rebase * Fix some style checks * Merge conflicts * Fix failing tests Add back check for 0 maxBytesInMemory in OnHeapIncrementalIndex * Address PR comments * Put defaults for maxRows and maxBytes in TuningConfig * Change/add javadocs * Refactoring and renaming some variables/methods * Fix TeamCity inspection warnings * Added maxBytesInMemory config to HadoopTuningConfig * Updated the docs and examples * Added maxBytesInMemory config in docs * Removed references to maxRowsInMemory under tuningConfig in examples * Set maxBytesInMemory to 0 until used Set the maxBytesInMemory to 0 if user does not set it as part of tuningConfing and set to part of max jvm memory when ingestion task starts * Update toString in KafkaSupervisorTuningConfig * Use correct maxBytesInMemory value in AppenderatorImpl * Update DEFAULT_MAX_BYTES_IN_MEMORY to 1/6 max jvm memory Experimenting with various defaults, 1/3 jvm memory causes OOM * Update docs to correct maxBytesInMemory default value * Minor to rename and add comment * Add more details in docs * Address new PR comments * Address PR comments * Fix spelling typo

Deprecated due to apache#5382

* Update defaultHadoopCoordinates in documentation. To match changes applied in apache#5382. * Remove a parameter with defaults from example configuration file. If it has reasonable defaults, then why would it be in an example config file? Also, it is yet another place that has been forgotten to be updated and will be forgotten in the future. Also, if someone is running different hadoop version, then there's much more work to be done than just changing this property, so why give users false hopes? * Fix typo in documentation.

…e#5583) * This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks Currently a config called 'maxRowsInMemory' is present which affects how much memory gets used for indexing.If this value is not optimal for your JVM heap size, it could lead to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might be bad for query performance and a higher value will limit number of persists but require more jvm heap space and could lead to OOM. 'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes kept in memory before persisting. * The default value is 1/3(Runtime.maxMemory()) * To maintain the current behaviour set 'maxBytesInMemory' to -1 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them will be respected i.e. the first one to go above threshold will trigger persist * Fix check style and remove a comment * Add overlord unsecured paths to coordinator when using combined service (apache#5579) * Add overlord unsecured paths to coordinator when using combined service * PR comment * More error reporting and stats for ingestion tasks (apache#5418) * Add more indexing task status and error reporting * PR comments, add support in AppenderatorDriverRealtimeIndexTask * Use TaskReport instead of metrics/context * Fix tests * Use TaskReport uploads * Refactor fire department metrics retrieval * Refactor input row serde in hadoop task * Refactor hadoop task loader names * Truncate error message in TaskStatus, add errorMsg to task report * PR comments * Allow getDomain to return disjointed intervals (apache#5570) * Allow getDomain to return disjointed intervals * Indentation issues * Adding feature thetaSketchConstant to do some set operation in PostAgg (apache#5551) * Adding feature thetaSketchConstant to do some set operation in PostAggregator * Updated review comments for PR apache#5551 - Adding thetaSketchConstant * Fixed CI build issue * Updated review comments 2 for PR apache#5551 - Adding thetaSketchConstant * Fix taskDuration docs for KafkaIndexingService (apache#5572) * With incremental handoff the changed line is no longer true. * Add doc for automatic pendingSegments (apache#5565) * Add missing doc for automatic pendingSegments * address comments * Fix indexTask to respect forceExtendableShardSpecs (apache#5509) * Fix indexTask to respect forceExtendableShardSpecs * add comments * Deprecate spark2 profile in pom.xml (apache#5581) Deprecated due to apache#5382 * CompressionUtils: Add support for decompressing xz, bz2, zip. (apache#5586) Also switch various firehoses to the new method. Fixes apache#5585. * This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks Currently a config called 'maxRowsInMemory' is present which affects how much memory gets used for indexing.If this value is not optimal for your JVM heap size, it could lead to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might be bad for query performance and a higher value will limit number of persists but require more jvm heap space and could lead to OOM. 'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes kept in memory before persisting. * The default value is 1/3(Runtime.maxMemory()) * To maintain the current behaviour set 'maxBytesInMemory' to -1 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them will be respected i.e. the first one to go above threshold will trigger persist * Address code review comments * Fix the coding style according to druid conventions * Add more javadocs * Rename some variables/methods * Other minor issues * Address more code review comments * Some refactoring to put defaults in IndexTaskUtils * Added check for maxBytesInMemory in AppenderatorImpl * Decrement bytes in abandonSegment * Test unit test for multiple sinks in single appenderator * Fix some merge conflicts after rebase * Fix some style checks * Merge conflicts * Fix failing tests Add back check for 0 maxBytesInMemory in OnHeapIncrementalIndex * Address PR comments * Put defaults for maxRows and maxBytes in TuningConfig * Change/add javadocs * Refactoring and renaming some variables/methods * Fix TeamCity inspection warnings * Added maxBytesInMemory config to HadoopTuningConfig * Updated the docs and examples * Added maxBytesInMemory config in docs * Removed references to maxRowsInMemory under tuningConfig in examples * Set maxBytesInMemory to 0 until used Set the maxBytesInMemory to 0 if user does not set it as part of tuningConfing and set to part of max jvm memory when ingestion task starts * Update toString in KafkaSupervisorTuningConfig * Use correct maxBytesInMemory value in AppenderatorImpl * Update DEFAULT_MAX_BYTES_IN_MEMORY to 1/6 max jvm memory Experimenting with various defaults, 1/3 jvm memory causes OOM * Update docs to correct maxBytesInMemory default value * Minor to rename and add comment * Add more details in docs * Address new PR comments * Address PR comments * Fix spelling typo

Use the official aws-sdk instead of jet3t

d2fdc39

jihoonson added the WIP label Feb 12, 2018

fix compile and serde tests

19f5021

drcrallen reviewed Feb 13, 2018

View reviewed changes

address comments and fix test

dbc945c

drcrallen approved these changes Mar 20, 2018

View reviewed changes

jihoonson mentioned this pull request Mar 20, 2018

Credentials provider for S3 proxy configuration #5507

Closed

jon-wei reviewed Mar 20, 2018

View reviewed changes

address comments

66dae38

jon-wei approved these changes Mar 20, 2018

View reviewed changes

jihoonson merged commit 1ad898b into apache:master Mar 21, 2018

drcrallen pushed a commit to drcrallen/druid that referenced this pull request Apr 5, 2018

Deprecate spark2 profile in pom.xml

6f4975a

Deprecated due to apache#5382

drcrallen mentioned this pull request Apr 5, 2018

Deprecate spark2 profile in pom.xml #5581

Merged

drcrallen added a commit that referenced this pull request Apr 6, 2018

Deprecate spark2 profile in pom.xml (#5581)

b86ed99

Deprecated due to #5382

surekhasaharan pushed a commit to surekhasaharan/druid that referenced this pull request Apr 6, 2018

Deprecate spark2 profile in pom.xml (apache#5581)

e9906e8

Deprecated due to apache#5382

jkukul mentioned this pull request Apr 30, 2018

Update defaultHadoopCoordinates in documentation. #5720

Merged

sathishsri88 pushed a commit to sathishs/druid that referenced this pull request May 8, 2018

Deprecate spark2 profile in pom.xml (apache#5581)

5733688

Deprecated due to apache#5382

dclim added this to the 0.13.0 milestone Oct 8, 2018

dclim mentioned this pull request Oct 19, 2018

fix distribution to not include contrib extensions by default, don't … #6494

Merged

dclim mentioned this pull request Nov 27, 2018

Druid 0.13.0-incubating release notes #6442

Closed

anantmf mentioned this pull request Feb 2, 2019

With latest version of druid is it possible to use OpenStack Swift custom implementation (not Amazon S3)? #6980

Closed

pjain1 mentioned this pull request Jul 11, 2019

write value of bitmap as field name #8066

Merged

jihoonson mentioned this pull request Sep 10, 2020

Upgrade AWS SDK #10375

Closed

Use the official aws-sdk instead of jet3t #5382

Use the official aws-sdk instead of jet3t #5382

Conversation

jihoonson commented Feb 12, 2018 • edited Loading

b-slim commented Feb 12, 2018

jihoonson commented Feb 12, 2018

jihoonson commented Feb 12, 2018

jihoonson commented Feb 12, 2018

gianm commented Feb 12, 2018

jihoonson commented Feb 12, 2018

gianm commented Feb 12, 2018

gianm commented Feb 12, 2018

b-slim commented Feb 12, 2018

jihoonson commented Feb 13, 2018

jihoonson commented Feb 13, 2018

drcrallen commented Feb 13, 2018

drcrallen commented Feb 13, 2018

Choose a reason for hiding this comment

jihoonson Feb 14, 2018 • edited Loading

Choose a reason for hiding this comment

drcrallen commented Feb 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jihoonson commented Feb 14, 2018 • edited Loading

drcrallen left a comment

Choose a reason for hiding this comment

jihoonson commented Mar 20, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jihoonson commented Mar 20, 2018

jihoonson commented Mar 21, 2018

b-slim commented Mar 23, 2018 • edited Loading

jihoonson commented Mar 23, 2018

b-slim commented Mar 23, 2018

jihoonson commented Mar 23, 2018

jihoonson commented Feb 12, 2018 •

edited

Loading

jihoonson Feb 14, 2018 •

edited

Loading

jihoonson commented Feb 14, 2018 •

edited

Loading

b-slim commented Mar 23, 2018 •

edited

Loading