Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix for "java.io.IOException: No FileSystem for scheme: hdfs" error #1721

Merged
merged 1 commit into from
Sep 13, 2015

Conversation

himanshug
Copy link
Contributor

@himanshug
Copy link
Contributor Author

tested it manually, looks like it fixes the issue.

@drcrallen
Copy link
Contributor

👍

@himanshug
Copy link
Contributor Author

@fjy rebased to latest master

fjy added a commit that referenced this pull request Sep 13, 2015
fix for "java.io.IOException: No FileSystem for scheme: hdfs" error
@fjy fjy merged commit 1548fa8 into apache:master Sep 13, 2015
@himanshug himanshug deleted the hdfs_fs_init_fix branch September 22, 2015 13:57
@mark1900
Copy link
Contributor

mark1900 commented Oct 8, 2015

I merged the changes in #1721 (https://github.com/druid-io/druid/pull/1721/files) into the druid release tagged "druid-0.8.1" and it seems that this issue still occurs. See my previous comments on #1022.

2015-10-08T17:39:01,106 INFO [druid_datasource_01-2015-10-08T17:30:00.000Z-persist-n-merge] io.druid.storage.hdfs.HdfsDataSegmentPusher - Copying segment[druid_datasource_01_2015-10-08T17:30:00.000Z_2015-10-08T17:31:00.000Z_2015-10-08T17:32:20.971Z] to HDFS at location[hdfs://server1:9000/druid/druid-hdfs-storage/druid_datasource_01/20151008T173000.000Z_20151008T173100.000Z/2015-10-08T17_32_20.971Z/0]
2015-10-08T17:39:01,109 ERROR [druid_datasource_01-2015-10-08T17:30:00.000Z-persist-n-merge] io.druid.segment.realtime.plumber.RealtimePlumber - Failed to persist merged index[druid_datasource_01]: {class=io.druid.segment.realtime.plumber.RealtimePlumber, exceptionType=class java.io.IOException, exceptionMessage=No FileSystem for scheme: hdfs, interval=2015-10-08T17:30:00.000Z/2015-10-08T17:31:00.000Z}
java.io.IOException: No FileSystem for scheme: hdfs
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2304) ~[?:?]
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2311) ~[?:?]
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:90) ~[?:?]
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2350) ~[?:?]
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2332) ~[?:?]
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:369) ~[?:?]
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) ~[?:?]
        at io.druid.storage.hdfs.HdfsDataSegmentPusher.push(HdfsDataSegmentPusher.java:83) ~[?:?]
        at io.druid.segment.realtime.plumber.RealtimePlumber$4.doRun(RealtimePlumber.java:454) [druid-server-0.8.1.jar:0.8.1]
        at io.druid.common.guava.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:40) [druid-common-0.8.1.jar:0.8.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_60]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_60]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_60]
2015-10-08T17:39:01,133 INFO [druid_datasource_01-2015-10-08T17:30:00.000Z-persist-n-merge] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"alerts","timestamp":"2015-10-08T17:39:01.131Z","service":"overlord","host":"server1:8100","severity":"component-failure","description":"Failed to persist merged index[druid_datasource_01]","data":{"class":"io.druid.segment.realtime.plumber.RealtimePlumber","exceptionType":"java.io.IOException","exceptionMessage":"No FileSystem for scheme: hdfs","exceptionStackTrace":"java.io.IOException: No FileSystem for scheme: hdfs\n\tat org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2304)\n\tat org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2311)\n\tat org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:90)\n\tat org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2350)\n\tat org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2332)\n\tat org.apache.hadoop.fs.FileSystem.get(FileSystem.java:369)\n\tat org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)\n\tat io.druid.storage.hdfs.HdfsDataSegmentPusher.push(HdfsDataSegmentPusher.java:83)\n\tat io.druid.segment.realtime.plumber.RealtimePlumber$4.doRun(RealtimePlumber.java:454)\n\tat io.druid.common.guava.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:40)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)\n\tat java.lang.Thread.run(Thread.java:745)\n","interval":"2015-10-08T17:30:00.000Z/2015-10-08T17:31:00.000Z"}}]

@himanshug
Copy link
Contributor Author

I have tried to repro it with apache hadoop 2.6.0 and druid-0.8.1 and it worked fine for me. However, I am not sure what is wrong here.
Basically, we need to ensure that hadoop-hdfs jar is present in the classpath of thread context class loader when FileSystem.xxx is called the very first time. Try and find why it might not be happening in your setup with 0.8.1.

@mark1900
Copy link
Contributor

It might be worth mentioning here that when I set the following in "druid-0.8.1/config/_common/common.runtime.properties":

druid.indexer.task.defaultHadoopCoordinates=["org.apache.hadoop:hadoop-client:2.7.1","org.apache.hadoop.hadoop-hdfs:2.7.1"]

It resulted in the following exception in my logs:

2015-10-14T21:18:20,024 ERROR [task-runner-0] io.druid.initialization.Initialization - Unable to resolve artifacts for [org.apache.hadoop.hadoop-hdfs:2.7.1:jar:0.8.1 (runtime) -> [] < [ (https://repo1.maven.org/maven2/, releases+snapshots),  (https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local, releases+snapshots)]].
org.eclipse.aether.resolution.DependencyResolutionException: Could not find artifact org.apache.hadoop.hadoop-hdfs:2.7.1:jar:0.8.1 in  (https://repo1.maven.org/maven2/)
    at org.eclipse.aether.internal.impl.DefaultRepositorySystem.resolveDependencies(DefaultRepositorySystem.java:384) ~[aether-impl-0.9.0.M2.jar:?]
    at io.tesla.aether.internal.DefaultTeslaAether.resolveArtifacts(DefaultTeslaAether.java:289) ~[tesla-aether-0.0.5.jar:0.0.5]
    at io.druid.initialization.Initialization.getClassLoaderForCoordinates(Initialization.java:253) [druid-server-0.8.1.jar:0.8.1]
    at io.druid.indexing.common.task.HadoopTask.buildClassLoader(HadoopTask.java:90) [druid-indexing-service-0.8.1.jar:0.8.1]
    at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:165) [druid-indexing-service-0.8.1.jar:0.8.1]
    at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:235) [druid-indexing-service-0.8.1.jar:0.8.1]
    at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:214) [druid-indexing-service-0.8.1.jar:0.8.1]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_60]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_60]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_60]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_60]
Caused by: org.eclipse.aether.resolution.ArtifactResolutionException: Could not find artifact org.apache.hadoop.hadoop-hdfs:2.7.1:jar:0.8.1 in  (https://repo1.maven.org/maven2/)
    at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolve(DefaultArtifactResolver.java:459) ~[aether-impl-0.9.0.M2.jar:?]
    at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolveArtifacts(DefaultArtifactResolver.java:262) ~[aether-impl-0.9.0.M2.jar:?]
    at org.eclipse.aether.internal.impl.DefaultRepositorySystem.resolveDependencies(DefaultRepositorySystem.java:367) ~[aether-impl-0.9.0.M2.jar:?]
    ... 10 more
Caused by: org.eclipse.aether.transfer.ArtifactNotFoundException: Could not find artifact org.apache.hadoop.hadoop-hdfs:2.7.1:jar:0.8.1 in  (https://repo1.maven.org/maven2/)
    at io.tesla.aether.connector.AetherRepositoryConnector$2.wrap(AetherRepositoryConnector.java:828) ~[aether-connector-okhttp-0.0.9.jar:0.0.9]
    at io.tesla.aether.connector.AetherRepositoryConnector$2.wrap(AetherRepositoryConnector.java:824) ~[aether-connector-okhttp-0.0.9.jar:0.0.9]
    at io.tesla.aether.connector.AetherRepositoryConnector$GetTask.flush(AetherRepositoryConnector.java:619) ~[aether-connector-okhttp-0.0.9.jar:0.0.9]
    at io.tesla.aether.connector.AetherRepositoryConnector.get(AetherRepositoryConnector.java:238) ~[aether-connector-okhttp-0.0.9.jar:0.0.9]
    at org.eclipse.aether.internal.impl.DefaultArtifactResolver.performDownloads(DefaultArtifactResolver.java:535) ~[aether-impl-0.9.0.M2.jar:?]
    at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolve(DefaultArtifactResolver.java:436) ~[aether-impl-0.9.0.M2.jar:?]
    at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolveArtifacts(DefaultArtifactResolver.java:262) ~[aether-impl-0.9.0.M2.jar:?]
    at org.eclipse.aether.internal.impl.DefaultRepositorySystem.resolveDependencies(DefaultRepositorySystem.java:367) ~[aether-impl-0.9.0.M2.jar:?]
    ... 10 more
2015-10-14T21:18:20,035 ERROR [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_project_name_lab_rc2_2015-10-14T21:11:57.658Z, type=index_hadoop, dataSource=project_name_lab_rc2}]
java.lang.RuntimeException: org.eclipse.aether.resolution.DependencyResolutionException: Could not find artifact org.apache.hadoop.hadoop-hdfs:2.7.1:jar:0.8.1 in  (https://repo1.maven.org/maven2/)
    at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
    at io.druid.initialization.Initialization.getClassLoaderForCoordinates(Initialization.java:273) ~[druid-server-0.8.1.jar:0.8.1]
    at io.druid.indexing.common.task.HadoopTask.buildClassLoader(HadoopTask.java:90) ~[druid-indexing-service-0.8.1.jar:0.8.1]
    at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:165) ~[druid-indexing-service-0.8.1.jar:0.8.1]
    at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:235) [druid-indexing-service-0.8.1.jar:0.8.1]
    at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:214) [druid-indexing-service-0.8.1.jar:0.8.1]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_60]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_60]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_60]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_60]
Caused by: org.eclipse.aether.resolution.DependencyResolutionException: Could not find artifact org.apache.hadoop.hadoop-hdfs:2.7.1:jar:0.8.1 in  (https://repo1.maven.org/maven2/)
    at org.eclipse.aether.internal.impl.DefaultRepositorySystem.resolveDependencies(DefaultRepositorySystem.java:384) ~[aether-impl-0.9.0.M2.jar:?]
    at io.tesla.aether.internal.DefaultTeslaAether.resolveArtifacts(DefaultTeslaAether.java:289) ~[tesla-aether-0.0.5.jar:0.0.5]
    at io.druid.initialization.Initialization.getClassLoaderForCoordinates(Initialization.java:253) ~[druid-server-0.8.1.jar:0.8.1]
    ... 8 more
Caused by: org.eclipse.aether.resolution.ArtifactResolutionException: Could not find artifact org.apache.hadoop.hadoop-hdfs:2.7.1:jar:0.8.1 in  (https://repo1.maven.org/maven2/)
    at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolve(DefaultArtifactResolver.java:459) ~[aether-impl-0.9.0.M2.jar:?]
    at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolveArtifacts(DefaultArtifactResolver.java:262) ~[aether-impl-0.9.0.M2.jar:?]
    at org.eclipse.aether.internal.impl.DefaultRepositorySystem.resolveDependencies(DefaultRepositorySystem.java:367) ~[aether-impl-0.9.0.M2.jar:?]
    at io.tesla.aether.internal.DefaultTeslaAether.resolveArtifacts(DefaultTeslaAether.java:289) ~[tesla-aether-0.0.5.jar:0.0.5]
    at io.druid.initialization.Initialization.getClassLoaderForCoordinates(Initialization.java:253) ~[druid-server-0.8.1.jar:0.8.1]
    ... 8 more
Caused by: org.eclipse.aether.transfer.ArtifactNotFoundException: Could not find artifact org.apache.hadoop.hadoop-hdfs:2.7.1:jar:0.8.1 in  (https://repo1.maven.org/maven2/)
    at io.tesla.aether.connector.AetherRepositoryConnector$2.wrap(AetherRepositoryConnector.java:828) ~[aether-connector-okhttp-0.0.9.jar:0.0.9]
    at io.tesla.aether.connector.AetherRepositoryConnector$2.wrap(AetherRepositoryConnector.java:824) ~[aether-connector-okhttp-0.0.9.jar:0.0.9]
    at io.tesla.aether.connector.AetherRepositoryConnector$GetTask.flush(AetherRepositoryConnector.java:619) ~[aether-connector-okhttp-0.0.9.jar:0.0.9]
    at io.tesla.aether.connector.AetherRepositoryConnector.get(AetherRepositoryConnector.java:238) ~[aether-connector-okhttp-0.0.9.jar:0.0.9]
    at org.eclipse.aether.internal.impl.DefaultArtifactResolver.performDownloads(DefaultArtifactResolver.java:535) ~[aether-impl-0.9.0.M2.jar:?]
    at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolve(DefaultArtifactResolver.java:436) ~[aether-impl-0.9.0.M2.jar:?]
    at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolveArtifacts(DefaultArtifactResolver.java:262) ~[aether-impl-0.9.0.M2.jar:?]
    at org.eclipse.aether.internal.impl.DefaultRepositorySystem.resolveDependencies(DefaultRepositorySystem.java:367) ~[aether-impl-0.9.0.M2.jar:?]
    at io.tesla.aether.internal.DefaultTeslaAether.resolveArtifacts(DefaultTeslaAether.java:289) ~[tesla-aether-0.0.5.jar:0.0.5]
    at io.druid.initialization.Initialization.getClassLoaderForCoordinates(Initialization.java:253) ~[druid-server-0.8.1.jar:0.8.1]
    ... 8 more
2015-10-14T21:18:20,042 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_hadoop_project_name_lab_rc2_2015-10-14T21:11:57.658Z",
  "status" : "FAILED",
  "duration" : 374402
}

Even though the following dependencies existed:

ls -al druid-0.8.1/extensions-repo/org/apache/hadoop/hadoop-hdfs/2.7.1/
8260573 Oct 14 17:17 hadoop-hdfs-2.7.1.jar
23707 Oct 14 17:13 hadoop-hdfs-2.7.1.pom

@gianm
Copy link
Contributor

gianm commented Oct 15, 2015

Does it work if you do "org.apache.hadoop:hadoop-hdfs:2.7.1" rather than "org.apache.hadoop.hadoop-hdfs:2.7.1"? (colon between the group and artifact)

@mark1900
Copy link
Contributor

Nice catch @gianm. One thing I did notice, was that other than the one exception log shown above, I don't see any further exceptions related to this, even when I had the invalid hadoop-hdfs value.

For instance, I changed my current value to:

druid.indexer.task.defaultHadoopCoordinates=["org.apache.hadoop:hadoop-client:2.7.1","org.apache.hadoop:hadoop-hdfs:2.7.1","org.apache.invalid:invalid-client:0.9.0"]

And I have yet to see any exceptions related to this "invalid-client", even after successful Druid segments have been created.

@mark1900
Copy link
Contributor

Issue seems to be resolved in the latest Druid 0.8.2 release: http://static.druid.io/artifacts/releases/druid-0.8.2-bin.tar.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants