[FLINK-5129] make the BlobServer use a distributed file system #2891

NicoK · 2016-11-28T18:16:00Z

Previously, the BlobServer held a local copy and in case high availability (HA)
is set, it also copied jar files to a distributed file system. Upon restore,
these files were copied to local store from which they are used.

This PR abstracts the BlobServer's backing file system and makes it use the
distributed file system directly in HA mode, i.e. without the local file system
copy. Other than that the behaviour should not change.

Secondly, BlobCache instances at the task managers also make use of this
distributed file system and download files from there instead of bothering
the blob server. As before, however, distributed files may only be deleted
by the blob server. If the distributed file system is not accessible at the blob
caches, the old behaviour is used.

BlobServer: include the cluster id in the HA storage path for blobs
make the BlobServer use the HA filesystem back-end properly:
make the BlobCache also use a distributed file system in HA mode

@uce can you have a look?

NicoK · 2016-11-30T14:12:27Z

Sorry for the hassle, found a regression and added a fix plus an appropriate test for it. Should be fine now.

This was actually the same implementation as FileSystemBlobStore#get(java.lang.String, java.io.File) and either of the two could have been removed but the implementation makes most sense at the concrete file system abstraction layer, i.e. in FileSystemBlobStore.

…lobUtils

…h for blobs Also use JUnit's TemporaryFolder in BlobRecoveryITCase, too. This makes cleaning up simpler.

Previously, the BlobServer holds a local copy and in case high availability (HA) is set, it also copies jar files to a distributed file system. Upon restore, these files are copied to local store from which they are used. This commit abstracts the BlobServer's backing file system and makes it use the distributed file system directly in HA mode, i.e. without the local file system copy. Other than that the behaviour does not change.

… HA mode * re-factor the file system abstraction in FileSystemBlobStore so that it can be used by the task managers, too, which should not be able to delete files in a distributed file system shared among different nodes * only download blobs from the blob server if not in HA mode or the distributed file system is not accessible by the BlobCache, e.g. at the task managers

…erver and cache If not in high availability mode, local (and now also distributed) file systems again try to set up a unique directory structure so that other instances with the same configuration file or storage path do not interfere. This was lost in 8b9c7d9.

…obStore and cleanup unused methods

Instead, the return value indicates whether a delete operation was successful. This is a result of the FileSystem abstraction layer in FileSystemBlobStore and follows the idiom that a failing delete operation is not that grave and the program can still continue.

This was set in 249b2ea.

NicoK · 2016-12-22T09:30:21Z

despite the tests completing successfully, I do still need to check a few things:

BlobService#getURL() may now return a URL for a distributed file system, however:
related code, e.g. java.io.File, may not know how to handle HDFS URLs, for example :(

NicoK · 2017-01-05T17:35:38Z

I need to adapt a few things and choose a different approach - I'll re-open later

NicoK force-pushed the FLINK-5129 branch from 771c271 to 660eba5 Compare November 30, 2016 14:01

NicoK mentioned this pull request Nov 30, 2016

[FLINK-5178] allow BLOB_STORAGE_DIRECTORY_KEY to point to a distributed file system #2911

Closed

Nico Kruber added 13 commits December 16, 2016 14:25

[hotfix] do not create intermediate strings inside String.format in B…

a878197

…lobUtils

[hotfix] properly shut down the BlobServer in BlobServerRangeTest

a2fd0c3

[FLINK-5129] BlobServer: include the cluster id in the HA storage pat…

f0e00bf

…h for blobs Also use JUnit's TemporaryFolder in BlobRecoveryITCase, too. This makes cleaning up simpler.

[hotfix] add a missing "'" to FileSystemBlobStore

4edef41

[FLINK-5129] move path-related methods from BlobUtils to FileSystemBl…

37200b8

…obStore and cleanup unused methods

[FLINK-5129] fix wrongly set isGlobal flag in BlobCache

bbce47b

This was set in 249b2ea.

[FLINK-5129] add a unit test for the fix of fe4c1c3

f0052b8

[FLINK-5129] add some more documentation

a0819eb

NicoK force-pushed the FLINK-5129 branch from 660eba5 to a0819eb Compare December 16, 2016 13:32

NicoK closed this Jan 5, 2017

rmetzger added the component=Runtime/Network label Mar 14, 2019

souo mentioned this pull request Dec 5, 2022

[Snyk] Security upgrade hapi from 8.8.1 to 11.0.4 souo/flink#118

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-5129] make the BlobServer use a distributed file system #2891

[FLINK-5129] make the BlobServer use a distributed file system #2891

NicoK commented Nov 28, 2016

NicoK commented Nov 30, 2016

NicoK commented Dec 22, 2016

NicoK commented Jan 5, 2017

[FLINK-5129] make the BlobServer use a distributed file system #2891

[FLINK-5129] make the BlobServer use a distributed file system #2891

Conversation

NicoK commented Nov 28, 2016

NicoK commented Nov 30, 2016

NicoK commented Dec 22, 2016

NicoK commented Jan 5, 2017