Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-5129] make the BlobServer use a distributed file system #2891

Closed
wants to merge 13 commits into from

Conversation

NicoK
Copy link
Contributor

@NicoK NicoK commented Nov 28, 2016

Previously, the BlobServer held a local copy and in case high availability (HA)
is set, it also copied jar files to a distributed file system. Upon restore,
these files were copied to local store from which they are used.

This PR abstracts the BlobServer's backing file system and makes it use the
distributed file system directly in HA mode, i.e. without the local file system
copy. Other than that the behaviour should not change.

Secondly, BlobCache instances at the task managers also make use of this
distributed file system and download files from there instead of bothering
the blob server. As before, however, distributed files may only be deleted
by the blob server. If the distributed file system is not accessible at the blob
caches, the old behaviour is used.

  • BlobServer: include the cluster id in the HA storage path for blobs
  • make the BlobServer use the HA filesystem back-end properly:
  • make the BlobCache also use a distributed file system in HA mode

@uce can you have a look?

@NicoK
Copy link
Contributor Author

NicoK commented Nov 30, 2016

Sorry for the hassle, found a regression and added a fix plus an appropriate test for it. Should be fine now.

Nico Kruber added 13 commits December 16, 2016 14:25
This was actually the same implementation as
FileSystemBlobStore#get(java.lang.String, java.io.File) and either of the two
could have been removed but the implementation makes most sense at the
concrete file system abstraction layer, i.e. in FileSystemBlobStore.
…h for blobs

Also use JUnit's TemporaryFolder in BlobRecoveryITCase, too. This makes
cleaning up simpler.
Previously, the BlobServer holds a local copy and in case high availability (HA)
is set, it also copies jar files to a distributed file system. Upon restore,
these files are copied to local store from which they are used.

This commit abstracts the BlobServer's backing file system and makes it use the
distributed file system directly in HA mode, i.e. without the local file system
copy. Other than that the behaviour does not change.
… HA mode

* re-factor the file system abstraction in FileSystemBlobStore so that it can
  be used by the task managers, too, which should not be able to delete files
  in a distributed file system shared among different nodes
* only download blobs from the blob server if not in HA mode or the distributed
  file system is not accessible by the BlobCache, e.g. at the task managers
…erver and cache

If not in high availability mode, local (and now also distributed) file systems
again try to set up a unique directory structure so that other instances with
the same configuration file or storage path do not interfere.

This was lost in 8b9c7d9.
Instead, the return value indicates whether a delete operation was successful.
This is a result of the FileSystem abstraction layer in FileSystemBlobStore
and follows the idiom that a failing delete operation is not that grave and
the program can still continue.
@NicoK
Copy link
Contributor Author

NicoK commented Dec 22, 2016

despite the tests completing successfully, I do still need to check a few things:

  • BlobService#getURL() may now return a URL for a distributed file system, however:
  • related code, e.g. java.io.File, may not know how to handle HDFS URLs, for example :(

@NicoK
Copy link
Contributor Author

NicoK commented Jan 5, 2017

I need to adapt a few things and choose a different approach - I'll re-open later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants