Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider storing root tablets list of files in zookeeper #936

Closed
keith-turner opened this issue Feb 4, 2019 · 0 comments

Comments

@keith-turner
Copy link
Contributor

commented Feb 4, 2019

User tablets store their list of files as absolute URIs in Accumulo's metadata table. Metadata tablets store their list of files in the root tablet. The root tablet does not store its list of files anywhere and relies on HDFS. This creates specialized code for the root tablet. This also make some changes like #642 more difficult and makes running in S3 more complicated.

Storing the root tablets list of files in Zookeeper may clean up a lot of this. Would make sense to consider Accumulo metadata abstractions #816 and #798 if this is done.

keith-turner added a commit that referenced this issue Jun 6, 2019
There are three major changes in this commit :

 * An abstraction layer for interacting with Accumulo's persisted
   metadata called Ample was introduced.  The goal is to eventually make
   all metadata read and write operations use Ample.
 * How the root tablet's metadata is stored in zookeeper was changed. It
   was changed to use a single zookeeper node (which is good for making
   atomic updates to multiple fields). In this single zookeeper node a
   json value is stored.  This json value has the same schema as all
   other metadata tablets, it uses the same column families and
   qualifiers.  This makes updating the json using Accumulo mutations
   easy and reading the json as Accumulo Key values easy.
 * Alot of the root tablet code that used to interact directly with
   Zookeeper was updated to use Ample.  In follow on changes, a lot of
   specialized root tablet code can be completely removed.  Those
   changes were not made in this commit inorder to keep it from
   becoming too large, making it hard to review.

This change is starting point to support many other changes like #816,  #817, #936, #1121
@ctubbsii ctubbsii added this to To do in 2.1.0 via automation Jun 14, 2019
@ctubbsii ctubbsii added v2.1.0 and removed v2.1.0 labels Jun 14, 2019
@keith-turner keith-turner self-assigned this Aug 2, 2019
keith-turner added a commit to keith-turner/accumulo that referenced this issue Aug 7, 2019
keith-turner added a commit to keith-turner/accumulo that referenced this issue Aug 7, 2019
2.1.0 automation moved this from To do to Done Aug 14, 2019
keith-turner added a commit to keith-turner/accumulo that referenced this issue Oct 17, 2019
This commit changes Accumulo to always call the volume chooser every time a
tablet creates a new file.  It also changes the interpretation of the srv:dir
column in the metadata table.  This column used to contain a URI to a
directory on a specific volume that was used for all new tablet files. Now the
srv:dir column only contains a directory name.  This directory name will be
used for new tablet files across all volumes.

This change necessitated to ~del markers in the metadata table used for
garbage collection.  When a table is cloned or tablets are merged out of
existance it can result in ~del markers for tablet dirs being placed in
the metadata table.  These ~del markers used to reference a specific volume.
With this change, the ~del marker now use a special URI  of the form

  accumulo://allVolumes/accumulo/tables/<tableId>/<dir name>

When the Accumulo GC sees this, it will delete the dir on all configured
volumes when its no longer used.

This change superceded apache#642.  These changes are possible because of the
changes made in apache#936.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
2.1.0
  
Done
2 participants
You can’t perform that action at this time.