Skip to content

Commit

Permalink
HADOOP-16085. S3Guard: use object version or etags to protect against…
Browse files Browse the repository at this point in the history
… inconsistent read after replace/overwrite.

Contributed by Ben Roling.

S3Guard will now track the etag of uploaded files and, if an S3
bucket is versioned, the object version.

You can then control how to react to a mismatch between the data
in the DynamoDB table and that in the store: warn, fail, or, when
using versions, return the original value.

This adds two new columns to the table: etag and version.
This is transparent to older S3A clients -but when such clients
add/update data to the S3Guard table, they will not add these values.
As a result, the etag/version checks will not work with files uploaded by older clients.

For a consistent experience, upgrade all clients to use the latest hadoop version.
  • Loading branch information
Ben Roling authored and steveloughran committed May 19, 2019
1 parent 729ccb2 commit a36274d
Show file tree
Hide file tree
Showing 56 changed files with 3,333 additions and 465 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -1904,15 +1904,15 @@
<name>fs.s3a.change.detection.mode</name>
<value>server</value>
<description>
Determines how change detection is applied to alert to S3 objects
rewritten while being read. Value 'server' indicates to apply the attribute
constraint directly on GetObject requests to S3. Value 'client' means to do a
client-side comparison of the attribute value returned in the response. Value
'server' would not work with third-party S3 implementations that do not
support these constraints on GetObject. Values 'server' and 'client' generate
RemoteObjectChangedException when a mismatch is detected. Value 'warn' works
like 'client' but generates only a warning. Value 'none' will ignore change
detection completely.
Determines how change detection is applied to alert to inconsistent S3
objects read during or after an overwrite. Value 'server' indicates to apply
the attribute constraint directly on GetObject requests to S3. Value 'client'
means to do a client-side comparison of the attribute value returned in the
response. Value 'server' would not work with third-party S3 implementations
that do not support these constraints on GetObject. Values 'server' and
'client' generate RemoteObjectChangedException when a mismatch is detected.
Value 'warn' works like 'client' but generates only a warning. Value 'none'
will ignore change detection completely.
</description>
</property>

Expand Down
5 changes: 5 additions & 0 deletions hadoop-tools/hadoop-aws/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -406,6 +406,11 @@
<artifactId>hadoop-common</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpcore</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,33 @@ public void retry(String action,
});
}

/**
* Execute a void operation with retry processing when doRetry=true, else
* just once.
* @param doRetry true if retries should be performed
* @param action action to execute (used in error messages)
* @param path path of work (used in error messages)
* @param idempotent does the operation have semantics
* which mean that it can be retried even if was already executed?
* @param retrying callback on retries
* @param operation operation to execute
* @throws IOException any IOE raised, or translated exception
*/
@Retries.RetryTranslated
public void maybeRetry(boolean doRetry,
String action,
String path,
boolean idempotent,
Retried retrying,
VoidOperation operation)
throws IOException {
maybeRetry(doRetry, action, path, idempotent, retrying,
() -> {
operation.execute();
return null;
});
}

/**
* Execute a void operation with the default retry callback invoked.
* @param action action to execute (used in error messages)
Expand All @@ -215,6 +242,28 @@ public void retry(String action,
retry(action, path, idempotent, retryCallback, operation);
}

/**
* Execute a void operation with the default retry callback invoked when
* doRetry=true, else just once.
* @param doRetry true if retries should be performed
* @param action action to execute (used in error messages)
* @param path path of work (used in error messages)
* @param idempotent does the operation have semantics
* which mean that it can be retried even if was already executed?
* @param operation operation to execute
* @throws IOException any IOE raised, or translated exception
*/
@Retries.RetryTranslated
public void maybeRetry(
boolean doRetry,
String action,
String path,
boolean idempotent,
VoidOperation operation)
throws IOException {
maybeRetry(doRetry, action, path, idempotent, retryCallback, operation);
}

/**
* Execute a function with the default retry callback invoked.
* @param action action to execute (used in error messages)
Expand Down Expand Up @@ -265,6 +314,41 @@ public <T> T retry(
() -> once(action, path, operation));
}

/**
* Execute a function with retry processing when doRetry=true, else just once.
* Uses {@link #once(String, String, Operation)} as the inner
* invocation mechanism before retry logic is performed.
* @param <T> type of return value
* @param doRetry true if retries should be performed
* @param action action to execute (used in error messages)
* @param path path of work (used in error messages)
* @param idempotent does the operation have semantics
* which mean that it can be retried even if was already executed?
* @param retrying callback on retries
* @param operation operation to execute
* @return the result of the call
* @throws IOException any IOE raised, or translated exception
*/
@Retries.RetryTranslated
public <T> T maybeRetry(
boolean doRetry,
String action,
@Nullable String path,
boolean idempotent,
Retried retrying,
Operation<T> operation)
throws IOException {
if (doRetry) {
return retryUntranslated(
toDescription(action, path),
idempotent,
retrying,
() -> once(action, path, operation));
} else {
return once(action, path, operation);
}
}

/**
* Execute a function with retry processing and no translation.
* and the default retry callback.
Expand Down
Loading

0 comments on commit a36274d

Please sign in to comment.