-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove tempStorageDirectory and rely on task dir instead #16416
base: master
Are you sure you want to change the base?
Remove tempStorageDirectory and rely on task dir instead #16416
Conversation
.../s3-extensions/src/main/java/org/apache/druid/storage/s3/output/RetryableS3OutputStream.java
Fixed
Show fixed
Hide fixed
} | ||
|
||
@Override | ||
public StorageConnector get() | ||
{ | ||
return new AzureStorageConnector(this, azureStorage); | ||
final File tempDir = injector.getInstance(Key.get(File.class, Names.named(DataSourceTaskIdHolder.TMP_DIR_BINDING))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for this change @adarshsanjeev 👍
Although, it might be worth checking if you can directly JacksonInject
the File object - one way to try it could be to do :
@JacksonInject
@Named(DataSourceTaskIdHolder.TMP_DIR_BINDING)
File tempDir;
Also, could consider adding the JacksonInject
in the constructor itself to keep all the relevant things there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lastly, could be worth having the temp dir instance setup for other processes as well, and also for indexers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried the above initially, but injecting the tempDir as a variable fails (maybe since it is a jackson inject, and that binding has not been added?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it fail for processes which are not peons? if that's the case you can try also marking this Nullable
also, if you have the error available, please feel free to paste it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above code fails for peons with the error
Guice configuration errors: 1) No implementation for java.io.File annotated with @com.google.inject.name.Named(value="druidTempDirectory") was bound. while locating java.io.File annotated with @com.google.inject.name.Named(value="druidTempDirectory") 1 error
...re-extensions/src/main/java/org/apache/druid/storage/azure/output/AzureStorageConnector.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change, a note asserting that the changes are backward compatible would help!
List<String> exportedFiles = (List<String>) queryKernel.getResultObjectForStage(finalStageId); | ||
log.info("Query [%s] exported %d files.", queryDef.getQueryId(), exportedFiles.size()); | ||
exportMetadataManager.writeMetadata(exportedFiles); | ||
} else if (MSQControllerTask.isExport(querySpec)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, why were there two else condition with the same condition originally MSQControllerTask.isExport(querySpec)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is due to a merge from master. Git might have created a new copy of the if condition during the merge. It was added here: https://github.com/apache/druid/pull/16168/files#diff-b9c30db60c6f71d4ccabd6e9ebc8dc1fbeee4db9fca63d5fbaa0ef5833e8e4acL1796.
@@ -64,14 +57,12 @@ public class AzureOutputConfig | |||
public AzureOutputConfig( | |||
@JsonProperty(value = "container", required = true) String container, | |||
@JsonProperty(value = "prefix", required = true) String prefix, | |||
@JsonProperty(value = "tempDir", required = true) File tempDir, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this exchanged between the Controller and Workers? Can this change cause any failure during upgrade?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is created from the runtime properties, and should not be passed between workers. TmpDirectories should not have been passed anywhere, since they are "per peon" properties, and passing the value doesn't make sense.
The only changed class that should have been passed between peons is the exportStorageProvider classes, and these do not contain the tmpDirectory values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM. I am wondering if there's a way to inject the tempDir directly into the connector using Guice while constructing the instance, instead of relying on the callers to pass it.
Can you please add a release note for the PR stating that the druid.export.storage.s3.tempLocalDir
and related properties are now defunct and storage connector will use the tasks's temp directory to store the temporary data.
|
||
@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type") | ||
public interface StorageConnectorProvider extends Provider<StorageConnector> | ||
public interface StorageConnectorProvider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious why this change is needed.
binder.bind(Key.get(StorageConnector.class, MultiStageQuery.class)) | ||
.toProvider(Key.get(StorageConnectorProvider.class, MultiStageQuery.class)) | ||
.in(LazySingleton.class); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change required? Do we still preserve the singleton nature of the storage connector? Also, we should still preserve the binding annotation.
Removes the temporary directory used by durable storage and export, from a user configured value. This PR instead reuses the temporary storage configured for the task.
Pending:
This PR has: