Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add query results directory and prevent the auto cleaner from cleaning it #14446

Merged
merged 10 commits into from
Jun 28, 2023

Conversation

adarshsanjeev
Copy link
Contributor

@adarshsanjeev adarshsanjeev commented Jun 19, 2023

This PR makes some changes to the MSQ auto cleaner to add support for a "query results" directory in durable storage. This change will allow us to store query results as well on durable storage in the future.

The new structure of durable storage now contains a new special directory

  • query-results/<controller_id>/<file>: This will contain query results in the future. Files in this directory would be cleaned by the auto cleaner once the task controller_id has been dropped from the list of known tasks.

This should not break backward compatibility as the structure is the same as before, with an additional directory. An earlier version of overlord should simply delete the new query results directory as well.

Tested out the changes are working with an S3StorageConnector. Query directories for task Ids which were returned by knownTasks were not deleted. Other files were deleted as before.

Release notes:

  • Adds support for automatic cleaning of a "query-results" directory in durable storage. This directory will be cleaned up only if the task id is not known to the overlord. This will allow storage of query results after the task has finished running.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Contributor

@LakshSingla LakshSingla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR LGTM along with a few comments.

Can you please test it out as well and confirm if it's working locally? You can set up a LocalFileStorageConnector and then manually create a couple of files under the paths and run the durable storage cleaner and check if its execution is as expected.

Also, please update the docs with this behavior and add a release note for the same.

@@ -150,4 +154,23 @@ public static String getControllerTaskIdWithPrefixFromPath(String path)
return null;
}
}

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the doc with an example of the query results path as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with examples

@@ -138,7 +142,7 @@ public static String getOutputsFileNameForPath(
* </ul>
*/
@Nullable
public static String getControllerTaskIdWithPrefixFromPath(String path)
public static String getNextDirNameWithPrefixFromPath(String path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the Javadoc for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

if (nextDirName != null && !nextDirName.isEmpty()) {
if (runningTaskIds.contains(nextDirName)) {
// do nothing
} else if (DurableStorageUtils.QUERY_RESULTS_DIR.equals(nextDirName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please confirm if this logic works? Let's say __query_results has a known id and an unknown id, then we would execute files.remove(currentFile) when we iterate over the unknown id.
However the currentFile would be the whole results directory or am I wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not understand what you meant here, current file would be something inside "unknown id", so it would add that file to the list to delete. (We only delete individual files in the cleaner, the storage connector does not allow us to delete directories with the function called)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point! Just revisited the Javadocs for listDir. This LGTM, thanks for clarfying

Copy link
Contributor

@LakshSingla LakshSingla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! LGTM 🚀

@LakshSingla LakshSingla merged commit 0335aaa into apache:master Jun 28, 2023
@abhishekagarwal87 abhishekagarwal87 added this to the 27.0 milestone Jul 19, 2023
sergioferragut pushed a commit to sergioferragut/druid that referenced this pull request Jul 21, 2023
…g it (apache#14446)

Adds support for automatic cleaning of a "query-results" directory in durable storage. This directory will be cleaned up only if the task id is not known to the overlord. This will allow the storage of query results after the task has finished running.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants