-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QTL] Make URI Exctraction Namespace take more sane arguments #2738
Conversation
drcrallen
commented
Mar 25, 2016
- Fixes Better URI namespaced lookups behavior with single files #2669
This is incompatible but the static config is probably going away anyways. |
@@ -119,6 +119,7 @@ protected boolean swapAndClearCache(String namespaceKey, String cacheKey) | |||
|
|||
final String priorCache = currentNamespaceCache.put(namespaceKey, swapCacheKey); | |||
if (priorCache != null) { | |||
// TODO: resolve what happens here if query is actively going on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please file a github issue for this if there is none already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added #2863
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drcrallen @nishantmonu51 you could use a reference counting mechanism similar to segments to avoid closing the resources in query is in flight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, yes. Unless you have some sort of insight, though, this is not straight forward. The issue is that queries do not have a concept of resources associated with them. There are some runners which wrap resources, but there isn't a general concept of query resources that need to be accounted for.
There are quite a few resources that get tied to a particular query though, so it might be worthwhile to add such a convention.
👍 |
|Property|Description|Required|Default| | ||
|--------|-----------|--------|-------| | ||
|`namespace`|The namespace to define|Yes|| | ||
|`pollPeriod`|Period between polling for updates|No|0 (only once)| | ||
|`versionRegex`|Regex to help find newer versions of the namespace data|Yes|| | ||
|`uri`|URI for the file of interest|No|Use `uriPrefix`| | ||
|`uriPrefix`|A URI which specifies a directory (or other searchable resource) in which to search for files|No|Use `uri`| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this read the most recent file under a prefix ? imagine i have an HDFS dir where prefix is /year/day/hour/lookupDir/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's part of io.druid.storage.hdfs.HdfsFileTimestampVersionFinder
which is used to discriminate among multiple files matching the pattern in uriPrefix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we document this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some documentation
@b-slim this should be fixed now. Any other comments? |
|`versionRegex`|Regex to help find newer versions of the namespace data|Yes|| | ||
|`uri`|URI for the file of interest|No|Use `uriPrefix`| | ||
|`uriPrefix`|A URI which specifies a directory (or other searchable resource) in which to search for files|No|Use `uri`| | ||
|`fileRegex`|Optional regex for matching the file name under `uriPrefix`. Only used if `uriPrefix` is used|No|`".*"`| | ||
|`namespaceParseSpec`|How to interpret the data at the URI|Yes|| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call it lookupParseSpec
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope here
@drcrallen please see the comments. My main concern i am not sure why this PR is not implementing the new interfaces ? |
@b-slim because the move to the new interface is in a single PR with no other modifications. I'm not going to modify a bunch of functionality in a PR as big as the lookup framework migration one. |
@drcrallen can we have an issue to track this by outlining the plan ? |
@b-slim I'm good for that. Give me 15 mins to get it written up |
@b-slim any more comments here? |
|
||
The `versionRegex` value specifies a regex to use to determine if a filename in the parent path of the uri should be considered when trying to find the latest version. Omitting this setting or setting it equal to `null` will match to all files it can find (equivalent to using `".*"`). The search occurs in the most significant "directory" of the uri. | ||
The `pollPeriod` value specifies the period in ISO 8601 format between checks for updates. If the source of the lookup is capable of providing a timestamp, the lookup will only be updated if it has changed since the prior tick of `pollPeriod`. A value of 0, an absent parameter, or `null` all mean populate once and do not attempt to update. Whenever an update occurs, the updating system will look for a file with the most recent timestamp and assume that one with the most recent data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reading this i can not tell if the update is incremental or is swapping the entire cache. Can we make it more clear please ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explained more here, please look.
LGTM 👍 , i would recommend one more ut where we have new lookup file creation to test the get new version path, but is is not a blocker. it is up to the author to add it or not. (please squash) |
@b-slim IMPLs for io.druid.data.SearchableVersionedDataFinder do have tests for finding new versions. If there are things not covered by those tests, or if you're looking for more integrations with |
Thanks @b-slim ! |
@drcrallen thought this can go in one commit ? |
@drcrallen my bad but in the top of this pr i can see 5 commits and thought that's what is going to be in the master. |
versionRegex = null; | ||
} | ||
} else { | ||
final Path filePath = Paths.get(extractionNamespace.getUri()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With an s3 lookup, this Paths.get call throws this exception for me,
java.nio.file.FileSystemNotFoundException: Provider "s3" not installed
at java.nio.file.Paths.get(Paths.java:147) ~[?:1.8.0_66]
at io.druid.server.namespace.URIExtractionNamespaceFunctionFactory$3.call(URIExtractionNamespaceFunctionFactory.java:154) ~[druid-namespace-lookup-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
at io.druid.server.namespace.URIExtractionNamespaceFunctionFactory$3.call(URIExtractionNamespaceFunctionFactory.java:121) ~[druid-namespace-lookup-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
at io.druid.server.namespace.cache.NamespaceExtractionCacheManager$5.run(NamespaceExtractionCacheManager.java:364) [druid-namespace-lookup-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
at com.google.common.util.concurrent.MoreExecutors$ScheduledListeningDecorator$NeverSuccessfulListenableFutureTask.run(MoreExecutors.java:582) [guava-16.0.1.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_66]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_66]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_66]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_66]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_66]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_66]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_66]