You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've already read #12, however I could not figure out how to check whether the files that match more complex patterns eg. /something/a-b-[cdef]-*/part-* .
In my case, I could not determine where wildcards are inserted into the patterns.
Do I have to walk all paths from the root of HDFS ? I would not like to do that because there are too many files in my HDFS.
The text was updated successfully, but these errors were encountered:
If you need more complex pattern matching than fnmatch can offer, you probably need to use regular expression. In any event, I don't see how you can avoid walking the whole tree where files that you need to match might be. There is a walk tool for this. You can apply your matching pattern to every item (with fnmatch or re) yielded by walk.
Thank you for your quick response!
I've already tried the walk tool in pydoop, but it took approx. 10~50 times longer compared to the bare hadoop command...
However, as you suggest, it seems that there is no way to search more efficiently without fully walking files.
I've already read #12, however I could not figure out how to check whether the files that match more complex patterns eg.
/something/a-b-[cdef]-*/part-*
.In my case, I could not determine where wildcards are inserted into the patterns.
Do I have to walk all paths from the root of HDFS ? I would not like to do that because there are too many files in my HDFS.
The text was updated successfully, but these errors were encountered: