ARROW-2575: [Python] Exclude hidden files when reading Parquet dataset#2027
ARROW-2575: [Python] Exclude hidden files when reading Parquet dataset#2027ukaratay wants to merge 1 commit intoapache:masterfrom
Conversation
On Unix systems hidden files are listed because os.walk does not care about hidden files. This especially creates a problem in macOS where .DS_Store files are created automatically.
Codecov Report
@@ Coverage Diff @@
## master #2027 +/- ##
==========================================
+ Coverage 87.42% 87.45% +0.03%
==========================================
Files 189 178 -11
Lines 29289 28516 -773
==========================================
- Hits 25607 24940 -667
+ Misses 3682 3576 -106Continue to review full report at Codecov.
|
|
Thanks @ukaratay. Can you:
|
|
@pitrou I have created a JIRA ticket for it. However, there seems be no test for ParquetDataset class in Python code. So, adding a test is gonna take some time unless I wasn't able to see them. |
|
@ukaratay A simple unit test for this could be:
|
|
Added this to 0.10.0 as it's a nuisance and not too difficult to test. @ukaratay can you write a test? Otherwise someone else may be able to get to it before 0.10 goes out |
|
I don't have the source downloaded as I've been working from Conda, but I figured I'd do what I could to help this along. Here's what @xhochy mentioned put into code using some of the examples in the pyarrow docs. I tested it and it seems to work from what I tested. If someone (perhaps @ukaratay) added this as a test function it should work. #Imports #Make table #Write Table #Read Directory #Add Hidden File #Try Reading Again #Test |
|
Superseded by PR #2312. |
On Unix systems hidden files are listed because os.walk does not care about hidden files. This especially creates a problem in macOS where .DS_Store files are created automatically.