Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet engine-core refactoring #4995

Merged
merged 62 commits into from Jul 19, 2019
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
8b734ba
rip out bytes.utils
Jul 2, 2019
f98fdfb
remove flaky stash
Jul 2, 2019
3b4d4b9
stop point
Jul 2, 2019
8b96b74
Refactoring parquet api using core-engine approach
rjzamora Jun 20, 2019
c1c4ffc
All pass in bytes
Jul 3, 2019
84f02de
Add fsspec to travis requirements and fix broader
Jul 4, 2019
64adb5f
Fix zarr mappers
Jul 4, 2019
8199c22
Simplify bag text loading
Jul 4, 2019
f0e3a48
cleanup
rjzamora Jul 8, 2019
1d7f738
Merge branch 'master' into fsspec
Jul 9, 2019
c57319f
remove old modules
Jul 9, 2019
1536d86
pathlib support doc and cleanup
rjzamora Jul 9, 2019
c2dcc7b
Merge branch 'master' into parquet-cleanup
rjzamora Jul 9, 2019
d60cc78
fixing some CI testing issues
rjzamora Jul 9, 2019
f8a43d2
flake8 and more test cleanup
rjzamora Jul 9, 2019
cf4d3c4
black
rjzamora Jul 9, 2019
21a724c
adding check_pyarrow to test_to_parquet_default_writes_nulls
rjzamora Jul 10, 2019
45ffeae
Small changes
Jul 10, 2019
fa32838
possible doctest ci failure fix
rjzamora Jul 11, 2019
1171615
black formatting
rjzamora Jul 11, 2019
d6d185d
removing problematic logic from dask/dataframe/core.py - empty datafr…
rjzamora Jul 12, 2019
2f44e9a
Add master s3fs to a couple of the test builds (temporary)
Jul 12, 2019
80b445a
trivial change to dask/dataframe/io/parquet/__init__.py
rjzamora Jul 12, 2019
5f67d94
Disallow read_bytes with compression and blocksize
Jul 12, 2019
c1d4776
fix url
Jul 13, 2019
820d5f3
path work [skip ci]
Jul 14, 2019
c121240
correct [skip ci]
Jul 14, 2019
9aaddaf
fix the fix [skip ci]
Jul 14, 2019
d4d4068
delimiter none case [skip ci]
Jul 14, 2019
2a64ed3
simple blocks [skip ci]
Jul 14, 2019
0e38f61
update test [skip ci]
Jul 14, 2019
6060b34
fix tests [skip ci]
Jul 14, 2019
7881f46
fix read_block for sample and no delimiter [skip ci]
Jul 14, 2019
5eaf25b
apply black
Jul 14, 2019
bd558e5
Update env and use simpler path tests
Jul 14, 2019
5d8a857
appease flake8
Jul 14, 2019
907b0f0
using read_parquet_part for fuse opt
rjzamora Jul 15, 2019
ec7574f
fastparquet version correction and addressing a few smaller code-revi…
rjzamora Jul 15, 2019
bfd1bc9
moving index-setting step to engine
rjzamora Jul 15, 2019
01cd90f
simplifying meta-index history etc
rjzamora Jul 15, 2019
cde42e6
black formatting, and adding index to read_partition positional argum…
rjzamora Jul 16, 2019
963588f
making metadata-file write optional - Beginning kwarg overhaul
rjzamora Jul 16, 2019
5941cbe
some read-phase cleanup
rjzamora Jul 16, 2019
b390d26
Revert "some read-phase cleanup"
Jul 17, 2019
6125f82
Revert "making metadata-file write optional - Beginning kwarg overhaul"
Jul 17, 2019
68f333b
minor tweaks on top of rolled-back kwarg changes
Jul 17, 2019
73219ac
correct flake excemptions
Jul 17, 2019
d988329
mostly doc cleanup
Jul 17, 2019
2f1430c
mostly doc cleanup
rjzamora Jul 17, 2019
30a4f0b
Merge branch 'parquet-cleanup' of https://github.com/rjzamora/dask in…
rjzamora Jul 17, 2019
b689194
adding required/expected kwargs to read_partitions signature
rjzamora Jul 17, 2019
821f7a1
flake8 retry
rjzamora Jul 17, 2019
71cb6f9
skip zipfile write on py35
Jul 17, 2019
6b2f65d
addressing small doc and mutable arg changes
rjzamora Jul 17, 2019
bbf399b
apply black
Jul 17, 2019
94f8e87
removing signature change for _meta_from_dtypes
rjzamora Jul 17, 2019
402f76d
merging martindurant/fsspec
rjzamora Jul 18, 2019
4219443
syncing necessary changes comming in pr#5064
rjzamora Jul 18, 2019
fd171ca
bug fix to address behavior raised in issue #5112
rjzamora Jul 18, 2019
ee0006c
Merge remote-tracking branch 'upstream/master' into parquet-cleanup
rjzamora Jul 19, 2019
93b116d
Merge remote-tracking branch 'upstream/master' into parquet-cleanup
rjzamora Jul 19, 2019
7c0e150
small code review tweak and correction of 0.13.01 to 0.13.1 for versi…
rjzamora Jul 19, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 2 additions & 1 deletion dask/bytes/core.py
Expand Up @@ -16,6 +16,7 @@
infer_storage_options,
build_name_function,
update_storage_options,
stringify_path,
)
from .. import config
from ..compatibility import unicode
Expand Down Expand Up @@ -327,7 +328,7 @@ def get_compression(urlpath, compression):
def infer_options(urlpath):
if hasattr(urlpath, "name"):
# deal with pathlib.Path objects - must be local
urlpath = str(urlpath)
urlpath = stringify_path(urlpath)
ispath = True
else:
ispath = False
Expand Down
4 changes: 2 additions & 2 deletions dask/bytes/tests/test_s3.py
Expand Up @@ -388,8 +388,8 @@ def test_read_text_passes_through_options():
def test_parquet(s3, engine):
dd = pytest.importorskip("dask.dataframe")
lib = pytest.importorskip(engine)
if engine == "pyarrow" and LooseVersion(lib.__version__) == "0.13.0":
pytest.skip("pyarrow 0.13.0 not supported for parquet")
if engine == "pyarrow" and LooseVersion(lib.__version__) < "0.13.1":
pytest.skip("pyarrow < 0.13.01 not supported for parquet")
import pandas as pd
import numpy as np

Expand Down
2 changes: 2 additions & 0 deletions dask/bytes/utils.py
Expand Up @@ -37,6 +37,8 @@ def infer_storage_options(urlpath, inherit_storage_options=None):
"host": "node", "port": 123, "path": "/mnt/datasets/test.csv",
"url_query": "q=1", "extra": "value"}
"""
urlpath = stringify_path(urlpath) # re, urllib don't support pathlib.Path objects

# Handle Windows paths including disk name in this special case
if re.match(r"^[a-zA-Z]:[\\/]", urlpath):
return {"protocol": "file", "path": urlpath}
Expand Down