We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hello @igorborgest , thanks a lot for developing this package!
In case when DataFrame has a RangeIndex - it could be written to parquet, but raises KeyError on read:
RangeIndex
KeyError
d = pd.date_range('1990-01-01', freq='D', periods=10000) vals = pd.np.random.randn(len(d), 4) x = pd.DataFrame(vals, index=d, columns=['A','B','C','D']).reset_index() wr.pandas.to_parquet(dataframe=x, path=PATH) wr.pandas.read_parquet(path=PATH)
Raises:
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) ~/miniconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 2896 try: -> 2897 return self._engine.get_loc(key) 2898 except KeyError: pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: '__index_level_0__' During handling of the above exception, another exception occurred: KeyError Traceback (most recent call last) <ipython-input-67-9e66c4b6764b> in <module> ----> 1 wr.pandas.read_parquet(path=PATH) ~/miniconda3/lib/python3.7/site-packages/awswrangler/pandas.py in read_parquet(self, path, columns, filters, procs_cpu_bound, wait_objects, wait_objects_timeout) 1373 procs_cpu_bound=procs_cpu_bound, 1374 wait_objects=wait_objects, -> 1375 wait_objects_timeout=wait_objects_timeout) 1376 else: 1377 procs = [] ~/miniconda3/lib/python3.7/site-packages/awswrangler/pandas.py in _read_parquet_paths(session_primitives, path, columns, filters, procs_cpu_bound, wait_objects, wait_objects_timeout) 1460 procs_cpu_bound=procs_cpu_bound, 1461 wait_objects=wait_objects, -> 1462 wait_objects_timeout=wait_objects_timeout) 1463 return [df] 1464 else: ~/miniconda3/lib/python3.7/site-packages/awswrangler/pandas.py in _read_parquet_path(session_primitives, path, columns, filters, procs_cpu_bound, wait_objects, wait_objects_timeout) 1524 df = table.to_pandas(use_threads=use_threads, integer_object_nulls=True) 1525 for c in integers: -> 1526 if not str(df[c].dtype).startswith("int"): 1527 df[c] = df[c].astype("Int64") 1528 logger.debug(f"Done: {path}") ~/miniconda3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key) 2993 if self.columns.nlevels > 1: 2994 return self._getitem_multilevel(key) -> 2995 indexer = self.columns.get_loc(key) 2996 if is_integer(indexer): 2997 indexer = [indexer] ~/miniconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 2897 return self._engine.get_loc(key) 2898 except KeyError: -> 2899 return self._engine.get_loc(self._maybe_cast_indexer(key)) 2900 indexer = self.get_indexer([key], method=method, tolerance=tolerance) 2901 if indexer.ndim > 1 or indexer.size > 1: pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: '__index_level_0__'
When the index is a series (e.g. when x is created as x = pd.DataFrame(vals, index=d, columns=['A','B','C','D'])) there's no such issue.
x
x = pd.DataFrame(vals, index=d, columns=['A','B','C','D'])
Versions: aws-data-wrangler: 0.2.5 pandas: 0.25.3 pyarrow: 0.15.1
The text was updated successfully, but these errors were encountered:
Thanks @vfilimonov, another great contribution.
Already fixed with the PR above. Will be release in the new version on the Weekend.
Sorry, something went wrong.
P.S. Test case also added on our test bench!
Thanks a lot, Igor! 👍
igorborgest
No branches or pull requests
Hello @igorborgest , thanks a lot for developing this package!
In case when DataFrame has a
RangeIndex
- it could be written to parquet, but raisesKeyError
on read:Raises:
When the index is a series (e.g. when
x
is created asx = pd.DataFrame(vals, index=d, columns=['A','B','C','D'])
) there's no such issue.Versions:
aws-data-wrangler: 0.2.5
pandas: 0.25.3
pyarrow: 0.15.1
The text was updated successfully, but these errors were encountered: