New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot read pyarrow RangeIndex #414
Comments
This is real and indeed due to the change in pyarrow. Obviously, In the related Dask case, we actually know the number of rows for each partition, and ought to use the information to set the range in each and divisions globally where there's a range index, or where there is no index at all. |
@martindurant would you mind elaborating a bit on why setting While setting
so setting I'd be more than happy to contribute and create a fix for it, but I am not sure if I understand what the actual problem is 100%. It would be awesome if I could request some explanation from you regarding this error. Thanks a lot! |
Exactly what you get will now depend on which version of pyarrow you used, as well as of fastparquet. In the past (<0.13), pyarrow would write real columns of data for the index, with names like the cryptic one you show. When you load with fastparquet and say "I don't want to set an index", it becomes an ordinary column. If you do allow it to be set as an index, the name should be reconstituted to In the most recent version of pyarrow, there would be no column data, but a range index metadata marker instead. It takes up no space, and there is no reason not to have it populate the index. In this case, if you said you wanted to ignore the index, or use another, the range should be ignored. |
Why is this ticket closed ? This break backwards compatibility, so ideally there should be a fix for this |
Are you saying that current fastparquet can't read older pyarrow-written data? That would indeed be a problem. |
Raises the exception
This is most likely the result of: pandas-dev/pandas#25672 and apache/arrow#3868
The text was updated successfully, but these errors were encountered: