Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Can't infer object conversion type: 0 (6.0, 1.0, 1.0, 1.0, 1.0) #458

Open
chrinide opened this issue Aug 22, 2019 · 5 comments

Comments

@chrinide
Copy link

chrinide commented Aug 22, 2019

Hi all,

I have tried to write the pandas DataFrame as a parquet file.

My DataFrame has some columns with list or tuple as the object.

If I try to do this, show the below error:

`

  • ValueError: Can't infer object conversion type: 0 (6.0, 1.0, 1.0, 1.0, 1.0)
  • 1 (6.0, 1.0, 1.0, 1.0, 1.0)
  • 2 (6.0, 1.0, 1.0, 1.0, 1.0)
  • 3 (6.0, 1.0, 1.0, 1.0, 1.0)
  • 4 (6.0, 1.0, 1.0, 1.0, 1.0)
  • 5 (7.0, 1.0, 1.0, 1.0)
  • 6 (7.0, 1.0, 1.0, 1.0)
  • 7 (7.0, 1.0, 1.0, 1.0)
  • 8 (7.0, 1.0, 1.0, 1.0)
  • 9 (8.0, 1.0, 1.0)
  • Name: nuclear_qs, dtype: object

`

@martindurant
Copy link
Member

You should use the object_encoding= keyword argument, I would suggest "json".

@auyer
Copy link

auyer commented Oct 21, 2020

This kind of issue makes it hard to use this library on situations where the incoming data is undefined.

For instance, in this real world example, when running the following code with parquet_engine = 'fastparquet', there is an error. When doing the same with parquet_engine = 'pyarrow', it works fine.

from decimal import Decimal
import pandas as pd

d = [{'CAD': Decimal('1.3126054674'), 'HKD': Decimal('7.7500843739'), 'ISK': Decimal('138.8795140061'), 'PHP': Decimal('48.5479244009'), 'DKK': Decimal('6.2801214985'), 'HUF': Decimal('307.12959838'), 'CZK': Decimal('22.9370570368')}]
df = pd.DataFrame(d)

filepath = "test.parquet"
parquet_engine = "fastparquet"
compression="snappy"

df.to_parquet(filepath, engine=parquet_engine, compression=compression)

I plan on looking into how Pyarrow deals with this, and maybe try to port this to fastparquet if possible. I guess this file is where its done in fastparquet, correct ?

Appendix : the error

Traceback (most recent call last):
  File ".../target-parquet/venv/lib/python3.8/site-packages/pandas/util/_decorators.py", line 199, in wrapper
    return func(*args, **kwargs)
  File ".../venv/lib/python3.8/site-packages/pandas/core/frame.py", line 2365, in to_parquet
    to_parquet(
  File ".../venv/lib/python3.8/site-packages/pandas/io/parquet.py", line 270, in to_parquet
    return impl.write(
  File ".../venv/lib/python3.8/site-packages/pandas/io/parquet.py", line 193, in write
    self.api.write(
  File ".../venv/lib/python3.8/site-packages/fastparquet/writer.py", line 875, in write
    fmd = make_metadata(data, has_nulls=has_nulls, ignore_columns=ignore,
  File ".../venv/lib/python3.8/site-packages/fastparquet/writer.py", line 706, in make_metadata
    se, type = find_type(data[column], fixed_text=fixed,
  File ".../venv/lib/python3.8/site-packages/fastparquet/writer.py", line 95, in find_type
    object_encoding = infer_object_encoding(data)
  File ".../venv/lib/python3.8/site-packages/fastparquet/writer.py", line 238, in infer_object_encoding
    raise ValueError("Can't infer object conversion type: %s" % head)
ValueError: Can't infer object conversion type: 0    1.3126054674
Name: CAD, dtype: object

@martindurant
Copy link
Member

Yes, you have exactly the right location. It seems we don't explicitly check for decimals. I don't know whether the right thing to do would be to convert to float or make use of parquet's decimal type.

@auyer
Copy link

auyer commented Oct 21, 2020

So if I understood correctly, the simplest approach would be to add decimal to that file (at least to solve my case), and latter try to figure out the rest ? If so I cloud create a PR for the decimal case in small time.

This would not solve the issue pointed out by @chrinide, and I think Json serializable objects should be parsed to Json by default.
Another good approach would be to flatten the Json objects, but that's more likely the responsibility of the user, not the library.

@martindurant
Copy link
Member

Yes, check for decimal in that list of types. If you convert to float, that's easy, but you'll have to do some digging to figure true decimal storage out. JSON would not be able to store decimal either.

I think Json serializable objects should be parsed to Json by default

We couldn't want to, for example, json-encode nulled-floats and strings, but it could be a resonable fallback. On the other hand, you can always request JSON encoding anyway. I'm not sure what it would do to deminals - probably fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants