Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java arrays passed to python script can't be passed to pandas.DataFrame #306

Closed
sayan-sen opened this issue Jun 5, 2020 · 8 comments
Closed
Labels

Comments

@sayan-sen
Copy link

sayan-sen commented Jun 5, 2020

I have a double array in java. When i print the array it shows like:
[6.390933837890625, 6.412708740234375, 6.383675537109375, 6.398192138671875, 6.40907958984375, 6.40907958984375, 6.416337890625, 6.4054504394531255, 6.40907958984375.....]

I pass it as arg to a python script func:

Python py = Python.getInstance();
PyObject pyObject=py.getModule("test");
pyObject.callAttr("showArr",darr);

Where my test.py is:

def showArr(arr):
	print(arr)

in python o/p it shows:
jarray('D')([6.390933837890625, 6.412708740234375, 6.383675537109375, 6.398192138671875, .......])

Because of this am not being able to call pandas.DataFrame(arr)
When i print a simple array in python it prints fine:

>>> arr=[1,2,3]
>>> print(arr)
[1, 2, 3]

Why does this jarray('D') gets prepended when i pass the double array to the py script?

@mhsmith mhsmith changed the title Java arrays passed to python script as arg gets a prefix "jarray('D')" Java arrays passed to python script can't be passed to pandas.DataFrame Jun 5, 2020
@mhsmith
Copy link
Member

mhsmith commented Jun 5, 2020

The jarray is being printed to indicate the object type, just like a Numpy array is printed as array(...).

I assume the error you got when passing the array to pandas.DataFrame was TypeError: __getbuffer__ is not implemented for double[]. This will be fixed in the next version of Chaquopy.

Meanwhile, you can work around it by writing pandas.DataFrame(list(arr)).

@neur1n
Copy link

neur1n commented Jun 8, 2020

This also works for a more general case:

numpy_array = numpy.array(  list( jarray(jfloat)(java_data) )  )

@mhsmith
Copy link
Member

mhsmith commented Jun 8, 2020

In case other people have the same question, let's leave the issue open until the simple syntax has been fixed.

@mhsmith mhsmith reopened this Jun 8, 2020
@sayan-sen
Copy link
Author

sayan-sen commented Jun 9, 2020

To me the error it was showing was
DataFrame constructor not properly called!
I thought some string is getting prepended because when I tried passing a string "jarray([1,2,3...])" to pandas dataframe, i got the same error.
But I will definitely try the options suggested in this page.

@mhsmith
Copy link
Member

mhsmith commented Jun 15, 2020

This issue is fixed in Chaquopy 8.0.0. To upgrade, open your app's top-level build.gradle file and change the version number of com.chaquo.python:gradle.

@mhsmith mhsmith closed this as completed Jun 15, 2020
@mhsmith mhsmith added the bug label Aug 30, 2021
@mhsmith
Copy link
Member

mhsmith commented Aug 30, 2021

We've just released a build of Pandas version 1.3.2. However, because of a change in the way Pandas detects iterable objects (pandas-dev/pandas#39852), passing a Java array to the DataFrame constructor now gives the following error:

  File "pandas/core/frame.py", line 614, in __init__
  File "pandas/core/internals/construction.py", line 464, in dict_to_mgr
  File "pandas/core/internals/construction.py", line 119, in arrays_to_mgr
  File "pandas/core/internals/construction.py", line 625, in _extract_index
ValueError: If using all scalar values, you must pass an index

We'll fix this in the next version of Chaquopy. Meanwhile, you can work around it either by converting the array to a list as shown above, or by staying on the old version, pandas==0.25.3.

@mhsmith
Copy link
Member

mhsmith commented Aug 30, 2021

Even after we fix this, passing a 1-dimensional array to the DataFrame constructor still probably won't work:

>>> a = jarray(jshort)([1,2,3])
>>> DataFrame(a)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "pandas/core/frame.py", line 711, in __init__
  File "pandas/core/internals/construction.py", line 304, in ndarray_to_mgr
  File "pandas/core/internals/construction.py", line 545, in _prep_ndarray
  File "pandas/core/internals/construction.py", line 533, in convert
  File "pandas/core/dtypes/cast.py", line 124, in maybe_convert_platform
  File "class.pxi", line 222, in java.chaquopy.setup_object_class.JavaObject.__getattribute__
  File "class.pxi", line 216, in java.chaquopy.setup_object_class.JavaObject.__getattribute__
AttributeError: 'jarray('S')' object has no attribute 'dtype'

DataFrame is a 2-dimensional structure, and I can't see anywhere in the documentation (either here or here) which says the constructor accepts a 1-dimensional sequence. In practice, passing a standard list returns a DataFrame with a single column:

>>> DataFrame([1,2,3])
   0
0  1
1  2
2  3

But with Pandas 1.3.2, this doesn't work with any non-standard sequence classes. For example, we get the same error with the DummyContainer class from the Pandas unit tests (after removing the incorrect extra n argument from __len__):

>>> DataFrame(DummyContainer([1,2,3]))
...
  File "pandas/core/dtypes/cast.py", line 124, in maybe_convert_platform
    if arr.dtype == object:
AttributeError: 'DummyContainer' object has no attribute 'dtype'

The solution is to explicitly specify whether to treat the array as a row or a column, by wrapping it in a list or dict respectively:

>>> DataFrame([a])
   0  1  2
0  1  2  3
>>> DataFrame({"a": a})
   a
0  1
1  2
2  3

@mhsmith
Copy link
Member

mhsmith commented Sep 22, 2021

This issue is fixed in Chaquopy 10.0.1, except for the 1-dimensional case mentioned in the previous comment.

To upgrade, edit your app's top-level build.gradle file and change the version number of com.chaquo.python:gradle.

@mhsmith mhsmith closed this as completed Sep 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants