Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] wrong conversion of DataFrame with boolean values #16858

Closed
asfimport opened this issue Aug 22, 2019 · 3 comments
Closed

[Python] wrong conversion of DataFrame with boolean values #16858

asfimport opened this issue Aug 22, 2019 · 3 comments

Comments

@asfimport
Copy link

asfimport commented Aug 22, 2019

From pandas-dev/pandas#28090

In [19]: df = pd.DataFrame(np.ones((3, 2), dtype=bool), columns=['a', 'b']) 

In [20]: df  
Out[20]: 
      a     b
0  True  True
1  True  True
2  True  True

In [21]: table = pa.table(df) 

In [23]: table.column(0)
Out[23]: 
<pyarrow.lib.ChunkedArray object at 0x7fd08a96e090>
[
  [
    true,
    false,
    false,
  ]
]

The resulting table has False values while the original DataFrame had only true values.
It seems this has to do with the fact that it are multiple columns, as with a single column it converts correctly.

Reporter: Joris Van den Bossche / @jorisvandenbossche
Assignee: Joris Van den Bossche / @jorisvandenbossche

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-6325. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Joris Van den Bossche / @jorisvandenbossche:
So when converting an array of the column gives also wrong values, but when taking a copy of the values, not:

In [39]: pa.array(np.asarray(df['a']))                                                                                                                                                                             
Out[39]: 
<pyarrow.lib.BooleanArray object at 0x7fd08a8a2c50>
[
  true,
  false,
  false
]

In [40]: pa.array(np.array(df['a']))                                                                                                                                                                               
Out[40]: 
<pyarrow.lib.BooleanArray object at 0x7fd08a88c990>
[
  true,
  true,
  true
]

So it probably has to do with in the first case the values being a view on a 2D array ?

@asfimport
Copy link
Author

Joris Van den Bossche / @jorisvandenbossche:
A numpy only reproducer. Starting from a 2D array, slicing a row is fine, but slicing a column gives the problems:

In [64]: a = np.ones((3, 2), dtype=bool)                                                                                                                                                                           

In [65]: pa.array(a[0, :])                                                                                                                                                                                         
Out[65]: 
<pyarrow.lib.BooleanArray object at 0x7fd093368d00>
[
  true,
  true
]

In [66]: pa.array(a[:, 0])                                                                                                                                                                                         
Out[66]: 
<pyarrow.lib.BooleanArray object at 0x7fd093368bf8>
[
  true,
  false,
  false
]

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Issue resolved by pull request 5176
#5176

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants