Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #1165 drop on axis #1177

Merged
merged 7 commits into from
Mar 8, 2022

Conversation

Ethan-DeBandi99
Copy link
Contributor

@Ethan-DeBandi99 Ethan-DeBandi99 commented Mar 4, 2022

This PR (closes #1165):

Pandas DataFrame.drop() utilizes an axis parameter to trigger drops on rows or columns. This functionality has been added to Arkouda here.

  • axis should be 0 or 1. 0 = rows, 1 = columns.
  • Tests have been added to: validate column drop using axis param, validate row drop, validate exception cases are caught as expected.
  • Documentation has been updated to include .drop() for DataFrame.

For testing I added functions to build an ak.DataFrame and a pd.DataFrame. I only implemented this for drop because there is an existing issue in to handle this for all tests.

This functionality is currently only available in-place at this time.

…lity for dropping rows in new function called by drop. Moved column drop to new function called by drop.

Updated the drop row algorithm to set the new column values in place using the superclass's __setitem__.

Updated docstrings to match new functionality.

Set default axis to 0 to conform to pandas.
@Ethan-DeBandi99 Ethan-DeBandi99 marked this pull request as ready for review March 4, 2022 19:07
Copy link
Contributor

@mhmerrill mhmerrill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok to me.

Copy link
Member

@stress-tess stress-tess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things I'd like you to take a look at but I think the overall functionality looks good

arkouda/dataframe.py Outdated Show resolved Hide resolved
arkouda/dataframe.py Outdated Show resolved Hide resolved
arkouda/dataframe.py Outdated Show resolved Hide resolved
arkouda/dataframe.py Show resolved Hide resolved
arkouda/dataframe.py Outdated Show resolved Hide resolved
arkouda/dataframe.py Outdated Show resolved Hide resolved
Copy link
Member

@stress-tess stress-tess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@mhmerrill mhmerrill merged commit 0da9681 into Bears-R-Us:master Mar 8, 2022
@Ethan-DeBandi99 Ethan-DeBandi99 deleted the 1165_Drop_on_Axis branch March 10, 2022 13:45
mhmerrill pushed a commit that referenced this pull request Mar 10, 2022
* Add UsedModules.cfg to .gitignore (#1179)

The UsedModules.cfg file, which is generated from --saveUsedModules
was not being ignored by git, and I don't think we would ever want
a UsedModules.cfg pushed into the repo, so moving it into the
gitignore will resolve any annoyance from having to selectively not
include that file.

* Add line to pass Arrow float to Chapel side
This PR adds a check for the Arrow float type on the Chapel side,
which was left out previously, allowing float data types to be
read with `ak.read_parquet()`. With this addition, the file
`alltypes_plain.parquet` from the Apache website now runs through
`ak.read_parquet()` without fail.

* Closes #1165 drop on axis (#1177)

* Modified drop() to allow for row and column dropping. Added functionality for dropping rows in new function called by drop. Moved column drop to new function called by drop.

Updated the drop row algorithm to set the new column values in place using the superclass's __setitem__.

Updated docstrings to match new functionality.

Set default axis to 0 to conform to pandas.

* Added ak.DataFrame.drop() documentation.

* Resolving Merge Conflicts

* Added test to validate axis can only be 1 or 0.

* Updated code based on PR comments. Removed unneeded if blocks and a few other style changes.

* Updating unknown axis error.

* Correcting typo

* Update Arkouda to work with CTypes (#1185)

In Chapel 1.26, various C types from 'SysCTypes', 'SysBasic' and
'CPtr' are being brought together into a single (new) module,
'CTypes'.  In this PR, I'm adding a new user-level 'CTypes'
compatibility module that brings the pieces that Arkouda relies upon
together so that it can use the new names and organization yet still
be compiled with older compilers.

---
Signed-off-by: Brad Chamberlain <bradcray@users.noreply.github.com>

* Catch all Parquet errors and report them to the client (#1188)

* Catch all Parquet errors and report them to the client
In the initial Parquet error handling code, the only errors that
were being reported were errors that came from status codes of
calls to the Parquet API, but all errors in the C++ code went
unhandled and would result in a server crash.

This PR wraps all the C++ functions in a try/catch to catch all
errors to be reported back to the client rather than crashing the
server. This approach seems to align with the philosophy of
"The server should never crash, errors should just be reported to
the client".

Additionally, a bug was identified with throwing errors in `forall`
loops within a function, which would cause a try/catch wrapping
the throwing function to not catch the resulting `TaskErrors` error
throw, but a workaround for that is to wrap `forall` loops in a
try/catch which just throws the error, which then allows the
overarching try/catch to catch and report the error, where previously
it would crash the server.

* Add unsupported type error message to getType function

* Remove the ARROWUNDEFINED variable and add some throws

* add parquet test files with license and readme

* added and skipped test against parquet standard files

* add option for running parquet file tests

* Update marker name to match decorator

* updated error message string in ParquetMsg with wrong format specifier and updated columns1 to pass assertion against ans

Co-authored-by: Ben McDonald <46734217+bmcdonald3@users.noreply.github.com>
Co-authored-by: Ben McDonald <mcdobe100@gmail.com>
Co-authored-by: reuster986 <reuster986@gmail.com>
Co-authored-by: Ethan-DeBandi99 <16845933+Ethan-DeBandi99@users.noreply.github.com>
Co-authored-by: Brad Chamberlain <bradcray@users.noreply.github.com>
Co-authored-by: Pierce Hayes <pierce314159@users.noreply.github.com>
mhmerrill pushed a commit that referenced this pull request Mar 10, 2022
* add parquet test files with license and readme

* added and skipped test against parquet standard files

* add option for running parquet file tests

* Update marker name to match decorator (#1193)

Co-authored-by: Pierce Hayes <pierce314159@users.noreply.github.com>

* Changes to PR#1183: use apache/parquet-testing files (#1194)

* Add UsedModules.cfg to .gitignore (#1179)

The UsedModules.cfg file, which is generated from --saveUsedModules
was not being ignored by git, and I don't think we would ever want
a UsedModules.cfg pushed into the repo, so moving it into the
gitignore will resolve any annoyance from having to selectively not
include that file.

* Add line to pass Arrow float to Chapel side
This PR adds a check for the Arrow float type on the Chapel side,
which was left out previously, allowing float data types to be
read with `ak.read_parquet()`. With this addition, the file
`alltypes_plain.parquet` from the Apache website now runs through
`ak.read_parquet()` without fail.

* Closes #1165 drop on axis (#1177)

* Modified drop() to allow for row and column dropping. Added functionality for dropping rows in new function called by drop. Moved column drop to new function called by drop.

Updated the drop row algorithm to set the new column values in place using the superclass's __setitem__.

Updated docstrings to match new functionality.

Set default axis to 0 to conform to pandas.

* Added ak.DataFrame.drop() documentation.

* Resolving Merge Conflicts

* Added test to validate axis can only be 1 or 0.

* Updated code based on PR comments. Removed unneeded if blocks and a few other style changes.

* Updating unknown axis error.

* Correcting typo

* Update Arkouda to work with CTypes (#1185)

In Chapel 1.26, various C types from 'SysCTypes', 'SysBasic' and
'CPtr' are being brought together into a single (new) module,
'CTypes'.  In this PR, I'm adding a new user-level 'CTypes'
compatibility module that brings the pieces that Arkouda relies upon
together so that it can use the new names and organization yet still
be compiled with older compilers.

---
Signed-off-by: Brad Chamberlain <bradcray@users.noreply.github.com>

* Catch all Parquet errors and report them to the client (#1188)

* Catch all Parquet errors and report them to the client
In the initial Parquet error handling code, the only errors that
were being reported were errors that came from status codes of
calls to the Parquet API, but all errors in the C++ code went
unhandled and would result in a server crash.

This PR wraps all the C++ functions in a try/catch to catch all
errors to be reported back to the client rather than crashing the
server. This approach seems to align with the philosophy of
"The server should never crash, errors should just be reported to
the client".

Additionally, a bug was identified with throwing errors in `forall`
loops within a function, which would cause a try/catch wrapping
the throwing function to not catch the resulting `TaskErrors` error
throw, but a workaround for that is to wrap `forall` loops in a
try/catch which just throws the error, which then allows the
overarching try/catch to catch and report the error, where previously
it would crash the server.

* Add unsupported type error message to getType function

* Remove the ARROWUNDEFINED variable and add some throws

* add parquet test files with license and readme

* added and skipped test against parquet standard files

* add option for running parquet file tests

* Update marker name to match decorator

* updated error message string in ParquetMsg with wrong format specifier and updated columns1 to pass assertion against ans

Co-authored-by: Ben McDonald <46734217+bmcdonald3@users.noreply.github.com>
Co-authored-by: Ben McDonald <mcdobe100@gmail.com>
Co-authored-by: reuster986 <reuster986@gmail.com>
Co-authored-by: Ethan-DeBandi99 <16845933+Ethan-DeBandi99@users.noreply.github.com>
Co-authored-by: Brad Chamberlain <bradcray@users.noreply.github.com>
Co-authored-by: Pierce Hayes <pierce314159@users.noreply.github.com>

Co-authored-by: pierce314159 <48131946+pierce314159@users.noreply.github.com>
Co-authored-by: Pierce Hayes <pierce314159@users.noreply.github.com>
Co-authored-by: Ben McDonald <46734217+bmcdonald3@users.noreply.github.com>
Co-authored-by: Ben McDonald <mcdobe100@gmail.com>
Co-authored-by: Ethan-DeBandi99 <16845933+Ethan-DeBandi99@users.noreply.github.com>
Co-authored-by: Brad Chamberlain <bradcray@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ak.DataFrame Drop on Axis
3 participants