Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add private function _validate_data_input to validate input data #2595

Merged
merged 4 commits into from Jul 5, 2023

Conversation

seisman
Copy link
Member

@seisman seisman commented Jul 5, 2023

Description of proposed changes

Includes validation logic from #1523, #1478 and #93.

Address comment #2493 (comment).

@seisman
Copy link
Member Author

seisman commented Jul 5, 2023

It makes more sense to also move the following lines to the _validate_data_input function, but it's not easy.

if required_z and (
getattr(data, "shape", (3, 3))[1] < 3 # np.array, pd.DataFrame
or len(getattr(data, "data_vars", (0, 1, 2))) < 3 # xr.Dataset
):
raise GMTInvalidInput("data must provide x, y, and z columns.")

@weiji14 weiji14 added the maintenance Boring but important stuff for the core devs label Jul 5, 2023
Comment on lines +68 to +70
else: # data is not None
if x is not None or y is not None or z is not None:
raise GMTInvalidInput("Too much data. Use either data or x/y/z.")
Copy link
Member

@weiji14 weiji14 Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes more sense to also move the following lines to the _validate_data_input function, but it's not easy.

if required_z and (
getattr(data, "shape", (3, 3))[1] < 3 # np.array, pd.DataFrame
or len(getattr(data, "data_vars", (0, 1, 2))) < 3 # xr.Dataset
):
raise GMTInvalidInput("data must provide x, y, and z columns.")

Hmm, can't we just add it here?

Suggested change
else: # data is not None
if x is not None or y is not None or z is not None:
raise GMTInvalidInput("Too much data. Use either data or x/y/z.")
else: # data is not None
if x is not None or y is not None or z is not None:
raise GMTInvalidInput("Too much data. Use either data or x/y/z.")
if required_z and (
getattr(data, "shape", (3, 3))[1] < 3 # np.array, pd.DataFrame
or len(getattr(data, "data_vars", (0, 1, 2))) < 3 # xr.Dataset
):
raise GMTInvalidInput("data must provide x, y, and z columns.")

Edit: Oh wait, I see, we also need to check that data is not str/pathlib.PurePath/xr.DataArray/GeoJSON... But maybe we can do hasattr(data, "shape") or hasattr(data, "data_vars") first?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But maybe we can do hasattr(data, "shape") or hasattr(data, "data_vars") first?

Sounds promising.

Copy link
Member

@weiji14 weiji14 Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't tested this, but maybe something like this:

Suggested change
else: # data is not None
if x is not None or y is not None or z is not None:
raise GMTInvalidInput("Too much data. Use either data or x/y/z.")
else: # data is not None
if x is not None or y is not None or z is not None:
raise GMTInvalidInput("Too much data. Use either data or x/y/z.")
if (
hasattr(data, "shape") # np.array, pd.DataFrame
and getattr(data, "shape", (3, 3))[1] < 3
) or (
hasattr(data, "data_vars") # xr.Dataset
and len(getattr(data, "data_vars", (0, 1, 2))) < 3
):
raise GMTInvalidInput("data must provide x, y, and z columns.")

Can't remember why I added a default value with getattr at #1523, might be able to remove it if not needed, but test first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't remember why I added a default value with getattr at #1523,

getattr(data, "shape", (3, 3)) returns (3,3) if data doesn't has the shape attribute, so it won't raise the exception.

With the above changes, test_geopandas_plot3d_default_cube will fail, because in this test:

>>> multipoint = shapely.geometry.MultiPoint([(0.5, 0.5, 0.5), (1.5, 1.5, 1.5)])
>>> gdf = gpd.GeoDataFrame(geometry=[multipoint])
>>> gdf.shape
(1, 1)

gdf has a shape (1,1).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a not hasattr(data, "__geo_interface__") check perhaps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ca2d49f, I changed the order of some codes to determine the data kind first and then check if the data input is valid. It works well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, that works too. Can you fix the pylint error? Should be good after that.

@seisman seisman added this to the 0.10.0 milestone Jul 5, 2023
@seisman seisman added the needs review This PR has higher priority and needs review. label Jul 5, 2023
@seisman seisman merged commit 3616f8f into main Jul 5, 2023
16 checks passed
@seisman seisman deleted the datakind branch July 5, 2023 11:47
@seisman seisman removed the needs review This PR has higher priority and needs review. label Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Boring but important stuff for the core devs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants