Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor info and grdinfo to use virtualfile_from_data #961

Merged
merged 10 commits into from
Mar 2, 2021

Conversation

weiji14
Copy link
Member

@weiji14 weiji14 commented Feb 24, 2021

Description of proposed changes

Create a universal virtualfile_from_data function that can handle any raster or vector data input. This allows us to centralize the data validation logic in a single place, resulting in a cleaner API for PyGMT modules to handle different PyData types (e.g. numpy/pandas/xarray/etc) seamlessly. This is the beginning of resolving #949.

Preview API docs at https://pygmt-git-virtualfilefromdata-gmt.vercel.app/api/generated/pygmt.clib.Session.virtualfile_from_data.html

Before:

kind = data_kind(grid, None, None)
with Session() as lib:
    if kind == "file":
        file_context = dummy_context(grid)
    elif kind == "grid":
        file_context = lib.virtualfile_from_grid(grid)
    else:
        raise GMTInvalidInput("Unrecognized data type: {}".format(type(grid)))
    with file_context as infile:
        ...

After

with Session() as lib:
    file_context = lib.virtualfile_from_data(check_kind="raster", data=grid)

As a start, both info and grdinfo have been refactored to use this new convenience function. Other modules will be refactored in separate PRs later.

Addresses #949.

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst.
  • Write detailed docstrings for all functions/methods.
  • If adding new functionality, add an example to docstrings or tutorials.

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash commands are:

  • /format: automatically format and lint the code
  • /test-gmt-dev: run full tests on the latest GMT development version

Create a universal `virtualfile_from_data` function
that can handle any raster or vector data input.
This allows us to centralize the data validation
logic in a single place, resulting in a cleaner API
for PyGMT modules to handle different PyData
types (e.g. numpy/pandas/xarray/etc) seamlessly.
As a start, both `info` and `grdinfo` have been
refactored to use this new convenience function.
@weiji14 weiji14 added the enhancement Improving an existing feature label Feb 24, 2021
@@ -1359,6 +1360,62 @@ def virtualfile_from_grid(self, grid):
with self.open_virtual_file(*args) as vfile:
yield vfile

def virtualfile_from_data(self, data, x=None, y=None, z=None, check_kind=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick question: does it work for modules that requires more than three columns? For example, in Figure.plot(), we need to pass extra_arrays to GMT API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent point. Perhaps put the check_kind parameter in front, so that it becomes virtualfile_from_data(self, check_kind, data, x, y, z, *extra_arrays)?

Copy link
Member

@seisman seisman Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking if having two functions can make it easier:

  • virtualfile_from_table_data: data can be file or matrix, or x, y, z, extra_arrays can be vector.
  • virtualfile_from_grid_data: data can be file, or xr.DataArray

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily easier for the developer coding this up! I prefer one universal data I/O function, but if we must split it, then virtualfile_from_vector and virtualfile_from_raster sounds better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with one universal function, but you may want to look at some more complicated modules (e.g., grdtrack and plot). In these modules, it seems the data kind is needed outside the virtualfile_from_data function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, grdtrack needs a good refactor, and so does plot. I think we can refactor this virtualfile_from_data function when we refactor those complicated modules, so need to make sure the implementation here is ok to be extended to those functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK to me.

pygmt/clib/session.py Outdated Show resolved Hide resolved
pygmt/clib/session.py Outdated Show resolved Hide resolved
@@ -1359,6 +1360,62 @@ def virtualfile_from_grid(self, grid):
with self.open_virtual_file(*args) as vfile:
yield vfile

def virtualfile_from_data(self, data, x=None, y=None, z=None, check_kind=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with one universal function, but you may want to look at some more complicated modules (e.g., grdtrack and plot). In these modules, it seems the data kind is needed outside the virtualfile_from_data function.

Pure dictionary based lookup to convert data to a virtualfile.
Fixes all the tests which expect first input into data_kind to be the 'data' argument.
Revert e558834 somewhat, because the dictionary actually creates four virtualfiles. Also pushed up 'file' and 'grid' kinds up the if-then block chain.
@weiji14 weiji14 added this to the 0.3.1 milestone Mar 1, 2021
@weiji14 weiji14 requested a review from a team March 1, 2021 21:52
@weiji14 weiji14 marked this pull request as ready for review March 2, 2021 21:38
@seisman
Copy link
Member

seisman commented Mar 2, 2021

@weiji14
Copy link
Member Author

weiji14 commented Mar 2, 2021

There are maybe two lines not hit by the tests (see https://codecov.io/gh/GenericMappingTools/pygmt/src/7c60cb304950aacb0ed0e1b3c76fac24a242b65d/pygmt/clib/session.py).

Yes I'm aware of them. They will be covered when I refactor other functions (e.g. plot, grdtrack) to use virtualfile_from_data, so just a temporary thing.

Copy link
Member

@seisman seisman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK to merge.

@weiji14
Copy link
Member Author

weiji14 commented Mar 2, 2021

Cool, just want to make sure the full tests pass and I'll merge after that.

@weiji14 weiji14 merged commit 6d57b64 into master Mar 2, 2021
@weiji14 weiji14 deleted the virtualfile_from_data branch March 2, 2021 22:10
@seisman seisman added maintenance Boring but important stuff for the core devs and removed enhancement Improving an existing feature labels Mar 9, 2021
weiji14 added a commit that referenced this pull request Apr 6, 2021
The virtualfile_from_data function wrapped in #961
allows for a standardized way for inputting data to
PyGMT modules. This change is for standardizing
some of the docstrings, specifically for 'table-like'
vector inputs which can be of several different
PyData object types.
weiji14 added a commit that referenced this pull request May 16, 2021
The virtualfile_from_data function wrapped in #961
allows for a standardized way for inputting data to
PyGMT modules. This change is for standardizing
some of the docstrings, specifically for 'table-like'
vector inputs which can be of several different
PyData object types.

* Reformat pygmt.info docstring to use table-like filler
* Reformat plot and plot3d docstring to use table-like filler
* Reformat data_kind and virtualfile_from_data docstring to use table-like filler
* Reformat rose docstring to use table-like filler
* Reformat grdtrack docstring to use table-like filler
* Reformat velo docstring to use table-like filler
* Reformat wiggle docstring to use table-like filler
* Reformat histogram docstring to use table-like filler
* Add Python list to histogram's table param description
* Use table-classes filler in docstring description
* Use fmt_docstring decorator on virtualfile_from_data function

Also had to remove curly brackets from doctest so that
they are not treated as placeholders to be substituted.

* Add a doctest for the table-classes filler text

Co-authored-by: Meghan Jones <meghanrjones@users.noreply.github.com>
Co-authored-by: Dongdong Tian <seisman.info@gmail.com>
sixy6e pushed a commit to sixy6e/pygmt that referenced this pull request Dec 21, 2022
…gTools#961)

Create a universal `virtualfile_from_data` function
that can handle any raster or vector data input.
This allows us to centralize the data validation
logic in a single place, resulting in a cleaner API
for PyGMT modules to handle different PyData
types (e.g. numpy/pandas/xarray/etc) seamlessly.
As a start, both `info` and `grdinfo` have been
refactored to use this new convenience function.

* Move check_kind to be the first parameter
* Move check_kind out of data_kind function and into virtualfile_from_data

Fixes all the tests which expect first input into data_kind to be the 'data' argument.

* Add doctest example usage of virtualfile_from_data with xarray.Dataset
* Add an entry to doc/api/index.rst for clib.Session.virtualfile_from_data
sixy6e pushed a commit to sixy6e/pygmt that referenced this pull request Dec 21, 2022
The virtualfile_from_data function wrapped in GenericMappingTools#961
allows for a standardized way for inputting data to
PyGMT modules. This change is for standardizing
some of the docstrings, specifically for 'table-like'
vector inputs which can be of several different
PyData object types.

* Reformat pygmt.info docstring to use table-like filler
* Reformat plot and plot3d docstring to use table-like filler
* Reformat data_kind and virtualfile_from_data docstring to use table-like filler
* Reformat rose docstring to use table-like filler
* Reformat grdtrack docstring to use table-like filler
* Reformat velo docstring to use table-like filler
* Reformat wiggle docstring to use table-like filler
* Reformat histogram docstring to use table-like filler
* Add Python list to histogram's table param description
* Use table-classes filler in docstring description
* Use fmt_docstring decorator on virtualfile_from_data function

Also had to remove curly brackets from doctest so that
they are not treated as placeholders to be substituted.

* Add a doctest for the table-classes filler text

Co-authored-by: Meghan Jones <meghanrjones@users.noreply.github.com>
Co-authored-by: Dongdong Tian <seisman.info@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Boring but important stuff for the core devs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants