Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: add filter option for the get-metadata command #130

Merged
merged 1 commit into from
Oct 30, 2023

Conversation

joudmas
Copy link
Member

@joudmas joudmas commented Aug 9, 2023

cli: add --filter option in get-metadata command
modified get-metadata to support multiple fields

@codecov-commenter
Copy link

codecov-commenter commented Aug 9, 2023

Codecov Report

Merging #130 (a95f805) into master (eead01e) will decrease coverage by 0.14%.
The diff coverage is 80.48%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #130      +/-   ##
==========================================
- Coverage   79.84%   79.71%   -0.14%     
==========================================
  Files          11       11              
  Lines         665      700      +35     
==========================================
+ Hits          531      558      +27     
- Misses        134      142       +8     
Files Coverage Δ
cernopendata_client/cli.py 91.37% <80.48%> (-3.08%) ⬇️

@joudmas joudmas force-pushed the add-filter-option branch 2 times, most recently from 2f62075 to 8eacb67 Compare August 16, 2023 13:58
"--filter",
"filters",
multiple=False,
help="Filter workflow that contains certain filtering criteria. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "workflow" here is a remainder from the REANA client?

The help should rather say here something like:

  • Filter only certain output values matching filtering criteria. Use --filter somefield=somevalue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also add a full example to the Examples section of the docstring:

$ cernopendata-client get-metadata --recid 329 --output-value authors.orcid --filter name="Rousseau, David"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, you can enrich the user guide usage.rst the section "Getting metadata" so that the David Rousseau example will also appear here: https://cernopendata-client.readthedocs.io/en/latest/usage.html#getting-metadata

"--output-value",
"authors.orcid",
"--filter",
"authors.url=Rousseau, David",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to quote "Rousseau, David" as an argument.

BTW this works nicely:

$ cernopendata-client get-metadata --recid 329 --output-value authors.orcid --filter authors.name="Rousseau, David"
0000-0001-7613-8063
$ cernopendata-client get-metadata --recid 329 --output-value authors.orcid --filter name="Rousseau, David"
0000-0001-7613-8063

But the following ones do not fully work:

$ cernopendata-client get-metadata --recid 329 --output-value authors --filter name="Rousseau, David"

(All output values are printed, even though user might expect only the values matching the filtering criteria.)

Can you please have a look?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works well with the PR update 👍 You may want to also add a test case for this, i.e. get all authors, then with the filter, and then compare the two lists whether the second one has one less item as expected

@click.option(
"--filter",
"filters",
multiple=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can do with multiple=False for now, but in the future we may want to extend it to match more than one. E.g. in our canonical example:

$ cernopendata-client get-metadata --recid 329 --output-value authors.orcid --filter name="Rousseau, David"
0000-0001-7613-8063

There could be more people with the same name, so one would like to write:

$ cernopendata-client get-metadata --recid 329 --output-value authors.orcid --filter name="Smith, John" --filter affiliation="Harvard"

to get John Smith from Harvard, an not from Cornell. (Just an example.)

If it it easy to implement, you can have a look at it. Otherwise we can open an issue for later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create a new issue for this and link it back to this discussion for reference.

docs/usage.rst Outdated
@@ -87,6 +87,30 @@ datasets, you can use **--output-value** command-line option:
$ cernopendata-client get-metadata --recid 1 --output-value system_details.global_tag
FT_R_42_V10A::All

If you would like to extract parts of the metadata by filtering some
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can rather say:

  • If the output field produces a list of values, you may want to filter only certain field values of interest. For example, you may want to ask not to output all authors of a dataset record, but only the authors matching certain particular author name. You can use the --filter command-line option to achieve this:

docs/usage.rst Outdated
0000-0002-9266-1783
0000-0001-5610-1060
0000-0001-7613-8063
$ cernopendata-client get-metadata --recid 329 --output-value authors --filter name='Rousseau, David'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one comes "in the middle" of the ORCID example. I think we have two options:

  1. First show example getting all authors, and filtering on one. Then show another example how to get only ORCID identifiers.
  2. Alternatively, we don't show full author JSON output, only the ORCID filtering.

@joudmas joudmas force-pushed the add-filter-option branch 6 times, most recently from f380499 to d13e519 Compare October 30, 2023 13:00
# noqa: D301
"""Filter metadata objects based on specified criteria.

:param field: Name of the array containing objects to access
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the first argument is named output_field, then we should also update the docstring of the function, see the lines:

:param field:

...

:type field:

"""Filter metadata objects based on specified criteria.

:param output_field: Name of the array containing objects to access
:param filters: Argument of the --filter option in the format field_name=value
Copy link
Member

@tiborsimko tiborsimko Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

field_name=value could be misunderstood: is value supposed to be the field name? Or the value of the field? Therefore it would be better to keep somefield=somevalue speaking abstractly, or fieldname=fieldvalue or somefieldname=somefieldvalue or some_field_name=some_field_value speaking more precisely.

This in all places:

$ rg 'field_name='
• tests/test_cli_get_metadata.py
144:    assert "Invalid filter format. Use --filter field_name=value" in test_result.output

• cernopendata_client/cli.py
76:    :param filters: Argument of the --filter option in the format field_name=value
89:            msg="Invalid filter format. Use --filter field_name=value",
145:    help="Filter only certain output values matching filtering criteria. [Use --filter field_name=value]",

@joudmas joudmas force-pushed the add-filter-option branch 2 times, most recently from db76186 to a95f805 Compare October 30, 2023 14:55
Copy link
Member

@tiborsimko tiborsimko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, works nicely 👍

We can merge this initial version, and please create that additional issue for multiple filters.

@tiborsimko tiborsimko merged commit a95f805 into master Oct 30, 2023
17 checks passed
@tiborsimko tiborsimko deleted the add-filter-option branch October 30, 2023 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants