-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cli: add filter option for the get-metadata command #130
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #130 +/- ##
==========================================
- Coverage 79.84% 79.71% -0.14%
==========================================
Files 11 11
Lines 665 700 +35
==========================================
+ Hits 531 558 +27
- Misses 134 142 +8
|
2f62075
to
8eacb67
Compare
cernopendata_client/cli.py
Outdated
"--filter", | ||
"filters", | ||
multiple=False, | ||
help="Filter workflow that contains certain filtering criteria. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "workflow" here is a remainder from the REANA client?
The help should rather say here something like:
- Filter only certain output values matching filtering criteria. Use
--filter somefield=somevalue
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also add a full example to the Examples section of the docstring:
$ cernopendata-client get-metadata --recid 329 --output-value authors.orcid --filter name="Rousseau, David"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moreover, you can enrich the user guide usage.rst
the section "Getting metadata" so that the David Rousseau example will also appear here: https://cernopendata-client.readthedocs.io/en/latest/usage.html#getting-metadata
tests/test_cli_get_metadata.py
Outdated
"--output-value", | ||
"authors.orcid", | ||
"--filter", | ||
"authors.url=Rousseau, David", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to quote "Rousseau, David" as an argument.
BTW this works nicely:
$ cernopendata-client get-metadata --recid 329 --output-value authors.orcid --filter authors.name="Rousseau, David"
0000-0001-7613-8063
$ cernopendata-client get-metadata --recid 329 --output-value authors.orcid --filter name="Rousseau, David"
0000-0001-7613-8063
But the following ones do not fully work:
$ cernopendata-client get-metadata --recid 329 --output-value authors --filter name="Rousseau, David"
(All output values are printed, even though user might expect only the values matching the filtering criteria.)
Can you please have a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works well with the PR update 👍 You may want to also add a test case for this, i.e. get all authors, then with the filter, and then compare the two lists whether the second one has one less item as expected
@click.option( | ||
"--filter", | ||
"filters", | ||
multiple=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can do with multiple=False
for now, but in the future we may want to extend it to match more than one. E.g. in our canonical example:
$ cernopendata-client get-metadata --recid 329 --output-value authors.orcid --filter name="Rousseau, David"
0000-0001-7613-8063
There could be more people with the same name, so one would like to write:
$ cernopendata-client get-metadata --recid 329 --output-value authors.orcid --filter name="Smith, John" --filter affiliation="Harvard"
to get John Smith from Harvard, an not from Cornell. (Just an example.)
If it it easy to implement, you can have a look at it. Otherwise we can open an issue for later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will create a new issue for this and link it back to this discussion for reference.
docs/usage.rst
Outdated
@@ -87,6 +87,30 @@ datasets, you can use **--output-value** command-line option: | |||
$ cernopendata-client get-metadata --recid 1 --output-value system_details.global_tag | |||
FT_R_42_V10A::All | |||
|
|||
If you would like to extract parts of the metadata by filtering some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can rather say:
- If the output field produces a list of values, you may want to filter only certain field values of interest. For example, you may want to ask not to output all authors of a dataset record, but only the authors matching certain particular author name. You can use the
--filter
command-line option to achieve this:
docs/usage.rst
Outdated
0000-0002-9266-1783 | ||
0000-0001-5610-1060 | ||
0000-0001-7613-8063 | ||
$ cernopendata-client get-metadata --recid 329 --output-value authors --filter name='Rousseau, David' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one comes "in the middle" of the ORCID example. I think we have two options:
- First show example getting all authors, and filtering on one. Then show another example how to get only ORCID identifiers.
- Alternatively, we don't show full author JSON output, only the ORCID filtering.
f380499
to
d13e519
Compare
cernopendata_client/cli.py
Outdated
# noqa: D301 | ||
"""Filter metadata objects based on specified criteria. | ||
|
||
:param field: Name of the array containing objects to access |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the first argument is named output_field
, then we should also update the docstring of the function, see the lines:
:param field:
...
:type field:
d13e519
to
8bb1811
Compare
cernopendata_client/cli.py
Outdated
"""Filter metadata objects based on specified criteria. | ||
|
||
:param output_field: Name of the array containing objects to access | ||
:param filters: Argument of the --filter option in the format field_name=value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
field_name=value
could be misunderstood: is value supposed to be the field name? Or the value of the field? Therefore it would be better to keep somefield=somevalue
speaking abstractly, or fieldname=fieldvalue
or somefieldname=somefieldvalue
or some_field_name=some_field_value
speaking more precisely.
This in all places:
$ rg 'field_name='
• tests/test_cli_get_metadata.py
144: assert "Invalid filter format. Use --filter field_name=value" in test_result.output
• cernopendata_client/cli.py
76: :param filters: Argument of the --filter option in the format field_name=value
89: msg="Invalid filter format. Use --filter field_name=value",
145: help="Filter only certain output values matching filtering criteria. [Use --filter field_name=value]",
8bb1811
to
6201af4
Compare
db76186
to
a95f805
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, works nicely 👍
We can merge this initial version, and please create that additional issue for multiple filters.
cli: add --filter option in get-metadata command
modified get-metadata to support multiple fields