Skip to content

Commit

Permalink
Final tweaks to client from code review:
Browse files Browse the repository at this point in the history
- Bump httpx & pytest-httpx
- Add --pretty-print flag and disable by default for performance
- Add a simple async CLI test in CI
- Documentation tweaks
- Added docs section about endpoints
- Setting `max_results_per_provider` to -1 or 0 will remove the limit
- Allow `extensions/<example>` endpoints to be queried
- Final code review changes
- Addded feature list to top of docs

Co-authored-by: Johan Bergsma <29785380+JPBergsma@users.noreply.github.com>
  • Loading branch information
ml-evs and JPBergsma committed May 25, 2022
1 parent c6c6633 commit 94c9b6a
Show file tree
Hide file tree
Showing 8 changed files with 145 additions and 19 deletions.
28 changes: 28 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -248,6 +248,34 @@ jobs:
OPTIMADE_DATABASE_BACKEND: 'mongodb'
OPTIMADE_INSERT_TEST_DATA: false # Must be specified as previous steps will have already inserted the test data

- name: Run the OPTIMADE Client CLI
if: matrix.python-version == 3.8
run: |
coverage run --append --source optimade optimade/client/cli.py \
--filter 'nsites = 1' \
--output-file test_get_async.json \
https://optimade.herokuapp.com
test test_get_async.json
coverage run --append --source optimade optimade/client/cli.py \
--filter 'nsites = 1' \
--count \
--output-file test_count.json \
https://optimade.herokuapp.com
test test_count.json
coverage run --append --source optimade optimade/client/cli.py \
--no-async \
--filter 'nsites = 1' \
--count \
--output-file test_count_no_async.json \
https://optimade.herokuapp.com
test test_count_no_async.json
diff test_count_no_async.json test_count.json
coverage xml
- name: Upload coverage to Codecov
if: matrix.python-version == 3.8 && github.repository == 'Materials-Consortia/optimade-python-tools'
uses: codecov/codecov-action@v3
Expand Down
93 changes: 82 additions & 11 deletions docs/getting_started/client.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,25 @@
# Using the OPTIMADE client

This package includes a Python client that can be used to query multiple OPTIMADE APIs simultaneously.
This package includes a Python client that can be used to query multiple OPTIMADE APIs simultaneously, whilst automatically paginating through the results of each query.
The client can be used from the command-line (`optimade-get`), or called in Python code.

The client does not currently validate the returned data it comes back from the databases, and as some OPTIMADE APIs do not implement all features, it is worth paying attention to any error messages emitted by each database.

## Features

This list outlines the current and planned features for the client:

- [x] Query multiple OPTIMADE APIs simultaneously and asynchronously, with support for different endpoints, filters, sorting and response fields.
- [x] Automatically paginate through the results for each query.
- [x] Validate filters against the OPTIMADE grammar before they are sent to each database.
- [x] Count the number of results for a query without downloading them all.
- [ ] Valiate the results against the optimade-python-tools models and export into other supported formats (ASE, pymatgen, CIF, AiiDA).
- [ ] Enable asynchronous use in cases where there is already a running event loop (e.g., inside a Jupyter notebook).
- [ ] Cache the results for queries to disk, and use them in future sessions without making new requests.
- [ ] Support for querying databases indirectly via an [OPTIMADE gateway server](https://github.com/Materials-Consortia/optimade-gateway).

## Installation

The client requires some extra dependencies that can be installed with the PyPI package with

```shell
Expand All @@ -13,6 +30,8 @@ or from a local copy of the repository with
pip install -e .[http_client]
```

## Usage

By default, the client will query all OPTIMADE API URLs that it can find via the [Providers list](https://providers.optimade.org):


Expand All @@ -25,14 +44,23 @@ By default, the client will query all OPTIMADE API URLs that it can find via the
```python
from optimade.client import OptimadeClient
client = OptimadeClient()
results = client.get()
results = client.get('elements HAS "Ag"')
```

At the command line, it may be immediately useful to redirect or save these results to a file:

```shell
# Save the results to a JSON file directly
optimade-get --filter 'elements HAS "Ag"' --output-file results.json
# or redirect the results (in a POSIX shell)
optimade-get --filter 'elements HAS "Ag"' > results.json
```

We can refine the search by manually specifying some URLs:

=== "Command line"
```shell
optimade-get https://optimade.herokuapp.com https://optimade.odbx.science
optimade-get --output-file results.json https://optimade.herokuapp.com https://optimade.odbx.science
```

=== "Python"
Expand All @@ -44,12 +72,14 @@ We can refine the search by manually specifying some URLs:
client.get()
```

By default, the command-line interface will use an example filter, and the Python interface will use an empty filter.
### Filtering

By default, an empty filter will be used (which will return all entries in a database).
You can specify your desired filter as follows (note the quotation marks):

=== "Command line"
```shell
optimade-get --filter 'elements HAS "Ag" AND nsites < 2'
optimade-get --filter 'elements HAS "Ag" AND nsites < 2' --output-file results.json
```

=== "Python"
Expand All @@ -61,13 +91,14 @@ You can specify your desired filter as follows (note the quotation marks):

The filter will be validated against the `optimade-python-tools` reference grammar before it is sent to the underlying servers.

## Accessing the results
### Accessing the results

At the command-line, the results of the query will be printed to `stdout` to be redirected to a file or piped into another program.
At the command-line, the results of the query will be printed to `stdout`, ready to be redirected to a file or piped into another program.
For example:

```shell
optimade-get --filter 'nsites = 1' https://optimade.herokuapp.com
optimade-get --filter 'nsites = 1' --output-file results.json https://optimade.herokuapp.com
cat results.json
```

has the followng (truncated) output:
Expand Down Expand Up @@ -120,7 +151,7 @@ import json
client = OptimadeClient(base_urls="https://optimade.herokuapp.com")
client.get('nsites = 1')
client.get('nsites = 2')
print(json.dumps(client.all_results, indent=2))
print(client.all_results)
```

will return a dictionary with top-level keys:
Expand All @@ -142,13 +173,53 @@ For a given session, this cache can be written and reloaded into an OPTIMADE cli
!!! info
In a future release, this cache will be automatically restored from disk and will obey defined cache lifetimes.

### Counting entries and limiting results
### Querying other endpoints

The client can also query other endpoints, rather than just the default `/structures` endpoint.
This includes any provider-specific `extensions/<example>` endpoints that may be implemented at a given base URL, which can be found listed at the corresponding `/info` endpoint for that database.

In the CLI, this is done with the `--endpoint` flag.
In the Python interface, the different endpoints can be queried as attributes of the client class or equivalently as a paramter to `client.get()` or `client.count()` (see below).

=== "Command line"
```shell
optimade-get --endpoint "structures"
optimade-get --endpoint "references"
optimade-get --endpoint "info"
optimade-get --endpoint "info/structures"
optimade-get --endpoint "extensions/properties"
```

=== "Python"
```python
from optimade.client import OptimadeClient
client = OptimadeClient()

client.references.count()
client.count(endpoint="references")

client.info.get()
client.get(endpoint="info")

client.info.structures.get()
client.get(endpoint="info/structures")

client.extensions.properties.get()
client.get(endpoint="extensions/properties")
```

### Limiting the number of responses

Querying all OPTIMADE APIs without limiting the number of entries can result in a lot of data.
The client will limit the number of results returned per database to the value of `max_results_per_provider` (defaults: 1000 for Python, 100 for CLI).
The client will limit the number of results returned per database to the value of `max_results_per_provider` (defaults: 1000 for Python, 10 for CLI).
This limit will be enforced up to a difference of the default page limit for the underlying OPTIMADE API (which is used everywhere).
This parameter can be controlled via the `--max-results-per-provider 10` at the CLI, or as an argument to `OptimadeClient(max_results_per_provider=10)`.

Setting this to a value of `-1` or `0` (or additionally `None`, if using the Python interface) will remove the limit on the number of results per provider.
In the CLI, this setting should be used alongside `--output-file` or redirection to avoid overflowing your terminal!

### Counting the number of responses without downloading

Downloading all the results for a given query can require hundreds or thousands of requests, depending on the number of results and the database's page limit.
It is possible to just count the number of results before downloading the entries themselves, which only requires 1 request per database.
This is achieved via the `--count` flag in the CLI, or the `.count()` method in the Python interface.
Expand Down
2 changes: 2 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ markdown_extensions:
- pymdownx.inlinehilite
- pymdownx.tabbed:
alternate_style: true
- pymdownx.tasklist:
custom_checkbox: true
- pymdownx.snippets
- toc:
permalink: true
Expand Down
21 changes: 18 additions & 3 deletions optimade/client/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@
@click.option("--use-async/--no-async", default=True, help="Use asyncio or not")
@click.option(
"--max-results-per-provider",
default=100,
help="Set the maximum number of results to download from any single provider",
default=10,
help="Set the maximum number of results to download from any single provider, where -1 or 0 indicate unlimited results.",
)
@click.option(
"--output-file",
Expand All @@ -48,6 +48,11 @@
default=None,
help="A string of comma-separated response fields to request.",
)
@click.option(
"--pretty-print",
is_flag=True,
help="Pretty print the JSON results.",
)
@click.argument("base-url", default=None, nargs=-1)
def get(
use_async,
Expand All @@ -59,6 +64,7 @@ def get(
response_fields,
sort,
endpoint,
pretty_print,
):
return _get(
use_async,
Expand All @@ -70,6 +76,7 @@ def get(
response_fields,
sort,
endpoint,
pretty_print,
)


Expand All @@ -83,6 +90,7 @@ def _get(
response_fields,
sort,
endpoint,
pretty_print,
):

if output_file:
Expand Down Expand Up @@ -116,8 +124,15 @@ def _get(
sys.exit(1)

if not output_file:
rich.print_json(data=results, default=lambda _: _.dict())
if pretty_print:
rich.print_json(data=results, indent=2, default=lambda _: _.dict())
else:
sys.stdout.write(json.dumps(results, indent=2, default=lambda _: _.dict()))

if output_file:
with open(output_file, "w") as f:
json.dump(results, f, indent=2, default=lambda _: _.dict())


if __name__ == "__main__":
get()
13 changes: 11 additions & 2 deletions optimade/client/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
)
from optimade.server.exceptions import BadRequest

ENDPOINTS = ("structures", "references", "calculations", "info")
ENDPOINTS = ("structures", "references", "calculations", "info", "extensions")

__all__ = ("OptimadeClient",)

Expand Down Expand Up @@ -107,6 +107,8 @@ def __init__(
base_urls = get_all_databases()

self.max_results_per_provider = max_results_per_provider
if self.max_results_per_provider in (-1, 0):
self.max_results_per_provider = None

self.base_urls = base_urls
if isinstance(self.base_urls, str):
Expand All @@ -127,7 +129,10 @@ def __init__(
self.use_async = use_async

def __getattribute__(self, name):
"""Allows entry endpoints to be queried via attribute access.
"""Allows entry endpoints to be queried via attribute access, using the
allowed list for this module.
Should also pass through any `extensions/<example>` endpoints.
Any non-entry-endpoint name requested will be passed to the
original `__getattribute__`.
Expand All @@ -138,15 +143,19 @@ def __getattribute__(self, name):
cli = OptimadeClient()
structures = cli.structures.get()
references = cli.references.get()
info_structures = cli.info.structures.get()
```
"""
if name in ENDPOINTS:
if self.__current_endpoint == "info":
self.__current_endpoint = f"info/{name}"
elif self.__current_endpoint == "extensions":
self.__current_endpoint = f"extensions/{name}"
else:
self.__current_endpoint = name
return self

return super().__getattribute__(name)

def get(
Expand Down
2 changes: 1 addition & 1 deletion requirements-http-client.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
click==8.1.3
httpx==0.22.0
httpx==0.23.0
rich==12.4.1
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
"aiida-core~=2.0;python_version>='3.8'",
]
http_client_deps = [
"httpx~=0.22",
"httpx~=0.23",
"rich~=12.4",
"click~=8.1",
]
Expand All @@ -57,7 +57,7 @@
"jsondiff~=2.0",
"pytest~=7.1",
"pytest-cov~=3.0",
"pytest-httpx~=0.20.0",
"pytest-httpx~=0.21",
] + server_deps
dev_deps = (
["pylint~=2.13", "pre-commit~=2.19", "invoke~=1.7"]
Expand Down
1 change: 1 addition & 0 deletions tests/server/test_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@ def test_command_line_client(httpx_mocked_response, use_async, capsys):
response_fields=None,
sort=None,
endpoint="structures",
pretty_print=False,
)

# Test multi-provider query
Expand Down

0 comments on commit 94c9b6a

Please sign in to comment.