-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[refactor] Tests/update fixtures #1046
Changes from 13 commits
5e85452
a23e28f
63a5056
a1a4f05
669dab6
69ed6ec
3eb50ed
03afe2a
f6b3115
09fe56a
7907535
8157ac8
1846ef1
659b0ab
7f20ce5
844b1bb
99a4a24
4fea794
a5bda79
741f7ef
f8126b1
315e675
20ba0ab
5e90d31
bad7d3d
ce276a7
7356a46
e03efdf
415462c
1b4892b
09f0f4d
9cc4fdc
2815c4e
67b5ca6
70dfb75
65071f8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,6 +24,16 @@ pip3 install -r requirements/plugins.txt | |
pip3 install -r requirements/tests.txt | ||
``` | ||
|
||
## Generating API Docs | ||
|
||
Hub used pdocs3 to generate docs: https://pdoc3.github.io/pdoc/ | ||
API docs are hosted at: https://api-docs.activeloop.ai/ | ||
|
||
Run the below command to generate API documentation: | ||
``` | ||
pdoc3 --html --output-dir api_docs --template-dir pdoc/templates hub | ||
``` | ||
|
||
### Running Tests | ||
|
||
- To run memory only tests, use: `python -m pytest .`. | ||
|
@@ -32,111 +42,64 @@ pip3 install -r requirements/tests.txt | |
- To run cache chain only tests, use: `python -m pytest . --local --s3 --cache-chains-only`. Note: you can opt out of `--local` or `--s3`, the cache chains produced will only contain enabled storage providers. | ||
- To run ALL tests, use: `python -m pytest . --local --s3 --cache-chains`. | ||
|
||
### Prerequisites | ||
### Extra Resources | ||
- Understand how to write [pytest](https://docs.pytest.org/en/6.2.x/) tests. | ||
- Understand what a [pytest fixture](https://docs.pytest.org/en/6.2.x/fixture.html) is. | ||
- Understand what [pytest parametrizations](https://docs.pytest.org/en/6.2.x/parametrize.html) are. | ||
|
||
|
||
### Options | ||
To see a list of our custom pytest options, run this command: `pytest -h | sed -En '/custom options:/,/\[pytest\] ini\-options/p'`. | ||
Combine any of the following options to suit your test cases. | ||
|
||
### Fixtures | ||
You can find more information on pytest fixtures [here](https://docs.pytest.org/en/6.2.x/fixture.html). | ||
|
||
- `memory_storage`: If `--memory-skip` is provided, tests with this fixture will be skipped. Otherwise, the test will run with only a `MemoryProvider`. | ||
- `local_storage`: If `--local` is **not** provided, tests with this fixture will be skipped. Otherwise, the test will run with only a `LocalProvider`. | ||
- `s3_storage`: If `--s3` is **not** provided, tests with this fixture will be skipped. Otherwise, the test will run with only an `S3Provider`. | ||
- `storage`: All tests that use the `storage` fixture will be parametrized with the enabled `StorageProvider`s (enabled via options defined below). If `--cache-chains` is provided, `storage` may also be a cache chain. Cache chains have the same interface as `StorageProvider`, but instead of just a single provider, it is multiple chained in a sequence, where the last provider in the chain is considered the actual storage. | ||
- `ds`: The same as the `storage` fixture, but the storages that are parametrized are wrapped with a `Dataset`. | ||
|
||
Each `StorageProvider`/`Dataset` that is created for a test via a fixture will automatically have a root created before and destroyed after the test. If you want to keep this data after the test run, you can use the `--keep-storage` option. | ||
- `--local`: Enable local tests. | ||
- `--s3`: Enable S3 tests. | ||
- `--hub-cloud`: Enable hub cloud tests. | ||
- `--memory-skip`: Disable memory tests. | ||
- `--s3-path`: Specify an s3 path if you don't have access to our internal testing bucket. | ||
- `--keep-storage`: By default all storages are cleaned up after tests run. Enable this option if you need to check the storage contents. | ||
|
||
#### Options Examples | ||
- `pytest .`: Run all tests with memory only. | ||
- `pytest . --local`: Run all tests with memory and local. | ||
- `pytest . --s3`: Run all tests with memory and s3. | ||
- `pytest . --memory-skip --hub-cloud`: Run all tests with hub cloud only. | ||
|
||
#### Fixture Examples | ||
|
||
|
||
Single storage provider fixture | ||
Datasets | ||
```python | ||
def test_memory(memory_storage): | ||
# test will skip if `--memory-skip` is provided | ||
memory_storage["key"] = b"1234" # this data will only be stored in memory | ||
|
||
def test_local(local_storage): | ||
# test will skip if `--local` is not provided | ||
memory_storage["key"] = b"1234" # this data will only be stored locally | ||
|
||
def test_local(s3_storage): | ||
# test will skip if `--s3` is not provided | ||
# test will fail if credentials are not provided | ||
memory_storage["key"] = b"1234" # this data will only be stored in s3 | ||
``` | ||
@enabled_datasets | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is there a good way now for testing a subset of all datasets? For example, I might want to just test local and s3 datasets (and not memory datasets) for transforms. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, you just need to write a parametrization, see |
||
def test_dataset(ds: Dataset): | ||
# this test will run once per enabled storage provider. if no providers are explicitly enabled, | ||
# only memory will be used. | ||
pass | ||
|
||
Multiple storage providers/cache chains | ||
```python | ||
from hub.core.tests.common import parametrize_all_storages, parametrize_all_caches, parametrize_all_storages_and_caches | ||
|
||
@parametrize_all_storages | ||
def test_storage(storage): | ||
# storage will be parametrized with all enabled `StorageProvider`s | ||
pass | ||
|
||
@parametrize_all_caches | ||
def test_caches(storage): | ||
# storage will be parametrized with all common caches containing enabled `StorageProvider`s | ||
pass | ||
|
||
@parametrize_all_storages_and_caches | ||
def test_storages_and_caches(storage): | ||
# storage will be parametrized with all enabled `StorageProvider`s and common caches containing enabled `StorageProvider`s | ||
pass | ||
``` | ||
|
||
def test_local_dataset(local_ds: Dataset): | ||
# this test will run only once with a local dataset. if the `--local` option is not provided, | ||
# this test will be skipped. | ||
pass | ||
``` | ||
|
||
Dataset storage providers/cache chains | ||
Storages | ||
```python | ||
from hub.core.tests.common import parametrize_all_dataset_storages, parametrize_all_dataset_storages_and_caches | ||
@enabled_storages | ||
def test_storage(storage: StorageProvider): | ||
# this test will run once per enabled storage provider. if no providers are explicitly enabled, | ||
# only memory will be used. | ||
pass | ||
|
||
@parametrize_all_dataset_storages | ||
def test_dataset(ds): | ||
# `ds` will be parametrized with 1 `Dataset` object per enabled `StorageProvider` | ||
pass | ||
|
||
@parametrize_all_dataset_storages_and_caches | ||
def test_dataset(ds): | ||
# `ds` will be parametrized with 1 `Dataset` object per enabled `StorageProvider` and all cache chains containing enabled `StorageProvider`s | ||
pass | ||
def test_local_storage(local_storage: StorageProvider): | ||
# this test will run only once with a local storage provider. if the `--local` option is not provided, | ||
# this test will be skipped. | ||
pass | ||
``` | ||
|
||
## Benchmarks | ||
We use [pytest-benchmark](https://pytest-benchmark.readthedocs.io/en/latest/usage.html) for our benchmark code which is a plugin for [pytest](https://docs.pytest.org/en/6.2.x/). | ||
|
||
### Running Benchmarks | ||
- To run benchmarks for memory only, use: | ||
|
||
```python -m pytest . --benchmark-only``` | ||
|
||
- To run ALL **fast** benchmarks, use: | ||
|
||
```python -m pytest . --local --s3 --cache-chains --benchmark-only```. | ||
|
||
Note: this only runs the subset of benchmarks that finish quickly. | ||
|
||
- To run ALL **fast AND slow** benchmarks, use: | ||
|
||
```python -m pytest . --local --s3 --full-benchmarks --benchmark-only``` | ||
|
||
Note: this will take a while... (also cache chains are implicitly enabled from `--full-benchmarks`.) | ||
|
||
- You can opt out of `--local` and `--s3` for all commands, or add `--memory-skip`. Also `--cache-chains-only` works. | ||
- Optionally, you can remove the `--benchmark-only` flag in any of these commands to run normal tests alongside the benchmarks. | ||
|
||
## Generating API Docs | ||
|
||
Hub used pdocs3 to generate docs: https://pdoc3.github.io/pdoc/ | ||
API docs are hosted at: https://api-docs.activeloop.ai/ | ||
|
||
Run the below command to generate API documentation: | ||
``` | ||
pdoc3 --html --output-dir api_docs --template-dir pdoc/templates hub | ||
Caches | ||
```python | ||
@enabled_cache_chains | ||
def test_cache(cache: StorageProvider): # note: caches are provided as `StorageProvider`s | ||
# this test runs for every cache chain that contains all enabled storage providers. | ||
# if only memory is enabled (no providers are explicitly enabled), this test will be skipped. | ||
pass | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed --cache-chains param, it's always on because we aren't doing benchmarking anymore with just cache chains