Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[refactor] Tests/update fixtures #1046

Merged
merged 36 commits into from
Jul 15, 2021
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
5e85452
remove unused options/fixtures + separate ds/storage fixtures
verbose-void Jul 12, 2021
a23e28f
local ds generator
verbose-void Jul 13, 2021
63a5056
start setting up hub cloud credentials
verbose-void Jul 13, 2021
a1a4f05
hub cloud fixtures work
verbose-void Jul 13, 2021
669dab6
persistent dataset tests
verbose-void Jul 13, 2021
69ed6ec
remove commented out code
verbose-void Jul 13, 2021
3eb50ed
remove skipif and benchmarks
verbose-void Jul 13, 2021
03afe2a
add cache chain fixtures
verbose-void Jul 13, 2021
f6b3115
update circleci to remove --cache-chains param
verbose-void Jul 13, 2021
09fe56a
rename local test
verbose-void Jul 13, 2021
7907535
update contributing.md and rename parametrizes
verbose-void Jul 13, 2021
8157ac8
fix mypy
verbose-void Jul 13, 2021
1846ef1
Merge branch 'main' of github.com:activeloopai/Hub into tests/update-…
verbose-void Jul 13, 2021
659b0ab
remove old test explanation
verbose-void Jul 13, 2021
7f20ce5
move api docs back down
verbose-void Jul 13, 2021
844b1bb
add memory showcase & explain where to find fixtures
verbose-void Jul 13, 2021
99a4a24
reformat contributing
verbose-void Jul 13, 2021
4fea794
update section formatting
verbose-void Jul 13, 2021
a5bda79
Merge branch 'main' of github.com:activeloopai/Hub into tests/update-…
verbose-void Jul 14, 2021
741f7ef
fix mypy
verbose-void Jul 14, 2021
f8126b1
use proper hub dev toekn fixture
verbose-void Jul 14, 2021
315e675
remove todos and use cache_chain param for contributing doc
verbose-void Jul 14, 2021
20ba0ab
get storage provider from hub path
verbose-void Jul 14, 2021
5e90d31
session ID is 4 chars
verbose-void Jul 14, 2021
bad7d3d
removed delete calls
verbose-void Jul 14, 2021
ce276a7
for hub cloud use storage
verbose-void Jul 14, 2021
7356a46
remove ds =
verbose-void Jul 14, 2021
e03efdf
Merge branch 'main' of github.com:activeloopai/Hub into tests/update-…
verbose-void Jul 14, 2021
415462c
better error messaging
verbose-void Jul 14, 2021
1b4892b
raise more proper error
verbose-void Jul 14, 2021
09f0f4d
update client docstring
verbose-void Jul 14, 2021
9cc4fdc
delete dataset after suffix bug test
verbose-void Jul 14, 2021
2815c4e
fix path / token issue
verbose-void Jul 15, 2021
67b5ca6
remove extra readonly test
verbose-void Jul 15, 2021
70dfb75
mention --keep-storage with memory doesn't work
verbose-void Jul 15, 2021
65071f8
remove old assertion
verbose-void Jul 15, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 2 additions & 2 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -169,15 +169,15 @@ commands:
command: |
$Env:GOOGLE_APPLICATION_CREDENTIALS = $Env:CI_GCS_PATH
setx /m GOOGLE_APPLICATION_CREDENTIALS "$Env:GOOGLE_APPLICATION_CREDENTIALS"
python3 -m pytest --cov-report=xml --cov=./ --local --s3 --cache-chains
python3 -m pytest --cov-report=xml --cov=./ --local --s3 --hub-cloud
- when:
condition: << parameters.unix-like >>
steps:
- run:
name: "Running tests - Unix"
command: |
export GOOGLE_APPLICATION_CREDENTIALS=$HOME/.secrets/gcs.json
python3 -m pytest --cov-report=xml --cov=./ --local --s3 --cache-chains
python3 -m pytest --cov-report=xml --cov=./ --local --s3 --hub-cloud
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed --cache-chains param, it's always on because we aren't doing benchmarking anymore with just cache chains


parallelism: 10
codecov-upload:
Expand Down
136 changes: 49 additions & 87 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,120 +16,82 @@ We also use static typing for our function arguments/variables for better code r
## Testing
We use [pytest](https://docs.pytest.org/en/6.2.x/) for our tests. In order to make it easier, we also have a set of custom options defined in [conftest.py](conftest.py).

## To install all dependencies run:
### To install all dependencies run:

```
pip3 install -r requirements/common.txt
pip3 install -r requirements/plugins.txt
pip3 install -r requirements/tests.txt
```

### Running Tests

- To run memory only tests, use: `python -m pytest .`.
- To run local only tests, use: `python -m pytest . --memory-skip --local`.
- To run s3 only tests, use: `python -m pytest . --memory-skip --s3`.
- To run cache chain only tests, use: `python -m pytest . --local --s3 --cache-chains-only`. Note: you can opt out of `--local` or `--s3`, the cache chains produced will only contain enabled storage providers.
- To run ALL tests, use: `python -m pytest . --local --s3 --cache-chains`.

### Prerequisites
- Understand how to write [pytest](https://docs.pytest.org/en/6.2.x/) tests.
- Understand what a [pytest fixture](https://docs.pytest.org/en/6.2.x/fixture.html) is.
- Understand what [pytest parametrizations](https://docs.pytest.org/en/6.2.x/parametrize.html) are.
### Running Tests
- `pytest .`: Run all tests with memory only.
- `pytest . --local`: Run all tests with memory and local.
- `pytest . --s3`: Run all tests with memory and s3.
- `pytest . --memory-skip --hub-cloud`: Run all tests with hub cloud only.


### Options
To see a list of our custom pytest options, run this command: `pytest -h | sed -En '/custom options:/,/\[pytest\] ini\-options/p'`.
Combine any of the following options to suit your test cases.

### Fixtures
You can find more information on pytest fixtures [here](https://docs.pytest.org/en/6.2.x/fixture.html).
- `--local`: Enable local tests.
- `--s3`: Enable S3 tests.
- `--hub-cloud`: Enable hub cloud tests.
- `--memory-skip`: Disable memory tests.
- `--s3-path`: Specify an s3 path if you don't have access to our internal testing bucket.
- `--keep-storage`: By default all storages are cleaned up after tests run. Enable this option if you need to check the storage contents.

- `memory_storage`: If `--memory-skip` is provided, tests with this fixture will be skipped. Otherwise, the test will run with only a `MemoryProvider`.
- `local_storage`: If `--local` is **not** provided, tests with this fixture will be skipped. Otherwise, the test will run with only a `LocalProvider`.
- `s3_storage`: If `--s3` is **not** provided, tests with this fixture will be skipped. Otherwise, the test will run with only an `S3Provider`.
- `storage`: All tests that use the `storage` fixture will be parametrized with the enabled `StorageProvider`s (enabled via options defined below). If `--cache-chains` is provided, `storage` may also be a cache chain. Cache chains have the same interface as `StorageProvider`, but instead of just a single provider, it is multiple chained in a sequence, where the last provider in the chain is considered the actual storage.
- `ds`: The same as the `storage` fixture, but the storages that are parametrized are wrapped with a `Dataset`.

Each `StorageProvider`/`Dataset` that is created for a test via a fixture will automatically have a root created before and destroyed after the test. If you want to keep this data after the test run, you can use the `--keep-storage` option.
### Extra Resources
If you feel lost with any of these sections, try reading up on some of these topics.

- Understand how to write [pytest](https://docs.pytest.org/en/6.2.x/) tests.
- Understand what a [pytest fixture](https://docs.pytest.org/en/6.2.x/fixture.html) is.
- Understand what [pytest parametrizations](https://docs.pytest.org/en/6.2.x/parametrize.html) are.

#### Fixture Examples

### Fixture Usage Examples
These are not all of the available fixtures. You can see all of them [here](hub/tests/).

Single storage provider fixture
Datasets
```python
def test_memory(memory_storage):
# test will skip if `--memory-skip` is provided
memory_storage["key"] = b"1234" # this data will only be stored in memory

def test_local(local_storage):
# test will skip if `--local` is not provided
memory_storage["key"] = b"1234" # this data will only be stored locally

def test_local(s3_storage):
# test will skip if `--s3` is not provided
# test will fail if credentials are not provided
memory_storage["key"] = b"1234" # this data will only be stored in s3
```
@enabled_datasets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a good way now for testing a subset of all datasets? For example, I might want to just test local and s3 datasets (and not memory datasets) for transforms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, you just need to write a parametrization, see enabled_datasets definition

def test_dataset(ds: Dataset):
# this test will run once per enabled storage provider. if no providers are explicitly enabled,
# only memory will be used.
pass

Multiple storage providers/cache chains
```python
from hub.core.tests.common import parametrize_all_storages, parametrize_all_caches, parametrize_all_storages_and_caches

@parametrize_all_storages
def test_storage(storage):
# storage will be parametrized with all enabled `StorageProvider`s
pass

@parametrize_all_caches
def test_caches(storage):
# storage will be parametrized with all common caches containing enabled `StorageProvider`s
pass

@parametrize_all_storages_and_caches
def test_storages_and_caches(storage):
# storage will be parametrized with all enabled `StorageProvider`s and common caches containing enabled `StorageProvider`s
pass
```

def test_local_dataset(local_ds: Dataset):
# this test will run only once with a local dataset. if the `--local` option is not provided,
# this test will be skipped.
pass
```

Dataset storage providers/cache chains
Storages
```python
from hub.core.tests.common import parametrize_all_dataset_storages, parametrize_all_dataset_storages_and_caches
@enabled_storages
def test_storage(storage: StorageProvider):
# this test will run once per enabled storage provider. if no providers are explicitly enabled,
# only memory will be used.
pass

@parametrize_all_dataset_storages
def test_dataset(ds):
# `ds` will be parametrized with 1 `Dataset` object per enabled `StorageProvider`
pass

@parametrize_all_dataset_storages_and_caches
def test_dataset(ds):
# `ds` will be parametrized with 1 `Dataset` object per enabled `StorageProvider` and all cache chains containing enabled `StorageProvider`s
pass
def test_memory_storage(memory_storage: StorageProvider):
# this test will run only once with a memory storage provider. if the `--memory-skip` option is provided,
# this test will be skipped.
pass
```

## Benchmarks
We use [pytest-benchmark](https://pytest-benchmark.readthedocs.io/en/latest/usage.html) for our benchmark code which is a plugin for [pytest](https://docs.pytest.org/en/6.2.x/).

### Running Benchmarks
- To run benchmarks for memory only, use:

```python -m pytest . --benchmark-only```

- To run ALL **fast** benchmarks, use:

```python -m pytest . --local --s3 --cache-chains --benchmark-only```.

Note: this only runs the subset of benchmarks that finish quickly.

- To run ALL **fast AND slow** benchmarks, use:

```python -m pytest . --local --s3 --full-benchmarks --benchmark-only```

Note: this will take a while... (also cache chains are implicitly enabled from `--full-benchmarks`.)

- You can opt out of `--local` and `--s3` for all commands, or add `--memory-skip`. Also `--cache-chains-only` works.
- Optionally, you can remove the `--benchmark-only` flag in any of these commands to run normal tests alongside the benchmarks.
Caches
```python
@enabled_cache_chains
def test_cache(cache_chain: StorageProvider): # note: caches are provided as `StorageProvider`s
# this test runs for every cache chain that contains all enabled storage providers.
# if only memory is enabled (no providers are explicitly enabled), this test will be skipped.
pass
```

## Generating API Docs

Expand Down