Skip to content
This repository has been archived by the owner on Jan 3, 2024. It is now read-only.

asv benchmarks for imports and tools modules #184

Merged
merged 36 commits into from
Jul 24, 2023
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
2c5a65c
notebook w typical pipeline
sfmig Jun 22, 2023
2a4c729
notes on writing benchmarks
sfmig Jun 26, 2023
9879ca6
pipeline notebook
sfmig Jun 26, 2023
fad8b8a
update gitignore
sfmig Jun 26, 2023
4603bcb
add imports benchmarks
sfmig Jun 26, 2023
67faa02
basic benchmark for reading with dask
sfmig Jun 26, 2023
fb8ebe9
parametrise dask benchmark
sfmig Jun 26, 2023
a00131e
add tiff benchmark and refactor
sfmig Jun 28, 2023
9bc0046
fix build command to use pyproject correctly
sfmig Jun 28, 2023
2bf4c5a
black formatting to IO benchmarks
sfmig Jun 29, 2023
336be4c
prep benchmarks pending teardown
sfmig Jun 29, 2023
a2b200e
remove voxel_size from IO and refactor. change precommit config to sk…
sfmig Jun 29, 2023
3f3a344
remove list comprehension
sfmig Jun 29, 2023
d20a97b
add benchmarks imports
sfmig Jun 29, 2023
80f12a0
remove initial templates
sfmig Jun 29, 2023
bfc7383
add init to benchmarks
sfmig Jun 29, 2023
589ffce
add readme and comments to asv config
sfmig Jun 30, 2023
febb776
add teardown function to prep benchmarks
sfmig Jun 30, 2023
2e35a8a
add comment for review
sfmig Jun 30, 2023
ebd57fb
Merge branch 'main' into smg/basic-asv-benchmark
sfmig Jun 30, 2023
382a285
add cellfinder_core.tools.prep mypy fix
sfmig Jun 30, 2023
357725f
replace imlib by brainglobe_utils
sfmig Jun 30, 2023
43fe398
small additions to readme
sfmig Jun 30, 2023
f6b8f08
move cellfinder_core.tool.prep to ignore imports section
sfmig Jun 30, 2023
2051c4a
remove notebook
sfmig Jun 30, 2023
d0ce539
increase timeout
sfmig Jun 30, 2023
70dffb3
small additions and format edits to the readme
sfmig Jul 20, 2023
701a135
exclude benchmarks from manifest
sfmig Jul 20, 2023
8d3ed4b
small additions to the readme
sfmig Jul 20, 2023
150eda7
reduce readme to basic commands
sfmig Jul 20, 2023
333155e
fixes to IO benchmarks from review discussions
sfmig Jul 21, 2023
45510da
fix typo
sfmig Jul 21, 2023
397aeb3
Apply Will's suggestions from code review
sfmig Jul 21, 2023
0ddd6b7
Merge branch 'smg/basic-asv-benchmark' of https://github.com/brainglo…
sfmig Jul 21, 2023
a5fcee1
change install path. remove TODOs. increase default timeout further
sfmig Jul 21, 2023
9e1d4c1
Merge branch 'main' into smg/basic-asv-benchmark
sfmig Jul 21, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -124,3 +124,9 @@ pip-wheel-metadata/
mprofile*.dat

*.DS_Store

# asv
.asv
benchmarks/results
benchmarks/html
benchmarks/env
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ repos:
hooks:
- id: mypy
args: [--config-file, pyproject.toml]
exclude: benchmarks/benchmarks/tools/IO.py
additional_dependencies:
- types-setuptools
- types-requests
Expand Down
16 changes: 15 additions & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1 +1,15 @@
prune tests/data
include README.md
include LICENSE
include pyproject.toml

exclude *.yml
exclude *.yaml
exclude tox.ini
exclude CHANGELOG.md

graft src

prune benchmarks
prune tests

exclude cellfinder-core/benchmarks/*
sfmig marked this conversation as resolved.
Show resolved Hide resolved
51 changes: 39 additions & 12 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,39 @@
# Benchmarks
`detect_and_classify.py` contains a simple script that runs
detection and classification with the small test dataset.

## Memory
[memory_profiler](https://github.com/pythonprofilers/memory_profiler)
can be used to profile memory useage. Install, and then run
`mprof run --include-children --multiprocess detect_and_classify.py`. It is **very**
important to use these two flags to capture memory usage by the additional
processes that cellfinder_core uses.

To show the results of the latest profile run, run `mprof plot`.
# Benchmarking with asv
sfmig marked this conversation as resolved.
Show resolved Hide resolved
[Install asv](https://asv.readthedocs.io/en/stable/installing.html) by running:
```
pip install asv
```

`asv` works roughly as follows:
1. It creates a virtual environment (as defined in the config)
2. It installs the software package version of a specific commit (or of a local commit)
3. It times the benchmarking tests and saves the results to json files
4. The json files are 'published' into an html dir
5. The html dir can be visualised in a static website

## Running benchmarks
To run benchmarks on a specific commit:
```
$ asv run 88fbbc33^!
```

To run them up to a specific commit:
```
$ asv run 88fbbc33
```

To run them on a range of commits:
```
$ asv run 827f322b..729abcf3
```

To collate the benchmarks' results into a viewable website:
```
$ asv publish
```
This will create a tree of files in the `html` directory, but this cannot be viewed directly from the local filesystem, so we need to put them in a static site. `asv publish` also detects satistically significant decreases of performance, the results can be inspected in the 'Regression' tab of the static site.

To visualise the results in a static site:
```
$ asv preview
```
188 changes: 188 additions & 0 deletions benchmarks/asv.conf.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
{
// The version of the config file format. Do not change, unless
// you know what you are doing.
"version": 1,

// The name of the project being benchmarked
"project": "cellfinder-core",

// The project's homepage
"project_url": "https://brainglobe.info/documentation/cellfinder/index.html",

// The URL or local path of the source code repository for the
// project being benchmarked
// To use the upstream repository: uncomment the 1st line (and comment the 2nd)
// To use the local repository: comment the 1st line (and uncomment the 2nd)
//"repo": "https://github.com/brainglobe/cellfinder-core.git",
"repo": "..",

// The Python project's subdirectory in your repo. If missing or
// the empty string, the project is assumed to be located at the root
// of the repository (where setup.py is located)
// "repo_subdir": "",

// Customizable commands for building, installing, and
// uninstalling the project. See asv.conf.json documentation.
//
"install_command": ["in-dir={env_dir} python -mpip install {wheel_file}"],
"uninstall_command": ["return-code=any python -mpip uninstall -y {project}"],
"build_command": [
"python -m pip install build",
"python -m build",
"PIP_NO_BUILD_ISOLATION=false python -mpip wheel --no-deps --no-index -w {build_cache_dir} {build_dir}"
],

// List of branches to benchmark. If not provided, defaults to "master"
// (for git) or "default" (for mercurial).
"branches": ["main"], // for git
// "branches": ["default"], // for mercurial

// The DVCS being used. If not set, it will be automatically
// determined from "repo" by looking at the protocol in the URL
// (if remote), or by looking for special directories, such as
// ".git" (if local).
// "dvcs": "git",

// The tool to use to create environments. May be "conda",
// "virtualenv" or other value depending on the plugins in use.
// If missing or the empty string, the tool will be automatically
// determined by looking for tools on the PATH environment
// variable.
"environment_type": "conda",

// timeout in seconds for installing any dependencies in environment
// defaults to 10 min
//"install_timeout": 600,

// the base URL to show a commit for the project.
"show_commit_url": "http://github.com/brainglobe/cellfinder-core/commit/",

// The Pythons you'd like to test against. If not provided, defaults
// to the current version of Python used to run `asv`.
"pythons": ["3.10"], // same as pyproject.toml? ["3.8", "3.9", "3.10"]

// The list of conda channel names to be searched for benchmark
// dependency packages in the specified order
"conda_channels": ["conda-forge", "defaults"],

// A conda environment file that is used for environment creation.
// "conda_environment_file": "environment.yml",

// The matrix of dependencies to test. Each key of the "req"
// requirements dictionary is the name of a package (in PyPI) and
// the values are version numbers. An empty list or empty string
// indicates to just test against the default (latest)
// version. null indicates that the package is to not be
// installed. If the package to be tested is only available from
// PyPi, and the 'environment_type' is conda, then you can preface
// the package name by 'pip+', and the package will be installed
// via pip (with all the conda available packages installed first,
// followed by the pip installed packages).
//
// The ``@env`` and ``@env_nobuild`` keys contain the matrix of
// environment variables to pass to build and benchmark commands.
// An environment will be created for every combination of the
// cartesian product of the "@env" variables in this matrix.
// Variables in "@env_nobuild" will be passed to every environment
// during the benchmark phase, but will not trigger creation of
// new environments. A value of ``null`` means that the variable
// will not be set for the current combination.
//
"matrix": {
"req": {},
// "napari": ["", null], // test with and without
// // "six": ["", null], // test with and without six installed
// // "pip+emcee": [""] // emcee is only available for install with pip.
// },
// "env": {"ENV_VAR_1": ["val1", "val2"]},
// "env_nobuild": {"ENV_VAR_2": ["val3", null]},
},

// Combinations of libraries/python versions can be excluded/included
// from the set to test. Each entry is a dictionary containing additional
// key-value pairs to include/exclude.
//
// An exclude entry excludes entries where all values match. The
// values are regexps that should match the whole string.
//
// An include entry adds an environment. Only the packages listed
// are installed. The 'python' key is required. The exclude rules
// do not apply to includes.
//
// In addition to package names, the following keys are available:
//
// - python
// Python version, as in the *pythons* variable above.
// - environment_type
// Environment type, as above.
// - sys_platform
// Platform, as in sys.platform. Possible values for the common
// cases: 'linux2', 'win32', 'cygwin', 'darwin'.
// - req
// Required packages
// - env
// Environment variables
// - env_nobuild
// Non-build environment variables
//
// "exclude": [
// {"python": "3.2", "sys_platform": "win32"}, // skip py3.2 on windows
// {"environment_type": "conda", "req": {"six": null}}, // don't run without six on conda
// {"env": {"ENV_VAR_1": "val2"}}, // skip val2 for ENV_VAR_1
// ],
//
// "include": [
// // additional env for python2.7
// {"python": "2.7", "req": {"numpy": "1.8"}, "env_nobuild": {"FOO": "123"}},
// // additional env if run on windows+conda
// {"platform": "win32", "environment_type": "conda", "python": "2.7", "req": {"libpython": ""}},
// ],

// The directory (relative to the current directory) that benchmarks are
// stored in. If not provided, defaults to "benchmarks"
"benchmark_dir": "benchmarks",

// The directory (relative to the current directory) to cache the Python
// environments in. If not provided, defaults to "env"
"env_dir": "env",

// The directory (relative to the current directory) that raw benchmark
// results are stored in. If not provided, defaults to "results".
"results_dir": "results",

// The directory (relative to the current directory) that the html tree
// should be written to. If not provided, defaults to "html".
"html_dir": "html",

// The number of characters to retain in the commit hashes.
// "hash_length": 8,

// `asv` will cache results of the recent builds in each
// environment, making them faster to install next time. This is
// the number of builds to keep, per environment.
"build_cache_size": 2,

// The commits after which the regression search in `asv publish`
// should start looking for regressions. Dictionary whose keys are
// regexps matching to benchmark names, and values corresponding to
// the commit (exclusive) after which to start looking for
// regressions. The default is to start from the first commit
// with results. If the commit is `null`, regression detection is
// skipped for the matching benchmark.
//
// "regressions_first_commits": {
// "some_benchmark": "352cdf", // Consider regressions only after this commit
// "another_benchmark": null, // Skip regression detection altogether
// },

// The thresholds for relative change in results, after which `asv
// publish` starts reporting regressions. Dictionary of the same
// form as in ``regressions_first_commits``, with values
// indicating the thresholds. If multiple entries match, the
// maximum is taken. If no entry matches, the default is 5%.
//
// "regressions_thresholds": {
// "some_benchmark": 0.01, // Threshold of 1%
// "another_benchmark": 0.5, // Threshold of 50%
// },
}
Empty file.
43 changes: 43 additions & 0 deletions benchmarks/benchmarks/imports.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# ------------------------------------
# Runtime benchmarks
# ------------------------------------
def timeraw_import_main():
sfmig marked this conversation as resolved.
Show resolved Hide resolved
return """
from cellfinder_core.main import main
"""


def timeraw_import_io_dask():
return """
from cellfinder_core.tools.IO import read_with_dask
"""


def timeraw_import_io_tiff_meta():
return """
from cellfinder_core.tools.IO import get_tiff_meta
"""


def timeraw_import_prep_tensorflow():
return """
from cellfinder_core.tools.prep import prep_tensorflow
"""


def timeraw_import_prep_models():
return """
from cellfinder_core.tools.prep import prep_models
"""


def timeraw_import_prep_classification():
return """
from cellfinder_core.tools.prep import prep_classification
"""


def timeraw_import_prep_training():
return """
from cellfinder_core.tools.prep import prep_training
"""
68 changes: 68 additions & 0 deletions benchmarks/benchmarks/tools/IO.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
import os
from pathlib import Path

from cellfinder_core.tools.IO import get_tiff_meta, read_with_dask

p = Path(os.path.dirname(__file__)).absolute()
CELLFINDER_CORE_PATH = p.parents[2]
sfmig marked this conversation as resolved.
Show resolved Hide resolved
TESTS_DATA_INTEGRATION_PATH = (
Path(CELLFINDER_CORE_PATH) / "tests" / "data" / "integration"
)
# Q for review: is there a nice way to get cellfinder-core path?
sfmig marked this conversation as resolved.
Show resolved Hide resolved


class Read:
# ---------------------------------------------
# Setup & teardown functions
# --------------------------------------------
def setup(self, subdir):
self.data_dir = str(subdir)

def teardown(self, subdir):
del self.data_dir
# Q for review: do I need this?
# maybe only relevant if it is the parameter we sweep across?
# from https://github.com/astropy/astropy-benchmarks/blob/
# 8758dabf84001903ea00c31a001809708969a3e4/benchmarks/cosmology.py#L24
# (they only use teardown function in that case)
sfmig marked this conversation as resolved.
Show resolved Hide resolved

# ---------------------------------------------
# Benchmarks for reading 3d arrays with dask
# --------------------------------------------
def time_read_with_dask(self, subdir):
read_with_dask(self.data_dir)

time_read_with_dask.param_names = [
"tests_data_integration_subdir",
]
time_read_with_dask.params = (
[
TESTS_DATA_INTEGRATION_PATH
/ Path("detection", "crop_planes", "ch0"),
TESTS_DATA_INTEGRATION_PATH
/ Path("detection", "crop_planes", "ch1"),
],
)

# -----------------------------------------------
# Benchmarks for reading metadata from tif files
# -------------------------------------------------
def time_get_tiff_meta(
self,
subdir,
):
get_tiff_meta(self.data_dir)

time_get_tiff_meta.param_names = [
"tests_data_integration_tiffile",
]

cells_tif_files = list(
Path(TESTS_DATA_INTEGRATION_PATH, "training", "cells").glob("*.tif")
)
non_cells_tif_files = list(
Path(TESTS_DATA_INTEGRATION_PATH, "training", "non_cells").glob(
"*.tif"
)
)
time_get_tiff_meta.params = cells_tif_files + non_cells_tif_files
Empty file.
Loading