pull assemblies into benchmarks; add versioning; add public benchmarks #175

mschrimpf · 2019-11-18T22:40:41Z

pull assemblies into benchmarks with benchmarks structured by assembly name (resolves get rid of assemblies package; structure benchmarks and assemblies into datasets #174)
version benchmarks
add public benchmarks

Other changes resulting from this PR:

add more unit tests for consistent scores with precomputed model features
remove all temporal predictivity benchmarks (since they were barely used and are being re-implemented by @anayebi following their paper)
examples use public assemblies and benchmarks
new method to list all public assemblies

rationale: we don't use this benchmark much anyway and I'm not sure when we have last tested this score

rationale: these benchmarks are WIP and until we agree on the right metric and ceiling, let's not put them in the Brain-Score standards

depends on brain-score/vision#175

mschrimpf · 2019-11-20T21:40:15Z

all 87 tests pass locally (2 are ignored)

brainscore/public_benchmarks.py

brainscore/benchmarks/freemanziemba2013.py

brainscore/benchmarks/majaj2015.py

brainscore/benchmarks/README.md

brainscore/benchmarks/__init__.py

brainscore/metrics/regression.py

brainscore/public_benchmarks.py

…in list

…#19) depends on brain-score/vision#175

@staticmethod

* initial commit * add unit tests * add README, LICENSE, .travis * move activations-related functions to this repo * use conda to install frameworks; remove python 3.7 due to pytorch incompatibility * source activate instead of conda * ignore tf-slim for testing * remove framework inference; fix keras and tensorflow pipeline * test grayscale and alpha images * add from_stimulus_set * use immutable tuples for normalization reference * fix stimuli_identifier default for storing activations * test for explicit activations * do not store StimulusSet activations by default * enable logits retrieval * require PCA to be hooked manually; add method to insert all attributes * store StimulusSets by default * add multilayer_mapping * use regression from brain-score * run tests on cpu * disable tf caching in order to cut down on memory usage travis tests fail due to OOM * attempt to obtain more memory by requiring sudo * treat logits as layer; create model directly from test provider * CenterCrop instead of Resize by default in pytorch * add option to disable multithreading * skip memory intense (>7.5 GB) tests in travis * download imagenet before travis script run * use MT_ environment variables for imagenet path * infer model class identifier rather than module * add brain commitment utility (LayerModel, ModelCommitment); remove regression * rename multilayer_mapping -> brain_transformation * allow multi-layer to region map; remove redundant data * re-use LayerModel in ModelCommitment * remove @staticmethod to allow sub-classing * remove unused variables * separate LayerScores from LayerSelection * add pixel-degree translation * store converted stimuli in consistent path; hook onto activations extractor also rename register_batch_hook -> register_batch_activations_hook * revert erroneously committed device assignment to cpu * keep awscli at 1.11.18 due to PyYAML dependency error * install libpython-dev to deal with awscli dependency error * --yes install libpython * property-forward identifier * attach PCA for layer selection; lazy layer commitment * add channel metadata * check for is_hooked before hooking * ignore six in awscli installation to avoid PyYAML error * fix filepath * fix merging of convolutional and fully-connected activations * remove ceiler stratification * add layer packaging status updates * separate layer-mapping and pixel-degrees * remove out-dated wrapper logits assignment * also separate unit tests for neural and stimuli * add behavioral mapping to ImageNet synsets * fix expected layer * add timeout multi-layer test; combine layer assemblies manually resolve #4 * update to public assemblies * add TemporalIgnore mapping * add ProbabilitiesMapping using logistic classifier from mschrimpf/brain-score@244f9c3 * tie LogitsBehavior to imagenet specifically since no fitting is done * use packaged behavioral data * use `approx` to avoid floating-point arithmetic mismatches * separate pytest flags; add private-access flag; add AWS access key * set AWS environment keys as global * pass time_bins for `brain_model.start_recording` * list installed package versions for diagnostics if the code doesn't work for the user, s/he can check travis for which versions did work * update neural benchmarks import * use pytest.mark instead of pytest.config https://docs.pytest.org/en/latest/deprecations.html#pytest-config-global * when possible, ignore local part of stimuli paths to align across machines * expand LayerMappedModel to multiple layers for single region (#10) * resize to target image size instead of center-crop (#11) * separate _build_extractor method to allow CORnet's temporal interjection (#12) * add tests that the package can be properly imported (#20) these tests can always run and do not require e.g. special memory * Add manifest file (#21) * FIx little bugs * Add manifest file to also install imagenet_classes.txt with pip * Revert old changes * use new public benchmarks from Brain-Score instead of self-built ones (#19) depends on #175 * allow custom benchmarks for mapping (#22) * add travis slack notifications (#23) * allow changing normalization params for torch preprocessing; allow multiple probabilities readout layers (#28) * allow changing normalize_mean/std for torchvision preprocessing; add ProbabilitiesMapping docs resolves https://github.com/brain-score/model-tools/pull/27/files * allow passing list of behavioral readout layers * fix kwargs name * Unhook methods and test fix (#26) * Fix hook problem * Fix failing test * Move submission check module to model-tools project Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Update tensorflow to V2 (#24) Change tensorflow packages * move stimuli-degree-resizing to brain-score; add BrainModel.visual_degrees; pytorch resize instead of center-crop (#9) * move stimuli-degree-resizing to brain-score; add BrainModel.visual_degrees * resize to target image size instead of center-crop center-cropping would e.g. take only 224x224 pixels from a 1800x1800 px image * fix benchmark import * update resize parameter passing to tuple otherwise, it would be resized to width only and maintain aspect ratio * update public_benchmarks import * default to 8 visual degrees instead of 10 * update layer selection with visual_degrees * update test to place stimuli on screen * add test for default visual degrees commitment * Update setup.py with missing dependencies (#29) * Add missing dependencies * Change the submission check modules (#30) * Change structure for submission checks. * Improve model checking Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * update for brain-score/brainio_collection#32 (#32) * reduce stimuli paths to unique set to avoid duplicate compute overhead (#33) * reduce stimuli paths to unique set to avoid duplicate compute overhead * output assembly for ImageNet task instead of synset list * Bugfixing (#35) * Fix hook problem * Fix failing test * Remove old test class * Tiny change for reloading. * Revert unhook functionality * Revert change * Move submission check module to model-tools project * Add missing dependencies * Add missing dependencies * Add missing dependencies * Change dependencies * Some test fixes * Some test fixes * Change structure for submission checks. * Improve model checking * add database tests * change tensorflow version * Update check model, it was wrong * Revert something * Revert * Change tf version * stimulus set identifier is now name * revert Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * fix coordinates on logits behavior (#36) * accept number_of_trials in look_at (and ignore) (#38) * accept number_of_trials in look_at (and ignore) * add default number_of_trials=1 * add number_of_trials to PreRunLayers.look_at (#39) * accept number_of_trials in look_at (and ignore) * add default number_of_trials=1 * add number_of_trials to PreRunLayers.look_at * Add fix for palletized images (#34) * Add fix for palletized images * Add tests for palletized image * Fix typo (.__.) * Add fix for keras version Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * do not include benchmark in storage identifier (#31) the selection_identifier is the correct identifier, benchmark is already the benchmark implementation * add check_submission/images; remove repeat_trials test (#43) * add check_submission/images to MANIFEST * remove repeat_trials * Visual Transformer compatibility (#44) * Adjusted to accomodate Transformer with 1D embedding * added transformer model test * added transformer to model_layers * added transformer meta test * changed transformer tests to contain dummy model * added 1k output layer (s. logit) to transformer dummy * revert FC coord_names to having channel, channel_x, channel_y where the latter are filled with nan values * corrected layer naming Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * added comment explaining KeyError when changing flatten_coord_names (#45) * remove redundant pandas dependency already covered through brainio_base dependencies * upgrade to python 3.7 (#48) discontinue python 3.6, together with brain-score/brainio_base#16 * pass region-layer mapping in ModelCommitment constructor (#51) * pass region-layer mapping in ModelCommitment constructor this simplifies the commitment of layers and prepares for the implementation of stochastic models * fix unit tests and submission check * fix unit tests * re-compute activations.test___init__.test_exact_activations[alexnet-rgb-{None,1000}] and save to netcdf * migrate brain_transformation.test_behavior[alexnet,resnet34,resnet18].pkl to netcdf migrated using ``` import pickle import xarray as xr from pathlib import Path from brainio_base.assemblies import walk_coords for path in [f'brain_transformation/identifier={model},stimuli_identifier=objectome-240.pkl' for model in ('alexnet', 'resnet34', 'resnet18')]: path = Path(path) f = open(path, 'rb') d = pickle.load(f) a = d['activations'] if 'activations' in d else d['data'] path_nc = path.parent / (path.stem + '.nc') a = xr.DataArray(a) a = a.reset_index([dim for dim in a.dims if len(list(walk_coords(a[dim]))) > 1]) a.to_netcdf(path_nc) print(f"saved {path_nc}") ``` * fix precomputed activations migrate from pkl files directly instead of recomputing like before * fix forwarding the `number_of_trials parameter` (#52) * fix time_bin naming (level_0/1 -> start/end) (#53) * fix time_bin naming (level_0/1 -> start/end) * note xarray bug leading to merge mis-naming * link result_caching directly to brain-score org * provide BrainModel identifier from neural and behavioral components; do away with trials (#40) * provide identifier as BrainModel 8e13735 * delete repeat_trials unit test following #235 * Fixed model-tools dependency error (#54) * Added AlexNet from examples into base_model.py to practice submission * Fixed model-template dependency error (now points to Brain-Score repo instead of Martin's) * Removed Dead code, rolled back tensorflow version to ==1.15 * Use BrainIO (#55) * Update to use brainio_core package. * Name change. * Remove brainio-core. * Force Jenkins re-run. * Trigger re-run on Jenkins. * remove check for ImageNet task (#60) * remove check for ImageNet task this will otherwise throw errors where the last layer has != 1000 neuroids (treated as logits). This has no effect on brain benchmarks and should imo thus not be a relevant check * convert path to str, more instructive logs * fix TestI2N.test_model sub-classing * Wordnet decoder for Geirhos2021 benchmarks (#61) counterpart to #323 * add wordnet_functions from https://github.com/bethgelab/model-vs-human/blob/745046c4d82ff884af618756bd6a5f47b6f36c45/modelvshuman/helper/wordnet_functions.py * clean up code but realizing that `is_hypernym` function is undefined * implement logits to label for Geirhos et al. 2021 * use named axes for softmax * add unit test for choice labels * seed model * seed custom model in init * handle dimensions more flexibly * use `.get_stimulus(stimulus_id)` (#62) * use `.get_stimulus(stimulus_id)` instead of `.get_image(image_id)` * downgrade protobuf for keras version attempting to fix keras import errors in http://braintree.mit.edu:8080/job/unittest_model_tools/132/ * rename image_id -> stimulus_id * rename image_paths -> stimulus_paths * add legacy support for LogitsBehavior class (#63) * add legacy support for LogitsBehavior class * fix `mock_stimulus_set` call * make flatten coordinates more generic; add forward_kwargs option; image tensors (#64) * fix tutorial link (#347) * add documentation for testing on precomputed features (#355) * add documentation for testing on precomputed features * re-add erroneously deleted content * fix retrieval of model predictions * make files public using predefined ACL * Add Imagenet index mappings for Zhu 2019 and Baker 2022 (#69) * Updated baker 2022 * Accuracy metric works, Engineering benchmark added * Finalized Baker2022 Benchmark * Added Zhu/Baker imagenet indices that do not exist. * Update model_tools/brain_transformation/behavior.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Update model_tools/brain_transformation/behavior.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Update model_tools/brain_transformation/behavior.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Update model_tools/brain_transformation/behavior.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> --------- Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * include engineering benchmarks in standard tests again (#361) * include engineering benchmarks in standard tests again also fix typo * fix pool keys access * use new jenkins_id instead of id for submissions (#359) * added support for jenkins_id write * removed redundant comment * added DB models specification changes * Make sure directories created use jenkins_id and not id (#364) * Hotfix: add jenkins_id to test_submission.py's tests (#366) * Hotfix: add jenkins_id to test_submission.py's tests * Fixed some tests * Fixed more tests, another submission issue * Islam2021 (#360) * add Islam2021 packaging file * add Dimensionality metric * add Islam2021Dimensionality benchmark * add islam2021 benchmarks to benchmark pool * clean packaging file * add Islam2021 benchmark tests * add islam2021 stimuli test * Fix typo in test_islam2021.py * Add lookup.csv entry for neil.Islam2021 * Correct benchmark name in brainscore/benchmarks/__init__.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Change recorded bins to standard ones Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Correct identifier of benchmark in islam2021.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Fix parent benchmark name islam2021.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Fix stimulus name in lookup.csv Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Fix islam2021 test names in tests/test_benchmarks/test___init__.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * fix islam2021 stimuli test name in tests/test_stimuli.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Fix islam2021 stimuli name in tests/test_stimuli.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Fix benchmark_pool keys in test_islam2021.py * Fix stimulus name * Add private_access to islam tests * Fix Islam2021 stimuli name in test_stimuli * add private_access to test_islam2021 in test_stimuli --------- Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Removed faulty model tools import line, removed other unused lines as well (#367) * Add domain to new benchmark creation code for vision. (#368) * Hotfix: Moved Islam to experimental pool, added domain to models.py (#369) * Moved Islam to experimental pool, added domain to models.py * Removed Islam2021 from engineering test pool * let travis run private and public tests separately (#371) previously, when private tests were run, travis ran private _and_ public tests, and then in a separate run only the public tests. This changes it so that _only_ private and _only_ public tests are run separately. * remove outdated lookup_source (#372) per @jjpr's advice * add instructions for adding users to AWS (#373) * document Python = 3.7 (instead of >= 3.7) for TF compatibility (#375) * more extensively describe tasks and intended outputs (#374) following the detailed description in [language](https://github.com/brain-score/language/blob/main/brainscore_language/artificial_subject.py) * reorganize contents into subdirectories * reorganize model_helpers for integration * update imports to brainscore_vision and model_helpers * fix loading of public benchmarks --------- Co-authored-by: franzigeiger <32977549+franzigeiger@users.noreply.github.com> Co-authored-by: Sachi Sanghavi <4976443+stothe2@users.noreply.github.com> Co-authored-by: pmcgrath249 <pmcgrath249@gmail.com> Co-authored-by: Michael Ferguson <38020092+mike-ferguson@users.noreply.github.com> Co-authored-by: jjpr-mit <jjpr@mit.edu> Co-authored-by: Michael Ferguson <mferg@mit.edu> Co-authored-by: Tiago Gaspar Oliveira <tiagojgroliveira@tecnico.ulisboa.pt> Co-authored-by: SusanWYS <127456911+SusanWYS@users.noreply.github.com>

@staticmethod

* initial commit * add unit tests * add README, LICENSE, .travis * move activations-related functions to this repo * use conda to install frameworks; remove python 3.7 due to pytorch incompatibility * source activate instead of conda * ignore tf-slim for testing * remove framework inference; fix keras and tensorflow pipeline * test grayscale and alpha images * add from_stimulus_set * use immutable tuples for normalization reference * fix stimuli_identifier default for storing activations * test for explicit activations * do not store StimulusSet activations by default * enable logits retrieval * require PCA to be hooked manually; add method to insert all attributes * store StimulusSets by default * add multilayer_mapping * use regression from brain-score * run tests on cpu * disable tf caching in order to cut down on memory usage travis tests fail due to OOM * attempt to obtain more memory by requiring sudo * treat logits as layer; create model directly from test provider * CenterCrop instead of Resize by default in pytorch * add option to disable multithreading * skip memory intense (>7.5 GB) tests in travis * download imagenet before travis script run * use MT_ environment variables for imagenet path * infer model class identifier rather than module * add brain commitment utility (LayerModel, ModelCommitment); remove regression * rename multilayer_mapping -> brain_transformation * allow multi-layer to region map; remove redundant data * re-use LayerModel in ModelCommitment * remove @staticmethod to allow sub-classing * remove unused variables * separate LayerScores from LayerSelection * add pixel-degree translation * store converted stimuli in consistent path; hook onto activations extractor also rename register_batch_hook -> register_batch_activations_hook * revert erroneously committed device assignment to cpu * keep awscli at 1.11.18 due to PyYAML dependency error * install libpython-dev to deal with awscli dependency error * --yes install libpython * property-forward identifier * attach PCA for layer selection; lazy layer commitment * add channel metadata * check for is_hooked before hooking * ignore six in awscli installation to avoid PyYAML error * fix filepath * fix merging of convolutional and fully-connected activations * remove ceiler stratification * add layer packaging status updates * separate layer-mapping and pixel-degrees * remove out-dated wrapper logits assignment * also separate unit tests for neural and stimuli * add behavioral mapping to ImageNet synsets * fix expected layer * add timeout multi-layer test; combine layer assemblies manually resolve #4 * update to public assemblies * add TemporalIgnore mapping * add ProbabilitiesMapping using logistic classifier from mschrimpf/brain-score@244f9c3 * tie LogitsBehavior to imagenet specifically since no fitting is done * use packaged behavioral data * use `approx` to avoid floating-point arithmetic mismatches * separate pytest flags; add private-access flag; add AWS access key * set AWS environment keys as global * pass time_bins for `brain_model.start_recording` * list installed package versions for diagnostics if the code doesn't work for the user, s/he can check travis for which versions did work * update neural benchmarks import * use pytest.mark instead of pytest.config https://docs.pytest.org/en/latest/deprecations.html#pytest-config-global * when possible, ignore local part of stimuli paths to align across machines * expand LayerMappedModel to multiple layers for single region (#10) * resize to target image size instead of center-crop (#11) * separate _build_extractor method to allow CORnet's temporal interjection (#12) * add tests that the package can be properly imported (#20) these tests can always run and do not require e.g. special memory * Add manifest file (#21) * FIx little bugs * Add manifest file to also install imagenet_classes.txt with pip * Revert old changes * use new public benchmarks from Brain-Score instead of self-built ones (#19) depends on #175 * allow custom benchmarks for mapping (#22) * add travis slack notifications (#23) * allow changing normalization params for torch preprocessing; allow multiple probabilities readout layers (#28) * allow changing normalize_mean/std for torchvision preprocessing; add ProbabilitiesMapping docs resolves https://github.com/brain-score/model-tools/pull/27/files * allow passing list of behavioral readout layers * fix kwargs name * Unhook methods and test fix (#26) * Fix hook problem * Fix failing test * Move submission check module to model-tools project Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Update tensorflow to V2 (#24) Change tensorflow packages * move stimuli-degree-resizing to brain-score; add BrainModel.visual_degrees; pytorch resize instead of center-crop (#9) * move stimuli-degree-resizing to brain-score; add BrainModel.visual_degrees * resize to target image size instead of center-crop center-cropping would e.g. take only 224x224 pixels from a 1800x1800 px image * fix benchmark import * update resize parameter passing to tuple otherwise, it would be resized to width only and maintain aspect ratio * update public_benchmarks import * default to 8 visual degrees instead of 10 * update layer selection with visual_degrees * update test to place stimuli on screen * add test for default visual degrees commitment * Update setup.py with missing dependencies (#29) * Add missing dependencies * Change the submission check modules (#30) * Change structure for submission checks. * Improve model checking Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * update for brain-score/brainio_collection#32 (#32) * reduce stimuli paths to unique set to avoid duplicate compute overhead (#33) * reduce stimuli paths to unique set to avoid duplicate compute overhead * output assembly for ImageNet task instead of synset list * Bugfixing (#35) * Fix hook problem * Fix failing test * Remove old test class * Tiny change for reloading. * Revert unhook functionality * Revert change * Move submission check module to model-tools project * Add missing dependencies * Add missing dependencies * Add missing dependencies * Change dependencies * Some test fixes * Some test fixes * Change structure for submission checks. * Improve model checking * add database tests * change tensorflow version * Update check model, it was wrong * Revert something * Revert * Change tf version * stimulus set identifier is now name * revert Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * fix coordinates on logits behavior (#36) * accept number_of_trials in look_at (and ignore) (#38) * accept number_of_trials in look_at (and ignore) * add default number_of_trials=1 * add number_of_trials to PreRunLayers.look_at (#39) * accept number_of_trials in look_at (and ignore) * add default number_of_trials=1 * add number_of_trials to PreRunLayers.look_at * Add fix for palletized images (#34) * Add fix for palletized images * Add tests for palletized image * Fix typo (.__.) * Add fix for keras version Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * do not include benchmark in storage identifier (#31) the selection_identifier is the correct identifier, benchmark is already the benchmark implementation * add check_submission/images; remove repeat_trials test (#43) * add check_submission/images to MANIFEST * remove repeat_trials * Visual Transformer compatibility (#44) * Adjusted to accomodate Transformer with 1D embedding * added transformer model test * added transformer to model_layers * added transformer meta test * changed transformer tests to contain dummy model * added 1k output layer (s. logit) to transformer dummy * revert FC coord_names to having channel, channel_x, channel_y where the latter are filled with nan values * corrected layer naming Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * added comment explaining KeyError when changing flatten_coord_names (#45) * remove redundant pandas dependency already covered through brainio_base dependencies * upgrade to python 3.7 (#48) discontinue python 3.6, together with brain-score/brainio_base#16 * pass region-layer mapping in ModelCommitment constructor (#51) * pass region-layer mapping in ModelCommitment constructor this simplifies the commitment of layers and prepares for the implementation of stochastic models * fix unit tests and submission check * fix unit tests * re-compute activations.test___init__.test_exact_activations[alexnet-rgb-{None,1000}] and save to netcdf * migrate brain_transformation.test_behavior[alexnet,resnet34,resnet18].pkl to netcdf migrated using ``` import pickle import xarray as xr from pathlib import Path from brainio_base.assemblies import walk_coords for path in [f'brain_transformation/identifier={model},stimuli_identifier=objectome-240.pkl' for model in ('alexnet', 'resnet34', 'resnet18')]: path = Path(path) f = open(path, 'rb') d = pickle.load(f) a = d['activations'] if 'activations' in d else d['data'] path_nc = path.parent / (path.stem + '.nc') a = xr.DataArray(a) a = a.reset_index([dim for dim in a.dims if len(list(walk_coords(a[dim]))) > 1]) a.to_netcdf(path_nc) print(f"saved {path_nc}") ``` * fix precomputed activations migrate from pkl files directly instead of recomputing like before * fix forwarding the `number_of_trials parameter` (#52) * fix time_bin naming (level_0/1 -> start/end) (#53) * fix time_bin naming (level_0/1 -> start/end) * note xarray bug leading to merge mis-naming * link result_caching directly to brain-score org * provide BrainModel identifier from neural and behavioral components; do away with trials (#40) * provide identifier as BrainModel 8e13735 * delete repeat_trials unit test following #235 * Fixed model-tools dependency error (#54) * Added AlexNet from examples into base_model.py to practice submission * Fixed model-template dependency error (now points to Brain-Score repo instead of Martin's) * Removed Dead code, rolled back tensorflow version to ==1.15 * Use BrainIO (#55) * Update to use brainio_core package. * Name change. * Remove brainio-core. * Force Jenkins re-run. * Trigger re-run on Jenkins. * remove check for ImageNet task (#60) * remove check for ImageNet task this will otherwise throw errors where the last layer has != 1000 neuroids (treated as logits). This has no effect on brain benchmarks and should imo thus not be a relevant check * convert path to str, more instructive logs * fix TestI2N.test_model sub-classing * Wordnet decoder for Geirhos2021 benchmarks (#61) counterpart to #323 * add wordnet_functions from https://github.com/bethgelab/model-vs-human/blob/745046c4d82ff884af618756bd6a5f47b6f36c45/modelvshuman/helper/wordnet_functions.py * clean up code but realizing that `is_hypernym` function is undefined * implement logits to label for Geirhos et al. 2021 * use named axes for softmax * add unit test for choice labels * seed model * seed custom model in init * handle dimensions more flexibly * use `.get_stimulus(stimulus_id)` (#62) * use `.get_stimulus(stimulus_id)` instead of `.get_image(image_id)` * downgrade protobuf for keras version attempting to fix keras import errors in http://braintree.mit.edu:8080/job/unittest_model_tools/132/ * rename image_id -> stimulus_id * rename image_paths -> stimulus_paths * add legacy support for LogitsBehavior class (#63) * add legacy support for LogitsBehavior class * fix `mock_stimulus_set` call * make flatten coordinates more generic; add forward_kwargs option; image tensors (#64) * fix tutorial link (#347) * add documentation for testing on precomputed features (#355) * add documentation for testing on precomputed features * re-add erroneously deleted content * fix retrieval of model predictions * make files public using predefined ACL * Add Imagenet index mappings for Zhu 2019 and Baker 2022 (#69) * Updated baker 2022 * Accuracy metric works, Engineering benchmark added * Finalized Baker2022 Benchmark * Added Zhu/Baker imagenet indices that do not exist. * Update model_tools/brain_transformation/behavior.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Update model_tools/brain_transformation/behavior.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Update model_tools/brain_transformation/behavior.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Update model_tools/brain_transformation/behavior.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> --------- Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * include engineering benchmarks in standard tests again (#361) * include engineering benchmarks in standard tests again also fix typo * fix pool keys access * use new jenkins_id instead of id for submissions (#359) * added support for jenkins_id write * removed redundant comment * added DB models specification changes * Make sure directories created use jenkins_id and not id (#364) * Hotfix: add jenkins_id to test_submission.py's tests (#366) * Hotfix: add jenkins_id to test_submission.py's tests * Fixed some tests * Fixed more tests, another submission issue * Islam2021 (#360) * add Islam2021 packaging file * add Dimensionality metric * add Islam2021Dimensionality benchmark * add islam2021 benchmarks to benchmark pool * clean packaging file * add Islam2021 benchmark tests * add islam2021 stimuli test * Fix typo in test_islam2021.py * Add lookup.csv entry for neil.Islam2021 * Correct benchmark name in brainscore/benchmarks/__init__.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Change recorded bins to standard ones Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Correct identifier of benchmark in islam2021.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Fix parent benchmark name islam2021.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Fix stimulus name in lookup.csv Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Fix islam2021 test names in tests/test_benchmarks/test___init__.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * fix islam2021 stimuli test name in tests/test_stimuli.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Fix islam2021 stimuli name in tests/test_stimuli.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Fix benchmark_pool keys in test_islam2021.py * Fix stimulus name * Add private_access to islam tests * Fix Islam2021 stimuli name in test_stimuli * add private_access to test_islam2021 in test_stimuli --------- Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Removed faulty model tools import line, removed other unused lines as well (#367) * Add domain to new benchmark creation code for vision. (#368) * Hotfix: Moved Islam to experimental pool, added domain to models.py (#369) * Moved Islam to experimental pool, added domain to models.py * Removed Islam2021 from engineering test pool * let travis run private and public tests separately (#371) previously, when private tests were run, travis ran private _and_ public tests, and then in a separate run only the public tests. This changes it so that _only_ private and _only_ public tests are run separately. * remove outdated lookup_source (#372) per @jjpr's advice * add instructions for adding users to AWS (#373) * document Python = 3.7 (instead of >= 3.7) for TF compatibility (#375) * more extensively describe tasks and intended outputs (#374) following the detailed description in [language](https://github.com/brain-score/language/blob/main/brainscore_language/artificial_subject.py) * reorganize contents into subdirectories * reorganize model_helpers for integration * add missing access parameter for MajajHong2015 --------- Co-authored-by: franzigeiger <32977549+franzigeiger@users.noreply.github.com> Co-authored-by: Sachi Sanghavi <4976443+stothe2@users.noreply.github.com> Co-authored-by: pmcgrath249 <pmcgrath249@gmail.com> Co-authored-by: Michael Ferguson <38020092+mike-ferguson@users.noreply.github.com> Co-authored-by: jjpr-mit <jjpr@mit.edu> Co-authored-by: Michael Ferguson <mferg@mit.edu> Co-authored-by: Tiago Gaspar Oliveira <tiagojgroliveira@tecnico.ulisboa.pt> Co-authored-by: SusanWYS <127456911+SusanWYS@users.noreply.github.com>

@staticmethod

Releases Brain-Score 2.0 which uses a plugin system to manage data, metrics, benchmarks, and models. - rename package to `brainscore_vision` - use `pyproject.toml` instead of `setup.py` (#383) - refactor data packaging and benchmarks to plugin format (#353, #397) - refactor metrics to plugin format; keep only overall result in `Score` object and `error` in attributes (#391) - integrate model_tools as model_helpers (#381) - add alexnet and pixel models (#408, #412) - automatically merge and score plugins after tests pass (#394, #414, #428) - validate tests and merge `master` (#393, #403, #424, #429) ----------------------------- detailed commits: * rename package brainscore -> brainscore_vision * setup plugin registries (#349) * add brain-score core dependency * remote top-level `get_stimulus_set` import * move temporary wontfix tests into todo package * add registry import; remove lab prefix * move `test_submission` to `todotests` * add pytest configuration to ignore `todotests` * move majajhong2015 ceiling test inside plugin * move tests for assemblies, stimuli, and examples into todo * fix assembly/stimulus_set loading use brainio.get_{assembly,stimulus_set} instead of brainscore_vision * use brainscore_vision entrypoint * unify import (nit) Co-authored-by: kvfairchild <kvg0@mit.edu> Co-authored-by: kvfairchild <kvg0@mit.edu> * Sw/Restructuring vision (#353) * Added benchmark and data folders (with corresponding init, benchmark/data_packaging, test) for sanghavi benchmarks, began to add others (geirhos2021, hermann2020, kar2019). Created test_helper to reduce code duplication within tests. * Added data_packaging.py to sanghavi, sanhavijozwik, and sanghavimurty. Moved environment.yml (removed the name and prefix) and requirements.txt (need to see what brainio url should be changed to) as well. * removed old sanghavi benchmarks * removed lazy load from sanghavi benchmarks, added marques2020_cavanaugh benchmarks, added test_benchmark_registry to sanghavi benchmarks * added marques2020_devalois1982a benchmark and data packaging * added marques2020 benchmarks and data packacking, removed sanghavi and devalois1982 a and b from test___init__.py, removed finished (kind of) benchmark scripts, removed benchmarks from benchmark __init__.py * Reformatted current data directories to combine inits, separate packagings, and combine tests * completed kar2019 benchmark, updated all tests of past benchmarks * reformatted rajalingham2018 and rajalingham2020 benchmarks, created their data folders and contents * completed geirhos, cadena benchmarks and data packaging * reformatted sanghavi and marques benchmarks * created benchmarks for imagenet, imagenet_c, objectnet, created data packaging for those as well as bashivankar2019 and kuzovkin2018, updated test_helper with parameter types * created benchmark helpers, updated tests of several benchmarks, moved several data packaging files to their respective directories * Created benchmark and data packaging folders for each benchmark, moved corresponding benchmarks and data packaging to each * Created benchmark and data packaging folders for each benchmark, moved corresponding benchmarks and data packaging to each * updated imports of benchmark helpers, renamed benchmarks, updated imports of benchmark inits * created david2004 data packaging, barbumay2019, majaj2015, deng2009, imagenetslim15000, seibert2019, rust2012, * Added an s3 util file that allows for assemblies and stimuli sets to be loaded into data registry, began reformatting inits with new functionality, began filling in parameters from lookup csv * reformatted inits of deng, imagenet, kars, kuzovkinm marques, rajalinghams, rust, sanghavi, seibert * updated inits to with load_assembly_from_s3() and load_stimulus_set_from_s3() functions, filled these out with corresponding sha1's, updated tests, moved files from packaging and test directories into correct new directories, continued reformatting * updated buckets of all assemblies and stimulus sets, removed for loop of geirhos to allow for string parsing, cleared out test directory, cleared out data packagnig directory * created data helper, added all version ids to all assembly/stimulus set inits, changed to stimulus_set_registry in inits, * Created BIBTEXs for all missing in data packaging, went through and fixed import errors/ other errors * Deleted packaging notebooks, deleted other notebooks * * Remove benchmark pools from brainscore_vision/__init__.py * take benchmark pools out of evaluation.py (although some left for Martin to decide) * changed path name in helper.py * remove .idea * move data helpers out of data directory * unify benchmark and metric definition; delete mask benchmarks * include stimulus set in assembly * move s3 into data_helpers/ * name lookup helpers as legacy * updated all version IDs manually * * removed public_benchmark_helper.py * removed unnecessary step in geirhos benchmark __init__.py * * added stimulus set loaders to all data registries * * reverted buckets in data packaging notebooks/.pys back to old version * make merging simulus set meta optional * import stimulus_set plugin * type hint stimulus_set_registry * specify `stimulus_set` registry prefix following brain-score/core#25 --------- Co-authored-by: Martin Schrimpf <m4rtinsch@gmail.com> * use pyproject.toml instead of setup.py (#383) * use pyproject instead of setup * explicitly set setuptools py-modules * add networkx dependency again * add scikit-learn dependency again * update screen gray tests: lossless png and more flexible amount gray * integrate model helpers (formerly model_tools) (#381) * initial commit * add unit tests * add README, LICENSE, .travis * move activations-related functions to this repo * use conda to install frameworks; remove python 3.7 due to pytorch incompatibility * source activate instead of conda * ignore tf-slim for testing * remove framework inference; fix keras and tensorflow pipeline * test grayscale and alpha images * add from_stimulus_set * use immutable tuples for normalization reference * fix stimuli_identifier default for storing activations * test for explicit activations * do not store StimulusSet activations by default * enable logits retrieval * require PCA to be hooked manually; add method to insert all attributes * store StimulusSets by default * add multilayer_mapping * use regression from brain-score * run tests on cpu * disable tf caching in order to cut down on memory usage travis tests fail due to OOM * attempt to obtain more memory by requiring sudo * treat logits as layer; create model directly from test provider * CenterCrop instead of Resize by default in pytorch * add option to disable multithreading * skip memory intense (>7.5 GB) tests in travis * download imagenet before travis script run * use MT_ environment variables for imagenet path * infer model class identifier rather than module * add brain commitment utility (LayerModel, ModelCommitment); remove regression * rename multilayer_mapping -> brain_transformation * allow multi-layer to region map; remove redundant data * re-use LayerModel in ModelCommitment * remove @staticmethod to allow sub-classing * remove unused variables * separate LayerScores from LayerSelection * add pixel-degree translation * store converted stimuli in consistent path; hook onto activations extractor also rename register_batch_hook -> register_batch_activations_hook * revert erroneously committed device assignment to cpu * keep awscli at 1.11.18 due to PyYAML dependency error * install libpython-dev to deal with awscli dependency error * --yes install libpython * property-forward identifier * attach PCA for layer selection; lazy layer commitment * add channel metadata * check for is_hooked before hooking * ignore six in awscli installation to avoid PyYAML error * fix filepath * fix merging of convolutional and fully-connected activations * remove ceiler stratification * add layer packaging status updates * separate layer-mapping and pixel-degrees * remove out-dated wrapper logits assignment * also separate unit tests for neural and stimuli * add behavioral mapping to ImageNet synsets * fix expected layer * add timeout multi-layer test; combine layer assemblies manually resolve #4 * update to public assemblies * add TemporalIgnore mapping * add ProbabilitiesMapping using logistic classifier from mschrimpf/brain-score@244f9c3 * tie LogitsBehavior to imagenet specifically since no fitting is done * use packaged behavioral data * use `approx` to avoid floating-point arithmetic mismatches * separate pytest flags; add private-access flag; add AWS access key * set AWS environment keys as global * pass time_bins for `brain_model.start_recording` * list installed package versions for diagnostics if the code doesn't work for the user, s/he can check travis for which versions did work * update neural benchmarks import * use pytest.mark instead of pytest.config https://docs.pytest.org/en/latest/deprecations.html#pytest-config-global * when possible, ignore local part of stimuli paths to align across machines * expand LayerMappedModel to multiple layers for single region (#10) * resize to target image size instead of center-crop (#11) * separate _build_extractor method to allow CORnet's temporal interjection (#12) * add tests that the package can be properly imported (#20) these tests can always run and do not require e.g. special memory * Add manifest file (#21) * FIx little bugs * Add manifest file to also install imagenet_classes.txt with pip * Revert old changes * use new public benchmarks from Brain-Score instead of self-built ones (#19) depends on #175 * allow custom benchmarks for mapping (#22) * add travis slack notifications (#23) * allow changing normalization params for torch preprocessing; allow multiple probabilities readout layers (#28) * allow changing normalize_mean/std for torchvision preprocessing; add ProbabilitiesMapping docs resolves https://github.com/brain-score/model-tools/pull/27/files * allow passing list of behavioral readout layers * fix kwargs name * Unhook methods and test fix (#26) * Fix hook problem * Fix failing test * Move submission check module to model-tools project Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Update tensorflow to V2 (#24) Change tensorflow packages * move stimuli-degree-resizing to brain-score; add BrainModel.visual_degrees; pytorch resize instead of center-crop (#9) * move stimuli-degree-resizing to brain-score; add BrainModel.visual_degrees * resize to target image size instead of center-crop center-cropping would e.g. take only 224x224 pixels from a 1800x1800 px image * fix benchmark import * update resize parameter passing to tuple otherwise, it would be resized to width only and maintain aspect ratio * update public_benchmarks import * default to 8 visual degrees instead of 10 * update layer selection with visual_degrees * update test to place stimuli on screen * add test for default visual degrees commitment * Update setup.py with missing dependencies (#29) * Add missing dependencies * Change the submission check modules (#30) * Change structure for submission checks. * Improve model checking Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * update for brain-score/brainio_collection#32 (#32) * reduce stimuli paths to unique set to avoid duplicate compute overhead (#33) * reduce stimuli paths to unique set to avoid duplicate compute overhead * output assembly for ImageNet task instead of synset list * Bugfixing (#35) * Fix hook problem * Fix failing test * Remove old test class * Tiny change for reloading. * Revert unhook functionality * Revert change * Move submission check module to model-tools project * Add missing dependencies * Add missing dependencies * Add missing dependencies * Change dependencies * Some test fixes * Some test fixes * Change structure for submission checks. * Improve model checking * add database tests * change tensorflow version * Update check model, it was wrong * Revert something * Revert * Change tf version * stimulus set identifier is now name * revert Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * fix coordinates on logits behavior (#36) * accept number_of_trials in look_at (and ignore) (#38) * accept number_of_trials in look_at (and ignore) * add default number_of_trials=1 * add number_of_trials to PreRunLayers.look_at (#39) * accept number_of_trials in look_at (and ignore) * add default number_of_trials=1 * add number_of_trials to PreRunLayers.look_at * Add fix for palletized images (#34) * Add fix for palletized images * Add tests for palletized image * Fix typo (.__.) * Add fix for keras version Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * do not include benchmark in storage identifier (#31) the selection_identifier is the correct identifier, benchmark is already the benchmark implementation * add check_submission/images; remove repeat_trials test (#43) * add check_submission/images to MANIFEST * remove repeat_trials * Visual Transformer compatibility (#44) * Adjusted to accomodate Transformer with 1D embedding * added transformer model test * added transformer to model_layers * added transformer meta test * changed transformer tests to contain dummy model * added 1k output layer (s. logit) to transformer dummy * revert FC coord_names to having channel, channel_x, channel_y where the latter are filled with nan values * corrected layer naming Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * added comment explaining KeyError when changing flatten_coord_names (#45) * remove redundant pandas dependency already covered through brainio_base dependencies * upgrade to python 3.7 (#48) discontinue python 3.6, together with brain-score/brainio_base#16 * pass region-layer mapping in ModelCommitment constructor (#51) * pass region-layer mapping in ModelCommitment constructor this simplifies the commitment of layers and prepares for the implementation of stochastic models * fix unit tests and submission check * fix unit tests * re-compute activations.test___init__.test_exact_activations[alexnet-rgb-{None,1000}] and save to netcdf * migrate brain_transformation.test_behavior[alexnet,resnet34,resnet18].pkl to netcdf migrated using ``` import pickle import xarray as xr from pathlib import Path from brainio_base.assemblies import walk_coords for path in [f'brain_transformation/identifier={model},stimuli_identifier=objectome-240.pkl' for model in ('alexnet', 'resnet34', 'resnet18')]: path = Path(path) f = open(path, 'rb') d = pickle.load(f) a = d['activations'] if 'activations' in d else d['data'] path_nc = path.parent / (path.stem + '.nc') a = xr.DataArray(a) a = a.reset_index([dim for dim in a.dims if len(list(walk_coords(a[dim]))) > 1]) a.to_netcdf(path_nc) print(f"saved {path_nc}") ``` * fix precomputed activations migrate from pkl files directly instead of recomputing like before * fix forwarding the `number_of_trials parameter` (#52) * fix time_bin naming (level_0/1 -> start/end) (#53) * fix time_bin naming (level_0/1 -> start/end) * note xarray bug leading to merge mis-naming * link result_caching directly to brain-score org * provide BrainModel identifier from neural and behavioral components; do away with trials (#40) * provide identifier as BrainModel 8e13735 * delete repeat_trials unit test following #235 * Fixed model-tools dependency error (#54) * Added AlexNet from examples into base_model.py to practice submission * Fixed model-template dependency error (now points to Brain-Score repo instead of Martin's) * Removed Dead code, rolled back tensorflow version to ==1.15 * Use BrainIO (#55) * Update to use brainio_core package. * Name change. * Remove brainio-core. * Force Jenkins re-run. * Trigger re-run on Jenkins. * remove check for ImageNet task (#60) * remove check for ImageNet task this will otherwise throw errors where the last layer has != 1000 neuroids (treated as logits). This has no effect on brain benchmarks and should imo thus not be a relevant check * convert path to str, more instructive logs * fix TestI2N.test_model sub-classing * Wordnet decoder for Geirhos2021 benchmarks (#61) counterpart to #323 * add wordnet_functions from https://github.com/bethgelab/model-vs-human/blob/745046c4d82ff884af618756bd6a5f47b6f36c45/modelvshuman/helper/wordnet_functions.py * clean up code but realizing that `is_hypernym` function is undefined * implement logits to label for Geirhos et al. 2021 * use named axes for softmax * add unit test for choice labels * seed model * seed custom model in init * handle dimensions more flexibly * use `.get_stimulus(stimulus_id)` (#62) * use `.get_stimulus(stimulus_id)` instead of `.get_image(image_id)` * downgrade protobuf for keras version attempting to fix keras import errors in http://braintree.mit.edu:8080/job/unittest_model_tools/132/ * rename image_id -> stimulus_id * rename image_paths -> stimulus_paths * add legacy support for LogitsBehavior class (#63) * add legacy support for LogitsBehavior class * fix `mock_stimulus_set` call * make flatten coordinates more generic; add forward_kwargs option; image tensors (#64) * fix tutorial link (#347) * add documentation for testing on precomputed features (#355) * add documentation for testing on precomputed features * re-add erroneously deleted content * fix retrieval of model predictions * make files public using predefined ACL * Add Imagenet index mappings for Zhu 2019 and Baker 2022 (#69) * Updated baker 2022 * Accuracy metric works, Engineering benchmark added * Finalized Baker2022 Benchmark * Added Zhu/Baker imagenet indices that do not exist. * Update model_tools/brain_transformation/behavior.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Update model_tools/brain_transformation/behavior.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Update model_tools/brain_transformation/behavior.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Update model_tools/brain_transformation/behavior.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> --------- Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * include engineering benchmarks in standard tests again (#361) * include engineering benchmarks in standard tests again also fix typo * fix pool keys access * use new jenkins_id instead of id for submissions (#359) * added support for jenkins_id write * removed redundant comment * added DB models specification changes * Make sure directories created use jenkins_id and not id (#364) * Hotfix: add jenkins_id to test_submission.py's tests (#366) * Hotfix: add jenkins_id to test_submission.py's tests * Fixed some tests * Fixed more tests, another submission issue * Islam2021 (#360) * add Islam2021 packaging file * add Dimensionality metric * add Islam2021Dimensionality benchmark * add islam2021 benchmarks to benchmark pool * clean packaging file * add Islam2021 benchmark tests * add islam2021 stimuli test * Fix typo in test_islam2021.py * Add lookup.csv entry for neil.Islam2021 * Correct benchmark name in brainscore/benchmarks/__init__.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Change recorded bins to standard ones Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Correct identifier of benchmark in islam2021.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Fix parent benchmark name islam2021.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Fix stimulus name in lookup.csv Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Fix islam2021 test names in tests/test_benchmarks/test___init__.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * fix islam2021 stimuli test name in tests/test_stimuli.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Fix islam2021 stimuli name in tests/test_stimuli.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Fix benchmark_pool keys in test_islam2021.py * Fix stimulus name * Add private_access to islam tests * Fix Islam2021 stimuli name in test_stimuli * add private_access to test_islam2021 in test_stimuli --------- Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Removed faulty model tools import line, removed other unused lines as well (#367) * Add domain to new benchmark creation code for vision. (#368) * Hotfix: Moved Islam to experimental pool, added domain to models.py (#369) * Moved Islam to experimental pool, added domain to models.py * Removed Islam2021 from engineering test pool * let travis run private and public tests separately (#371) previously, when private tests were run, travis ran private _and_ public tests, and then in a separate run only the public tests. This changes it so that _only_ private and _only_ public tests are run separately. * remove outdated lookup_source (#372) per @jjpr's advice * add instructions for adding users to AWS (#373) * document Python = 3.7 (instead of >= 3.7) for TF compatibility (#375) * more extensively describe tasks and intended outputs (#374) following the detailed description in [language](https://github.com/brain-score/language/blob/main/brainscore_language/artificial_subject.py) * reorganize contents into subdirectories * reorganize model_helpers for integration * add missing access parameter for MajajHong2015 --------- Co-authored-by: franzigeiger <32977549+franzigeiger@users.noreply.github.com> Co-authored-by: Sachi Sanghavi <4976443+stothe2@users.noreply.github.com> Co-authored-by: pmcgrath249 <pmcgrath249@gmail.com> Co-authored-by: Michael Ferguson <38020092+mike-ferguson@users.noreply.github.com> Co-authored-by: jjpr-mit <jjpr@mit.edu> Co-authored-by: Michael Ferguson <mferg@mit.edu> Co-authored-by: Tiago Gaspar Oliveira <tiagojgroliveira@tecnico.ulisboa.pt> Co-authored-by: SusanWYS <127456911+SusanWYS@users.noreply.github.com> * First model added (hopefully many more to come!) * Revert "First model added (hopefully many more to come!)" This reverts commit 0d74c72. * Metrics plugin format (#391) * move all metrics / metric_helpers into plugin directories * define ceiling class in `brainscore_vision/metrics` * register metrics to make them loadable * use loader methods for metrics and ceilings * scalar score instead of aggregation; move/add tests * refactor from `aggregation=['center', 'error']` to a scalar Score * tests continued * continue cleaning up scalar score instead of aggregation * monkey-patch readthedocs following #388 * add metric dependencies numpy, scipy, scikit-learn * remove tensorflow and keras from test dependencies * monkey-patch readthedocs part 2 following #390 * fix previous commit's typo * delete out-dated anatomy; fix metric loading * delete outdated references * fix ceiling values * split up `test_setup.sh` into benchmark-specific s3 download (#393) * split up `test_setup.sh` into benchmark-specific s3 download * delete tests redundant with benchmark plugin * add torch test dependency * add torchvision dependency * delete redundant setup.py * move test into plugin * mark `TestLayerSelection` as memory_intense following OSError in https://app.travis-ci.com/github/brain-score/brain-score/jobs/612644470 * run plugin tests * target TRAVIS_BRANCH for git diff * checkout & diff on one line * DEBUG: diff with FETCH_HEAD * DEBUG integrate_core not found * git diff against HEAD * git diff against HEAD * add pytest_check to test dependencies * typo * echo changed files * config before fetch * add conda * remove outdated fixtures like `brainio_home` * delete rdm tests * continued test fixing private access etc * simplify more tests * remove rdm/single benchmarks from registry * fix metric suffix * do not use credentials for `data_helpers` s3 download * fix identifiers * fall back to files in `brainio.contrib` bucket * fix region parameter * fix bucket for precomputed features download * fix registry use; re-arrange Cadena data tests --------- Co-authored-by: Katherine Fairchild <kvfairchild@gmail.com> * Prevent triggering Travis plugin tests for empty `git diff` results (#399) * validate and fix tests (#403) * remove redundant newline * add missing imagenet2012.csv * fix download on demand; unify testing * optimize imports * fix package name * add models/__init__.py * remove redundant/dead code * use approx for cka equal 1 * Add models for testing model conversion helpers (based off #395) (#398) * add s3 helpers for model download * add integration tests and generic model test * add migration scripts for converting zip submissions to 2.0 * add models (alexnet and pixels) * add alexnet and pixels * Fixed benchmark id in test_integration * removed unnecessary migration csv files --------- Co-authored-by: Martin Schrimpf <m4rtinsch@gmail.com> Co-authored-by: Katherine Fairchild <kvfairchild@gmail.com> Co-authored-by: Khaled K Shehada <shehadak@mit.edu> * Revert "Add models for testing model conversion helpers (based off #395) (#398)" This reverts commit 2f51df0. * delete `lookup.csv` and entrypoint (#397) * Add models for testing model conversion helpers (based off #395) (#408) * add s3 helpers for model download * add integration tests and generic model test * add migration scripts for converting zip submissions to 2.0 * add models (alexnet and pixels) * add alexnet and pixels * Fixed benchmark id in test_integration * removed unnecessary migration csv files * Used a public benchmark id for integration testing * Updated integration test expected scores for new benchmark id * Marked integration tests memory-intense, otherwise not enough memory on Travis * Updated test_models to test all existing models --------- Co-authored-by: Martin Schrimpf <m4rtinsch@gmail.com> Co-authored-by: Katherine Fairchild <kvfairchild@gmail.com> * integrate submission handling (#394) * add vision endpoints modeled after language * add config and readme same as language * delete outdated components * fix `conda_active` parameter * write `comment` with layer commitment into `score.attrs` * updated endpoints to be compatible with profiler functionality * Added submission unit tests for endpoints and DB interaction * Changed endpoint tests to use a public benchmark --------- Co-authored-by: Khaled Shehada <45083797+shehadak@users.noreply.github.com> Co-authored-by: Khaled K Shehada <shehadak@mit.edu> * integrate remaining submission tests (#414) * integrate submission tests tests adapted: * test_competition_field_set (from test_integration.test_competition_field) * test_competition_field_not_set (from test_integration.test_competition_field_none) * test_one_model_multiple_benchmarks (from test_integration.test_rerun_evaluation) * add assertion for score comment (from test_integration.test_evaluation) tests deleted because already present: * test_integration.test_failure_evaluation * all tests in test_submission.py tests not incorporated: * test_model_failure_evaluation -- this could be interesting to add in the future, when the scoring fails in the middle, leaving an in-progress database state * format indent * add `test_two_models_two_benchmarks` test * fix querying * use `None` instead of string `'None'` for competition field following brain-score/core#63 * merge main into 2.0 integrate_core (#424) * fix tutorial link (#347) * add documentation for testing on precomputed features (#355) * add documentation for testing on precomputed features * re-add erroneously deleted content * fix retrieval of model predictions * make files public using predefined ACL * include engineering benchmarks in standard tests again (#361) * include engineering benchmarks in standard tests again also fix typo * fix pool keys access * use new jenkins_id instead of id for submissions (#359) * added support for jenkins_id write * removed redundant comment * added DB models specification changes * Make sure directories created use jenkins_id and not id (#364) * Hotfix: add jenkins_id to test_submission.py's tests (#366) * Hotfix: add jenkins_id to test_submission.py's tests * Fixed some tests * Fixed more tests, another submission issue * Islam2021 (#360) * add Islam2021 packaging file * add Dimensionality metric * add Islam2021Dimensionality benchmark * add islam2021 benchmarks to benchmark pool * clean packaging file * add Islam2021 benchmark tests * add islam2021 stimuli test * Fix typo in test_islam2021.py * Add lookup.csv entry for neil.Islam2021 * Correct benchmark name in brainscore/benchmarks/__init__.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Change recorded bins to standard ones Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Correct identifier of benchmark in islam2021.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Fix parent benchmark name islam2021.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Fix stimulus name in lookup.csv Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Fix islam2021 test names in tests/test_benchmarks/test___init__.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * fix islam2021 stimuli test name in tests/test_stimuli.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Fix islam2021 stimuli name in tests/test_stimuli.py Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Fix benchmark_pool keys in test_islam2021.py * Fix stimulus name * Add private_access to islam tests * Fix Islam2021 stimuli name in test_stimuli * add private_access to test_islam2021 in test_stimuli --------- Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Removed faulty model tools import line, removed other unused lines as well (#367) * Add domain to new benchmark creation code for vision. (#368) * Hotfix: Moved Islam to experimental pool, added domain to models.py (#369) * Moved Islam to experimental pool, added domain to models.py * Removed Islam2021 from engineering test pool * let travis run private and public tests separately (#371) previously, when private tests were run, travis ran private _and_ public tests, and then in a separate run only the public tests. This changes it so that _only_ private and _only_ public tests are run separately. * remove outdated lookup_source (#372) per @jjpr's advice * add instructions for adding users to AWS (#373) * document Python = 3.7 (instead of >= 3.7) for TF compatibility (#375) * more extensively describe tasks and intended outputs (#374) following the detailed description in [language](https://github.com/brain-score/language/blob/main/brainscore_language/artificial_subject.py) * Add version to `.readthedocs.yml` (required) (#388) * Add `build.os` to `readthedocs.yml` (#390) * add build.os * ubuntu 18.04 * ubuntu 20.04 * add submission creation to test_competition_field() --------- Co-authored-by: Katherine Fairchild <kvfairchild@gmail.com> * merge main with Islam2021 benchmark * add contributor code of conduct and badge (#405) * Add odd_one_out task documentation (#404) Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * Domain-Transfer Benchmarks (#416) * preliminary analysis * removed data assembly from exploration * added dependencies for creating merged data assembly - no .nc files * investigation on the validity of the merged assembly * neural benchmark for domain-transfer * finalized the merged assembly creation and added lines in lookup table * added hook to cross regressed correlation to take care of background_id * benchmarks and scoring files * added results for analysis benchmark * cleaned the analysis benchmark script * addedd bibliography * added the benchmarks to the pool * addedd tests related script comments * corrected typos * Delete score-model-analysis.py not needed in the PR * Delete score-model.py not needed in the PR * corrected names in the basic checks * added a simple unit test for the neural assembly * minor changes to fix Trevis run * corrected accordingly to PR comments * clean up * add self test --------- Co-authored-by: Ernesto Bocini <bocini@jed.jed.cluster> Co-authored-by: Martin Schrimpf <m4rtinsch@gmail.com> * remove space (#421) * fix engineering/analysis benchmark for Igustibagus2024 (#423) * autoformat * autoimport * code cosmetics * fix ceiling * add test for Igustibagus analysis * aggregate score over domains * add private_access flag * finalize merging domain-transfer benchmarks * mark as private * update to use plugins --------- Co-authored-by: Michael Ferguson <mferg@mit.edu> Co-authored-by: Tiago Gaspar Oliveira <tiagojgroliveira@tecnico.ulisboa.pt> Co-authored-by: SusanWYS <127456911+SusanWYS@users.noreply.github.com> Co-authored-by: kvfairchild <kvg0@mit.edu> Co-authored-by: Katherine Fairchild <kvfairchild@gmail.com> Co-authored-by: Linus <95619282+linus-md@users.noreply.github.com> Co-authored-by: Ernesto Bocini <111696102+ernestoBocini@users.noreply.github.com> Co-authored-by: Ernesto Bocini <bocini@jed.jed.cluster> * First pass of new submission docs (#396) * First pass of new submission docs * First round of Martin's PR comments * Moved location of deb_schema.uml * changed path for uml photo * Update docs/source/modules/submission.rst Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> * 2.0 updates to developer documentation (#418) * 2.0 updates * updated AWS env count number from 3 -> 2 * removed scoring process block --------- Co-authored-by: Katherine Fairchild <kvfairchild@gmail.com> Co-authored-by: Mike Ferguson <mef8dd@virginia.edu> * fix links * update model tutorial quickstart only * update benchmark tutorial * fix links --------- Co-authored-by: kvfairchild <kvg0@mit.edu> Co-authored-by: Martin Schrimpf <mschrimpf@users.noreply.github.com> Co-authored-by: Katherine Fairchild <kvfairchild@gmail.com> Co-authored-by: Martin Schrimpf <m4rtinsch@gmail.com> * simplify alexnet and pixel models (#412) * simplify alexnet and pixel models from #408 * test jenkins testing for new plugins * retrigger checks * Trigger CI after Core update * reset core path --------- Co-authored-by: Katherine Fairchild <kvfairchild@gmail.com> Co-authored-by: Khaled K Shehada <shehadak@mit.edu> * update examples for 2.0 (#425) * update data example * update metrics example * update benchmarks example * combine data, metrics, and benchmarks notebooks * add models example notebook * add example for scoring * fix README links (#426) * register pytest markers in pyproject (#413) * register pytest markers in pyproject * remove duplicate markers definition * americanize Co-authored-by: kvfairchild <kvg0@mit.edu> * import script from core (#427) Co-authored-by: Katherine Fairchild <kvfairchild@gmail.com> * remove lab identifiers from plugin identifiers (#402) * remove lab identifiers from data plugins * rename `BarbuMayo2019` -> `ObjectNet` * remove lab prefix from standard region benchmarks --------- Co-authored-by: kvfairchild <kvg0@mit.edu> * Setup GitHub Actions and Travis for automated submissions (#428) * import script from core * add action workflows * trigger automerge for plugin-only web submissions * model_type=Brain_Model * python -> 3.7 * update repo to vision * cleanup * clarify imported scripts --------- Co-authored-by: Katherine Fairchild <kvfairchild@gmail.com> --------- Co-authored-by: kvfairchild <kvg0@mit.edu> Co-authored-by: samwinebrake <85908068+samwinebrake@users.noreply.github.com> Co-authored-by: franzigeiger <32977549+franzigeiger@users.noreply.github.com> Co-authored-by: Sachi Sanghavi <4976443+stothe2@users.noreply.github.com> Co-authored-by: pmcgrath249 <pmcgrath249@gmail.com> Co-authored-by: Michael Ferguson <38020092+mike-ferguson@users.noreply.github.com> Co-authored-by: jjpr-mit <jjpr@mit.edu> Co-authored-by: Michael Ferguson <mferg@mit.edu> Co-authored-by: Tiago Gaspar Oliveira <tiagojgroliveira@tecnico.ulisboa.pt> Co-authored-by: SusanWYS <127456911+SusanWYS@users.noreply.github.com> Co-authored-by: Mike Ferguson <mef8dd@virginia.edu> Co-authored-by: Katherine Fairchild <kvfairchild@gmail.com> Co-authored-by: Khaled K Shehada <shehadak@mit.edu> Co-authored-by: Khaled Shehada <45083797+shehadak@users.noreply.github.com> Co-authored-by: Linus <95619282+linus-md@users.noreply.github.com> Co-authored-by: Ernesto Bocini <111696102+ernestoBocini@users.noreply.github.com> Co-authored-by: Ernesto Bocini <bocini@jed.jed.cluster>

mschrimpf added 21 commits November 18, 2019 11:52

separate benchmarks by their respective assemblies

e87ea2d

move assembly loading into respective benchmark files

dcc03c4

add versions

e97bc7c

separate unit tests according to new benchmark/assembly structure

2f5b990

separate unit tests according to new benchmark/assembly structure

28f3b47

rename behavior -> image_level_behavior (contains I2n)

8a1b257

work around missing neuroid dimension to to xarray collapsing multiindex

7d4bcd9

Merge remote-tracking branch 'private/issue-174' into issue-174

3366892

add method to list assemblies that user has access to

f2ef3b1

get rid of overly verbose logging; compare sets of assemblies

8e61cf7

add public benchmarks for V1, V2, V4, IT, match-to-sample

46c680c

give more leeway for Cadena benchmark

3a6ff6a

rationale: we don't use this benchmark much anyway and I'm not sure when we have last tested this score

get rid of temporal predictivity benchmarks

7cad307

rationale: these benchmarks are WIP and until we agree on the right metric and ceiling, let's not put them in the Brain-Score standards

rename test_behavioral -> test_rajalingham2018

aebb753

defer __len__ and __class__ to LazyLoad's content

59ac9fb

fix README header

c2275ec

add unit tests with precomputed model features

aed1634

add version property to Benchmark interface

2e88499

use public assemblies and benchmarks in examples

8aaa533

fix separate stratification_coord; add unittests for public benchmarks

f6254d8

remove private_access mark since examples now only use public assemblies

97adc4e

mschrimpf added a commit to brain-score/model-tools that referenced this pull request Nov 20, 2019

use new public benchmarks from Brain-Score instead of self-built ones

adb4246

depends on brain-score/vision#175

mschrimpf mentioned this pull request Nov 20, 2019

use new public benchmarks from Brain-Score instead of self-built ones brain-score/model-tools#19

Merged

fix V4 precomputed score; fix ceiling averaging

d86d57a

mschrimpf marked this pull request as ready for review November 20, 2019 21:40

mschrimpf requested a review from franzigeiger November 20, 2019 21:40

franzigeiger reviewed Nov 21, 2019

View reviewed changes

add imagenet to public benchmarks; remove specific movshon benchmark …

500d9b4

…in list

franzigeiger approved these changes Nov 25, 2019

View reviewed changes

mschrimpf merged commit d03e50f into brain-score:master Dec 9, 2019

mschrimpf added a commit to brain-score/model-tools that referenced this pull request Dec 10, 2019

use new public benchmarks from Brain-Score instead of self-built ones (…

f21a7e2

…#19) depends on brain-score/vision#175

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pull assemblies into benchmarks; add versioning; add public benchmarks #175

pull assemblies into benchmarks; add versioning; add public benchmarks #175

mschrimpf commented Nov 18, 2019 •

edited

Loading

mschrimpf commented Nov 20, 2019

pull assemblies into benchmarks; add versioning; add public benchmarks #175

pull assemblies into benchmarks; add versioning; add public benchmarks #175

Conversation

mschrimpf commented Nov 18, 2019 • edited Loading

mschrimpf commented Nov 20, 2019

mschrimpf commented Nov 18, 2019 •

edited

Loading