## Replicating Meta AI model

Meta AI: https://github.com/facebookresearch/brainmagick/tree/main

Table of contents:
1. Requirements
2. Data and studies
3. Preprocessing and cache
4. Training
5. Evaluations
6. Tests
7. Visualization of metrics


### Requirements

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
%cd /content/drive/MyDrive/Final_project/Baseline_model/brainmagick

/content/drive/.shortcut-targets-by-id/1vmzRyDN2H7gN5AacMSG-aJKQHDWqIxiB/Final_project/Baseline_model/brainmagick


In [3]:
!ls

bm		    MANIFEST.in				  Miniconda3-latest-Linux-x86_64.sh.7
bm.egg-info	    Miniconda3-latest-Linux-x86_64.sh	  Miniconda3-latest-Linux-x86_64.sh.8
brainmagick.png     Miniconda3-latest-Linux-x86_64.sh.1   Miniconda3-latest-Linux-x86_64.sh.9
cache		    Miniconda3-latest-Linux-x86_64.sh.10  mypy.ini
CHANGELOG.md	    Miniconda3-latest-Linux-x86_64.sh.11  notebook_templates
CODE_OF_CONDUCT.md  Miniconda3-latest-Linux-x86_64.sh.12  outputs
CONTRIBUTING.md     Miniconda3-latest-Linux-x86_64.sh.13  pyproject.toml
data		    Miniconda3-latest-Linux-x86_64.sh.2   README.md
doc		    Miniconda3-latest-Linux-x86_64.sh.3   requirements.txt
LICENSE		    Miniconda3-latest-Linux-x86_64.sh.4   scripts
logo.png	    Miniconda3-latest-Linux-x86_64.sh.5   setup.cfg
Makefile	    Miniconda3-latest-Linux-x86_64.sh.6   setup.py


In [4]:
!wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
!chmod +x Miniconda3-latest-Linux-x86_64.sh
!bash ./Miniconda3-latest-Linux-x86_64.sh -b -f -p /usr/local

--2023-12-10 15:22:25--  https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Resolving repo.anaconda.com (repo.anaconda.com)... 104.16.130.3, 104.16.131.3, 2606:4700::6810:8203, ...
Connecting to repo.anaconda.com (repo.anaconda.com)|104.16.130.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 120986213 (115M) [application/x-sh]
Saving to: ‘Miniconda3-latest-Linux-x86_64.sh.14’


2023-12-10 15:22:27 (78.0 MB/s) - ‘Miniconda3-latest-Linux-x86_64.sh.14’ saved [120986213/120986213]

PREFIX=/usr/local
Unpacking payload ...
                                                                               
Installing base environment...


Downloading and Extracting Packages


Downloading and Extracting Packages

Preparing transaction: - \ | / - done
Executing transaction: | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / done
installation finished.
    You currently have a PYTHONPATH envi

In [5]:
import sys
sys.path.append('/usr/local/lib/python3.8/site-packages')

In [6]:
# originally Meta AI had this line but it didn't work, so I replaced it with explicit package versions
# !conda install pytorch torchaudio cudatoolkit=11.3 -c pytorch -y

In [7]:
!conda create -n bm ipython python=3.8 -y
!conda activate bm
!conda install pytorch=1.11.0 torchvision=0.12.0 torchaudio=0.11.0 cudatoolkit=11.3 -c pytorch
!pip install -U -r requirements.txt
!pip install -e .
!python -m spacy download en_core_web_md

Collecting package metadata (current_repodata.json): - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - done
Solving environment: | / done


  current version: 23.9.0
  latest version: 23.11.0

Please update conda by running

    $ conda update -n base -c defaults conda

Or to minimize the number of packages updated during conda update use

     conda install conda=23.11.0



## Package Plan ##

  environment location: /usr/local/envs/bm

  added / updated specs:
    - ipython
    - python=3.8


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    asttokens-2.0.5            |     pyhd3eb1b0_0          20 KB
    backcall-0.2.0             |     pyhd3eb1b0_0          13 KB
    decorator-5.1.1            |     pyhd3eb1b0_0          12 KB
    executing-0.8.3            |     pyhd3eb1b0_0          18 KB
    ipython-8.12.2             |   py38h06

Obtaining file:///content/drive/.shortcut-targets-by-id/1vmzRyDN2H7gN5AacMSG-aJKQHDWqIxiB/Final_project/Baseline_model/brainmagick
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: bm
  Building editable for bm (pyproject.toml) ... [?25l[?25hdone
  Created wheel for bm: filename=bm-0.1.0-0.editable-py3-none-any.whl size=14588 sha256=6f8ffb479cc2a6637d006da32d49b29869aaff5714f721b23ee9d94a208a8e14
  Stored in directory: /tmp/pip-ephem-wheel-cache-thom_all/wheels/7e/a7/59/3db9f2829ff6d232b82eba0db4942aa099acf465f430726524
Successfully built bm
Installing collected packages: bm
Successfully installed bm-0.1.0
[0m[33mDEPRECATION: https://github.com/explosion/spacy-models/releases/download/en_core_web_md-2.3.1/en_core_web_md-2.3.1.tar.gz#egg=en_

### Data and studies

* `broderick2019`: EEG
* `brennan2019`: EEG
  * Version 1 (2019): used in Meta AI study, available at: `./data/brennan2019/`
  * Version 2 (2023): updated version: https://deepblue.lib.umich.edu/data/concern/data_sets/bn999738r?locale=en
* `audio_mous`: MEG
* `gwilliams2022`: MEG, available at: `./data/gwilliams2022/` -- it's a default dataset used in `dora run ...`. If we want to use a different dataset, we need to overwrite the default version.

### Preprocessing and cache

Key code:
* `!dora run download_only=true` - downloads and preprocesses the default dataset (gwilliams2022).
* `!dora run download_only=true 'dset.selections=[brennan2019]'` - downloads and preprocesses a selected dataset.



In [8]:
#!dora run download_only=true

In [9]:
!dora run download_only=true 'dset.selections=[brennan2019]'

Hostname 0b267f0dbfc4 not defined in /conf/study_paths/study_paths.yaml. Using default paths.
a
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
[2023-12-10 15:29:10,413][bm.train][INFO] - For logs, checkpoints and samples, check /content/drive/.shortcut-targets-by-id/1vmzRyDN2H7gN5AacMSG-aJKQHDWqIxiB/Final_project/Baseline_model/brainmagick/outputs/xps/fa7ea8f3.
[2023-12-10 15:29:10,413][bm.train][INFO] - Caching intermediate data under /content/drive/.shortcut-targets-by-id/1vmzRyDN2H7gN5AacMSG-aJKQHDWqIxiB/Final_project/Baseline_model/brainmagick/cache.
[[36m12-10 15:29:10[0m][[34mdora.distrib[0m][[32mINFO[0m] - world_size is 1, skipping init.[0m
[[36m12-10 15:29:53[0m][[34mbm.dataset[0m][[32mINFO[0m] - Loading Subjects | 6/33 | 6.40 it/sec[0m
[[36m12-10 15:29:53[0m][[34mbm.dataset[0m][[32mINFO[0m] - Loading Subjects | 12/33 | 8.23 it/sec[0m
[[36m12-10 15:29:54[0m][[34mbm.dataset[0m][[32mIN

In [None]:
!dora run 'dset.selections=[brennan2019]' dset.n_recordings=30 simpleconv.subject_layers=true

Hostname 0b267f0dbfc4 not defined in /conf/study_paths/study_paths.yaml. Using default paths.
a
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
[2023-12-10 15:41:38,931][bm.train][INFO] - For logs, checkpoints and samples, check /content/drive/.shortcut-targets-by-id/1vmzRyDN2H7gN5AacMSG-aJKQHDWqIxiB/Final_project/Baseline_model/brainmagick/outputs/xps/ae68a716.
[2023-12-10 15:41:38,931][bm.train][INFO] - Caching intermediate data under /content/drive/.shortcut-targets-by-id/1vmzRyDN2H7gN5AacMSG-aJKQHDWqIxiB/Final_project/Baseline_model/brainmagick/cache.
[[36m12-10 15:41:38[0m][[34mdora.distrib[0m][[32mINFO[0m] - world_size is 1, skipping init.[0m
[[36m12-10 15:41:41[0m][[34mbm.dataset[0m][[32mINFO[0m] - Loading Subjects | 6/30 | 9.64 it/sec[0m
[[36m12-10 15:41:41[0m][[34mbm.dataset[0m][[32mINFO[0m] - Loading Subjects | 12/30 | 10.63 it/sec[0m
[[36m12-10 15:41:42[0m][[34mbm.dataset[0m][[32mI

### Training

Key code:
* `!dora run dset.n_recordings=1` - runs a training on the default dataset (gwilliams2022) using just 1 recording.
* `!dora run 'dset.selections=[brennan2019]'` - runs a training on a selected dataset using all recordings. --> run; all subjects (get stuck on the normalization part for 6 hours).

In [11]:
!dora run 'dset.selections=[brennan2019]' dset.n_recordings=20

Hostname 0b267f0dbfc4 not defined in /conf/study_paths/study_paths.yaml. Using default paths.
a
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
[2023-12-10 15:31:49,397][bm.train][INFO] - For logs, checkpoints and samples, check /content/drive/.shortcut-targets-by-id/1vmzRyDN2H7gN5AacMSG-aJKQHDWqIxiB/Final_project/Baseline_model/brainmagick/outputs/xps/1f75090d.
[2023-12-10 15:31:49,397][bm.train][INFO] - Caching intermediate data under /content/drive/.shortcut-targets-by-id/1vmzRyDN2H7gN5AacMSG-aJKQHDWqIxiB/Final_project/Baseline_model/brainmagick/cache.
[[36m12-10 15:31:49[0m][[34mdora.distrib[0m][[32mINFO[0m] - world_size is 1, skipping init.[0m
[[36m12-10 15:31:51[0m][[34mbm.dataset[0m][[32mINFO[0m] - Loading Subjects | 4/20 | 6.91 it/sec[0m
[[36m12-10 15:31:51[0m][[34mbm.dataset[0m][[32mINFO[0m] - Loading Subjects | 8/20 | 8.54 it/sec[0m
[[36m12-10 15:31:51[0m][[34mbm.dataset[0m][[32mINF

In [12]:
!dora run 'dset.selections=[brennan2019]' dset.n_recordings=20 simpleconv.subject_layers=true

Hostname 0b267f0dbfc4 not defined in /conf/study_paths/study_paths.yaml. Using default paths.
a
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
[2023-12-10 15:37:45,384][bm.train][INFO] - For logs, checkpoints and samples, check /content/drive/.shortcut-targets-by-id/1vmzRyDN2H7gN5AacMSG-aJKQHDWqIxiB/Final_project/Baseline_model/brainmagick/outputs/xps/1f75090d.
[2023-12-10 15:37:45,384][bm.train][INFO] - Caching intermediate data under /content/drive/.shortcut-targets-by-id/1vmzRyDN2H7gN5AacMSG-aJKQHDWqIxiB/Final_project/Baseline_model/brainmagick/cache.
[[36m12-10 15:37:45[0m][[34mdora.distrib[0m][[32mINFO[0m] - world_size is 1, skipping init.[0m
[[36m12-10 15:37:47[0m][[34mbm.dataset[0m][[32mINFO[0m] - Loading Subjects | 4/20 | 6.96 it/sec[0m
[[36m12-10 15:37:47[0m][[34mbm.dataset[0m][[32mINFO[0m] - Loading Subjects | 8/20 | 8.60 it/sec[0m
Traceback (most recent call last):
  File "/content/dri

In [13]:
# see config one for change
# n_records = 20
# max_sacle = 100
# others remains the same
!dora run 'dset.selections=[brennan2019]' dset.n_recordings=20 norm.max_scale=100

Hostname 0b267f0dbfc4 not defined in /conf/study_paths/study_paths.yaml. Using default paths.
Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.11/site-packages/wordfreq/__init__.py", line 25, in <module>
    DATA_PATH = data_path()
                ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/wordfreq/util.py", line 13, in data_path
    return Path(locate.this_dir(), "data")
                ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/locate/_locate.py", line 39, in this_dir
    return _this_dir(inspect.stack())
                     ^^^^^^^^^^^^^^^
  File "/usr/local/lib/py

### Evaluations

Key code:
* `!dora grid nmi.main_table --dry_run --init` - shows a full list of all possible experiments per each study; each experiment has its own signature; we pass it when we want to start the experiment.
* `!dora grid nmi.main_table '!seed' '!features' '!wer_random' --dry_run --init'` - shows a signature for each study.
* `!dora run -f 6e3bf7d7 -d` - command to run a specific experiment.

In [14]:
!dora grid nmi.main_table --dry_run --init

Hostname 0b267f0dbfc4 not defined in /conf/study_paths/study_paths.yaml. Using default paths.
a
Traceback (most recent call last):
  File "/usr/local/bin/dora", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dora/__main__.py", line 170, in main
    args.action(args, main)
  File "/usr/local/lib/python3.11/site-packages/dora/grid.py", line 138, in grid_action
    run_grid(main, explorer, args.grid, rules, slurm, grid_args)
  File "/usr/local/lib/python3.11/site-packages/dora/grid.py", line 271, in run_grid
    main.init_xp(sheep.xp)
  File "/usr/local/lib/python3.11/site-packages/dora/main.py", line 118, in init_xp
    json.dump(xp.argv, open(xp._argv_cache, 'w'))
                       ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen codecs>", line 186, in __init__
KeyboardInterrupt
^C


In [15]:
!dora grid nmi.main_table '!seed' '!features' '!wer_random' --dry_run --init

Hostname 0b267f0dbfc4 not defined in /conf/study_paths/study_paths.yaml. Using default paths.
a
Monitoring Grid nmi.main_table
Base name:  model=clip_conv
[1m[0m                                     Meta                                      | trai | vali |    test  [0m[0m
[1m[38;5;245mi  name                                                     sta       sig  sid | e  l | l  b | wer  wer_[0m[0m
[0m0  dse.force_uid_assignement dse.selections=['audio_mous']  N/A  34219380      | 0    |      |          [0m
[38;5;245m1  dse.selections=['gwilliams2022']                         N/A  52345878      | 0    |      |          [0m
[0m2  dse.selections=['broderick2019'] tes.wer_recordings=100  N/A  557f5f8a      | 0    |      |          [0m
[38;5;245m3                                                           N/A  6e3bf7d7      | 0    |      |          [0m


In [16]:
!dora run -f 6e3bf7d7

Hostname 0b267f0dbfc4 not defined in /conf/study_paths/study_paths.yaml. Using default paths.
a
[1mParser[0m Injecting argv ['model=clip_conv', 'dset.selections=["brennan2019"]', 'seed=2036'] from sig 6e3bf7d7
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
[2023-12-10 15:38:36,408][bm.train][INFO] - For logs, checkpoints and samples, check /content/drive/.shortcut-targets-by-id/1vmzRyDN2H7gN5AacMSG-aJKQHDWqIxiB/Final_project/Baseline_model/brainmagick/outputs/xps/6e3bf7d7.
[2023-12-10 15:38:36,408][bm.train][INFO] - Caching intermediate data under /content/drive/.shortcut-targets-by-id/1vmzRyDN2H7gN5AacMSG-aJKQHDWqIxiB/Final_project/Baseline_model/brainmagick/cache.
[[36m12-10 15:38:36[0m][[34mdora.distrib[0m][[32mINFO[0m] - world_size is 1, skipping init.[0m
[[36m12-10 15:38:39[0m][[34mbm.dataset[0m][[32mINFO[0m] - Loading Subjects | 6/33 | 9.62 it/sec[0m
[[36m12-10 15:38:39[0m][[34mbm.dataset[0m

In [17]:
!dora run -f 6e3bf7d7 -d

Hostname 0b267f0dbfc4 not defined in /conf/study_paths/study_paths.yaml. Using default paths.
Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 936, in exec_module
  File "<frozen importlib._bootstrap_external>", line 1032, in get_code
  File "<frozen importlib._bootstrap_external>", line 1130, in get_data
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/dora", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dora/__main__.py", line 158, in main
    main = get_main(args.main_module, args.package)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dora/_utils.py", line 48, in get_main
    module = impor

In [18]:
!python -m scripts.run_eval_probs grid_name="main_table"

Hostname 0b267f0dbfc4 not defined in /conf/study_paths/study_paths.yaml. Using default paths.
a
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/content/drive/.shortcut-targets-by-id/1vmzRyDN2H7gN5AacMSG-aJKQHDWqIxiB/Final_project/Baseline_model/brainmagick/scripts/run_eval_probs.py", line 471, in <module>
    assert grid_dir.exists(), f"{grid_dir} does not exists"
AssertionError: /content/drive/.shortcut-targets-by-id/1vmzRyDN2H7gN5AacMSG-aJKQHDWqIxiB/Final_project/Baseline_model/brainmagick/outputs/grids/main_table does not exists


### Tests

In [19]:
#!pytest bm

### Visualization of metrics

In [20]:
#!pip install hiplot

In [21]:
#!python -m hiplot dora.hiplot.load --port=5005