## Replicating Meta AI model

Meta AI: https://github.com/facebookresearch/brainmagick/tree/main

Table of contents:
1. Requirements
2. Data and studies
3. Preprocessing and cache
4. Training
5. Evaluations
6. Tests
7. Visualization of metrics


### Requirements

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
cd /content/drive/MyDrive/Final_project/Baseline_model/brainmagick

/content/drive/MyDrive/11785 _DL/Final_project/Baseline_model/brainmagick


In [None]:
ls

[0m[01;34mbm[0m/                 logo.png                             [01;34mnotebook_templates[0m/
[01;34mbm.egg-info[0m/        Makefile                             [01;34moutputs[0m/
brainmagick.png     MANIFEST.in                          pyproject.toml
[01;34mcache[0m/              Miniconda3-latest-Linux-x86_64.sh    README.md
CHANGELOG.md        Miniconda3-latest-Linux-x86_64.sh.1  requirements.txt
CODE_OF_CONDUCT.md  Miniconda3-latest-Linux-x86_64.sh.2  [01;34mscripts[0m/
CONTRIBUTING.md     Miniconda3-latest-Linux-x86_64.sh.3  setup.cfg
[01;34mdata[0m/               Miniconda3-latest-Linux-x86_64.sh.4  setup.py
[01;34mdoc[0m/                Miniconda3-latest-Linux-x86_64.sh.5
LICENSE             mypy.ini


In [None]:
!wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
!chmod +x Miniconda3-latest-Linux-x86_64.sh
!bash ./Miniconda3-latest-Linux-x86_64.sh -b -f -p /usr/local

In [None]:
import sys
sys.path.append('/usr/local/lib/python3.8/site-packages')

In [None]:
# originally Meta AI had this line but it didn't work, so I replaced it with explicit package versions
# !conda install pytorch torchaudio cudatoolkit=11.3 -c pytorch -y

In [None]:
!conda create -n bm ipython python=3.8 -y
!conda activate bm
!conda install pytorch=1.11.0 torchvision=0.12.0 torchaudio=0.11.0 cudatoolkit=11.3 -c pytorch
!pip install -U -r requirements.txt
!pip install -e .
!python -m spacy download en_core_web_md

### Data and studies

* `broderick2019`: EEG
* `brennan2019`: EEG
  * Version 1 (2019): used in Meta AI study, available at: `./data/brennan2019/`
  * Version 2 (2023): updated version: https://deepblue.lib.umich.edu/data/concern/data_sets/bn999738r?locale=en
* `audio_mous`: MEG
* `gwilliams2022`: MEG, available at: `./data/gwilliams2022/` -- it's a default dataset used in `dora run ...`. If we want to use a different dataset, we need to overwrite the default version.

### Preprocessing and cache

Key code:
* `!dora run download_only=true` - downloads and preprocesses the default dataset (gwilliams2022).
* `!dora run download_only=true 'dset.selections=[brennan2019]'` - downloads and preprocesses a selected dataset.



In [None]:
#!dora run download_only=true

In [None]:
!dora run download_only=true 'dset.selections=[brennan2019]'

Hostname 2e465021a4a5 not defined in /conf/study_paths/study_paths.yaml. Using default paths.
a
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
[2023-11-14 09:01:34,984][bm.train][INFO] - For logs, checkpoints and samples, check /content/drive/MyDrive/11785 _DL/Final_project/Baseline_model/brainmagick/outputs/xps/fa7ea8f3.
[2023-11-14 09:01:34,985][bm.train][INFO] - Caching intermediate data under /content/drive/MyDrive/11785 _DL/Final_project/Baseline_model/brainmagick/cache.
[[36m11-14 09:01:34[0m][[34mdora.distrib[0m][[32mINFO[0m] - world_size is 1, skipping init.[0m
[[36m11-14 09:02:00[0m][[34mbm.dataset[0m][[32mINFO[0m] - Loading Subjects | 6/33 | 6.72 it/sec[0m
[[36m11-14 09:02:01[0m][[34mbm.dataset[0m][[32mINFO[0m] - Loading Subjects | 12/33 | 8.00 it/sec[0m
[[36m11-14 09:02:02[0m][[34mbm.dataset[0m][[32mINFO[0m] - Loading Subjects | 18/33 | 8.93 it/sec[0m
[[36m11-14 09:02:02[0m][

### Training

Key code:
* `!dora run dset.n_recordings=1` - runs a training on the default dataset (gwilliams2022) using just 1 recording.
* `!dora run 'dset.selections=[brennan2019]'` - runs a training on a selected dataset using all recordings. --> run; all subjects (get stuck on the normalization part for 6 hours).

In [None]:
!dora run 'dset.selections=[brennan2019]' dset.n_recordings=10

Hostname 2e465021a4a5 not defined in /conf/study_paths/study_paths.yaml. Using default paths.
a
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
[2023-11-14 11:28:57,320][bm.train][INFO] - For logs, checkpoints and samples, check /content/drive/MyDrive/11785 _DL/Final_project/Baseline_model/brainmagick/outputs/xps/54af1a5f.
[2023-11-14 11:28:57,320][bm.train][INFO] - Caching intermediate data under /content/drive/MyDrive/11785 _DL/Final_project/Baseline_model/brainmagick/cache.
[[36m11-14 11:28:57[0m][[34mdora.distrib[0m][[32mINFO[0m] - world_size is 1, skipping init.[0m
[[36m11-14 11:28:59[0m][[34mbm.dataset[0m][[32mINFO[0m] - Loading Subjects | 2/10 | 4.78 it/sec[0m
[[36m11-14 11:28:59[0m][[34mbm.dataset[0m][[32mINFO[0m] - Loading Subjects | 4/10 | 6.21 it/sec[0m
[[36m11-14 11:28:59[0m][[34mbm.dataset[0m][[32mINFO[0m] - Loading Subjects | 6/10 | 7.13 it/sec[0m
[[36m11-14 11:29:00[0m][[3

In [None]:
!dora run 'dset.selections=[brennan2019]' dset.n_recordings=20

Hostname 48d7219ef932 not defined in /conf/study_paths/study_paths.yaml. Using default paths.
a
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
[2023-11-14 19:10:26,258][bm.train][INFO] - For logs, checkpoints and samples, check /content/drive/MyDrive/11785 _DL/Final_project/Baseline_model/brainmagick/outputs/xps/1f75090d.
[2023-11-14 19:10:26,259][bm.train][INFO] - Caching intermediate data under /content/drive/MyDrive/11785 _DL/Final_project/Baseline_model/brainmagick/cache.
[[36m11-14 19:10:26[0m][[34mdora.distrib[0m][[32mINFO[0m] - world_size is 1, skipping init.[0m
[[36m11-14 19:10:48[0m][[34mbm.dataset[0m][[32mINFO[0m] - Loading Subjects | 4/20 | 4.89 it/sec[0m
[[36m11-14 19:10:49[0m][[34mbm.dataset[0m][[32mINFO[0m] - Loading Subjects | 8/20 | 6.56 it/sec[0m
[[36m11-14 19:10:49[0m][[34mbm.dataset[0m][[32mINFO[0m] - Loading Subjects | 12/20 | 7.56 it/sec[0m
[[36m11-14 19:10:49[0m][[

### Evaluations

Key code:
* `!dora grid nmi.main_table --dry_run --init` - shows a full list of all possible experiments per each study; each experiment has its own signature; we pass it when we want to start the experiment.
* `!dora grid nmi.main_table '!seed' '!features' '!wer_random' --dry_run --init'` - shows a signature for each study.
* `!dora run -f 6e3bf7d7 -d` - command to run a specific experiment.

In [None]:
!dora grid nmi.main_table --dry_run --init

Hostname 2e465021a4a5 not defined in /conf/study_paths/study_paths.yaml. Using default paths.
a
Monitoring Grid nmi.main_table
Base name:  model=clip_conv
[1m[0m                                                                     Meta                                                                     | trai | vali |    test  [0m[0m
[1m[38;5;245min  name                                                                                                                   sta       sig  sid | e  l | l  b | wer  wer_[0m[0m
[0m 0  dse.force_uid_assignement dse.selections=['audio_mous']                                                                N/A  34219380      | 0    |      |          [0m
[38;5;245m 1  dse.force_uid_assignement dse.selections=['audio_mous'] opt.epochs=1 opt.max_batches=1 tes.wer_random                  N/A  bcd967bc      | 0    |      |          [0m
[0m 2  dse.features=['MelSpectrum'] dse.force_uid_assignement dse.selections=['audio_mous']                   

In [None]:
!dora grid nmi.main_table '!seed' '!features' '!wer_random' --dry_run --init

Hostname 2e465021a4a5 not defined in /conf/study_paths/study_paths.yaml. Using default paths.
a
Monitoring Grid nmi.main_table
Base name:  model=clip_conv
[1m[0m                                     Meta                                      | trai | vali |    test  [0m[0m
[1m[38;5;245mi  name                                                     sta       sig  sid | e  l | l  b | wer  wer_[0m[0m
[0m0  dse.force_uid_assignement dse.selections=['audio_mous']  N/A  34219380      | 0    |      |          [0m
[38;5;245m1  dse.selections=['gwilliams2022']                         N/A  52345878      | 0    |      |          [0m
[0m2  dse.selections=['broderick2019'] tes.wer_recordings=100  N/A  557f5f8a      | 0    |      |          [0m
[38;5;245m3                                                           N/A  6e3bf7d7      | 0    |      |          [0m


In [None]:
!dora run -f 6e3bf7d7

Hostname 2e465021a4a5 not defined in /conf/study_paths/study_paths.yaml. Using default paths.
a
[1mParser[0m Injecting argv ['model=clip_conv', 'dset.selections=["brennan2019"]', 'seed=2036'] from sig 6e3bf7d7
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
[2023-11-14 09:21:47,196][bm.train][INFO] - For logs, checkpoints and samples, check /content/drive/MyDrive/11785 _DL/Final_project/Baseline_model/brainmagick/outputs/xps/6e3bf7d7.
[2023-11-14 09:21:47,197][bm.train][INFO] - Caching intermediate data under /content/drive/MyDrive/11785 _DL/Final_project/Baseline_model/brainmagick/cache.
[[36m11-14 09:21:47[0m][[34mdora.distrib[0m][[32mINFO[0m] - world_size is 1, skipping init.[0m
[[36m11-14 09:21:50[0m][[34mbm.dataset[0m][[32mINFO[0m] - Loading Subjects | 6/33 | 9.14 it/sec[0m
[[36m11-14 09:21:50[0m][[34mbm.dataset[0m][[32mINFO[0m] - Loading Subjects | 12/33 | 10.14 it/sec[0m
[[36m11-14 09:21

In [None]:
!dora run -f 6e3bf7d7 -d

Hostname d236674f7fcb not defined in /conf/study_paths/study_paths.yaml. Using default paths.
a
[1mParser[0m Injecting argv ['model=clip_conv', 'dset.selections=["gwilliams2022"]', 'seed=2036'] from sig 6e3bf7d7
[1mExecutor:[0m Starting 1 worker processes for DDP.
Hostname d236674f7fcb not defined in /conf/study_paths/study_paths.yaml. Using default paths.
a
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
[2023-11-11 22:00:56,662][bm.train][INFO] - For logs, checkpoints and samples, check /content/drive/MyDrive/11785 _DL/Final_project/Baseline_model/brainmagick/outputs/xps/6e3bf7d7.
[2023-11-11 22:00:56,662][bm.train][INFO] - Caching intermediate data under /content/drive/MyDrive/11785 _DL/Final_project/Baseline_model/brainmagick/cache.
[[36m11-11 22:00:56[0m][[34mdora.distrib[0m][[32mINFO[0m] - world_size is 1, skipping init.[0m
  raw = read_raw_bids(bids_path)  # FIXME this is NOT a lazy read
  raw = read

KeyboardInterrupt: ignored

In [None]:
!python -m scripts.run_eval_probs grid_name="main_table"

Hostname d236674f7fcb not defined in /conf/study_paths/study_paths.yaml. Using default paths.
a
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/content/drive/MyDrive/11785 _DL/Final_project/Baseline_model/brainmagick/scripts/run_eval_probs.py", line 471, in <module>
    assert grid_dir.exists(), f"{grid_dir} does not exists"
AssertionError: /content/drive/MyDrive/11785 _DL/Final_project/Baseline_model/brainmagick/outputs/grids/main_table does not exists


### Tests

In [None]:
#!pytest bm

### Visualization of metrics

In [None]:
#!pip install hiplot

In [None]:
#!python -m hiplot dora.hiplot.load --port=5005