# Instructions
Please refer to the [infer_quickstart.ipynb](https://github.com/bowang-lab/ECG-FM/blob/main/notebooks/infer_quickstart.ipynb) notebook if you haven't already. This tutorial assumes you have already gone through the installation and model setup.

This tutorial focuses on performing inference through the [fairseq_signals](https://github.com/Jwoo5/fairseq-signals) command-line functionality, which is useful for the large-scale computation and storage of results.

In [None]:
import os
import pandas as pd

root = os.path.dirname(os.getcwd())

fairseq_signals_root = # TODO
fairseq_signals_root = fairseq_signals_root.rstrip('/')

In [None]:
label_def = pd.read_csv(
    os.path.join(root, 'data/mimic_iv_ecg/labels/label_def.csv'),
     index_col='name',
)
label_names = label_def.index
label_names

## Data manifest

The segmented split must be saved with absolute file paths, so we will update the current relative file paths accordingly.

In [None]:
segmented_split = pd.read_csv(
    os.path.join(root, 'data/code_15/segmented_split_incomplete.csv'),
    index_col='idx',
)
segmented_split['path'] = (root + '/data/code_15/segmented/') + segmented_split['path']
segmented_split.to_csv(os.path.join(root, 'data/code_15/segmented_split.csv'))

In [None]:
assert os.path.isfile(os.path.join(root, 'data/code_15/segmented_split.csv'))

Run the follow commands togenerate the `test.tsv` file used for inference.

In [None]:
print(f"""cd {fairseq_signals_root}/scripts/preprocess
python manifests.py \\
    --split_file_paths "{root}/data/code_15/segmented_split.csv" \\
    --save_dir "{root}/data/manifests/code_15_subset10/"
""")

In [None]:
assert os.path.isfile(os.path.join(root, 'data/manifests/code_15_subset10/test.tsv'))

# Inference

Inside our environment, we can run the following command using hydra's command line interface to extract the logits/targets, as well as the precursor results needed to obtain the embeddings and saliency maps.

The [embs.py](https://github.com/bowang-lab/ECG-FM/blob/main/scripts/embs.py) and [saliency.py](https://github.com/bowang-lab/ECG-FM/blob/main/scripts/saliency.py) scripts can then be used to convert the result precursors into a more final form. See the `infer_quickstart.ipynb` for visualization.

In [None]:
print(f"""fairseq-hydra-inference \
    task.data="{root}/data/manifests/code_15_subset10/" \\
    common_eval.path="{root}/ckpts/mimic_iv_ecg_finetuned.pt" \\
    common_eval.extract=[output,encoder_out,saliency]
    common_eval.results_path="{root}/outputs" \\
    model.num_labels={len(label_names)} \\
    dataset.valid_subset=test \
    dataset.batch_size=10 \
    dataset.num_workers=3 \
    dataset.disable_validation=false \
    distributed_training.distributed_world_size=1 \
    distributed_training.find_unused_parameters=True \
    --config-dir "{root}/ckpts" \\
    --config-name mimic_iv_ecg_finetuned
""")

In [None]:
assert os.path.isfile(os.path.join(root, 'outputs/outputs_test.npy'))
assert os.path.isfile(os.path.join(root, 'outputs/outputs_test_header.pkl'))

### Loading logits

In [None]:
import torch

from fairseq_signals.utils.store import MemmapReader

In [None]:
# Load the array of computed logits
logits = MemmapReader.from_header(
    os.path.join(root, 'outputs/outputs_test.npy')
)[:]
logits.shape

In [None]:
# Construct predictions from logits
pred = pd.DataFrame(
    torch.sigmoid(torch.tensor(logits)).numpy(),
    columns=label_names,
)

# Join in sample information
pred = segmented_split.reset_index().join(pred, how='left').set_index('idx')
pred

# Pretrained embeddings
If looking to obtain pretrained embeddings (e.g., for use as a feature set or for linear probing), the simplest way is to run `fairseq-hydra-train` to transform a pretrained model into a finetuning model format which can be ran through `fairseq-hydra-validate` with `common_eval.extract=[encoder_out]`. Include the following arguments to ensure no training actually occurs:
```
    optimization.lr=[1e-25] \
    optimization.max_update=1 \
    checkpoint.save_interval_updates=1 \
```