## Overview

This notebook is used to download artifacts produced by `CAFA5Pipeline.ipynb`. Here each paragraph exactly maps the `CAFA5Pipeline.ipynb` numeration. So execution of the step here will be equivalent to execute the corresponding cells to calculate  it from skratch. If some paragraphs are missing, that means, computations are easy and fast, so no need to store the results in cloud storage.

### 1.4. Get external data


In [None]:
!mkdir temporal
!mkdir temporal/labels

!wget https://storage.yandexcloud.net/cafa5embeds/temporal/cafa-terms-diff.tsv -O temporal/cafa-terms-diff.tsv
!wget https://storage.yandexcloud.net/cafa5embeds/temporal/prop_quickgo51.tsv -O temporal/prop_quickgo51.tsv
    
!wget https://storage.yandexcloud.net/cafa5embeds/temporal/labels/prop_test_leak_no_dup.tsv -O temporal/labels/prop_test_leak_no_dup.tsv
!wget https://storage.yandexcloud.net/cafa5embeds/temporal/labels/prop_test_no_kaggle.tsv -O temporal/labels/prop_test_no_kaggle.tsv
!wget https://storage.yandexcloud.net/cafa5embeds/temporal/labels/prop_train_no_kaggle.tsv -O temporal/labels/prop_train_no_kaggle.tsv

## 2. Embeddings

### 2.1 T5 pretrained inference

In [None]:
!mkdir embeds
!mkdir embeds/t5

for file in ['train_embeds.npy', 'test_embeds.npy', 'train_ids.npy', 'test_ids.npy']:
    !wget https://storage.yandexcloud.net/cafa5embeds/embeds/t5/{file} -O embeds/t5/{file}

### 2.2 ESM pretrained inference

In [None]:
!mkdir embeds
!mkdir embeds/esm_small

for file in ['train_embeds.npy', 'test_embeds.npy', 'train_ids.npy', 'test_ids.npy']:
    !wget https://storage.yandexcloud.net/cafa5embeds/embeds/esm_small/{file} -O embeds/esm_small/{file}

## 3. Base models

In [None]:
!mkdir models
files = [f'model_{x}.pkl' for x in range(5)] + ['oof_pred.pkl', 'test_pred.pkl']

### 3.1. Train and inference py-boost models

In [None]:
for model in ['pb_t54500_cond', 'pb_t54500_raw', 'pb_t5esm4500_cond', 'pb_t5esm4500_raw']:
    !mkdir models/{model}
    for file in files:
        !wget https://storage.yandexcloud.net/cafa5embeds/boostpreds/{model}/{file} -O models/{model}/{file}

### 3.2. Train and inference logreg models

In [None]:
for model in ['lin_t5_cond', 'lin_t5_raw']:
    !mkdir models/{model}
    for file in files:
        !wget https://storage.yandexcloud.net/cafa5embeds/linpreds/{model}/{file} -O models/{model}/{file}

### 3.3. Train and inference NN models

In [None]:
!mkdir models/nn_serg
for i in range(12):
    for j in range(5):
        !wget https://storage.yandexcloud.net/cafa5embeds/nn_models_upd/model_{i}_{j}.pt -O models/nn_serg/model_{i}_{j}.pt
            
!wget https://storage.yandexcloud.net/cafa5embeds/nn_models_upd/pytorch-keras-etc-3-blend-cafa-metric-etc.pkl -O models/nn_serg/pytorch-keras-etc-3-blend-cafa-metric-etc.pkl

# 4. Final model

### 4.1. Train GCN models

In [None]:
!mkdir models/gcn

for ns in ['bp', 'mf', 'cc']:
    !mkdir models/gcn/{ns}
    !wget https://storage.yandexcloud.net/cafa5embeds/gcn/{ns}/checkpoint.pth -O models/gcn/{ns}/checkpoint.pth

### 4.2. Inference GCN models and TTA

In [None]:
for i in range(4):
    !wget https://storage.yandexcloud.net/cafa5embeds/gcn/pred_tta_{i}.tsv -O models/gcn/pred_tta_{i}.tsv