# Push final models to Huggingface

**Purpose:** This script is used to push final models to huggingface model-hub. This can also be done manually. 

**Dependency:** Prior to running this script, final models need to be trained (`Train_final_models.ipynb`) and downloaded from weights and biases and placed in the `TRIDENT`-folder.

**Consecutive scripts:** After running this script the following scripts may be executed. `download_wandb_artifacts.ipynb`

In [1]:
import transformers
from transformers import AutoModel, AutoTokenizer

import torch

from development_utils.training.Build_Pytorch_model import TRIDENT, DNN_module

In [2]:
model_path = '../TRIDENT/'
version = 'EC50_fish'
name = f'final_model_{version}'

In [3]:
onehotencodinglengths = {
    'EC50_algae': 1,
    'EC10_algae': 1,
    'EC50EC10_algae': 2, 
    'EC50_invertebrates': 2,
    'EC10_invertebrates': 6,
    'EC50EC10_invertebrates': 8,
    'EC50_fish': 1,
    'EC10_fish': 7,
    'EC50EC10_fish': 9
}

In [4]:
chemberta = AutoModel.from_pretrained('seyonec/PubChem10M_SMILES_BPE_450k')
tokenizer = AutoTokenizer.from_pretrained('seyonec/PubChem10M_SMILES_BPE_450k')

Some weights of the model checkpoint at seyonec/PubChem10M_SMILES_BPE_450k were not used when initializing RobertaModel: ['lm_head.layer_norm.bias', 'lm_head.dense.weight', 'lm_head.decoder.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [6]:
dnn = DNN_module(one_hot_enc_len=onehotencodinglengths[version],
                n_hidden_layers=3,
                layer_sizes=[700,500,300],
                dropout=0.2)

model = TRIDENT(chemberta, dnn)

In [7]:
def load_ckp(checkpoint_dir, model):
    checkpoint_dnn = torch.load(checkpoint_dir+'_dnn_saved_weights.pt', map_location='cpu')
    checkpoint_roberta = torch.load(checkpoint_dir+'_roberta_saved_weights.pt', map_location='cpu')
    model.dnn.load_state_dict(checkpoint_dnn)
    model.roberta.load_state_dict(checkpoint_roberta)
    return model

In [8]:
model = load_ckp(model_path+name, model)
model.eval()

TRIDENT(
  (roberta): RobertaModel(
    (embeddings): RobertaEmbeddings(
      (word_embeddings): Embedding(52000, 768, padding_idx=1)
      (position_embeddings): Embedding(512, 768, padding_idx=1)
      (token_type_embeddings): Embedding(1, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): RobertaEncoder(
      (layer): ModuleList(
        (0): RobertaLayer(
          (attention): RobertaAttention(
            (self): RobertaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): RobertaSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), e

## Save to hub

In [5]:
from huggingface_hub import notebook_login

notebook_login()

Token is valid.
Your token has been saved in your configured git credential helpers (manager-core).
Your token has been saved to C:\Users\Styrbjörn Käll\.huggingface\token
Login successful


In [12]:
model.roberta.push_to_hub(version)

CommitInfo(commit_url='https://huggingface.co/StyrbjornKall/EC50_fish/commit/bfb85c5996e9eb17aec22c12b424dd725456607f', commit_message='Upload model', commit_description='', oid='bfb85c5996e9eb17aec22c12b424dd725456607f', pr_url=None, pr_revision=None, pr_num=None)

In [13]:
tokenizer.push_to_hub(version)

CommitInfo(commit_url='https://huggingface.co/StyrbjornKall/EC50_fish/commit/dbb4bd954205db9ec3cf06c61f17267cdf60217c', commit_message='Upload tokenizer', commit_description='', oid='dbb4bd954205db9ec3cf06c61f17267cdf60217c', pr_url=None, pr_revision=None, pr_num=None)

# Push all

In [8]:
for version in ['EC50_algae','EC10_algae','EC50EC10_algae','EC50_fish', 'EC10_fish','EC50EC10_fish','EC50_invertebrates','EC10_invertebrates','EC50EC10_invertebrates']:
    print(f'Pushing: {version}\n')
    model_path = '../TRIDENT/'
    name = f'final_model_{version}'

    onehotencodinglengths = {
        'EC50_algae': 1,
        'EC10_algae': 1,
        'EC50EC10_algae': 2, 
        'EC50_invertebrates': 2,
        'EC10_invertebrates': 6,
        'EC50EC10_invertebrates': 8,
        'EC50_fish': 1,
        'EC10_fish': 7,
        'EC50EC10_fish': 9
    }

    chemberta = AutoModel.from_pretrained('seyonec/PubChem10M_SMILES_BPE_450k')
    tokenizer = AutoTokenizer.from_pretrained('seyonec/PubChem10M_SMILES_BPE_450k')

    dnn = DNN_module(one_hot_enc_len=onehotencodinglengths[version],
                    n_hidden_layers=3,
                    layer_sizes=[700,500,300],
                    dropout=0.2)

    model = TRIDENT(chemberta, dnn)

    model = load_ckp(model_path+name, model)
    model.eval()

    model.roberta.push_to_hub(version)

    tokenizer.push_to_hub(version)

Pushing: EC50_algae



Some weights of the model checkpoint at seyonec/PubChem10M_SMILES_BPE_450k were not used when initializing RobertaModel: ['lm_head.layer_norm.bias', 'lm_head.dense.weight', 'lm_head.decoder.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Pushing: EC10_algae



Some weights of the model checkpoint at seyonec/PubChem10M_SMILES_BPE_450k were not used when initializing RobertaModel: ['lm_head.layer_norm.bias', 'lm_head.dense.weight', 'lm_head.decoder.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Pushing: EC50EC10_algae



Some weights of the model checkpoint at seyonec/PubChem10M_SMILES_BPE_450k were not used when initializing RobertaModel: ['lm_head.layer_norm.bias', 'lm_head.dense.weight', 'lm_head.decoder.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Pushing: EC50_fish



Some weights of the model checkpoint at seyonec/PubChem10M_SMILES_BPE_450k were not used when initializing RobertaModel: ['lm_head.layer_norm.bias', 'lm_head.dense.weight', 'lm_head.decoder.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Pushing: EC10_fish



Some weights of the model checkpoint at seyonec/PubChem10M_SMILES_BPE_450k were not used when initializing RobertaModel: ['lm_head.layer_norm.bias', 'lm_head.dense.weight', 'lm_head.decoder.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Pushing: EC50EC10_fish



Some weights of the model checkpoint at seyonec/PubChem10M_SMILES_BPE_450k were not used when initializing RobertaModel: ['lm_head.layer_norm.bias', 'lm_head.dense.weight', 'lm_head.decoder.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Pushing: EC50_invertebrates



Some weights of the model checkpoint at seyonec/PubChem10M_SMILES_BPE_450k were not used when initializing RobertaModel: ['lm_head.layer_norm.bias', 'lm_head.dense.weight', 'lm_head.decoder.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Pushing: EC10_invertebrates



Some weights of the model checkpoint at seyonec/PubChem10M_SMILES_BPE_450k were not used when initializing RobertaModel: ['lm_head.layer_norm.bias', 'lm_head.dense.weight', 'lm_head.decoder.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Pushing: EC50EC10_invertebrates



Some weights of the model checkpoint at seyonec/PubChem10M_SMILES_BPE_450k were not used when initializing RobertaModel: ['lm_head.layer_norm.bias', 'lm_head.dense.weight', 'lm_head.decoder.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
