# Upload Hugging Face Models to the Hub

In this notebook, a set of models are uploaded to the HF data. Due to an error with the config_class, the models are uploaded as a folder instead of using `push_to_hub`.

**Note**: "UPC-HLE/fc-{path}-{model_name}" have a model card but it is because they were previously uploaded using push_to_hub.

This is the error: 

```bash
ValueError: The model class you are passing has a `config_class` attribute that is not consistent with the config class you passed (model has <class 'transformers_modules.jinaai.jina-reranker-v2-base-multilingual.126747772a932960028d9f4dc93bd5d9c4869be4.configuration_xlm_roberta.XLMRobertaFlashConfig'> and you passed <class 'transformers_modules.UPC-HLE.fc-monolingual_spa_reranker-jina-v2-base-multilingual.fe03bc8de2620aa865b72e3e84a00a91dddca6a6.configuration_xlm_roberta.XLMRobertaFlashConfig'>. Fix one of those so they match!
```


In [1]:
from huggingface_hub import notebook_login
from src.utils import upload_model_to_hub

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:

import os
from sentence_transformers import SentenceTransformer
from transformers import AutoConfig

models_path = "output/official/contrastive/monolingual/20241102-171126"
model_name = "jina-v2-base-multilingual"

commit_message = models_path

for path in os.listdir(models_path):
    if os.path.isdir(os.path.join(models_path, path)):
        local_model_path = os.path.join(models_path, path)
        hf_model_path = f"UPC-HLE/fc-{path}-{model_name}"
         
        upload_model_to_hub(local_model_path, hf_model_path, commit_message, private=True, exist_ok=True)
        print(f"Uploaded {local_model_path} to {hf_model_path}\n")



triplets.csv:   0%|          | 0.00/18.5M [00:00<?, ?B/s]

2024-12-08 21:06:13,544 - INFO - Model uploaded to UPC-HLE/fc-monolingual_eng_reranker-jina-v2-base-multilingual


Uploaded output/official/contrastive/monolingual/20241102-171126/monolingual_eng_reranker to UPC-HLE/fc-monolingual_eng_reranker-jina-v2-base-multilingual



model.safetensors:   0%|          | 0.00/557M [00:00<?, ?B/s]

2024-12-08 21:06:35,011 - INFO - Model uploaded to UPC-HLE/fc-monolingual_deu_reranker-jina-v2-base-multilingual


Uploaded output/official/contrastive/monolingual/20241102-171126/monolingual_deu_reranker to UPC-HLE/fc-monolingual_deu_reranker-jina-v2-base-multilingual



model.safetensors:   0%|          | 0.00/557M [00:00<?, ?B/s]

2024-12-08 21:06:52,330 - INFO - Model uploaded to UPC-HLE/fc-monolingual_msa_reranker-jina-v2-base-multilingual


Uploaded output/official/contrastive/monolingual/20241102-171126/monolingual_msa_reranker to UPC-HLE/fc-monolingual_msa_reranker-jina-v2-base-multilingual



model.safetensors:   0%|          | 0.00/557M [00:00<?, ?B/s]

2024-12-08 21:07:08,705 - INFO - Model uploaded to UPC-HLE/fc-monolingual_ara_reranker-jina-v2-base-multilingual


Uploaded output/official/contrastive/monolingual/20241102-171126/monolingual_ara_reranker to UPC-HLE/fc-monolingual_ara_reranker-jina-v2-base-multilingual



model.safetensors:   0%|          | 0.00/557M [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

triplets.csv:   0%|          | 0.00/24.2M [00:00<?, ?B/s]

2024-12-08 21:07:29,008 - INFO - Model uploaded to UPC-HLE/fc-monolingual_spa_reranker-jina-v2-base-multilingual


Uploaded output/official/contrastive/monolingual/20241102-171126/monolingual_spa_reranker to UPC-HLE/fc-monolingual_spa_reranker-jina-v2-base-multilingual



model.safetensors:   0%|          | 0.00/557M [00:00<?, ?B/s]

2024-12-08 21:07:45,152 - INFO - Model uploaded to UPC-HLE/fc-monolingual_fra_reranker-jina-v2-base-multilingual


Uploaded output/official/contrastive/monolingual/20241102-171126/monolingual_fra_reranker to UPC-HLE/fc-monolingual_fra_reranker-jina-v2-base-multilingual



model.safetensors:   0%|          | 0.00/557M [00:00<?, ?B/s]

2024-12-08 21:08:00,775 - INFO - Model uploaded to UPC-HLE/fc-monolingual_tha_reranker-jina-v2-base-multilingual


Uploaded output/official/contrastive/monolingual/20241102-171126/monolingual_tha_reranker to UPC-HLE/fc-monolingual_tha_reranker-jina-v2-base-multilingual



triplets.csv:   0%|          | 0.00/11.0M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/557M [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

2024-12-08 21:08:17,063 - INFO - Model uploaded to UPC-HLE/fc-monolingual_por_reranker-jina-v2-base-multilingual


Uploaded output/official/contrastive/monolingual/20241102-171126/monolingual_por_reranker to UPC-HLE/fc-monolingual_por_reranker-jina-v2-base-multilingual



# Test the model you choose

Select a model and compare its performance with the original model.

In [13]:
hf_model_path = "UPC-HLE/fc-monolingual_eng_reranker-jina-v2-base-multilingual"
local_model_path = "output/official/contrastive/monolingual/20241102-171126/monolingual_eng_reranker"

st0 = SentenceTransformer(local_model_path, trust_remote_code=True)
st1 = SentenceTransformer(hf_model_path, trust_remote_code=True)

sen1 = "Donald Trump is the president of the United States of America."
sen2 = "The president of the United States of America is DT."

sim1 = st0.similarity(st0.encode(sen1), st0.encode(sen2))
sim2 = st1.similarity(st1.encode(sen1), st1.encode(sen2))

print(sim1)
print(sim2)

assert sim1 == sim2, "The models are not equal"

2024-12-08 21:30:47,748 - INFO - Use pytorch device_name: cuda
2024-12-08 21:30:47,749 - INFO - Load pretrained SentenceTransformer: output/official/contrastive/monolingual/20241102-171126/monolingual_eng_reranker
Some weights of XLMRobertaModel were not initialized from the model checkpoint at output/official/contrastive/monolingual/20241102-171126/monolingual_eng_reranker and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
2024-12-08 21:30:48,965 - INFO - Use pytorch device_name: cuda
2024-12-08 21:30:48,966 - INFO - Load pretrained SentenceTransformer: UPC-HLE/fc-monolingual_eng_reranker-jina-v2-base-multilingual
Some weights of XLMRobertaModel were not initialized from the model checkpoint at UPC-HLE/fc-monolingual_eng_reranker-jina-v2-base-multilingual and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

tensor([[0.9377]])
tensor([[0.9377]])
