# Sharing pretrained models (PyTorch)

Install the Transformers, Datasets, and Evaluate libraries to run this notebook.

In [42]:
!pip install accelerate>=0.21.0

In [43]:
!pip install transformers[torch]



In [44]:
!pip install datasets evaluate transformers[sentencepiece]
!apt install git-lfs

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
git-lfs is already the newest version (3.0.2-1ubuntu0.2).
0 upgraded, 0 newly installed, 0 to remove and 45 not upgraded.


You will need to setup git, adapt your email and name in the following cell.

In [60]:
!git config --global user.email "arifmuhammadladuni4@gmail.com"
!git config --global user.name "ariipp"

You will also need to be logged in to the Hugging Face Hub. Execute the following and enter your credentials.

In [61]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [62]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [63]:
from transformers import TrainingArguments

training_args = TrainingArguments(
    "bert-finetuned-mrpc", save_strategy="epoch", push_to_hub=True
)

In [64]:
from transformers import AutoModelForMaskedLM, AutoTokenizer

checkpoint = "camembert-base"

model = AutoModelForMaskedLM.from_pretrained(checkpoint)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

Some weights of the model checkpoint at camembert-base were not used when initializing CamembertForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing CamembertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing CamembertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [65]:
model.push_to_hub("dummy-model")

CommitInfo(commit_url='https://huggingface.co/ariipp/dummy-model/commit/4f29609c001f38b5751f2b26ad856bc81632e740', commit_message='Upload CamembertForMaskedLM', commit_description='', oid='4f29609c001f38b5751f2b26ad856bc81632e740', pr_url=None, pr_revision=None, pr_num=None)

In [66]:
tokenizer.push_to_hub("dummy-model")

CommitInfo(commit_url='https://huggingface.co/ariipp/dummy-model/commit/409eca1499b507bfae9bffa984c7e7c06cd277e0', commit_message='Upload tokenizer', commit_description='', oid='409eca1499b507bfae9bffa984c7e7c06cd277e0', pr_url=None, pr_revision=None, pr_num=None)

In [68]:
tokenizer.push_to_hub("dummy-model", organization="ariipp")

CommitInfo(commit_url='https://huggingface.co/ariipp/dummy-model/commit/bc390add61f2e0881aa5d0b4cf9051c6a1765bbd', commit_message='Upload tokenizer', commit_description='', oid='bc390add61f2e0881aa5d0b4cf9051c6a1765bbd', pr_url=None, pr_revision=None, pr_num=None)

In [71]:
tokenizer.push_to_hub("dummy-model", organization="ariipp", use_auth_token="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")

CommitInfo(commit_url='https://huggingface.co/ariipp/dummy-model/commit/30e7e8d0674017f696d16094eec175497077e9b4', commit_message='Upload tokenizer', commit_description='', oid='30e7e8d0674017f696d16094eec175497077e9b4', pr_url=None, pr_revision=None, pr_num=None)

In [72]:
from huggingface_hub import (
    # User management
    login,
    logout,
    whoami,

    # Repository creation and management
    create_repo,
    delete_repo,
    update_repo_visibility,

    # And some methods to retrieve/change information about the content
    list_models,
    list_datasets,
    list_metrics,
    list_repo_files,
    upload_file,
    delete_file,
)

In [75]:
from huggingface_hub import create_repo

create_repo("dummy-model1")

RepoUrl('https://huggingface.co/ariipp/dummy-model1', endpoint='https://huggingface.co', repo_type='model', repo_id='ariipp/dummy-model1')

In [86]:
from huggingface_hub import create_repo

create_repo("dummy-model3")

RepoUrl('https://huggingface.co/ariipp/dummy-model3', endpoint='https://huggingface.co', repo_type='model', repo_id='ariipp/dummy-model3')

In [99]:
from huggingface_hub import upload_file

upload_file(
    "dummy-model3/config.json",
    path_in_repo="config.json",
    repo_id="ariipp/dummy-model3",
)

TypeError: HfApi.upload_file() takes 1 positional argument but 2 positional arguments (and 2 keyword-only arguments) were given

In [102]:
from huggingface_hub import Repository

repo = Repository("dummy-model", clone_from="ariipp/dummy-model")

Cloning https://huggingface.co/ariipp/dummy-model into local empty directory.


Download file model.safetensors:   0%|          | 1.45k/422M [00:00<?, ?B/s]

Download file sentencepiece.bpe.model:   0%|          | 1.45k/792k [00:00<?, ?B/s]

Clean file sentencepiece.bpe.model:   0%|          | 1.00k/792k [00:00<?, ?B/s]

Clean file model.safetensors:   0%|          | 1.00k/422M [00:00<?, ?B/s]

In [105]:
repo.git_pull()
repo.git_add()
#repo.git_commit()
repo.git_push()
#repo.git_tag()

Everything up-to-date




'https://huggingface.co/ariipp/dummy-model/commit/30e7e8d0674017f696d16094eec175497077e9b4'

In [106]:
repo.git_pull()

In [107]:
model.save_pretrained("<path_to_dummy_folder>")
tokenizer.save_pretrained("<path_to_dummy_folder>")

('<path_to_dummy_folder>/tokenizer_config.json',
 '<path_to_dummy_folder>/special_tokens_map.json',
 '<path_to_dummy_folder>/sentencepiece.bpe.model',
 '<path_to_dummy_folder>/added_tokens.json',
 '<path_to_dummy_folder>/tokenizer.json')

In [110]:
repo.git_add()
#repo.git_commit("Add model and tokenizer files")
repo.git_push()

Everything up-to-date




'https://huggingface.co/ariipp/dummy-model/commit/30e7e8d0674017f696d16094eec175497077e9b4'

In [109]:
from transformers import AutoModelForMaskedLM, AutoTokenizer

checkpoint = "camembert-base"

model = AutoModelForMaskedLM.from_pretrained(checkpoint)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

# Do whatever with the model, train it, fine-tune it...

model.save_pretrained("<path_to_dummy_folder>")
tokenizer.save_pretrained("<path_to_dummy_folder>")

Some weights of the model checkpoint at camembert-base were not used when initializing CamembertForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing CamembertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing CamembertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


('<path_to_dummy_folder>/tokenizer_config.json',
 '<path_to_dummy_folder>/special_tokens_map.json',
 '<path_to_dummy_folder>/sentencepiece.bpe.model',
 '<path_to_dummy_folder>/added_tokens.json',
 '<path_to_dummy_folder>/tokenizer.json')