# Sharing pretrained models (PyTorch)

Install the Transformers and Datasets libraries to run this notebook.

In [1]:
# Install datasets, transformers, and git large file sharing
!pip install datasets transformers[sentencepiece]
!apt install git-lfs

Collecting datasets
  Downloading datasets-1.11.0-py3-none-any.whl (264 kB)
[K     |████████████████████████████████| 264 kB 5.2 MB/s 
[?25hCollecting transformers[sentencepiece]
  Downloading transformers-4.9.2-py3-none-any.whl (2.6 MB)
[K     |████████████████████████████████| 2.6 MB 44.8 MB/s 
[?25hCollecting fsspec>=2021.05.0
  Downloading fsspec-2021.7.0-py3-none-any.whl (118 kB)
[K     |████████████████████████████████| 118 kB 44.0 MB/s 
Collecting huggingface-hub<0.1.0
  Downloading huggingface_hub-0.0.15-py3-none-any.whl (43 kB)
[K     |████████████████████████████████| 43 kB 1.1 MB/s 
Collecting xxhash
  Downloading xxhash-2.0.2-cp37-cp37m-manylinux2010_x86_64.whl (243 kB)
[K     |████████████████████████████████| 243 kB 40.3 MB/s 
Collecting sacremoses
  Downloading sacremoses-0.0.45-py3-none-any.whl (895 kB)
[K     |████████████████████████████████| 895 kB 37.8 MB/s 
[?25hCollecting huggingface-hub<0.1.0
  Downloading huggingface_hub-0.0.12-py3-none-any.whl (37 kB)


You will need to setup git, adapt your email and name in the following cell.

In [2]:
# Set config variable to login to Huggingface Hub
!git config --global user.email "miesner.jacob@gmail.com"
!git config --global user.name "miesnerjacob"

You will also need to be logged in to the Hugging Face Hub. Execute the following and enter your credentials.

In [3]:
# Login to Hugging Face Hub via CLI commands
!huggingface-cli login


        _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
        _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
        _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
        _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
        _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

        
Username: miesnerjacob
Password: 
Login successful
Your token: OMJmXAUFgxWGSheARSAjknhHnDgzlwgDTAwSHSHicAiFKUrNFimdXZNOAYxKYhLzBBQZEqfRQtiSpRafWIzSrdRVYPaHLkhyuFbZWrlzJBTcgXZrARBCdvHBqEYOarEy 

Your token has been saved to /root/.huggingface/token


In [9]:
# Load pretrained model
from transformers import AutoModelForMaskedLM, AutoTokenizer

checkpoint = "camembert-base"

model = AutoModelForMaskedLM.from_pretrained(checkpoint)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

In [5]:
# Push dummy model to the HF Hub
model.push_to_hub("dummy-model")

'https://huggingface.co/miesnerjacob/dummy-model/commit/227a7906b4941a7b943d916c166d72cd4d6b72e8'

In [6]:
# Push dummy tokenizer to the HF Hub
tokenizer.push_to_hub("dummy-model")

# # Push dummy model to the HF Hub with organization specified
# tokenizer.push_to_hub("dummy-model", organization="sightly")

# # Push dummy model to the HF Hub with organization and auth toke specified
# tokenizer.push_to_hub(
#     "dummy-model", organization="huggingface", use_auth_token="<TOKEN>"
# )

'https://huggingface.co/miesnerjacob/dummy-model/commit/e7167e3ca413048adf230b8001b365fe4d382a24'

In [11]:
# Saving model on local machine
from transformers import AutoModelForMaskedLM, AutoTokenizer

checkpoint = "camembert-base"

model = AutoModelForMaskedLM.from_pretrained(checkpoint)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

# Do whatever with the model, train it, fine-tune it...

model.save_pretrained("<path_to_dummy_folder>")
tokenizer.save_pretrained("<path_to_dummy_folder>")

('<path_to_dummy_folder>/tokenizer_config.json',
 '<path_to_dummy_folder>/special_tokens_map.json',
 '<path_to_dummy_folder>/sentencepiece.bpe.model',
 '<path_to_dummy_folder>/added_tokens.json',
 '<path_to_dummy_folder>/tokenizer.json')