### There are three ways to go about creating new model repositories:
- Using the 'push_to_hub' API
- Using the 'huggingface_hub' python library
- Using the web interface

In [1]:
# The simplest way to upload files to the hub is by leveraging the 'push_to_hub' API
# Notebook login
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [1]:
# Terminal login
!huggingface-cli login --token ""

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


### The easiest way to upload to the Hub is to set 'push_to_hub = True' when you define your 'TrainingArguments'

In [2]:
from transformers import TrainingArguments

training_args = TrainingArguments(
    "bert-finetuned-mrpc", save_strategy = 'epoch', push_to_hub = True
)

- When you call 'trainer.train()' the 'Trainer' will then upload your model to the Hub each time it is saved (here every epoch)
- That repository will be named like the ouput directory you picked (here bert-finetuned-mrpc) but you can choose a different name with 'hub_model_id = "a_different_name"'
- To upload your model to an organization you are a member of, you pass it with 'hub_model_id = "my_organization/my_repo_name"'

- Once your training is finished, you should do a final 'trainer.pysh_to_hub()' to upload the last version of your model.
    - It will also generate a model card with all the relevant metadata, reporting the hyperparameters used and the evaluation results

In [4]:
# import image module
from IPython.display import Image
  
# get the image
Image(url="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter4/model_card.png", width=600, height=600)

- At a lower level the model Hub can be done directly on models, tokenizers and configuration objects via their 'push_to_hub()' method.
    - This method takes care of both the repository creation and pushing the model and the tokenizers files directly to the repository
    - No manual handeling is required, unlike the API we'll see below.

In [5]:
from transformers import AutoModelForMaskedLM, AutoTokenizer

checkpoint = 'camembert-base'

model = AutoModelForMaskedLM.from_pretrained(checkpoint)
tokenizer  =AutoTokenizer.from_pretrained(checkpoint)

#### Your free to do whatever you want with these *add tokens to the tokenizer, train the model, fine-tune it*
- Once your happy with the resulting model, weights, and tokenizer, you can leverage the 'push_to_hub()' method directly available on the model object

In [4]:
model.push_to_hub('dummy-model')

pytorch_model.bin:   0%|          | 0.00/443M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/MindNetML/dummy-model/commit/e1e47f493e3f4836f8165a235486c06a23375eb0', commit_message='Upload CamembertForMaskedLM', commit_description='', oid='e1e47f493e3f4836f8165a235486c06a23375eb0', pr_url=None, pr_revision=None, pr_num=None)

- The above will create a new repository 'dummy-model' in you profile, and populate it with your model files.
- Do the same with the tokenizer, so that all the fils are now available in this repostory
- If you belong to an organization, simply specify the organizatoin argument to upload to the organization's namespace:

In [5]:
# Push to hub while naming organization you belong to
tokenizer.push_to_hub('dummy-model')

# If you wish to specify a specific HuggingFace token
# tokenizer.push_to_hub("dummy-model", organization = "huggingface", use_auth_token = "<TOKEN>")

# Don't run or you'll get ttwo more dummy-models pushed to Hub

sentencepiece.bpe.model:   0%|          | 0.00/811k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/MindNetML/dummy-model/commit/5950520263d482ee6a982cc643673b59fc4e13eb', commit_message='Upload tokenizer', commit_description='', oid='5950520263d482ee6a982cc643673b59fc4e13eb', pr_url=None, pr_revision=None, pr_num=None)

In [6]:
# Exercise

# name of model we'll use
checkpoint = 'bert-base-cased'

# download model and tokenizer for the checkpoint
model = AutoModelForMaskedLM.from_pretrained(checkpoint)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'bert.pooler.dense.bias', 'cls.seq_relationship.weight', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [7]:
model.push_to_hub('bertDummy-model')
tokenizer.push_to_hub('bertDummy-model')

pytorch_model.bin:   0%|          | 0.00/433M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/MindNetML/bertDummy-model/commit/df51935a418bf04496d07dc9e69b37100bf0d64f', commit_message='Upload tokenizer', commit_description='', oid='df51935a418bf04496d07dc9e69b37100bf0d64f', pr_url=None, pr_revision=None, pr_num=None)

### Using the huggingface_hub Python library
- The 'huggingface_hub' python library offers tools for the model and dataset hubs.
- it provides methods and classes for common task like getting information about repotories on the hub and managing them
- It provides simple API's that work on top of git to manage those repositories content and to integrate the Hub in your projects and libraries
- The 'push_to_hub' API will require you to have your API token saved in your cache.
    - login using the CLI (hugging-cli login --token 'token')

- The hugginface_hub offers methods and classes which are useful for our purpose
- Firstly there are a few mthods to manage repoitory creation, deletion, and other:

```python
from huggingface_hub import (
    # User management
    login,
    logout,
    whoami,

    # Repository creation and management
    create_repo,
    delete_repo,
    update_repo_visibility,

    # And some methods to retrieve/change information about the content
    list_models,
    list_datasets,
    list_metrics,
    list_repo_files,
    upload_file,
    delete_file,
)
```

- Additionally, it offers the very powerful 'Repository' class to manage a local repostory. We will explore these methods and that class in the next few section to understand how to leverage them.

In [11]:
# The 'create_repo' method creats a new repository in the hub
from huggingface_hub import create_repo

create_repo("dummy-model2")

RepoUrl('https://huggingface.co/MindNetML/dummy-model2', endpoint='https://huggingface.co', repo_type='model', repo_id='MindNetML/dummy-model2')

- If you choose organization, the model will be featured on the organization's page and every memeber of the organizaition will have the ability to contribute to the rposiitory

- Next, enter your models name, this will also be the name of the repository, finally, 
- You can specify whether you want your model to be public or private. 
- private models are hidden from public view

- The newly created repo wil be blank
- This is where your model will be hoseted, to start populating it, you can add a README file directly fromt the web interface.

### The upload_file approach
- using 'upload_file' does not require 'git' and 'git-lfs' to be installed on your system.
- It pushes files directly to the 🤗 Hub using HTTP POST requests.
- A limitation of this approach is that it doesn't handle files that are larger than 5GB in size
- If your files are larger than 5GB, please follow the two other methods detailed below

In [14]:
# The API may be used as follows
# not working. Unsure why, to lazy to fix
# skip it, not important
from huggingface_hub import upload_file

upload_file(
    "<path_to_file>/config.json",
    path_in_repo = "config.json",
    repo_id = "MindNetML/dummy-model2",
)

TypeError: upload_file() takes 1 positional argument but 2 positional arguments (and 2 keyword-only arguments) were given

- The above will upload the file 'config.json' available at <path_to_file> to the root of the repository as config.json, to the dummy-model repostory.
- Other arguments which may be useful are
    - 'token', if you would like to override the token stored in your cache by a given token
    - 'repo_type', if you would like to upload to a dataset or a space instead of a model. Accepted values are "dataset" and "space"

### The Repository Class

- The Repository class manages a local repository in a git-like manner.
- It abstracts most of the pain points one may have with git to provide all features that we require.
- Using this class requires having git and git-lfs installed, so make sure you have git-lfs installed (see here for installation instructions) and set up before you begin

- In order to start playing around with the repository we have just created, we can start by initlising it into a local folder by cloning the remote repository.

In [2]:
from huggingface_hub import Repository

repo = Repository("<path_to_dummy_folder>", clone_from = "MindNetML/dummy-model2")

Cloning https://huggingface.co/MindNetML/dummy-model2 into local empty directory.


In [3]:
# we make sure that our local clone is up to dat by pulling the latest change
repo.git_pull()

In [7]:
# We save the model and tokenizer files
model.save_pretrained("<path_to_dummy_folder>")
tokenizer.save_pretrained("<path_to_dummy_folder>")

('<path_to_dummy_folder>/tokenizer_config.json',
 '<path_to_dummy_folder>/special_tokens_map.json',
 '<path_to_dummy_folder>/vocab.txt',
 '<path_to_dummy_folder>/added_tokens.json',
 '<path_to_dummy_folder>/tokenizer.json')

In [8]:
# Now the <path_to_dummy_folder> contains all the model and tokenizer files. 

# we'll follow the usual git workflow by adding files to the staging area, commiting them and pushing them to the hub:
repo.git_add()
repo.git_commit("Add model and tokenizer files")
repo.git_push()

Upload file pytorch_model.bin:   0%|          | 1.00/413M [00:00<?, ?B/s]

To https://huggingface.co/MindNetML/dummy-model2
   f13c2fe..e8b8ebd  main -> main



'https://huggingface.co/MindNetML/dummy-model2/commit/e8b8ebdac3a0cc8576e07a866ebe84c542ad36bb'