# Model sharing

## What does it mean to share a model?

Model sharing can mean multiple things:

```mermaid
mindmap
    root((Shared model))
        Model weights
            Specifies model's parameters
        Model code
            Specifies how the model can be run
        Model card
            Describes the model to users
        Training code
            Specifies how the model can be trained
        Training data
            The data used to train the model
```

Let's look at these in detail:

### Model weights

Model weights is the simplest to understand: it is just the parameter weights of the model.

These weights are often [serialized versions](https://en.wikipedia.org/wiki/Serialization) of the numbers stored in training framework's memory. While doing training these weights are often stored by pickling Python objects ([PyTorch's serialization routines](https://docs.pytorch.org/docs/stable/notes/serialization.html) work this way), but because Python's [pickle](https://docs.python.org/3/library/pickle.html)-module is not secure, they are rarely stored in these formats when shared.

Because we do not want to execute random code provided to us by strangers over the internet, multiple different formats have been designed to fix this problem.

[ONNX](https://onnx.ai/) is a open standard for sharing machine learning models and [PyTorch supports is natively](https://docs.pytorch.org/tutorials/beginner/onnx/export_simple_model_to_onnx_tutorial.html). It is widely used, especially in industry, where trained models do inference on various hardware devices.  

ONXX converts the whole computation graph of the model into operations that can then be stored in the serialization format. Same is done for the parameters.

Another popular format, especially among scientists and ML designers, is [safetensors](https://huggingface.co/docs/safetensors). This format was created by Hugging Face and it is used in many repos in Hugging Face Hub. Safetensors focuses on serializing the weights, so getting a working model from safetensors file requires access to the module structure where there parameters will be placed.

### Model code

Like mentioned in the previous section, sometimes getting access to the model specification is needed to construct the model.

Sometimes the code is given as code in e.g. Github repositories, but sometimes the configuration is given as specification.

For example, Hugging Face uses a concept called [AutoModel](https://huggingface.co/docs/transformers/main/en/model_doc/auto#transformers.AutoModel) that picks a model from an pre-existing list of model specifications. These models are then initialized based on [AutoConfig](https://huggingface.co/docs/transformers/main/en/model_doc/auto#transformers.AutoConfig) given in model's repository. So when you're calling `AutoModelForCausalLM.from_pretrained`, the following happens:

```mermaid
flowchart TD
    R[User asks for model] --> F["AutoModelForCausalLM.from_pretrained(...)"];
    F --> H[Hugging Face Hub checks the repository];
    H --> C[Repository contains an AutoConfig];
    C --> M[AutoModel is initialized with layer configuration from AutoConfig];
    M --> R
```

A good example of this is [gpt-oss-120b's model configuration](https://huggingface.co/openai/gpt-oss-120b/blob/main/config.json).
It utilizes [GptOssForCausalLM](https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt_oss/modeling_gpt_oss.py#L637), which in turn is a subclass is [GptOssPreTrainedModel](https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt_oss/modeling_gpt_oss.py#L422), which in turn is a subclass of [PreTrainedModel](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L1644).

The complex model structure can be represented by a simple json-file for all GPT-like models.

### Model card

The idea of a machine learning model card was introduced in 2018 by [Mitchell et al.](https://arxiv.org/abs/1810.03993). The basic idea of a model card is that it should contain information on the intented use cases of the model and possible limitations of the model.

Since then, the concept has been used by all of the major players in the field: [Hugging Face](https://huggingface.co/docs/hub/model-cards), [OpenAI](https://platform.openai.com/docs/models/system-cards/), [Google](https://modelcards.withgoogle.com/), [Meta](https://www.llama.com/docs/model-cards-and-prompt-formats/) to name a few.

It is good to remember that writing a model card is an important piece of sharing a model.

### Training code

Compared to the others mentioned before, training code is not shared as often. Especially in the field of huge models like foundation large language models training can be so expensive, that keeping the training code hidden can provide a major competitive advantage to AI developers.

In purely scientific fields sharing the training code is much more common and GitHub is the most common way of sharing code.

### Training data

Sharing training data is another complicated topic. Similar to training codes, possessing more and better quality training data will provide companies with competitive advantages and thus many of the training datasets are not shared. Questions of licensing and data ownership also limit the possibility of sharing the training data.

[Hugging Face datasets](https://huggingface.co/datasets) provides lots of datasets that are commonly used for various tasks.

## Where models are shared

Models are nowadays shared through various sites, but [Hugging Face Hub](https://huggingface.co/models) is one of the most popular places for model sharing.

[Zenodo](https://zenodo.org/) and other similar publicly funded storage solutions also contain datasets, but they often lose in ease of use to Hugging Face.