In [None]:
!pip install safetensors huggingface_hub

In [4]:
import os
import datetime
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
import torch

In [5]:
sf_filename = hf_hub_download("Hemanth-thunder/english-tamil-mt", filename="model.safetensors")

Downloading model.safetensors:   0%|          | 0.00/1.94G [00:00<?, ?B/s]

In [7]:
%%time
#CPU
weights = load_file(sf_filename, device="cpu")

CPU times: user 43.2 ms, sys: 1.51 ms, total: 44.7 ms
Wall time: 46.8 ms


In [None]:
#GPU
# import os
# os.environ["SAFETENSORS_FAST_GPU"] = "1"
# weights = load_file(sf_filename, device="cuda:0")

**What are shared tensors ?**

Pytorch uses shared tensors for some computation. This is extremely interesting to reduce memory usage in general.

  One very classic use case is in transformers the embeddings are shared with lm_head. By using the same matrix, the model uses less parameters, and gradients flow much better to the embeddings (which is the start of the model, so they don’t flow easily there, whereas lm_head is at the tail of the model, so gradients are extremely good over there, since they are the same tensors, they both benefit)
  
  [**Reference**](https://huggingface.co/docs/safetensors/torch_shared_tensors)

**Why use safetensors?**


There are several reasons for using safetensors:

Safety is the number one reason for using safetensors. As open-source and model distribution grows, it is important to be able to trust the model weights you downloaded don’t contain any malicious code. The current size of the header in safetensors prevents parsing extremely large JSON files.

**Loading speed between switching models is another reason to use safetensors, which performs zero-copy of the tensors. It is especially fast compared to pickle if you’re loading the weights to CPU (the default case), and just as fast if not faster when directly loading the weights to GPU. You’ll only notice the performance difference if the model is already loaded, and not if you’re downloading the weights or loading the model for the first time.**

In [8]:
from torch import nn

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.a = nn.Linear(100, 100)
        self.b = self.a

    def forward(self, x):
        return self.b(self.a(x))


model = Model()
print(model.state_dict())
# odict_keys(['a.weight', 'a.bias', 'b.weight', 'b.bias'])
torch.save(model.state_dict(), "model.bin")

OrderedDict([('a.weight', tensor([[-0.0862,  0.0058,  0.0389,  ...,  0.0844, -0.0018,  0.0467],
        [ 0.0200,  0.0528,  0.0032,  ...,  0.0813,  0.0509, -0.0989],
        [-0.0105, -0.0430, -0.0419,  ..., -0.0306,  0.0349, -0.0247],
        ...,
        [-0.0941, -0.0816, -0.0313,  ...,  0.0551, -0.0587, -0.0344],
        [ 0.0577,  0.0479, -0.0511,  ..., -0.0925,  0.0016,  0.0836],
        [ 0.0156, -0.0355, -0.0717,  ..., -0.0229,  0.0640, -0.0159]])), ('a.bias', tensor([-0.0718, -0.0711,  0.0794, -0.0062, -0.0094, -0.0452,  0.0605,  0.0963,
         0.0775,  0.0407, -0.0050,  0.0241, -0.0531,  0.0522,  0.0532,  0.0058,
        -0.0876, -0.0511,  0.0277, -0.0252,  0.0430,  0.0438, -0.0970,  0.0951,
        -0.0373,  0.0141, -0.0117, -0.0667,  0.0976,  0.0041,  0.0248,  0.0287,
         0.0127,  0.0681,  0.0041,  0.0093,  0.0413,  0.0350, -0.0622, -0.0340,
        -0.0622, -0.0421, -0.0385,  0.0329, -0.0830,  0.0395, -0.0651, -0.0237,
         0.0333,  0.0803, -0.0546, -0.0240, -0.