### Model Compression

* Pruning:
  * Remove connections that do not improve the model
* Knowledge Distillation
  * Train a smaller model using the original model (Transfer Knowledge)
* Quantization
  * We can quantize the weights (w) or the activations (a), or the both
  * Representing values in lower precision

### Data Types and Sizes

In [2]:
import torch

#### Integers

In [3]:
# Information of '8-bit unsigned integer'

torch.iinfo(torch.uint8)

iinfo(min=0, max=255, dtype=uint8)

In [4]:
# 8-bit signed integer

torch.iinfo(torch.int8)

iinfo(min=-128, max=127, dtype=int8)

In [7]:
# 32 bit signed

torch.iinfo(torch.int32)

iinfo(min=-2.14748e+09, max=2.14748e+09, dtype=int32)

#### Floating Points

In [8]:
value = 1/3

In [9]:
value

0.3333333333333333

In [11]:
format(value, '0.60f')

'0.333333333333333314829616256247390992939472198486328125000000'

In [16]:
# 64-bit floating point

tensor_fp64 = torch.tensor(value, dtype=torch.float64)

In [20]:
format(tensor_fp64.item(), '0.60f')

'0.333333333333333314829616256247390992939472198486328125000000'

In [21]:
# Brain float point 16

tensor_bf16 = torch.tensor(value, dtype = torch.bfloat16)

In [24]:
torch.finfo(torch.bfloat16)

finfo(resolution=0.01, min=-3.38953e+38, max=3.38953e+38, eps=0.0078125, smallest_normal=1.17549e-38, tiny=1.17549e-38, dtype=bfloat16)

In [25]:
torch.finfo(torch.float32)

finfo(resolution=1e-06, min=-3.40282e+38, max=3.40282e+38, eps=1.19209e-07, smallest_normal=1.17549e-38, tiny=1.17549e-38, dtype=float32)

### Downcasting

In [39]:
tensor_fp32 = torch.rand(1000, dtype=torch.float32)

In [40]:
tensor_fp32.shape

torch.Size([1000])

In [42]:
tensor_fp32[:5]

tensor([0.9580, 0.7454, 0.7565, 0.3951, 0.8871])

In [41]:
tensor_fp32_to_bf16 = tensor_fp32.to(dtype = torch.bfloat16)

In [43]:
tensor_fp32_to_bf16[:5]

tensor([0.9570, 0.7461, 0.7578, 0.3945, 0.8867], dtype=torch.bfloat16)

In [44]:
# tensor_fp32 x tensor_fp32
m_float32 = torch.dot(tensor_fp32, tensor_fp32)

In [45]:
m_float32

tensor(337.0033)

In [46]:
m_bfloat16 = torch.dot(tensor_fp32_to_bf16, tensor_fp32_to_bf16)

In [47]:
m_bfloat16

tensor(338., dtype=torch.bfloat16)