# Preliminaries

In [1]:
import torch
import numpy as np
import time 
from gpu1 import test0, get_device, test1

# For large arrays, GPU multiplication is faster than CPU multiplication

Print out CPU, Pytorch, and GPU information.

In [2]:
test0()

--- CPU Information ---
Processor: arm
Physical cores: 8
Total cores: 8
Current CPU frequency: 4056.00 MHz

--- PyTorch and GPU Information ---
PyTorch version: 2.8.0
It is False that CUDA is available
It is True that MPS is available
It is True that MPS is built


Select the device to use for tensor computations. \
Try CUDA/NVIDIA, then MPS, then CPU.

In [3]:
_ = get_device(verbose=True)

Using MPS (Apple Silicon GPU)


In [4]:
n_list = [1, 10, 20, 50, 70, 100, 200, 500, 700, 1000, 2000, 5000, 7000, 8000]

In [5]:
for n in n_list:

    (device, n, duration1, duration2, diff, same) = test1(n, verbose=False)
    
    print((f'{device.type}:{device.index}',
           f'{n:5d}',
           f'{1e3 * duration1:8.3f} ms',
           f'{1e3 * duration2:8.3f} ms',
           f'{duration2/duration1:7.1f}x',
           f'{diff:0.1e}',
           same))

('mps:0', '    1', '   1.915 ms', '   0.017 ms', '    0.0x', '0.0e+00', True)
('mps:0', '   10', '   1.985 ms', '   0.038 ms', '    0.0x', '9.5e-07', True)
('mps:0', '   20', '   1.577 ms', '   0.008 ms', '    0.0x', '0.0e+00', True)
('mps:0', '   50', '   1.506 ms', '   0.009 ms', '    0.0x', '0.0e+00', True)
('mps:0', '   70', '   1.431 ms', '   0.090 ms', '    0.1x', '0.0e+00', True)
('mps:0', '  100', '   3.705 ms', '   0.081 ms', '    0.0x', '0.0e+00', True)
('mps:0', '  200', '   3.124 ms', '   0.209 ms', '    0.1x', '0.0e+00', True)
('mps:0', '  500', '   3.152 ms', '   1.291 ms', '    0.4x', '9.9e-05', False)
('mps:0', '  700', '   2.427 ms', '   2.793 ms', '    1.2x', '1.4e-04', False)
('mps:0', ' 1000', '   3.290 ms', '  10.903 ms', '    3.3x', '2.2e-04', False)
('mps:0', ' 2000', '   3.897 ms', '  91.919 ms', '   23.6x', '4.6e-04', False)
('mps:0', ' 5000', '   4.119 ms', ' 543.172 ms', '  131.9x', '1.3e-03', False)
('mps:0', ' 7000', '   4.003 ms', '1426.832 ms', '  356.4x'

# Pytorch using MPS (Apple Silicon GPU) is limited to single precision

## Set the default torch to be single precision

In [6]:
torch.set_default_dtype(torch.float32)

In [7]:
(torch.tensor([1.2, 3]).dtype,
 torch.tensor(np.array([1.2, 3])).dtype,
 torch.tensor(np.array([1.2, 3]).astype(np.float32)).dtype)

(torch.float32, torch.float64, torch.float32)

## Set the default torch type to be double-precision

In [8]:
torch.set_default_dtype(torch.float64)

In [9]:
(torch.tensor([1.2, 3]).dtype,
 torch.tensor(np.array([1.2, 3])).dtype,
 torch.tensor(np.array([1.2, 3]).astype(np.float32)).dtype)

(torch.float64, torch.float64, torch.float32)

## Puzzling behavior

With the precision now set to double, `torch` appears to happily multiply double-precision arrays.

In [10]:
a = torch.tensor(np.array([1.2, 3]))
b = torch.tensor(np.array([2.4, 5]))
c = torch.matmul(a, b)

(a.dtype, b.dtype, c.dtype)

(torch.float64, torch.float64, torch.float64)

But if we try to call `torch.randn` and specify double precision, we get an error.

In [11]:
x = torch.randn(size=(n, n), dtype=torch.float64, device=device)

TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

Use numpy to create two large double-precision arrays filled with random numbers.

In [12]:
a_np = np.random.randn(100, 100).astype(np.float64)
b_np = np.random.randn(100, 100).astype(np.float64)
(a_np.dtype, b_np.dtype)

(dtype('float64'), dtype('float64'))

Convert these arrays to torch tensors using the `torch.tensor` function. \
Multiply them using `torch.matmul`. \
This works.

In [13]:
a_torch = torch.tensor(a_np)
b_torch = torch.tensor(b_np)
c_torch = torch.matmul(a_torch, b_torch)
(a_torch.dtype, b_torch.dtype, c_torch.dtype)

(torch.float64, torch.float64, torch.float64)

If we try to convert these array stores torch tensors using the `torch.from_numpy` then we get an error.

In [14]:
a_torch = torch.from_numpy(a_np).to(device)

TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

What the heck?

# See also

I have written stand-alone python programs to speed-test computations on the GPU using `torch`, on the GPU using `jax`, and on the CPU with parallel processing via `concurrent.futures`.

    gpu1.py
    jax1.py
    multi1.py
    multi2.py

::: {.content-hidden when-format="html"}

# Formatting notes

The header at the top of this file is for creating a nicely-formatted `.html` document using the program `quarto` ([link](https://quarto.org/)).  To create nicely-formated `.html`versions of this notebook, run `quarto` from the command line as follows

    quarto render dissipation-theory--Study-66.ipynb && open dissipation-theory--Study-66.html
    
:::