## Pytorch Fundamentals
- Tensor: Core data structure
    - Multidimensional array
    - Has shape and a data type
    - Can live one the GPU
    - Supports auto differentiation

- Inputs and outputs are tensors

In [1]:
import torch

In [None]:
# Creating a tensor
X = torch.tensor([[1.0,4.0,7.0],[2.0,3.0,6.0]])
print(X)

# Properties
print("Size: ", X.shape)
print("Shape: ", X.dtype)

tensor([[1., 4., 7.],
        [2., 3., 6.]])
Size:  torch.Size([2, 3])
Shape:  torch.float32


In [8]:
# Indexing
print(X[0,1])
print(X[:,1])

tensor(4.)
tensor([4., 3.])


In [15]:
''' 
APIs for computations
'''
# Itemwise addition and multiplication
print("Arithmatic: \n", 10 * (X + 1.0))
print("Itemwise Exponential: \n", X.exp())
print("Mean: \n", X.mean())
print("Max dim per column: \n", X.max(dim=0))
print("Maxrix Transpose + Multiplication\n",
      X @ X.T)


Arithmatic: 
 tensor([[20., 50., 80.],
        [30., 40., 70.]])
Itemwise Exponential: 
 tensor([[   2.7183,   54.5982, 1096.6332],
        [   7.3891,   20.0855,  403.4288]])
Mean: 
 tensor(3.8333)
Max dim per column: 
 torch.return_types.max(
values=tensor([2., 4., 7.]),
indices=tensor([1, 0, 0]))
Maxrix Transpose + Multiplication
 tensor([[66., 56.],
        [56., 49.]])


In [None]:
''' 
Converting tensor to np array and vice versa

Notice:
    Default precision is 32bit in pytorch,
    64 bit in numpy
    Deep learning likes 32 bits, half ram, speeds up
    computation, and that neural nets do not need such precision
'''
import numpy as np
print(X.numpy(), X.numpy().dtype)

torch.tensor(np.array([[1., 4., 7.], [2., 3., 6.]]))

[[1. 4. 7.]
 [2. 3. 6.]] float32


tensor([[1., 4., 7.],
        [2., 3., 6.]], dtype=torch.float64)

In [23]:
''' 
Do you have GPU acceleartion?
'''
if torch.cuda.is_available():
    device = "cuda"
elif torch.backends.mps.is_available():
    device = "mps"
else:
    device = "cpu"

print(device)

mps


In [26]:
''' 
Set up GPU runtime
'''
M = torch.tensor([[1., 2., 3.], [4., 5., 6.]])
M = M.to(device)

M.device

# OR do this:
M = torch.tensor([[1., 2., 3.], [4., 5., 6.]], device=device)

In [27]:
''' 
After selecting Runtime, can do operations that now takes place of GPU
'''
R = M @ M.T 
R

tensor([[14., 32.],
        [32., 77.]], device='mps:0')

In [28]:
''' 
What is the actual speed difference?
'''
M = torch.rand((1000, 1000))  # On Cpu

%timeit M @ M.T

M = torch.rand((1000, 1000), device=device)
%timeit M @ M.T

984 μs ± 9.8 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
944 μs ± 2.54 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


### Autograd
Pytorch comes with reverse-mode auto-differentiation (automatic gradients)
We know derivative of x^2 is 2x
Lets evaulate it at f(5) and f'(5), we get 25 and 10 respectively

In [40]:
''' 
1. Create tensor x = 5. requries_grad=True means that it is a
    variable and not a constant
    Pytorhc will automatically keep track of all operations involving x

2. We compute f = x**2, resulting tensor is 25.0, square of 5.0
    f also carries a grad_fn attribute, represents operation that
        created this tensor

3. Then call f.backward() which backpropagates the gradients
    through the computation graph, starting at f and all the way back to
    leaf nodes (just x in this case)


4. Lastly, read x tensor's grad attribute, computed during backprop
    Giving us derivative of f in reagards to x
'''

x = torch.tensor(5.0, requires_grad=True)
f = x ** 2
print(f)

f.backward()
x.grad

tensor(25., grad_fn=<PowBackward0>)


tensor(10.)

In [41]:
''' 
After computing gradients, you generally would want to perform a
    gradient descent step by subrtacting a fraction of the gradients
    from the model variables when training a neural net.

In the example, running gradient descent will gradually push x to 0.

To run grad descent, want to temporarily disable gradient tracking 
    as you don't want to track the descent step itself in computation graph.

Can be done by doing torch.no_grad
'''

learning_rate = 0.1
with torch.no_grad():
    x -= learning_rate * x.grad
x

tensor(4., requires_grad=True)

In [None]:
# or use variable's detach() method, which creates a new tensor.
# points to the same data in memory, but detached from computation graph
# Useful if you want fine-grained control over which operations
# should contribute to gradient coputation
'However using no_grad() is generally preferred when performing'
'inference or descent step'
x_detached = x.detach()
x_detached -= learning_rate * x.grad
print(x_detached)



tensor(3.)


In [44]:
''' 
Finally, before repeating the whole process 
(forward & back pass, and gradient descent), essential to
zero out gradients of very model parameter
'''
x.grad.zero_()

tensor(0.)

In [45]:
''' 
Every together, whole training loop looks like:
'''
learning_rate = 0.1
x = torch.tensor(5.0, requires_grad=True)
for iteration in range(100):
    f = x ** 2  # forward pass
    f.backward()  # backward pass
    with torch.no_grad():
        x -= learning_rate * x.grad  # gradient descent step

    x.grad.zero_()  # reset the gradients

**IMPORTANT**:
In-place operations don't always play nicely with autograd

## Implementing Linear Regression

In [47]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

housing = fetch_california_housing()

X_train_full, X_test, y_train_full, y_test = train_test_split(
    housing.data, housing.target, random_state=42)

X_train, X_valid, y_train, y_valid = train_test_split(
    X_train_full, y_train_full, random_state=42)


# Convert to tensor and normalize it
X_train = torch.FloatTensor(X_train)
X_valid = torch.FloatTensor(X_valid)
X_test = torch.FloatTensor(X_test)

means = X_train.mean(dim=0, keepdims=True)
stds = X_train.std(dim=0, keepdims=True)

X_train = (X_train - means) / stds
X_valid = (X_valid - means) / stds 
X_test = (X_test - means) / stds

In [None]:
# Convert targets to tensors too
# Since predictions are column vectors, we need to ensure target are
# also column vectors
# However, numpy arrays represents the target are one-dimensions,
# need to rehape the tensors to column vecors by adding second dim size 1

y_train = torch.FloatTensor(y_train).reshape(-1,1)
y_valid = torch.FloatTensor(y_valid).reshape(-1, 1)
y_test = torch.FloatTensor(y_test).reshape(-1, 1)