## Introduction to PyTorch

### PyTorch Tensors

In [1]:
import torch
import math # constant status

### Creating Tensors

In [5]:
x = torch.tensor([[5.5, 3],[1, 2]])
print(type(x))
print(x)

x = torch.empty(3, 4) # 只是分配了内存，但是没有初始化，所以值是隨機的
print(type(x))
print(x)

<class 'torch.Tensor'>
tensor([[5.5000, 3.0000],
        [1.0000, 2.0000]])
<class 'torch.Tensor'>
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])


Let's unpack what we just did:
* We created a tensor using one of the numerous factory methods attached to the torch module.
* The tensor itself is 2-dimensional, having 3 rows and 4 columns.
* The type of the object returned is `torch. Tensor`, which is an alias for `torch.FloatTensor` ; by default, PyTorch tensors are populated with 32-bit floating point numbers. (More on data types below.)
* You will probably see some random-looking values when printing your tensor. The `torch. empty()` call allocates memory for the tensor, but does not initialize it with any values - so what you're seeing is whatever was in memory at the time of allocation.

A brief note about tensors and their number of dimensions, and terminology:
* You will sometimes see a 1-dimensional tensor called a vector.
* Likewise, a 2-dimensional tensor is often referred to as a matrix.
* Anything with more than two dimensions is generally just called a tensor.
More often than not, you'll want to initialize your tensor with some value. Common cases are all zeros, all ones, or random values, and the torch module provides factory methods for all of these:

In [6]:
zeros = torch.zeros(2, 3)
print(zeros)

ones = torch.ones(2, 3)
print(ones)

torch.manual_seed(1729)
random = torch.rand(2, 3)
print(random)


tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([[1., 1., 1.],
        [1., 1., 1.]])
tensor([[0.3126, 0.3791, 0.3087],
        [0.0736, 0.4216, 0.0691]])


The factory methods all do just what you'd expect - we have a tensor full of zeros, another full of ones, and another with random values between 0 and 1.  



**Random Tensors and Seeding**  
Speaking of the random tensor, did you notice the call to `torch-manual seed()` immediately preceding it? Initializing tensors, such as a model's learning weights, with random values is common but there are times - especially in research settings - where you'll want some assurance of the reproducibility of your results. Manually setting your random number generator's seed is the way to do this. Let's look more closely:

In [21]:
torch.manual_seed(1729)
random1 = torch.rand(2, 3)
print(random1)

random2 = torch.randn(2, 3)
print(random2)

torch.manual_seed(1729)
random3 = torch.randn(2, 3)
print(random3)


random4 = torch.rand(2, 3)
print(random4)

tensor([[0.3126, 0.3791, 0.3087],
        [0.0736, 0.4216, 0.0691]])
tensor([[ 1.0757, -1.2086, -0.6922],
        [ 2.0419, -1.8508,  2.1626]])
tensor([[-1.1257, -0.0057, -1.3975],
        [ 1.4364, -0.1068, -0.8413]])
tensor([[0.6128, 0.1519, 0.0453],
        [0.5035, 0.9978, 0.3884]])


What you should see above is that `random1` and `random3` carry identical values, as do `random2` and `random4`. Manually setting the RNG's seed resets it, so that identical computations depending on random number should, in most settings, provide identical results.  
For more information, see the PyTorch documentation on reproducibility.



**Tensor Shapes**  
Often, when you're performing operations on two or more tensors, they will need to be of the same shape - that is, having the same number of dimensions and the same number of cells in each dimension. For that, we have the `torch.*_like()` methods:

In [22]:
x = torch.empty(2, 2, 3)
print(x.shape)
print(x)

empty_like_x = torch.empty_like(x)
print(empty_like_x.shape)
print(empty_like_x)

zeros_like_x = torch.zeros_like(x)
print(zeros_like_x.shape)
print(zeros_like_x)

ones_like_x = torch.ones_like(x)
print(ones_like_x.shape)
print(ones_like_x)

rand_like_x = torch.rand_like(x)
print(rand_like_x.shape)
print(rand_like_x)

torch.Size([2, 2, 3])
tensor([[[ 2.5041e-32,  9.3467e-43, -0.0000e+00],
         [ 1.6895e+00, -2.0000e+00,  1.6543e+00]],

        [[ 0.0000e+00,  1.3972e+00,  2.0000e+00],
         [ 1.7108e+00,  0.0000e+00,  1.3881e+00]]])
torch.Size([2, 2, 3])
tensor([[[ 2.5039e-32,  9.3467e-43,  1.0842e-19],
         [ 9.6553e-01,  1.0842e-19,  1.9247e+00]],

        [[-3.6893e+19,  1.9296e+00,  1.0842e-19],
         [ 1.4636e+00,  2.0000e+00,  1.8353e+00]]])
torch.Size([2, 2, 3])
tensor([[[0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.]]])
torch.Size([2, 2, 3])
tensor([[[1., 1., 1.],
         [1., 1., 1.]],

        [[1., 1., 1.],
         [1., 1., 1.]]])
torch.Size([2, 2, 3])
tensor([[[0.6929, 0.1703, 0.1384],
         [0.4759, 0.7481, 0.0361]],

        [[0.5062, 0.8469, 0.2588],
         [0.2707, 0.4115, 0.6839]]])


The first new thing in the code cell above is the use of the `.shape` property on a tensor. This property contains a list of the extent of each dimension of a tensor - in our case, `x` is a three-dimensional tensor with shape 2x2x3.  
Below that, we call the `.empty_like()`, `.zeros_like()`, `.ones_like()` , and `.rand_ like()` methods. Using the `.shape` property, we can verify that each of these methods returns a tensor of identical dimensionality and extent.  
The last way to create a tensor that will cover is to specify its data directly from a PyTorch collection:

In [23]:
some_constants = torch.tensor([[3.1415926, 2.71828],[1.61803, 0.0072897]])
print(some_constants)

some_integers = torch.tensor((2, 3, 5, 7, 11, 13, 17, 19))
print(some_integers)

more_integers = torch.tensor(([2, 4, 6], [3, 6, 9]))
print(more_integers)

tensor([[3.1416, 2.7183],
        [1.6180, 0.0073]])
tensor([ 2,  3,  5,  7, 11, 13, 17, 19])
tensor([[2, 4, 6],
        [3, 6, 9]])


Using `torch.tensor()` is the most straightforward way to create a tensor if you already have data in a Python tuple or list. As shown above, nesting the collections will result in a multi-dimensional tensor.  
Note: `torch.tensor()` creates a copy of the data.



**Tensor Data Types**  
Setting the datatype of a tensor is possible a couple of ways:

In [24]:
a = torch.ones((2, 3), dtype=torch.int16)
print(a)

b = torch.rand((2, 3), dtype=torch.float32) * 20
print(b)

c = b.to(torch.int64)
print(c)

tensor([[1, 1, 1],
        [1, 1, 1]], dtype=torch.int16)
tensor([[ 1.4051, 10.2103, 18.9010],
        [ 4.7172,  3.9587,  6.6549]])
tensor([[ 1, 10, 18],
        [ 4,  3,  6]])


The simplest way to set the underlying data type of a tensor is with an optional argument at creation time. In the first line of the cell above, we set `dtype=torch.intl6` for the tensor `a`. When we print `a`, we can see that it's full of `1` rather than `1.` - Python's subtle cue that this is an integer type rather than floating point.  
Another thing to notice about printing a is that, unlike when we left `dtype` as the default (32-bit floating point), printing the tensor also specifies its `dtype`.  
You may have also spotted that we went from specifying the tensor's shape as a series of integer arguments, to grouping those arguments in a tuple. This is not strictly necessary - PyTorch will take a series of initial, unlabeled integer arguments as a tensor shape - but when adding the optional arguments, it can make your intent more readable.  
The other way to set the datatype is with the `.to()` method. In the cell above, we create a random floating point tensor `b` in the usual way. Following that, we create `c` by converting `b` to a 32-bit integer with the `.to()` method. Note that `c` contains all the same values as `b`, but truncated to integers.  
Available data types include:  
* torch.bool
* torch.int8
* torch.uint8
* torch.int16
* torch.int32
* torch.int64
* torch.half
* torch.tloat
* torch.double
* torch.bfloat

**Math & Logic with PyTorch Tensors**  
Now that you know some of the ways to create a tensor... what can you do with them?

Let's look at basic arithmetic first, and how tensors interact with simple scalars:



In [25]:
ones = torch.zeros(2, 2) + 1
print(ones)

twos = torch.ones(2, 2) * 2
print(twos)

threes = (torch.ones(2, 2) * 7 - 1) / 2
print(threes)

fours = twos ** 2
print(fours)

sqrt2s = twos ** 0.5
print(sqrt2s)

tensor([[1., 1.],
        [1., 1.]])
tensor([[2., 2.],
        [2., 2.]])
tensor([[3., 3.],
        [3., 3.]])
tensor([[4., 4.],
        [4., 4.]])
tensor([[1.4142, 1.4142],
        [1.4142, 1.4142]])


As you can see above, arithmetic operations between tensors and scalars, such as addition, subtraction, multiplication, division, and exponentiation are distributed over every element of the tensor. Because the output of such an operation will be a tensor, you can chain them together with the usual operator precedence rules, as in the line where we create `threes`.  
Similar operations between two tensors also behave like you'd intuitively expect:

In [26]:
powers2 = twos ** torch.Tensor(([1, 2], [3, 4]))
print(powers2)

fives = ones + fours
print(fives)

dozens = threes * fours
print(dozens)

tensor([[ 2.,  4.],
        [ 8., 16.]])
tensor([[5., 5.],
        [5., 5.]])
tensor([[12., 12.],
        [12., 12.]])


It's important to note here that all of the tensors in the previous code cell were of identical shape. What happens when we try to perform a binary operation on tensors if dissimilar shape?  

**Note: The following cell throws a run-time error. This is intentional.**

In [30]:
a = torch.rand(2, 3)
b = torch.rand(3, 2)

print(a)
print(b)
# print(a @ b)
print(a * b)

tensor([[0.1332, 0.0023, 0.4945],
        [0.3857, 0.9883, 0.4762]])
tensor([[0.7242, 0.0776],
        [0.4004, 0.9877],
        [0.0352, 0.0905]])


RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 1

In [31]:
print(a.T * b)


tensor([[9.6436e-02, 2.9950e-02],
        [9.0789e-04, 9.7623e-01],
        [1.7419e-02, 4.3096e-02]])


In the general case, you cannot operate on tensors of different shape this way, even in a case like the cell above, where the tensors have an identical number of elements.

**In Brief: Tensor Broadcasting**  
(Note: if you are familiar with broadcasting semantics(語意) in NumPy ndarrays, you'll find the same rules apply here.)  
The exception to the same-shapes rule is tensor broadcasting. Here's an example:

In [32]:
rand = torch.rand(2, 4)
doubled = rand * (torch.ones(1, 4) * 2)

print(rand)
print(doubled)

tensor([[0.4485, 0.8740, 0.2526, 0.6923],
        [0.7545, 0.7746, 0.2330, 0.8441]])
tensor([[0.8971, 1.7480, 0.5051, 1.3846],
        [1.5091, 1.5491, 0.4660, 1.6881]])


What's the trick here? How is it we got to multiply a 2x4 tensor by a 1x4 tensor?

Broadcasting is a way to perform an operation between tensors that have similarities in their shapes. In the example above, the one-row, four-column tensor is multiplied by both rows of the two-row, four-column tensor.

This is an important operation in Deep Learning. The common example is multiplying a tensor of learning weights by a batch of input tensors, applying the operation to each instance in the batch separately, and returning a tensor of identical shape - just like our (2, 4) * (1, 4) example above returned a tensor of shape (2, 4).

The rules for broadcasting are:
* Each tensor must have at least one dimension - no empty tensors.
* Comparing the dimension sizes of the two tensors, going from last to first:
  * Each dimension must be equal, or
  * One of the dimensions must be of size 1, or
  * The dimension does not exist in one of the tensors  
Tensors of identical shape, of course, are trivially "broadcastable", as you saw earlier.

Here are some examples of situations that honor the above rules and allow broadcasting:

In [33]:
a =     torch.ones(4, 3, 2)

b = a * torch.rand(   3, 2) # 3rd & 2nd dims identical to a, dim 1 absent
print(b)

c = a * torch.rand(   3, 1) # 3rd dim = 1, 2nd dim identical to a
print(c)

d = a * torch.rand(   1, 2) # 3rd dim identical to a, 2nd dim = 1
print(d)

tensor([[[0.9004, 0.3995],
         [0.6324, 0.9464],
         [0.0113, 0.5183]],

        [[0.9004, 0.3995],
         [0.6324, 0.9464],
         [0.0113, 0.5183]],

        [[0.9004, 0.3995],
         [0.6324, 0.9464],
         [0.0113, 0.5183]],

        [[0.9004, 0.3995],
         [0.6324, 0.9464],
         [0.0113, 0.5183]]])
tensor([[[0.9807, 0.9807],
         [0.6545, 0.6545],
         [0.4144, 0.4144]],

        [[0.9807, 0.9807],
         [0.6545, 0.6545],
         [0.4144, 0.4144]],

        [[0.9807, 0.9807],
         [0.6545, 0.6545],
         [0.4144, 0.4144]],

        [[0.9807, 0.9807],
         [0.6545, 0.6545],
         [0.4144, 0.4144]]])
tensor([[[0.0696, 0.4648],
         [0.0696, 0.4648],
         [0.0696, 0.4648]],

        [[0.0696, 0.4648],
         [0.0696, 0.4648],
         [0.0696, 0.4648]],

        [[0.0696, 0.4648],
         [0.0696, 0.4648],
         [0.0696, 0.4648]],

        [[0.0696, 0.4648],
         [0.0696, 0.4648],
         [0.0696, 0.4648]]])


Look closely at the values of each tensor above:  
* The multiplication operation that created b was broadcast over every "layer" of `a`.
* For `c`, the operation was broadcast over ever layer and row of a - every 3-element column is identical.
* For `d`, we switched it around - now every row is identical, across layers and columns.

For more information on broadcasting, see the PyTorch documentation on the topic.  
Here are some examples of attempts at broadcasting that will fail:  

**Note: The following cell throws a run-time error. This is intentional.**

In [34]:
a =     torch.ones(4, 3, 2)

b = a * torch.rand(4, 3)    # dimensions must match last-to-first，由後往前比對
print(b)

c = a * torch.rand(   2, 3) # both 3rd & 2nd dims different
print(c)

d = a * torch.rand((0, ))   # can't broadcast with an empty tensor
print(d)

RuntimeError: The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 2

**More Math with Tensors**  
PyTorch tensors have over three hundred operations that can be performed on them.

Here is a small sample from some of the major categories of operations:

In [39]:
# common functions
a = torch.rand(2, 4) * 2 - 1
print('Common functions:')
print(a)
print(torch.abs(a))
print(torch.ceil(a))
print(torch.floor(a))
print(torch.clamp(a, -0.5, 0.5)) # 將值限制在 [-0.5, 0.5] 之間

Common functions:
tensor([[ 0.0503, -0.6737,  0.4782,  0.8509],
        [-0.4908,  0.8738,  0.4975,  0.8212]])
tensor([[0.0503, 0.6737, 0.4782, 0.8509],
        [0.4908, 0.8738, 0.4975, 0.8212]])
tensor([[1., -0., 1., 1.],
        [-0., 1., 1., 1.]])
tensor([[ 0., -1.,  0.,  0.],
        [-1.,  0.,  0.,  0.]])
tensor([[ 0.0503, -0.5000,  0.4782,  0.5000],
        [-0.4908,  0.5000,  0.4975,  0.5000]])


In [40]:
# trigonometric functions and their inverses
angles = torch.tensor([0, math.pi/4, math.pi/2, math.pi/4])
sines = torch.sin(angles)
inverses = torch.asin(sines)
print('\nSine and arcsine:')
print(angles)
print(sines)
print(inverses)


Sine and arcsine:
tensor([0.0000, 0.7854, 1.5708, 0.7854])
tensor([0.0000, 0.7071, 1.0000, 0.7071])
tensor([0.0000, 0.7854, 1.5708, 0.7854])


In [44]:
# bitwise operations
print('\nBitwise XOR:')
b = torch.tensor([1, 5, 11])
c = torch.tensor([3, 7, 10])
print(b)
print(c)
print(torch.bitwise_xor(b, c))


Bitwise XOR:
tensor([ 1,  5, 11])
tensor([ 3,  7, 10])
tensor([2, 2, 1])


In [54]:
# comparison:
print('\nBroadcasted, element-wise equqlity omparison:')
d = torch.tensor([[1., 2.], [3., 4.]])
# e = torch.ones(1, 2)  # many comparison ops support broadcasting!
e = torch.tensor([[1., 4.]])  # many comparison ops support broadcasting!

print(d)
print(e)
print(torch.eq(d, e)) # returns a tensor of type bool


Broadcasted, element-wise equqlity omparison:
tensor([[1., 2.],
        [3., 4.]])
tensor([[1., 4.]])
tensor([[ True, False],
        [False,  True]])


In [55]:
# reductions:
print('\nReductions ops:')
print(torch.max(d))        # returns a single-element tensor
print(torch.max(d).item()) # extracts the value from the returned tensor
print(torch.mean(d))       # average
print(torch.std(d))        # standard deviation;標準差
print(torch.prod(d))       # product of all numbers
print(torch.unique(torch.tensor([1, 2, 1, 2, 1, 2, 3]))) # fliter unique elements


Reductions ops:
tensor(4.)
4.0
tensor(2.5000)
tensor(1.2910)
tensor(24.)
tensor([1, 2, 3])


In [49]:
# vector and linear algebra operations
v1 = torch.tensor([1., 0., 0.])        # x unit vectors
v2 = torch.tensor([0., 1., 0.])        # y unit vectors
m1 = torch.rand(2, 2)                  # random matrix
m2 = torch.tensor([[3., 0.],[0., 3.]]) # three times identity matrix

print('\nVector & Matrices:')
print(torch.cross(v1, v2))             # negative of z unit vector (v1 x v2 == -v2 x v1)
print(m1)
m3 = torch.tensor([[1., 2.],[3., 4.]]) # 3 times ml
print(m3)
print(torch.svd(m3))                   # singular value decomposition


Vector & Matrices:
tensor([0., 0., 1.])
tensor([[0.9130, 0.9833],
        [0.7706, 0.2174]])
tensor([[1., 2.],
        [3., 4.]])
torch.return_types.svd(
U=tensor([[-0.4046, -0.9145],
        [-0.9145,  0.4046]]),
S=tensor([5.4650, 0.3660]),
V=tensor([[-0.5760,  0.8174],
        [-0.8174, -0.5760]]))


This is a small sample of For more details and the full inventory of math functions, have a look at the documentation.

**Altering Tensors in Place**  
Most binary operations on tensors will return a third, new tensor. When we say `c = a * b` (where `a` and `b` are tensors), the new tensor `c` will occupy a
region of memory distinct from the other tensors.

There are times, though, that you may wish to alter a tensor in place - for example, if you're doing an element-wise computation where you can discard intermediate values. For this, most of the math functions have a version with an appended underscore (`_`) that will alter a tensor in place.

For example:

In [56]:
a = torch.tensor([0, math.pi/4, math.pi/2, 3 * math.pi/4])
print('a:')
print(a)
print(torch.sin(a))    # this operation creates a new tensor in memory
print(a)               # a has not changed

b = torch.tensor([0, math.pi/4, math.pi/2, 3 * math.pi/4])
print('\nb:')
print(b)
print(torch.sin_(b))   # note the underscore
print(b)               # b has changed

a:
tensor([0.0000, 0.7854, 1.5708, 2.3562])
tensor([0.0000, 0.7071, 1.0000, 0.7071])
tensor([0.0000, 0.7854, 1.5708, 2.3562])

b:
tensor([0.0000, 0.7854, 1.5708, 2.3562])
tensor([0.0000, 0.7071, 1.0000, 0.7071])
tensor([0.0000, 0.7071, 1.0000, 0.7071])


For arithmetic operations, there are functions that behave similarly:

In [57]:
a = torch.ones(2, 2)
b = torch.rand(2, 2)

print('Before:')
print(a)
print(b)
print('\nAfter adding:')
print(a.add_(b))
print(a)
print(b)
print('\nAfter multiplying:')
print(b.mul_(b))
print(b)

Before:
tensor([[1., 1.],
        [1., 1.]])
tensor([[0.2038, 0.2881],
        [0.2677, 0.3067]])

After adding:
tensor([[1.2038, 1.2881],
        [1.2677, 1.3067]])
tensor([[1.2038, 1.2881],
        [1.2677, 1.3067]])
tensor([[0.2038, 0.2881],
        [0.2677, 0.3067]])

After multiplying:
tensor([[0.0415, 0.0830],
        [0.0717, 0.0940]])
tensor([[0.0415, 0.0830],
        [0.0717, 0.0940]])


Note that these in-place arithmetic functions are methods on the `torch.Tensor` object, not attached to the `torch` module like many other functions (e.g.. `torch.sin()`). As you can see from `a.add_(b)`, the calling tensor is the one that gets changed in place.

There is another option for placing the result of a computation in an existing, allocated tensor. Many of the methods and functions we've seen so far - including creation methodst - have an `out` argument that lets you specity a tensor to receive the output. If the `out` tensor is the correct shape and `dtype`, this can happen without a new memory allocation:

In [60]:
a = torch.rand(2, 2)
b = torch.rand(2, 2)
c = torch.zeros(2, 2)
old_id = id(c) # 獲取 c 的內存 ID

print('c:', c)
print('old_id:', old_id)
d = torch.matmul(a, b, out=c) # 計算矩陣乘法，結果存儲在 c 中
print(c)                 # c has changed

assert c is d            # test c & d are same object, not just containing equal values
assert id(c) == old_id   # make sure the new c is the same object as the old one

torch.rand(2, 2, out=c)  # works for creation too!
print(c)                 # c has changed
assert id(c) == old_id   # still the same object

c: tensor([[0., 0.],
        [0., 0.]])
old_id: 2139738519312
tensor([[0.5221, 0.4058],
        [0.5952, 0.7076]])
tensor([[0.9713, 0.3280],
        [0.5546, 0.6437]])


**Copying Tensors**  
As with any object in Python, assigning a tensor to a variable makes the variable a label of the tensor, and does not copy it. For example:

In [61]:
a = torch.ones(2, 2)
b = a

a[0][1] = 561 # change a
print(b)      # b has changed too

tensor([[  1., 561.],
        [  1.,   1.]])


But what if you want a separate copy of the data to work on? The `clone()` method is there for you:

In [62]:
a = torch.ones(2, 2)
b = a.clone() # b is a copy of a

assert b is not a     # different objects in memory
print(torch.eq(a, b)) # ...but still with the same contents!

tensor([[True, True],
        [True, True]])
