## Pytorch 1.0 tutorials
_have a basic familiarity of pytorch 0.3, according to https://pytorch.org/tutorials/_

### 1. Set up cuda or non-cuda versions

We use anaconda3 and conda to setup pytorch.  
And it is very slow to download pytorch from pytorch.org. Therefore, we use https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/linux-64/ to make it.  
Sellect and  download corresponding version from aforementsion url and "conda install it"

### 2. What is pytorch?

#### You can use standard NumPy-like indexing with all bells and whistles!

In [1]:
import torch

test = torch.empty(3,2,dtype=torch.long)
print(test)
print(test[:, 0]) # select the 0 column or the first one from secondth dimension
print(test[0, :])

tensor([[    140143715670560,                  24],
        [    140144782868479, 8171062582517395298],
        [8243662592152856949, 7310305785198503009]])
tensor([    140143715670560,     140144782868479, 8243662592152856949])
tensor([140143715670560,              24])


#### If you have a one element tensor, use .item() to get the value as a Python number

In [2]:
x = torch.randn(1)
print(x)
print(x.item())

tensor([-1.1207])
-1.120747447013855


#### Tensors can be moved onto any device using the .to method.

In [4]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    y = torch.ones_like(x, device = device) # directly create a tensor on gpu
    x = x.to(device)
    
    z = x+y
    print(z)
    print(z.to("cpu" , torch.double)) # ".to" also can change dtype

tensor([-0.1207], device='cuda:0')
tensor([-0.1207], dtype=torch.float64)


### 3. Automatic differentitation

different from pytorch 0.3: there have no variable, torch has autograd  
.backward : backward this result  
.grad : compute this grad
$a.backward + b.grad \Leftarrow \Rightarrow \frac{d(a)}{d(b)} $  
you can use the .grad_fn to check the compute graph

In [78]:
# dataset
x = torch.Tensor([1, 2]) # 1 dimension vector
# use unsquence to 2 dimension vector(raw or column)
x.unsqueeze_(0)
x.resize_(2,2)
x[1, :] = 1

# init parameter
w = torch.zeros(2,1).t_() # 2 dimension raw vector
w.requires_grad_(True) # you must give the leaf parameter, the parameter in NN is leaf

for i in range(2):
    items = x[:, i]
    items.unsqueeze_(1)
    result = w.mm(items)
    
    result.backward() # result = w.mm(x), so \farc{d(result)}{d(w)} = x
    print(w.grad) # w.grad = w.grad_origin + w.grad_new = [1,1]+[2,1], you need tor zero_grad it to check each grad

tensor([[1., 1.]])
tensor([[3., 2.]])


### 3. Neural network

You can use any of the "Tensor operations" in the forward function.

#### Training on Gpu

In [81]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)
# you should tranfer the nn, inputs and targets onto GPU
net.to(device)
inputs, labels = inputs.to(device), labels.to(device)

cuda:0


#### Training on multiple Gpu

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = Model(input_size, output_size)
if torch.cuda.device_count() > 1:
  print("Let's use", torch.cuda.device_count(), "GPUs!")
  # dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
  model = nn.DataParallel(model) # according the number of gpus to change the dimension of nn

model.to(device)

### 4. Data loading and processing