In [None]:
system("/usr/bin/wget https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-1.6.0%2Bcpu.zip -O libtorch.zip")

In [None]:
system("/usr/bin/unzip libtorch.zip")

In [1]:
#pragma cling add_include_path("./libtorch/include")
#pragma cling add_include_path("./libtorch/include/torch/csrc/api/include")
#pragma cling add_library_path("./libtorch/lib")
#pragma cling load("libtorch")

In [2]:
#include <iostream>
#include <torch/torch.h>
#include <ATen/ATen.h>
#include <torch/csrc/autograd/variable.h>
#include <torch/csrc/autograd/function.h>

# Basic Operations

In [3]:
torch::Tensor tensor = torch::eye(3);
std::cout << tensor << std::endl;

 1  0  0
 0  1  0
 0  0  1
[ CPUFloatType{3,3} ]


In [4]:
at::Tensor a = at::ones({2, 2}, at::kInt);
at::Tensor b = at::randn({2, 2});
auto c = a + b.to(at::kInt);

std::cout << "a: \n" << a << std::endl;
std::cout << std::endl;
std::cout << "b: \n" << b << std::endl;
std::cout << std::endl;
std::cout << "c: \n" << c << std::endl;

a: 
 1  1
 1  1
[ CPUIntType{2,2} ]

b: 
 0.1033  0.8785
-1.4594 -2.3636
[ CPUFloatType{2,2} ]

c: 
 1  1
 0 -1
[ CPUIntType{2,2} ]


In [5]:
std::vector<double> w = {1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.0};
auto tensor_w = at::tensor(w);
std::cout << "tensor_w: \n" << tensor_w << std::endl;

tensor_w: 
  1.1000
  2.2000
  3.3000
  4.4000
  5.5000
  6.6000
  7.7000
  8.8000
  9.9000
 10.0000
[ CPUDoubleType{10} ]


In [6]:
auto a = torch::rand({10, 3});
auto b = torch::rand({10, 3});
// expected
auto exp = torch::empty({10, 3});
for (int j = 0; j < 10; j++) {
  auto u1 = a[j][0], u2 = a[j][1], u3 = a[j][2];
  auto v1 = b[j][0], v2 = b[j][1], v3 = b[j][2];
  exp[j][0] = u2 * v3 - v2 * u3;
  exp[j][1] = v1 * u3 - u1 * v3;
  exp[j][2] = u1 * v2 - v1 * u2;
}
// actual
auto out = torch::cross(a, b);

std::cout << "croos(a,b): \n" << out << std::endl;

croos(a,b): 
 0.6706 -0.1580 -0.4023
 0.1039 -0.0234 -0.2546
-0.1587 -0.1712  0.2032
 0.1164 -0.2145  0.0571
 0.1953 -0.2828  0.0443
 0.1154 -0.0222 -0.2360
 0.3060 -0.6522  0.3887
-0.1257  0.0475  0.1047
-0.5939  0.3177 -0.0905
-0.0237  0.0163  0.1551
[ CPUFloatType{10,3} ]


In [7]:
std::vector<double> x = { 1, 2, 3, 4, 5, 6};
auto options = torch::TensorOptions().dtype(torch::kFloat64);
auto t1 = torch::tensor(x);
std::cout << "t1: \n" << t1 << std::endl;
auto t2 = torch::tensor(x, options).reshape({1, 6});
std::cout << "t2: \n" << t2 << std::endl;
auto t3 = torch::tensor(x, options).reshape({2, 3});
std::cout << "t2: \n" << t3 << std::endl;

t1: 
 1
 2
 3
 4
 5
 6
[ CPUFloatType{6} ]
t2: 
 1  2  3  4  5  6
[ CPUDoubleType{1,6} ]
t2: 
 1  2  3
 4  5  6
[ CPUDoubleType{2,3} ]


In [8]:
auto opt = torch::TensorOptions().dtype(torch::kFloat64).requires_grad(false);
auto tensor1 = torch::tensor({1, 2, 3, 4, 5}, opt).reshape({5, 1, 1});
auto tensor2 = torch::tensor({5, 4, 3, 2, 1}, opt).reshape({5, 1, 1});
std::cout << "tensor1: \n" << tensor1 << std::endl;
std::cout << "tensor2: \n" << tensor2 << std::endl;

tensor1: 
(1,.,.) = 
  1

(2,.,.) = 
  2

(3,.,.) = 
  3

(4,.,.) = 
  4

(5,.,.) = 
  5
[ CPUDoubleType{5,1,1} ]
tensor2: 
(1,.,.) = 
  5

(2,.,.) = 
  4

(3,.,.) = 
  3

(4,.,.) = 
  2

(5,.,.) = 
  1
[ CPUDoubleType{5,1,1} ]


# Autograd in C++ Frontend

The autograd package is crucial for building highly flexible and dynamic neural networks in PyTorch. Most of the autograd APIs in PyTorch Python frontend are also available in C++ frontend, allowing easy translation of autograd code from Python to C++.

In this tutorial we’ll look at several examples of doing autograd in PyTorch C++ frontend. Note that this tutorial assumes that you already have a basic understanding of autograd in Python frontend. If that’s not the case, please first read Autograd: [Automatic Differentiation](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html).

## Basic autograd operations

(Adapted from [this tutorial](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#autograd-automatic-differentiation))

Create a tensor and set `torch::requires_grad()` to track computation with it

In [9]:
auto x = torch::ones({2, 2}, torch::requires_grad());
std::cout << x << std::endl;

 1  1
 1  1
[ CPUFloatType{2,2} ]


Do a tensor operation:

In [10]:
auto y = x + 2;
std::cout << y << std::endl;

 3  3
 3  3
[ CPUFloatType{2,2} ]


`y` was created as a result of an operation, so it has a `grad_fn`.

In [11]:
std::cout << y.grad_fn()->name() << std::endl;

AddBackward1


Do more operations on `y`

In [12]:
auto z = y * y * 3;
auto out = z.mean();

std::cout << z << std::endl;
std::cout << z.grad_fn()->name() << std::endl;
std::cout << out << std::endl;
std::cout << out.grad_fn()->name() << std::endl;

 27  27
 27  27
[ CPUFloatType{2,2} ]
MulBackward1
27
[ CPUFloatType{} ]
MeanBackward0


`.requires_grad_( ... )` changes an existing tensor’s `requires_grad` flag in-place.

In [13]:
torch::Tensor a_tensor = torch::ones({2, 2}, torch::requires_grad());
torch::Tensor b_tensor = torch::randn({2, 2});

std::cout << a_tensor << std::endl;
std::cout << b_tensor << std::endl;

auto c_tensor = a_tensor + b_tensor;
c_tensor.grad();

std::cout << c_tensor << std::endl;

 1  1
 1  1
[ CPUFloatType{2,2} ]
-0.1077 -1.3659
-1.7583 -0.9559
[ CPUFloatType{2,2} ]
 0.8923 -0.3659
-0.7583  0.0441
[ CPUFloatType{2,2} ]


In [14]:
auto a = torch::randn({2, 2});
a = ((a * 3) / (a - 1));
std::cout << a.requires_grad() << std::endl;

a.requires_grad_(true);
std::cout << a.requires_grad() << std::endl;

auto b = (a * a).sum();
std::cout << b.grad_fn()->name() << std::endl;

0
1
SumBackward0


Let’s backprop now. Because out contains a single scalar, `out.backward()`is equivalent to `out.backward(torch::tensor(1.))`.

In [15]:
out.backward();

Print gradients d(out)/dx

In [16]:
std::cout << x.grad() << std::endl;

 4.5000  4.5000
 4.5000  4.5000
[ CPUFloatType{2,2} ]


You should have got a matrix of `4.5`. For explanations on how we arrive at this value, please see [the corresponding section in this tutorial](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#gradients).

Now let’s take a look at an example of vector-Jacobian product:

In [17]:
x = torch::randn(3, torch::requires_grad());

y = x * 2;
while (y.norm().item<double>() < 1000) {
  y = y * 2;
}

std::cout << y << std::endl;
std::cout << y.grad_fn()->name() << std::endl;

-302.7886
-1573.0356
-1041.9695
[ CPUFloatType{3} ]
MulBackward1


If we want the vector-Jacobian product, pass the vector to `backward` as argument:

In [18]:
auto v = torch::tensor({0.1, 1.0, 0.0001}, torch::kFloat);
y.backward(v);

std::cout << x.grad() << std::endl;

  102.4000
 1024.0000
    0.1024
[ CPUFloatType{3} ]


You can also stop autograd from tracking history on tensors that require gradients either by putting `torch::NoGradGuard` in a code block

In [19]:
std::cout << x.requires_grad() << std::endl;
std::cout << x.pow(2).requires_grad() << std::endl;

{
  torch::NoGradGuard no_grad;
  std::cout << x.pow(2).requires_grad() << std::endl;
}

1
1
0


Or by using `.detach()` to get a new tensor with the same content but that does not require gradients:

In [20]:
std::cout << x.requires_grad() << std::endl;
y = x.detach();
std::cout << y.requires_grad() << std::endl;
std::cout << x.eq(y).all().item<bool>() << std::endl;

1
0
1


For more information on C++ tensor autograd APIs such as `grad` / `requires_grad `/ `is_leaf`/ `backward` / `detach` / `detach_` / `register_hook` / `retain_grad`, please see [the corresponding C++ API docs](https://pytorch.org/cppdocs/api/classat_1_1_tensor.html).

## Computing higher-order gradients in C++

One of the applications of higher-order gradients is calculating gradient penalty. Let’s see an example of it using `torch::autograd::grad`:

In [21]:
#include <torch/torch.h>

auto model = torch::nn::Linear(4, 3);

auto input = torch::randn({3, 4}).requires_grad_(true);
auto output = model(input);

// Calculate loss
auto target = torch::randn({3, 3});
auto loss = torch::nn::MSELoss()(output, target);

// Use norm of gradients as penalty
auto grad_output = torch::ones_like(output);
auto gradient = torch::autograd::grad({output}, {input}, /*grad_outputs=*/{grad_output}, /*create_graph=*/true)[0];
auto gradient_penalty = torch::pow((gradient.norm(2, /*dim=*/1) - 1), 2).mean();

// Add gradient penalty to loss
auto combined_loss = loss + gradient_penalty;
combined_loss.backward();

std::cout << input.grad() << std::endl;

 0.0206 -0.0837  0.0977 -0.0048
 0.0206 -0.0705  0.1354 -0.0293
-0.1433 -0.0648 -0.1352 -0.0565
[ CPUFloatType{3,4} ]


Please see the documentation for `torch::autograd::backward` ([link](https://pytorch.org/cppdocs/api/function_namespacetorch_1_1autograd_1afa9b5d4329085df4b6b3d4b4be48914b.html)) and `torch::autograd::grad` ([link](https://pytorch.org/cppdocs/api/function_namespacetorch_1_1autograd_1a1e03c42b14b40c306f9eb947ef842d9c.html)) for more information on how to use them.

## Using custom autograd function in C++

(Adapted from this [tutorial](https://pytorch.org/docs/stable/notes/extending.html#extending-torch-autograd))

Adding a new elementary operation to torch::autograd requires implementing a new `torch::autograd::Function` subclass for each operation. `torch::autograd::Function` s are what `torch::autograd` uses to compute the results and gradients, and encode the operation history. Every new function requires you to implement 2 methods: `forward` and `backward`, and please see this [link](https://pytorch.org/cppdocs/api/structtorch_1_1autograd_1_1_function.html) for the detailed requirements.

Below you can find code for a `Linear` function from `torch::nn`:

In [22]:
#include <torch/torch.h>

using namespace torch::autograd;

// Inherit from Function
class LinearFunction : public Function<LinearFunction> {
 public:
  // Note that both forward and backward are static functions

  // bias is an optional argument
  static torch::Tensor forward(
      AutogradContext *ctx, torch::Tensor input, torch::Tensor weight, torch::Tensor bias = torch::Tensor()) {
    ctx->save_for_backward({input, weight, bias});
    auto output = input.mm(weight.t());
    if (bias.defined()) {
      output += bias.unsqueeze(0).expand_as(output);
    }
    return output;
  }

  static tensor_list backward(AutogradContext *ctx, tensor_list grad_outputs) {
    auto saved = ctx->get_saved_variables();
    auto input = saved[0];
    auto weight = saved[1];
    auto bias = saved[2];

    auto grad_output = grad_outputs[0];
    auto grad_input = grad_output.mm(weight);
    auto grad_weight = grad_output.t().mm(input);
    auto grad_bias = torch::Tensor();
    if (bias.defined()) {
      grad_bias = grad_output.sum(0);
    }

    return {grad_input, grad_weight, grad_bias};
  }
};

Then, we can use the `LinearFunction` in the following way:

In [23]:
auto x = torch::randn({2, 3}).requires_grad_();
auto weight = torch::randn({4, 3}).requires_grad_();
auto y = LinearFunction::apply(x, weight);
y.sum().backward();

std::cout << x.grad() << std::endl;
std::cout << weight.grad() << std::endl;

-0.0600  1.8425  0.0863
-0.0600  1.8425  0.0863
[ CPUFloatType{2,3} ]
 1.3493 -2.8672  0.5722
 1.3493 -2.8672  0.5722
 1.3493 -2.8672  0.5722
 1.3493 -2.8672  0.5722
[ CPUFloatType{4,3} ]


Here, we give an additional example of a function that is parametrized by non-tensor arguments:

In [24]:
#include <torch/torch.h>

using namespace torch::autograd;

class MulConstant : public Function<MulConstant> {
 public:
  static torch::Tensor forward(AutogradContext *ctx, torch::Tensor tensor, double constant) {
    // ctx is a context object that can be used to stash information
    // for backward computation
    ctx->saved_data["constant"] = constant;
    return tensor * constant;
  }

  static tensor_list backward(AutogradContext *ctx, tensor_list grad_outputs) {
    // We return as many input gradients as there were arguments.
    // Gradients of non-tensor arguments to forward must be `torch::Tensor()`.
    return {grad_outputs[0] * ctx->saved_data["constant"].toDouble(), torch::Tensor()};
  }
};

Then, we can use the `MulConstant` in the following way:

In [25]:
auto x = torch::randn({2}).requires_grad_();
auto y = MulConstant::apply(x, 5.5);
y.sum().backward();

std::cout << x.grad() << std::endl;

 5.5000
 5.5000
[ CPUFloatType{2} ]


For more information on `torch::autograd::Function`, please see [its documentation](https://pytorch.org/cppdocs/api/structtorch_1_1autograd_1_1_function.html).