Hacking the codebase

Resources

https://pytorch.org/blog/a-tour-of-pytorch-internals-1/
https://pytorch.org/blog/a-tour-of-pytorch-internals-2/
http://blog.ezyang.com/2019/05/pytorch-internals/
https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md
Libtorch => https://github.com/pytorch/pytorch/blob/master/docs/libtorch.rst Note: Method2 is more viable here.

Building Extension

Setup.py ==>

Cmake Link1 Link2
Define loading extension torch._C Link

Torch C extension (bindings)

Initialization of torch._C Link Notice the method list defined here and how methods are appended to the module Link

Other modules/objects from C extension:

torch._C._functions
torch._C._EngineBase
torch._C._FunctionBase
torch._C._LegacyVariableBase
torch._C._CudaEventBase
torch._C._CudaStreamBase
torch._C.Generator
"torch._C." THPStorageBaseStr // Note the ""
torch._C._PtrWrapper

Important Link

Implementation of torch.tensor

Check implementation of torch.tensor() i.e. (init())

Tensor https://github.com/pytorch/pytorch/blob/e8ad167211e09b1939dcb4f462d3f03aa6a6f08a/torch/tensor.py#L20
_TensorBase : Note this is an object added via PyModule_AddObject https://github.com/pytorch/pytorch/blob/e8ad167211e09b1939dcb4f462d3f03aa6a6f08a/torch/csrc/autograd/python_variable.cpp#L588

Note: torch.autograd.Variable class was used before PyTorch v0.4.0. Now Variable class has been deprecated. torch.autograd.Variable and torch.Tensor and the same now. https://pytorch.org/blog/pytorch-0_4_0-migration-guide/

Implementation of torch.tensor operators

See the section on torch._C.VariableFunctions.add. THPVariable_add in Edward's post

Adding to this take a look at https://github.com/pytorch/pytorch/tree/master/torch/csrc/autograd In the torch/csrc/autograd directory another folder called generated is created that contains all Python methods associated with torch.Tensor.

Import TH/TH.h link
Import ATen/Aten.h link

Torch Random Number Generators

https://github.com/pytorch/pytorch/blob/14ecf92d4212996937a9a1ceadd2202bd828636e/torch/csrc/Generator.cpp#L46

Autograd

https://github.com/pytorch/pytorch/blob/master/docs/source/notes/autograd.rst

Module.cpp

THPModule_initNames THPModule_initExtension => Callback for python part. Used for additional initialization of python classes

void THPAutograd_initFunctions()

What is there in copy_utils.h? Check THPInsertStorageCopyFunction

Python Types (PyTypeObject)

THPDtypeType
THPDeviceType
THPMemoryFormatType
THPLayoutType
THPGeneratorType
THPWrapperType
THPQSchemeType
THPSizeType
THPFInfoType

** Note: THPWrapperType is different from THPVariableType in the way that THPVariableType is used for recording autograd properties on Tensors whereas THPWrapperType is just a method to access Tensors in case of Distributed. More to be clear later. (Someone please Verify )**

Tools Directory

tools directory is the most important one if you want to hack the Pytorch codebase. A lot of magic happens here i.e. code generation .

Module.cpp lists the functions to be injected(link). Note the method lists are marked extern in the file csrc/autograd/python_variable.cpp.

Also look at line

This line ${py_methods} is replaced by the generated code.
The code above ${py_methods} dictates how the code is generated.
The code below ${${py_methods} dictates the list of variable_methods[] that would be injected into torch module and _tensorImpl type.
Note all the code in directory tools/autograd/templates gets placed in csrc/autograd/generated directory. So whenever, you are not sure where the code comes from in csrc directory, check for generated in headers. This means you need to look into tools directory.

Torch variable methods implementation

Torch uses C++ for checking function signature. Link . This is where we add our torch function.

We generate the Pydefs in torch/csrc/autograd/generated directory. Here we check for the function signature. Next we build a parser to store the signature. Next we dispatch the functions. Next we wrap the result and return.

Parser works as follows

Dispatch works as follows:

Release the GIL Next call Tensor APIs. ** Note : Here its not a torch.Tensor but at::Tensor Now you are in C++ land. Now you can use Ezyang's blog to explore C++ land. For torch_function, I am currently not bothered about C++ land, i.e. ATen, Legacy functions and Generic functions.

Difference between torch functions and variable methods

Torch function

static PyObject * THPVariable_mean(PyObject* self_, PyObject* args, PyObject* kwargs)
{
  HANDLE_TH_ERRORS
  std::cout << "hello world! from function mean" << std::endl;
  static PythonArgParser parser({
    "mean(Tensor input, *, ScalarType? dtype=None)",
    "mean(Tensor input, IntArrayRef[1] dim, bool keepdim=False, *, ScalarType? dtype=None, Tensor out=None)",
  }, /*traceable=*/true);

  ParsedArgs<5> parsed_args;
  auto r = parser.parse(args, kwargs, parsed_args);

  if (r.idx == 0) {
    return wrap(dispatch_mean(r.tensor(0), r.scalartypeOptional(1)));
  } else if (r.idx == 1) {
    if (r.isNone(4)) {
      return wrap(dispatch_mean(r.tensor(0), r.intlist(1), r.toBool(2), r.scalartypeOptional(3)));
    } else {
      return wrap(dispatch_mean(r.tensor(0), r.intlist(1), r.toBool(2), r.scalartypeOptional(3), r.tensor(4)));
    }
  }
  Py_RETURN_NONE;
  END_HANDLE_TH_ERRORS
}

Variable Methods

static PyObject * THPVariable_mean(PyObject* self_, PyObject* args, PyObject* kwargs)
{
  HANDLE_TH_ERRORS
  std::cout << "hello world! from mean" << std::endl;
  static PythonArgParser parser({
    "mean(*, ScalarType? dtype=None)",
    "mean(IntArrayRef[1] dim, bool keepdim=False, *, ScalarType? dtype=None)",
  }, /*traceable=*/true);
  auto& self = reinterpret_cast<THPVariable*>(self_)->cdata;
  ParsedArgs<4> parsed_args;
  auto r = parser.parse(args, kwargs, parsed_args);

  if (r.idx == 0) {
    return wrap(dispatch_mean(self, r.scalartypeOptional(0)));
  } else if (r.idx == 1) {
    return wrap(dispatch_mean(self, r.intlist(0), r.toBool(1), r.scalartypeOptional(2)));
  }
  Py_RETURN_NONE;
  END_HANDLE_TH_ERRORS
}

NOTE:

Link is not called when the same function is called again. I believe that the signatures are stored already.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly