Skip to content

Hacking the codebase

Prasun Anand edited this page Sep 16, 2019 · 22 revisions

Resources

  1. https://pytorch.org/blog/a-tour-of-pytorch-internals-1/
  2. https://pytorch.org/blog/a-tour-of-pytorch-internals-2/
  3. http://blog.ezyang.com/2019/05/pytorch-internals/
  4. https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md
  5. Libtorch => https://github.com/pytorch/pytorch/blob/master/docs/libtorch.rst Note: Method2 is more viable here.

Building Extension

Setup.py ==>

  1. Cmake Link1 Link2
  2. Define loading extension torch._C Link

Torch C extension (bindings)

Initialization of torch._C Link Notice the method list defined here and how methods are appended to the module Link

Other modules/objects from C extension:

  1. torch._C._functions
  2. torch._C._EngineBase
  3. torch._C._FunctionBase
  4. torch._C._LegacyVariableBase
  5. torch._C._CudaEventBase
  6. torch._C._CudaStreamBase
  7. torch._C.Generator
  8. "torch._C." THPStorageBaseStr // Note the ""
  9. torch._C._PtrWrapper

Implementation of torch.tensor

Check implementation of torch.tensor() i.e. (init())

  1. Tensor https://github.com/pytorch/pytorch/blob/e8ad167211e09b1939dcb4f462d3f03aa6a6f08a/torch/tensor.py#L20
  2. _TensorBase : Note this is an object added via PyModule_AddObject https://github.com/pytorch/pytorch/blob/e8ad167211e09b1939dcb4f462d3f03aa6a6f08a/torch/csrc/autograd/python_variable.cpp#L588

Note: torch.autograd.Variable class was used before PyTorch v0.4.0. Now Variable class has been deprecated. torch.autograd.Variable and torch.Tensor and the same now. https://pytorch.org/blog/pytorch-0_4_0-migration-guide/

Implementation of torch.tensor operators

See the section on torch._C.VariableFunctions.add. THPVariable_add in Edward's post

Adding to this take a look at https://github.com/pytorch/pytorch/tree/master/torch/csrc/autograd In the torch/csrc/autograd directory another folder called generated is created that contains all Python methods associated with torch.Tensor.

  1. Import TH/TH.h link
  2. Import ATen/Aten.h link

Torch Random Number Generators

  1. https://github.com/pytorch/pytorch/blob/14ecf92d4212996937a9a1ceadd2202bd828636e/torch/csrc/Generator.cpp#L46

Autograd

https://github.com/pytorch/pytorch/blob/master/docs/source/notes/autograd.rst

Module.cpp

THPModule_initNames THPModule_initExtension => Callback for python part. Used for additional initialization of python classes

void THPAutograd_initFunctions()

What is there in copy_utils.h? Check THPInsertStorageCopyFunction

Python Types (PyTypeObject)

  1. THPDtypeType
  2. THPDeviceType
  3. THPMemoryFormatType
  4. THPLayoutType
  5. THPGeneratorType
  6. THPWrapperType
  7. THPQSchemeType
  8. THPSizeType
  9. THPFInfoType

** Note: THPWrapperType is different from THPVariableType in the way that THPVariableType is used for recording autograd properties on Tensors whereas THPWrapperType is just a method to access Tensors in case of Distributed. More to be clear later. (Someone please Verify )**

Tools Directory

tools directory is the most important one if you want to hack the Pytorch codebase. A lot of magic happens here i.e. code generation .

Module.cpp lists the functions to be injected(link). Note the method lists are marked extern in the file csrc/autograd/python_variable.cpp.

Also look at line

  • This line ${py_methods} is replaced by the generated code.
  • The code above ${py_methods} dictates how the code is generated.
  • The code below ${${py_methods} dictates the list of variable_methods[] that would be injected into torch module and _tensorImpl type.
  • Note all the code in directory tools/autograd/templates gets placed in csrc/autograd/generated directory. So whenever, you are not sure where the code comes from in csrc directory, check for generated in headers. This means you need to look into tools directory.

Torch variable methods implementation

Torch uses C++ for checking function signature. Link . This is where we add our torch function.

We generate the Pydefs in torch/csrc/autograd/generated directory. Here we check for the function signature. Next we build a parser to store the signature. Next we dispatch the functions. Next we wrap the result and return.

Parser works as follows

Dispatch works as follows:

Release the GIL Next call Tensor APIs. ** Note : Here its not a torch.Tensor but at::Tensor Now you are in C++ land. Now you can use Ezyang's blog to explore C++ land. For torch_function, I am currently not bothered about C++ land, i.e. ATen, Legacy functions and Generic functions.

Difference between torch functions and variable methods

Torch function

static PyObject * THPVariable_mean(PyObject* self_, PyObject* args, PyObject* kwargs)
{
  HANDLE_TH_ERRORS
  std::cout << "hello world! from function mean" << std::endl;
  static PythonArgParser parser({
    "mean(Tensor input, *, ScalarType? dtype=None)",
    "mean(Tensor input, IntArrayRef[1] dim, bool keepdim=False, *, ScalarType? dtype=None, Tensor out=None)",
  }, /*traceable=*/true);

  ParsedArgs<5> parsed_args;
  auto r = parser.parse(args, kwargs, parsed_args);

  if (r.idx == 0) {
    return wrap(dispatch_mean(r.tensor(0), r.scalartypeOptional(1)));
  } else if (r.idx == 1) {
    if (r.isNone(4)) {
      return wrap(dispatch_mean(r.tensor(0), r.intlist(1), r.toBool(2), r.scalartypeOptional(3)));
    } else {
      return wrap(dispatch_mean(r.tensor(0), r.intlist(1), r.toBool(2), r.scalartypeOptional(3), r.tensor(4)));
    }
  }
  Py_RETURN_NONE;
  END_HANDLE_TH_ERRORS
}

Variable Methods

static PyObject * THPVariable_mean(PyObject* self_, PyObject* args, PyObject* kwargs)
{
  HANDLE_TH_ERRORS
  std::cout << "hello world! from mean" << std::endl;
  static PythonArgParser parser({
    "mean(*, ScalarType? dtype=None)",
    "mean(IntArrayRef[1] dim, bool keepdim=False, *, ScalarType? dtype=None)",
  }, /*traceable=*/true);
  auto& self = reinterpret_cast<THPVariable*>(self_)->cdata;
  ParsedArgs<4> parsed_args;
  auto r = parser.parse(args, kwargs, parsed_args);

  if (r.idx == 0) {
    return wrap(dispatch_mean(self, r.scalartypeOptional(0)));
  } else if (r.idx == 1) {
    return wrap(dispatch_mean(self, r.intlist(0), r.toBool(1), r.scalartypeOptional(2)));
  }
  Py_RETURN_NONE;
  END_HANDLE_TH_ERRORS
}

NOTE:

Link is not called when the same function is called again. I believe that the signatures are stored already.