[FEA] Tensor contractions #23

DavidAce · 2021-11-08T06:53:01Z

Great work so far on MatX!

I wonder if tensor contractions (aka tensordot or einsum) are in the roadmap for MatX. Until now this has existed in cuTENSOR but it is quite verbose, so it would be great to write tensor contractions using MatX high-level syntax.

cliffburdick · 2021-11-08T14:02:07Z

Hi @DavidAce, are you only looking for contractions of the feature or everything that einsum can do?

DavidAce · 2021-11-08T15:47:28Z

I was mostly thinking of contractions, as in numpy's tensordot or Eigen::Tensor contract, since as far as I can tell einsum operations can be composed from contractions + index permutations, where the latter is already supported. However, in a sense einsum is more "high-level" so that could be a better fit for this library.

cliffburdick · 2021-11-08T23:07:49Z

Hi David, I think einsum might be a little overkill at this point because of the flexibility and overlap with other functions such as permute. However, adding contractions in for the general case is probably a useful feature. The limitations would be whatever the limitations in cuTENSOR are (strides, data types, etc), so we'll investigate these and get back to you.

cliffburdick · 2021-11-09T03:55:44Z

Hi @DavidAce, I remember now the reason we didn't include cuTensor was simply that it wasn't included with CUDA and created an extra download for users. We'll discuss the options for this internally.

oscarbg · 2021-11-09T17:01:10Z

+1

cliffburdick · 2021-11-18T16:28:07Z

@DavidAce @oscarbg can you describe some scenarios you'd use this feature in? Things like the ranks, sizes, and dimensions you want to contract over. Are you also looking for a list of tensor to be contracted or just two?

DavidAce · 2021-11-20T15:39:21Z

@cliffburdick Sure! I use tensor contractions to implement algorithms based on matrix product states (MPS) for studying quantum many-body systems in 1D.

Typically my contractions involve 2 to 5 dense tensors of rank 1 to 8. These are of type double or std::complex<double> depending on the model. Any of these tensor networks can be contracted two tensors at a time, so it would be sufficient to have support for an operation in the style of this cutensor example:

Syntactically, I find that Eigen's approach is nice, with chained contractions using the dot-operator and lazy-evaluation. E.g.

A = B.contract(C, ...).contract(D, ...).contract(E, ...)

where ... are the corresponding indices to contract.

As the number of tensors grows, it becomes non-trivial to determine the optimal order of contractions. There are several algorithms to finding good orderings, here are some examples. I would not expect order optimization to be a part of a library, and indeed all libraries that I'm aware of leaves this as an exercise to the user. Still, I suspect fellow practitioners would think this is "nice to have", in particular people studying 2D systems (so-called PEPS).

At the moment, the contractions I deal with are fairly simple. For instance, the following contraction takes most of the time in my simulations (in tensor diagram notation) :

This tensor contraction is used to express the matrix-vector product Hψ in an iterative eigenvalue solver, and runs somewhere in the millions of times during a simulation. I would normally contract in the order

result = ψ.contract(L, ...).contract(M, ...).contract(R, ...)

Depending on the model, the indices can have the following dimensions

p and s: from 1 to 2048 (possibly more with gpu acceleration)
q and t: from 4 to 25
r and i from 2 to 64

A while ago I considered using GPU acceleration using cutensor on RTX 2080 TI, but decided against it despite promising performance. Mostly because I had access to way more CPU power than GPU, but also because the contraction above resulted in quite a lot of code. I felt it would be hard to maintain, prone to human error and take a lot of effort to port.

However, more GPUs have become available since (also with native fp64 support), so a high-level library to handle contractions on GPU would definitely make things interesting again by lowering the barrier. For instance, it would be cool to detect if an HPC node has a GPU accelerator available at runtime and use it.

leofang · 2021-11-25T00:02:29Z

Hi @DavidAce In case you don't know already, we are working on a new cuTensorNet library (part of the cuQuantum SDK), planned to be released in this December, that might meet just what you need exactly. There is a recent GTC talk on the cuQuantum SDK (not sure if login/registration is required or not).

cuTensorNet allows you to create an arbitrary tensor network (be it MPS, PEPS, MERA, or whatever) with the network topology specified by pairwise contractions among the tensors, for which it can

search for the optimal contraction order using various algorithms we implemented (we call it a pathfinder)
perform the actual contraction based on the given path (either from the pathfinder or supplied by users)

Roughly speaking, the capability of cuTensorNet can be nicely mapped to 1. einsum_path() and 2. einsum(). A lot of efforts have been invested into the performance optimization (of both directions) to fully utilize the compute power of CPU and GPU. Both C and Python APIs would be provided for users' convenience.

Now, with regard to the nice introduction you wrote above, I have two questions:

For structured TNs, especially MPS for 1D systems, isn't the best contraction order already known? Practically how performant are the greedy/annealing algorithms discussed in the Schindler & Jermyn paper for contracting MPS/PEPS in your experience (compared to the paths we learned in introductory many-body physics)?
In the Schrodinger equation example you discussed, why are the p and s bonds excessively large compared to other bonds? If you have a large quantum system with many sites, single-GPU tensor contraction would not be feasible anyway. Is this a particularly pathological case (say close to critical points)? Would you be able to point us to such examples so that we can try and evaluate it internally?

leofang · 2021-12-21T16:15:33Z

Update: cuTensorNet is out, available with both C and Python APIs. See https://docs.nvidia.com/cuda/cuquantum/index.html.

cliffburdick · 2021-12-24T04:42:02Z

@DavidAce and @oscarbg can you please evaluate the cuTensorNet library above? If it suits your needs and would like us to consider MatX to use it let us know.

cliffburdick · 2022-01-19T21:13:47Z

Hi @DavidAce and @oscarbg, we're happy to report that we've added support for contractions via an einsum function thanks to the help of @leofang. This change uses both cuTENSOR and cuTensorNet as the backends. The documentation can be found here: https://nvidia.github.io/MatX/api/einsum.html

To be clear, this is a subset of NumPy's einsum at this point, but we do support contractions of any rank/mode, and any number of tensors can be contracted. If you have a specific size/shape/number you want to try, please let us know, or just play around with the library.

Using your example above, a 3-way contraction would be something like:

matx::cutensor::einsum(out, "ijk,ijl,lmn->kmn", stream, a, b, c);

cliffburdick · 2022-02-16T04:40:20Z

Closing this one for now. Please open a new issue if you find problems.

cliffburdick added the enhancement New feature or request label Nov 24, 2021

cliffburdick closed this as completed Feb 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Tensor contractions #23

[FEA] Tensor contractions #23

DavidAce commented Nov 8, 2021

cliffburdick commented Nov 8, 2021

DavidAce commented Nov 8, 2021 •

edited

cliffburdick commented Nov 8, 2021

cliffburdick commented Nov 9, 2021

oscarbg commented Nov 9, 2021

cliffburdick commented Nov 18, 2021

DavidAce commented Nov 20, 2021 •

edited

leofang commented Nov 25, 2021

leofang commented Dec 21, 2021

cliffburdick commented Dec 24, 2021

cliffburdick commented Jan 19, 2022 •

edited

cliffburdick commented Feb 16, 2022

[FEA] Tensor contractions #23

[FEA] Tensor contractions #23

Comments

DavidAce commented Nov 8, 2021

cliffburdick commented Nov 8, 2021

DavidAce commented Nov 8, 2021 • edited

cliffburdick commented Nov 8, 2021

cliffburdick commented Nov 9, 2021

oscarbg commented Nov 9, 2021

cliffburdick commented Nov 18, 2021

DavidAce commented Nov 20, 2021 • edited

leofang commented Nov 25, 2021

leofang commented Dec 21, 2021

cliffburdick commented Dec 24, 2021

cliffburdick commented Jan 19, 2022 • edited

cliffburdick commented Feb 16, 2022

DavidAce commented Nov 8, 2021 •

edited

DavidAce commented Nov 20, 2021 •

edited

cliffburdick commented Jan 19, 2022 •

edited