[TVM] Automatic differentiation for tensor expressions #2498

sgrechanik-h · 2019-01-23T13:56:52Z

This PR adds automatic differentiation for tensor expressions (#1996). The implementation can be
naturally divided into two parts:

The automatic differentiation itself (autodiff.cc, autodiff.h, autodiff.py, test_pass_autodiff.py). It contains an implementation of reverse-mode automatic differentiation (the function Differentiate) and standard differentiation rules (the class JacobianMutator).
Transformations for optimizing the code generated by the autodiff (zero_elimination.cc, zero_elimination.h, test_pass_zero_elimination.py). This part is called zero elimination because its goal is to eliminate generation of zero values in intermediate computations by fusing tensors and transforming domains of iteration. Its central part is Fourier-Motzkin elimination, currently we use our own implementation (the function SolveSystemOfInequalities), however in the future we might want to consider using some libraries for manipulating polyhedra (which may be useful for other parts of tvm as well).

There are also several helpers: functions for printing out tensors recursively in a human-readable form (dump_tensor.cc) and functions for applying transformations to compute bodies (in op_util.cc).

Testing of the automatic differentiation is done in two ways:

The correctness is tested by comparing to numerically computed gradients.
We try to catch performance regressions by statically estimating some metrics (the number of operations, iterations, etc). An exception is thrown if some metric is more than 1.5x worse than the reference value.

Currently there are several thing missing from this PR:

There is no solver for systems of linear integer equations. Instead, some ad-hoc transformations are used which don't work in many cases and are very sensitive to things like the order of variables. For this reason gradients for some operations like dilated and strided convolutions are not as performant as they could be. This is the main deficiency which should be fixed in subsequent PRs.
No tutorials and documentation, I'm planning to add them in a separate PR.
Some operations (e.g. scan) are not supported.
No integration with Relay and NNVM, we plan to do it in the future.

masahi · 2019-01-24T01:00:10Z

This looks super cool :) just wondering what is the plan to schedule the generated gradient expression (for CPU and GPU) ? Is it intended to be used with auto scheduler @merrymercy is working on?

sgrechanik-h · 2019-01-24T09:09:28Z

@masahi Yes, it is supposed to be used with an autoscheduler. Scheduling the generated code manually is possible, but I don't think it's practical.

kaitingwang · 2019-01-24T19:39:40Z

Would using a linear equation solver (e.g. IBM CPLEX, that's proprietary, or open source GLPK) help ? Another question, will control flow be supported? e.g. https://arxiv.org/abs/1810.07951

Ravenwater · 2019-01-24T21:04:59Z

@kaitingwang I would advise against GLPK as it wouldn't be able to take advantage of different arithmetics. I would be looking for a templated C++ code that provides the 'specific' algorithm you are going to select for the scheduling. No need to pull in a general linear programming library and weight down this part of TVM.

reminisce · 2019-01-25T06:15:54Z

python/tvm/autodiff.py

+        adjoints and populate the corresponding dict, but the list of results
+        will be empty).
+
+    head : Tensor


Can this be understood as the gradient of the output tensor w.r.t. the final output of a math expression? Could you explain why its shape is output.shape+output.shape when None is passed? My understanding is that it should default to ones(output.shape). Thanks.

Please disregard my question above. After reading your implementation of DiffBuildingBlock, I think I got the meaning of defaulting head to shape output.shape+output.shape.

sgrechanik-h · 2019-01-25T15:59:37Z

Would using a linear equation solver (e.g. IBM CPLEX, that's proprietary, or open source GLPK) help ?

Using some third party libraries for doing well-known stuff like solving systems of linear integer equations is a good idea, however we should think through what particular libraries we should use in tvm.

Another question, will control flow be supported? e.g. https://arxiv.org/abs/1810.07951

No, differentiating through control flow structures is done separately on the Relay level.

reminisce · 2019-01-25T18:34:12Z

@sgrechanik-h I have a question regarding the performance of auto diff, specifically about DiffBuildingBlock.

Taking the final loss as a scalar value for example, suppose we have an input tensor of shape (i1, i2, ..., i_m), an output tensor of shape (j1, j2, ..., j_n), the adjoint of the output with shape (j1, j2, ..., j_n), and the Jacobian of the output w.r.t. the input is (j1, j2, ..., j_n, i1, i2, ..., i_m). In DiffBuildingBlock, it performs a tensor dot operation on the head and Jacobian to get the adjoint of the input. While this is a generic way of realizing auto diff, a tensor dot operation can actually be avoided for many operators whose forward algorithms are known and Jacobian is sparse.

Maybe you have taken the situation like this into the following optimization passes. Just want ask

If you have compared the performance of autodiff with the gradient kernels manually devised in MXNet/Pytorch/TF?
Any guideline on when should users provide their own fdiff functions to tvm.autodiff.differentiate?

sgrechanik-h · 2019-01-28T11:29:37Z

@reminisce Yes, DiffBuildingBlock performs several transformations to fuse the tensor dot product with the Jacobian computation.

If you have compared the performance of autodiff with the gradient kernels manually devised in MXNet/Pytorch/TF?

I haven't yet. The main problem of such a comparison is that we need to schedule the generated gradients somehow. So far I've only tried to compare the overall performance of a couple of simple neural networks with Keras/TF on CPU. I used a very naive scheduling method (just fuse axes, and then split the resulting axis to parallelize), so the forward pass was about 1.7x slower on tvm, and the backward pass was about 3x slower. So I would currently estimate the generated gradients to be about 2x "worse" than the manually written ones. However, this is very operation-dependent, so I should compare performance for separate operations in the future.

Any guideline on when should users provide their own fdiff functions to tvm.autodiff.differentiate?

I was thinking of two use cases:

When the autodiff does a poor job and we have to perform differentiation manually for certain tensors.
When the user wants to try a custom set or order of optimizations (different from the ones used in DiffBuildingBlock).

Currently there is no way to find out if the autodiff did a poor job, other than run and measure the performance, or look at the generated code, but I'm thinking about writing a simple analyzer which will issue a warning if the generated code looks bad.

kaitingwang · 2019-01-28T14:53:02Z

I've gone over the paper: "Automatic Differentiation for Tensor Algebras" where your work is based on. (I have not dived into your code yet) Please correct me if I've misunderstood. Your work is based on solving A * alpha = beta where 1<= alpha <= Nf (equation 39), with a fallback to equation 40 (if cannot be solved, one must sum over the sparsity of zeros). The math is quite involved so I'm wondering why you've picked this paper and if the mathematically soundness has been checked... Thanks.

sgrechanik-h · 2019-01-28T16:28:52Z

Your work is based on solving A * alpha = beta where 1<= alpha <= Nf (equation 39), with a fallback to equation 40 (if cannot be solved, one must sum over the sparsity of zeros).

Essentially, yes. More precisely, I first differentiate naively, generating something similar to the equation (40), and then optimize this expression by generating sets of the form (39) for each loop and trying to simplify them.

The math is quite involved so I'm wondering why you've picked this paper and if the mathematically soundness has been checked... Thanks.

It's quite overcomplicated in the paper, but the overall procedure is quite modular, so the soundness should follow from the soundness of every small step (the naive differentiation and every optimizing tranformation). At the same time the paper contains all the necessary details, so that must be why I've chosen it.

reminisce · 2019-01-29T00:23:06Z

@sgrechanik-h Thanks for the reply. This is really nice work. IMHO, the value of this auto diff feature lies in providing a generic way of generating gradient expression for operators, so that developers can avoid tedious implementation of gradient expression. To achieve reasonable performance in BP, we can attach schedules at operator level.

Two more questions:

How is race condition handled in BP in this PR? For example, topi.take uses a input indices tensor to get elements from the input data. When there are duplicate indices provided, there is a race condition at the time of back-propagating output gradient to the same location of the input gradient tensor. How is this problem resolved in this PR?
I have tried differentiate on several operators in topi. Most of them can return correct results, but there are a few errors. The following is one of them. Any idea where went wrong?

import tvm
import topi

m, n = 40, 50
data = tvm.placeholder((m, n), name='data')
indices = tvm.placeholder((1,), dtype='int32', name='indices')
ret = topi.take(data, indices)
head = topi.full_like(ret, 2.0)
diff_ret = tvm.differentiate(ret, [data], head)

# error message:
# Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

sgrechanik-h · 2019-01-29T08:56:23Z

@reminisce,

How is race condition handled in BP in this PR? For example, topi.take uses a input indices tensor to get elements from the input data. When there are duplicate indices provided, there is a race condition at the time of back-propagating output gradient to the same location of the input gradient tensor.

This implementation of autodiff doesn't create race conditions in the first place, instead it generates huge naive loops directly. For example, in the case of take it will check every single index for every single element of the output:

  for (ax0, 0, 40) {
    for (ax1, 0, 50) {
      grad[((ax0*50) + ax1)] = 0.000000f
      for (k0, 0, 10) {
        if (((ax0 == ((indices[k0]/50) % 40)) && (ax1 == (indices[k0] % 50)))) {
          grad[((((indices[k0]/50) % 40)*50) + (indices[k0] % 50))] = grad[((((indices[k0]/50) % 40)*50) + (indices[k0] % 50))] + head[k0]
        }
      }
    }
  }

That is, take may be an example when this differentiation algorithm is not a good choice. (Upd: I think, it is possible to rearrange the loops so that k0 become the outer variable, and then eliminate the variables ax0 and ax0 by merging the corresponding loops with the condition. However it seems that this cannot be expressed on the tensor expression level and should be done during or after the scheduling phase.)

I have tried differentiate on several operators in topi. Most of them can return correct results, but there are a few errors. The following is one of them. Any idea where went wrong?

Thank you for reporting, I'll upload the fix soon.

sergei-mironov · 2019-01-29T11:03:47Z

src/pass/autodiff.h

@@ -0,0 +1,146 @@
+/*!


The current location of autodiff.h prevents it from being used by standalone C++ applications since they don't have an access to src folder. Could we move this file to tvm/include ?

reminisce · 2019-01-29T19:03:59Z

@sgrechanik-h

This implementation of autodiff doesn't create race conditions in the first place, instead it generates huge naive loops directly. For example, in the case of take it will check every single index for every single element of the output:

You are right. I misunderstood the problem. It should be handled at the schedule level. Thanks.

reminisce · 2019-01-29T22:17:13Z

@sgrechanik-h I'm reporting another crash. It is fine if data1 and data2 have the same shape, while crashes when broadcasting is involved. Could you take a look? Thanks.

import tvm
import topi


m = tvm.var('m')
n = tvm.var('n')
data1 = tvm.placeholder((m, n), name='data1')
data2 = tvm.placeholder((n,), name='data2')
out = topi.add(data1, data2)
head = topi.full_like(out, 2.0)
tvm.differentiate(out, [data1, data2], head)

Error message:

libc++abi.dylib: terminating with uncaught exception of type std::out_of_range: unordered_map::at: key not found

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

sgrechanik-h · 2019-01-30T15:55:38Z

@reminisce Thanks, the problem is due to free variables m and n. I've fixed this particular bug. However such free variables are not really supported, the generated gradients should be correct (although I expect more bugs since I haven't thoroughly tested this case), but I would expect their performance to be much worse.

kaitingwang · 2019-02-05T21:16:10Z

@sgrechanik-h I tried the following testcase and am puzzled about why you needed to 'shift' the identity matrix to avoid the out-of-bound situation (i.e. allocating 10x3x17x3 for identity) even when you had put in the if-guard to prevent this situation. (i.e. if ((((((k1 <= ax2) && ((ax2 - k1) <= 9)) && ((ax2 + -9) <= k1)) && (k1 <= ax2)) && (ax2 <= 11))))

  1 from __future__ import absolute_import
  2 import tvm
  3 import topi
  4
  5 A = tvm.placeholder((15,), name = "A")
  6 B = tvm.compute((10,3), lambda i,j: A[i+j], name = "B")
  7
  8 print("-----------FORWARD-----------------")
  9 print(tvm.PrintTensorRecursively(B))
 10 sout = tvm.create_schedule(B.op)
 11 ir = tvm.lower(sout, [B,A], simple_mode=True)
 12 print(ir)
 13
 14 print("-----------BACKWARD---------------------")
 15 [jacs] = list(tvm.differentiate(B, [A]))
 16 print(tvm.PrintTensorRecursively(jacs))
 17 sjac = tvm.create_schedule(jacs.op)
 18 ir_jacs = tvm.lower(sjac, [jacs], simple_mode=True)
 19 print(ir_jacs)

Lowered code:

// attr [identity] storage_scope = "global"
allocate identity[float32 * 10 * 3 * 17 * 3]
// attr [tensor] storage_scope = "global"
allocate tensor[float32 * 15 * 3 * 10]
produce identity {
  for (ax0, 0, 10) {
    for (ax1, 0, 3) {
      for (ax2, 0, 17) {
        for (ax3, 0, 3) {
          if (likely((2 <= ax2))) {
            if (likely((ax2 < 12))) {
              identity[((((((ax0*3) + ax1)*17) + ax2)*3) + ax3)] = float32((((ax0 - ax2) == -2) && (ax1 == ax3)))
            }
          }
        }
      }
    }
  }
}
produce tensor {
  for (ax2, 0, 15) {
    for (ax1, 0, 3) {
      for (ax0, 0, 10) {
        tensor[((((ax2*3) + ax1)*10) + ax0)] = 0.000000f
        for (k1, 0, 3) {
          if (likely((ax2 < 12))) {
            if ((((((k1 <= ax2) && ((ax2 - k1) <= 9)) && ((ax2 + -9) <= k1)) && (k1 <= ax2)) && (ax2 <= 11))) {
              tensor[((((ax2*3) + ax1)*10) + ax0)] = (tensor[((((ax2*3) + ax1)*10) + ax0)] + identity[(((((ax2 + (ax1*17)) + (ax0*51))*3) - (k1*2)) + 6)])
            }
          }
        }
      }
    }
  }
}

sgrechanik-h · 2019-02-06T09:57:00Z

@kaitingwang This is a problem of TVM. At some point during or immediately after scheduling, TVM analyzes bounds of tensor arguments and expands bounds of tensors if it thinks that out-of-bounds might happen. However, bound analysis is difficult, and this sometimes results in overapproximation. In this case TVM couldn't properly infer the bounds from the conditions (or it may have ignored the conditions completely, I'm not sure). I tried to fix this behavior of TVM once, but failed: #2104

(Note that there is another problem with this example: the identity tensor was not inlined, although this would have been very beneficial. The reason is that I don't perform inlining of adjoints since in the case of gradients the adjoints are rarely sparse and inlining them results in worse performance. However, in the case of arbitrary Jacobians, like in this case, inlining of adjoints is necessary, so I have to think what to do about it.)

sgrechanik-h · 2019-02-15T19:07:24Z

I've updated the PR with a Relay integration. Basically, calling relay._ir_pass.AutogeneratePrimalGradientForAll() will populate the FPrimalGradient attribute of every relay operation with automatically generated gradients. For some operations automatically generated gradients fail with compile or run-time errors, usually this is caused by multi-output operations (which don't seem to be fully supported by the relay ad yet) and by integer inputs and outputs.

sgrechanik-h · 2019-02-18T15:45:12Z

@tqchen zero-elimination is independent and it makes sense to create a separate PR, I will do it soon.

sgrechanik-h · 2019-02-18T15:50:18Z

A link to my comment containing things to discuss before merging.

include/tvm/autodiff.h

merrymercy · 2019-02-19T07:09:26Z

src/op/op_util.cc

@@ -206,6 +206,34 @@ Expr ReplaceTensor(Expr expr,
 }


+void ReplaceTensorRecursivelyImpl(Tensor tensor,
+                                  std::unordered_map<Tensor, Tensor>* replace) {


Should we use std::unordered_map<Tensor, Tensor, NodeHash, NodeEqual>*?

I don't think so. The Tensor class has its own coarser notion of equality and it has its own specialization of std::hash. I think using the ordinary reference equality may cause problems even for this particular function.

src/pass/autodiff.cc

tutorials/language/autodiff_basics.py

src/relay/op/autodiff_integration.cc

sgrechanik-h · 2019-02-20T15:59:01Z

Zero elimination PR #2634

altanh · 2019-05-14T22:05:30Z

@sgrechanik-h @tqchen Hi, what is the status of this and #2634, #2588? In trying to implement the missing primal gradients for #2562 (to enable training), it seems like there will be a high per-operator time cost, so we'd like to understand where this AD (along with the other features) is in terms of progress.

sgrechanik-h · 2019-05-15T16:32:17Z

@altanh I've recently implemented a solver for integer systems of equations which improved performance of the code generated by the autodiff, however this triggered some bugs which are probably connected to missimplification by the Halide simplifier (due to different division semantics), so I'm currently moving the autodiff to the new tvm simplifiers. When I'm done, I'll update the pull requests.

altanh · 2019-05-17T00:15:03Z

@sgrechanik-h that sounds great, thank you for the update! For now we'll continue working through the list of gradients and implementing them by hand, since the consensus is that they'll be invaluable for future things like optimizing after AD, higher order AD, etc.

yzhliu · 2019-07-21T18:48:11Z

@sgrechanik-h Does this PR realize the linear inequality solver (to eliminate Kronecker delta within summation) described in the paper you mentioned?

sgrechanik-h · 2019-07-22T05:58:31Z

@yzhliu Yes, it does. The more up-to-date autodiff-dev branch also implements the solver for systems of linear equations (and should work on top of the current TVM master).

kaitingwang

Perhaps this is not possible since softmax is not an element-wise operation. Therefore, one would need to fully complete the forward softmax computation, in order to use it in the backward computation.

kaitingwang · 2019-08-12T23:28:25Z

src/pass/autodiff.cc

+        return Mul::make(Mutate(op->args[0]), e);
+      } else if (op->name == "log") {
+        return Div::make(Mutate(op->args[0]), op->args[0]);
+      } else if (op->name == "sigmoid") {


@sgrechanik-h I like this approach to express the gradient of sigmoid in terms of forward sigmoid (i.e. held in "e") I'm wondering how to extend this for softmax which is a bit trickier. The reason is that the gradient for softmax can also be expressed as forward softmax but it depends if 'i == j'

How would you express this here?

PS. the softmax gradient is taken from : https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/

Yes, since softmax is not elementwise, we can't express it here, so I think it should be done through the gradient overriding mechanism (or on the Relay level).

I agree. Although softmax can be inside a large expression tree (e.g. focalloss) and as such, we'd want to autodiff that expression tree (using the Jacobian logic you have there), but when encountering the softmax operator, use the relay overridden softmax gradient instead. How would you invoke a relay registered gradient from the Jacobian function?

I think there are several problems:

How to override differentiation rules for some specific tensor: in my implementation there is a mechanism for this, e.g. the option override which can be used as described here

How to detect the operation for which we want to override gradients, in this case softmax. The overriding mechanism simply compares tensors by reference, but this equality seems to be too fine-grained in practice, because sortmax(A) and softmax(B) are two different tensors on the tensor expression level. I think the practical solution would be to perform gradient overriding by the tensor's tag or name, however currently it is not implemented.

How to reuse gradients implemented in Relay. I don't know, but I think it should be relatively straightforward to lower a Relay graph representing a gradient into a tensor expression (however I have no idea what to do with schedules).

Thanks for the reply! Took me a while to understand but I like your clever design that when override is specified, fdiff is essentially a wrapper. That is, depending on whether the out tensor is specified in the override or not, one would either use the default fdiff, or the one specified inside the override dictionary. However, I found your example below unintuitive:

# a generalization of my_fdiff which works for non-immediate dependencies # this is necessary because z1 is not an immediate dep of z2 because of padding def my_diff(out, inputs, head): return tvm.differentiate(out, inputs, head, fdiff=my_fdiff) # using a custom differentiation function only for z2 res = tvm.differentiate(y, [w1, w2], override={z2: ([z1, w2], my_diff)})

If I write my own custom fdiff for z2 w.r.t. z1, or w2, I'd prefer that my_diff returning a tensor (i.e. a tvm.compute that is the custom gradient) instead of DifferentiationResult. (not sure how to construct such object in python...)

Actually, I jut realize that I can indeed do so, by putting the custom gradient tensors into a list and returning that from my_diff. DifferentiationResult is simply a list of tensors!

MarisaKirisame · 2019-08-14T14:55:02Z

@sgrechanik-h what if you use a more traditional method of doing automatic differentiation - instead of working on sparse tensor, generate tvmir code with mutation and do the standard wengert list there. See Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, Second Edition for example.

sgrechanik-h · 2019-08-14T16:53:03Z

@MarisaKirisame I guess it will result in a lots of += operations which will require synchronization if we want to parallelize the resulting code.

merrymercy · 2020-02-09T00:23:22Z

@sgrechanik-h What's the current status of this project? Are there any updates?

sgrechanik-h · 2020-02-09T04:50:08Z

@merrymercy Unfortunately I don't have time to work on it now. The last time I rebased my branch on the latest TVM and checked that it still worked was 2 months ago (the branch is here: https://github.com/sgrechanik-h/tvm/tree/autodiff-dev).

yzhliu · 2020-02-10T18:46:54Z

@sgrechanik-h would you mind if someone take over and rebuild it on top of you work?

sergei-grechanik · 2020-02-11T02:53:28Z

@yzhliu Yeah, absolutely, I don't mind.

yzhliu · 2020-03-06T08:04:45Z

@sgrechanik-h I have difficulty understand this part, would be helpful if you can provide some guide,

say we have A x = b => U^T S V^T x = b the solution for x is V S^{-1} U b + K z, in which K is the right m-r columns of V and z is arbitrary vector. in my understanding,

old_to_new is V^T
matrix is S
rhs is U b
solution[] in the code is V^T V S^{-1} U b + V^T K z

Then the for-loops in the code calculates V^T * solution, how does it equal to original x ?

sergei-grechanik · 2020-03-06T16:47:28Z

@yzhliu solution in my code seems to be S^{-1} U b + V^{-1} K z, i.e. it is the solution for y of S y = U b (where y = V^{-1} x). Sorry for the confusing naming.

yzhliu · 2020-03-06T18:55:58Z

@sgrechanik-h thanks for reply. that's the same as what I thought. if solution = V^{-1} x, what does old_to_new * solution mean? In my understanding old_to_new = V^{-1}, then old_to_new * solution = V^{-2} x, did I miss something?

sergei-grechanik · 2020-03-06T19:17:31Z

@yzhliu Oh, I guess old_to_new must be V then, not V^{-1}

tqchen · 2020-03-30T20:54:01Z

Superseded by #5121

sgrechanik-h added 2 commits January 23, 2019 16:24

[TVM] Automatic differentiation for tensor expressions

a0822d3

Fix the failing tests

0de13b9

reminisce reviewed Jan 25, 2019

View reviewed changes

Fix topi.take

9f99cd7

sergei-mironov reviewed Jan 29, 2019

View reviewed changes

sgrechanik-h added 3 commits January 29, 2019 14:54

Fix a python2 syntax error

0775a4e

Move autodiff.h to include/tvm

d90323e

Reduce the probability of test failure because of integer overflow

204f2f7

Fix a problem with free vars

952629b

ehsanmok mentioned this pull request Feb 4, 2019

More Strongly-Typed Rust Bindings #2559

Closed

3 tasks

reminisce mentioned this pull request Feb 5, 2019

[RELAY] Add primal gradients for Relay operators. #2562

Closed

86 tasks

sgrechanik-h added 2 commits February 15, 2019 19:05

[AD] More intrinsics; Fixed treatment of ints

75418c7

[AD] Autodiff/relay integration

cf5083a

Simplified overriding of gradients; Tutorial and docs

5ae4ac1

merrymercy reviewed Feb 19, 2019

View reviewed changes

Several fixes

a342e5e

sgrechanik-h mentioned this pull request Feb 20, 2019

[TVM] Zero elimination #2634

Closed

yzhliu mentioned this pull request Mar 2, 2019

[DEV] TVM v0.6 Roadmap #2623

Closed

28 tasks

merrymercy mentioned this pull request Apr 2, 2019

[RFC][AUTOTVM] Auto-Schedule from Compute Declaration #2954

Closed

4 tasks

kaitingwang reviewed Aug 13, 2019

View reviewed changes

tqchen unassigned merrymercy Nov 4, 2019

tqchen mentioned this pull request Nov 8, 2019

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

tqchen closed this Mar 30, 2020

magic3007 mentioned this pull request May 16, 2020

Meeting Progress[04/25/2020] magic3007/AutoGrad#4

Open

[TVM] Automatic differentiation for tensor expressions #2498

[TVM] Automatic differentiation for tensor expressions #2498

Conversation

sgrechanik-h commented Jan 23, 2019

masahi commented Jan 24, 2019

sgrechanik-h commented Jan 24, 2019

kaitingwang commented Jan 24, 2019

Ravenwater commented Jan 24, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sgrechanik-h commented Jan 25, 2019

reminisce commented Jan 25, 2019

sgrechanik-h commented Jan 28, 2019

kaitingwang commented Jan 28, 2019

sgrechanik-h commented Jan 28, 2019

reminisce commented Jan 29, 2019

sgrechanik-h commented Jan 29, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reminisce commented Jan 29, 2019

reminisce commented Jan 29, 2019

sgrechanik-h commented Jan 30, 2019

kaitingwang commented Feb 5, 2019

sgrechanik-h commented Feb 6, 2019

sgrechanik-h commented Feb 15, 2019

sgrechanik-h commented Feb 18, 2019

sgrechanik-h commented Feb 18, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sgrechanik-h commented Feb 20, 2019

altanh commented May 14, 2019

sgrechanik-h commented May 15, 2019

altanh commented May 17, 2019

yzhliu commented Jul 21, 2019

sgrechanik-h commented Jul 22, 2019

kaitingwang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaitingwang Aug 14, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaitingwang Aug 21, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarisaKirisame commented Aug 14, 2019

sgrechanik-h commented Aug 14, 2019

merrymercy commented Feb 9, 2020 • edited Loading

sgrechanik-h commented Feb 9, 2020

yzhliu commented Feb 10, 2020

sergei-grechanik commented Feb 11, 2020

yzhliu commented Mar 6, 2020

sergei-grechanik commented Mar 6, 2020

yzhliu commented Mar 6, 2020

sergei-grechanik commented Mar 6, 2020

tqchen commented Mar 30, 2020 • edited Loading

sgrechanik-h commented Jan 29, 2019 •

edited

Loading

kaitingwang Aug 14, 2019 •

edited

Loading

kaitingwang Aug 21, 2019 •

edited

Loading

merrymercy commented Feb 9, 2020 •

edited

Loading

tqchen commented Mar 30, 2020 •

edited

Loading