[Feature] Add ParallelGraph executor mode in parallelexecutor to improve performance #14791

Yancey1989 · 2018-12-07T08:12:51Z

Backgroup

The default executor type in ParallelExecutor would schedule op_handle in the threadpool, but the op_handles switching would reduce the performance while the operation execution time is very short (20 ~ 100 us on CPU in ResNet50 model).

CPU kernal duration VS. GPU kernal duration

GPUs	CPU kernal duration	GPU kernal duration	conv2d(CPU)	conv2d(GPU)
2GPUs, 1CPU	180 ms	277 ms	78.632 ns	216.535 ns
2GPUs, 2CPUs	286 ms	277 ms	113.573 ns	216.555 ns

Scheduling op_handle in threadpool lasts longger than non-threadpool.

fake data VS. real data

GPUs/batch_time	vis-reader	vis-reader + test_mode	fake_data
8GPUs, 8CPUs	254ms	233ms	230 ms

IO is not the biggest bottleneck

This PR trying to implement another executor type called ParallelGraph in ParallelExecutor, the different with the default executor in ParallelExecutor is as follows:

Convert the main_program into N * graph, which N is the number of devices, and
ParallelGraph would run each of the graphs on one thread.

Experiment

Test env:

GPU: 8 * V100
Model: ResNet50
Dataset: ImageNet, batch_size is 32 for each GPU, fetch for every 30 iters.

Test cases:

throughput with fake_data on qianmo vm

GPUs/thoughtput	Default executor	ParallelGraph executor
1 GPUs, 1 CPUs	268(1)	268(1)
2 GPUs, 2 CPUs	305(1.14)	507(1.89)
4 GPUs, 4 CPUs	633(2.36)
8 GPUs, 8 CPUs	1150(4.29)	1874(6.9)

throughput with vis-reader on qianmo vm

GPUs/thoughtput	Default executor	ParallelGraph executor
1 GPU, 1 CPU	264	264
8 GPUs, 8 CPUs	976(3.69)	1559(5.9)

throughtput with vis-reader on PaddleCloud

GPUs/thoughtput	default executor	ParallelGraph executor	Multiple processes
8 GPUs, 8 CPUs,bs=32	1293	1733(+34%)	1736(+34%)

Test on Transformer model

GPUs/thoughtput	default executor	ParallelGraph executor
8GPUs,8CPUs,bs=4096	80,478	84,869(+5%)

TODO:

Support GPU parallel training and nccl2 distributed training mode.
Fix nccl allreduce hang if the training data is empty on some devices.
ParalleGraph mode support CPU training.
Support PServer distributed mode.

panyx0718

I like the idea of having each thread executing it's own ops. But the implementation is a little confusing. Perhaps we can have a better way to implement it.

panyx0718 · 2018-12-10T01:48:51Z

paddle/fluid/framework/threadpool.h

@@ -14,6 +14,7 @@ limitations under the License. */

 #pragma once

+#include <pthread.h>


why change here?

Just a test, I tried to increase the priority of the threads, http://man7.org/linux/man-pages/man3/pthread_setschedparam.3.html , will delete this header file.

panyx0718 · 2018-12-10T01:50:41Z

paddle/fluid/platform/profiler.cc

  if (g_state == ProfilerState::kDisabled) return;
+  std::lock_guard<std::mutex> l(profiler_mu);


Does this matter? it's wrong to put it here.

Maybe don't need to check the mutex when disabling the profiler, it will decrease the performance, and disable/enable the profiler only happens at the beginning or end of each batch training.

panyx0718 · 2018-12-10T01:51:16Z

paddle/fluid/platform/profiler.cc

  if (g_state == ProfilerState::kDisabled || !is_enabled_) return;
+  VLOG(5) << "call ~RecordEvent";
+  std::lock_guard<std::mutex> l(profiler_mu);


same as above

#14791 (comment)

panyx0718 · 2018-12-10T01:52:08Z

paddle/fluid/pybind/pybind.cc

+          [](ExecutionStrategy &self, ExecutionStrategy::ExecutorType type) {
+            self.type_ = type;
+          },
+          R"DOC()DOC");


Can you add more doc to describe kParallelGraph type?

Sure, this PR is WIP, I updated the description of this PR, and add a todo list.

panyx0718 · 2018-12-10T01:56:01Z

paddle/fluid/framework/parallel_executor.cc

+    } else if (exec_strategy.type_ == ExecutionStrategy::kParallelGraph) {
+      nccl_id.reset(new ncclUniqueId());
+      PADDLE_ENFORCE(platform::dynload::ncclGetUniqueId(nccl_id.get()));
+      *member_->global_scope_->Var(NCCL_ID_VARNAME)


why can't nccl_id_varname be created like above?

panyx0718 · 2018-12-10T01:57:01Z

paddle/fluid/framework/details/all_reduce_op_handle.cc

+          // only used in executor_type == ParallalGraph, one thread one GPU
+          // TODO(Yancey1989): use allreduce operator to avoid this tricky.
+          PADDLE_ENFORCE(all_reduce_calls.size() == 1UL);
+          all_reduce_calls[0]();


why is this different?

It will hang when using group call of nccl operators, and the example codes of NCCL wouldn't use the gropu call: https://docs.nvidia.com/deeplearning/sdk/nccl-developer-guide/docs/examples.html#example-1-one-device-per-process-or-thread .

sneaxiy · 2018-12-10T02:20:21Z

paddle/fluid/framework/details/all_reduce_op_handle.cc

+          // only used in executor_type == ParallalGraph, one thread one GPU
+          // TODO(Yancey1989): use allreduce operator to avoid this tricky.
+          PADDLE_ENFORCE(all_reduce_calls.size() == 1UL);
+          all_reduce_calls[0]();


It seems that it is easy to produce deadlock as described in https://arxiv.org/pdf/1706.02677.pdf

Sure, the same problem with distributed training with NCCL2 collective mode, and we need to fix the order of the all-reduce operators, @gongweibao has a PR to do that #14586 , I will do more test with that feature after this PR merged.

sneaxiy · 2018-12-10T02:22:04Z

paddle/fluid/framework/details/parallel_ssa_graph_executor.cc

+    auto call = [this, i] {
+      // FIXME(Yancey1989): need to fix fetch data failed.
+      std::vector<std::string> empty;
+      executors_[i]->Run(empty);


You need to make ParallelSSAGraphExecutor exception-safe, because some exception is acceptable such as EOFException caused by py_reader when one pass ends.

sneaxiy · 2018-12-10T02:24:33Z

paddle/fluid/framework/threadpool.h

@@ -14,6 +14,7 @@ limitations under the License. */

 #pragma once

+#include <pthread.h>


Do not use pthread.h which is only POSIX-compatible. Try to use standard C++ headers #include <thread>.

I will delete this header file, #14791 (comment)

chengduoZH · 2018-12-10T02:56:43Z

paddle/fluid/framework/parallel_executor.cc

@@ -106,31 +110,56 @@ ParallelExecutor::ParallelExecutor(
 // Bcast Parameters to all GPUs
 #if defined(PADDLE_WITH_CUDA) && !defined(_WIN32)
    auto *nccl_id_var = scope->FindVar(NCCL_ID_VARNAME);
-    ncclUniqueId *nccl_id = nullptr;
+    std::unique_ptr<ncclUniqueId> nccl_id = nullptr;


Why do you use std::unique_ptr here?

Will try to avoid that.

…l_graph_mode

Yancey1989 · 2018-12-11T02:21:22Z

I like the idea of having each thread executing it's own ops. But the implementation is a little confusing. Perhaps we can have a better way to implement it. -- FROM @panyx0718

Just to reuse the code of multi_devices_pass and maybe implement another Pass is a good idea.

chengduoZH · 2018-12-12T03:26:33Z

paddle/fluid/framework/details/parallel_ssa_graph_executor.cc

+      graphs_(std::move(graphs)) {
+  PADDLE_ENFORCE_EQ(places_.size(), local_scopes_.size());
+  // do not use threadpool for each graph execution.
+  strategy_.num_threads_ = 1UL;


Do you compare the performance between different num_threads?

Don't use the threadpool can achieve better performance, and I add it as a configurable argument.

This is the result on fake_data:

num_threds throughput

1 1841.10691

2 1648.53907

dzhwinter · 2018-12-12T05:15:50Z

paddle/fluid/framework/details/parallel_ssa_graph_executor.cc

+      }
+    }
+  }
+


is there any issue here?
when a exception is throwed, it will keep running until line-80.

thanks, done.

…l_graph_mode test=develop

typhoonzero · 2018-12-14T04:47:47Z

paddle/fluid/framework/details/all_reduce_op_handle.cc

-        for (auto &call : all_reduce_calls) {
-          call();
+        // TODO(Yancey1989): need allreduce operator to avoid this flag
+        if (nccl_ctxs_->need_group_call_) {


Why not just use all_reduce_calls.size() == 1UL

good idea, done.

typhoonzero · 2018-12-14T06:10:32Z

paddle/fluid/framework/details/op_handle_base.cc

@@ -52,7 +52,6 @@ void OpHandleBase::Run(bool use_cuda) {
 #else
  PADDLE_ENFORCE(!use_cuda);
 #endif
-


can remove these unnecessary changes

typhoonzero · 2018-12-14T06:17:09Z

paddle/fluid/framework/details/multi_devices_graph_pass.cc

@@ -386,7 +386,16 @@ std::unique_ptr<ir::Graph> MultiDevSSAGraphBuilder::ApplyImpl(
          CreateComputationalOps(&result, node, places_.size());
        }

+// insert synchronous ops at the backpropagation; and


synchronous ops => collective ops

typhoonzero · 2018-12-14T06:26:07Z

paddle/fluid/framework/details/parallel_ssa_graph_executor.cc

+    };
+
+    if (pool_) {
+      run_futures.emplace_back(pool_->enqueue(std::move(call)));


since the number of tasks to run is determined, can just use a set of threads to avoid enqueues

ThreadPool can avoid creating the threads at the begging of each batch.

typhoonzero · 2018-12-14T06:32:25Z

paddle/fluid/framework/parallel_executor.cc

 #endif

-  auto max_memory_size = GetEagerDeletionThreshold();


should add this back?

typhoonzero · 2018-12-14T06:37:44Z

paddle/fluid/pybind/pybind.cc

-      });
+                    })
+      .def_property(
+          "executor_type",


executor_type is a too general name which cannot provide enough information for the API

We can discuss the name, and I also added some description about this field.

Actually, ParallelGraph executor is one of the executors (Default, FastTheaded and more in the future).
@panyx0718 do you have any good idea?

Using build_strategy.enable_parallel_graph to enable/disable the parallel graph.

…l_graph_mode

panyx0718

@sneaxiy @chengduoZH please review

panyx0718 · 2018-12-27T05:19:59Z

paddle/fluid/framework/details/all_reduce_op_handle.cc

+    ReduceLoDTensor func(lod_tensors, &trg);
+    VisitDataType(lod_tensors[0]->type(), func);
+
+    for (size_t i = 1; i < local_scopes_.size(); ++i) {


still work for cpu?

No, CPU allreduce is implemented by TensorCopy, need a barrier to wait for the input var ready which from difference devices(graphs).

panyx0718 · 2018-12-27T05:24:53Z

paddle/fluid/framework/details/parallel_ssa_graph_executor.cc

+      places_(std::move(places)),
+      graphs_(std::move(graphs)) {
+  PADDLE_ENFORCE_EQ(places_.size(), local_scopes_.size());
+  // do not use threadpool for each graph execution.


how do you enforce this?

Will delete this comment, it's an optional argument.

panyx0718 · 2018-12-27T05:27:16Z

paddle/fluid/framework/details/parallel_ssa_graph_executor.cc

+        fetch_datas.emplace_back(std::move(f.get()));
+      }
+    }
+  }


It's the sync execution if the pool_ is nullptr.

panyx0718 · 2018-12-27T05:28:55Z

paddle/fluid/framework/details/parallel_ssa_graph_executor.cc

+      lodtensor_ptrs.push_back(&fetch_datas.at(scope_idx).at(fetch_idx));
+    }
+    ret.emplace_back();
+    ret.back().MergeLoDTensor(lodtensor_ptrs, platform::CPUPlace());


Does ParallelExecutor's fetch merge lodtensor?

fetch_op_handle only placed on one device for the ParallelGraph mode, should merge them here.

panyx0718 · 2018-12-27T05:31:18Z

paddle/fluid/framework/parallel_executor.cc

+        "build_strategy.reduce should be `AllReduce` if you want to enable"
+        "ParallelGraph.");
+    PADDLE_ENFORCE(
+        member_->use_cuda_,


why only support gpu?

#14791 (comment)

panyx0718 · 2018-12-27T05:32:45Z

paddle/fluid/framework/parallel_executor.cc

    if (nccl_id_var != nullptr) {
      nccl_id = nccl_id_var->GetMutable<ncclUniqueId>();
    }
+    if (build_strategy.enable_parallel_graph_ && places.size() > 1) {


why use places, not num_parallel_devices?

panyx0718 · 2018-12-27T05:33:22Z

paddle/fluid/framework/parallel_executor.cc

+  // Step 2. Convert main_program to SSA form and dependency graph. Also, insert
+  // ncclOp
+  std::vector<std::unique_ptr<ir::Graph>> graphs;
+  member_->num_parallel_devices_ = member_->places_.size() * num_trainers;


why compute num_parallel_devices again?

Sorry, it's a duplicate code, will delete it.

panyx0718 · 2018-12-27T05:38:23Z

python/paddle/fluid/tests/unittests/test_dist_base.py

@@ -442,10 +442,10 @@ def _run_cluster_nccl2(self, model, envs, check_error_log):
        tr_cmd = "%s %s --role trainer --endpoints %s --trainer_id %d --current_endpoint %s --update_method nccl2 --lr %f"
        tr0_cmd = tr_cmd % \
                  (self._python_interp, model, self._ps_endpoints,
-                   0, w0_ep, self._lr / 2)
+                   0, w0_ep, self._lr)


why change this?

Remark 3: Normalize the per-worker loss by total minibatch size kn, not per-worker size n.

FROM https://arxiv.org/abs/1706.02677.

The scale on loss instead of LR can get better acc.

panyx0718 · 2018-12-27T05:44:50Z

paddle/fluid/framework/parallel_executor.cc

@@ -198,6 +199,17 @@ ParallelExecutor::ParallelExecutor(
                   "the number of places must be greater than 1.");
  }

+  if (build_strategy.enable_parallel_graph_) {


which exe_strategy does it check?

panyx0718 · 2018-12-27T05:49:53Z

paddle/fluid/platform/nccl_helper.h

@@ -106,7 +106,7 @@ struct NCCLContextMap {
    }
    std::unique_ptr<ncclComm_t[]> comms(new ncclComm_t[order_.size()]);
    // if num_trainers == 1, should create a new nccl id for local comms.
-    if (num_trainers == 1) {
+    if (num_trainers == 1 && nccl_id == nullptr) {


why change this? Is this a bug fix?

Not a bug, ParallelGraph should initialize NCCl by ranks mode with the same nccl_id.

panyx0718 · 2018-12-27T06:04:47Z

python/paddle/fluid/tests/unittests/test_parallel_executor_mnist.py


    def test_batchnorm_fc(self):
        for use_cuda in (False, True):
            for use_fast_executor in (False, True):
                self.check_batchnorm_fc_convergence(use_cuda, use_fast_executor)

+        self.check_batchnorm_fc_convergence(
+            use_cuda=True, use_fast_executor=False, use_parallel_graph=True)
+
    def test_batchnorm_fc_with_new_strategy(self):
        # FIXME(zcd): close this test temporally.


@chengduoZH

…l_graph_mode

…l_graph_mode test=develop

chengduoZH · 2019-01-01T12:33:20Z

paddle/fluid/framework/details/parallel_ssa_graph_executor.cc

+      if (exception_holder_.IsCaught()) {
+        f.wait();
+      } else {
+        fetch_datas.emplace_back(std::move(f.get()));


data is a plural form.

chengduoZH · 2019-01-01T12:44:59Z

paddle/fluid/framework/parallel_executor.cc

+  if (build_strategy.enable_sequential_execution_ ||
+      exec_strategy.type_ == ExecutionStrategy::ExecutorType::kExperimental)
+    enable_parallel_graph = false;
+  return enable_parallel_graph && FLAGS_enable_parallel_graph;


The code order of EnableParallelGraphExecution can be refined.
e.g. if FLAGS_enable_parallel_graph is False, it can return directly.

…l_graph_mode test=develop

chengduoZH

This PR can be merge first, but there are several problems to be solved in next PR: if some executor doesn't have training data, and other executors is not notified, in this case the program will be hang during NCCL AllReduce.
This PR doesn't affect the default behavior of Parallel Executor.

Yancey1989 · 2019-01-03T07:20:33Z

Thanks @chengduoZH , will fix the ParallelExecutor hang if some devices have no enough training data , before that we can enable the ParallelGraph mode by setting the env:

FLAGS_enable_parallel_graph=1 FLAGS_sync_nccl_allreduce=1 ...

And this PR need @panyx0718 's approve since the const_cast.

panyx0718

We want to have a more automatic way of handling different build and execution strategy

panyx0718 · 2019-01-03T09:28:00Z

paddle/fluid/framework/details/all_reduce_op_handle.cc

+// asynchronous nccl allreduce or synchronous issue:
+// https://github.com/PaddlePaddle/Paddle/issues/15049
+DEFINE_bool(
+    sync_nccl_allreduce, false,


still not quite comfortable with this flag

Will enable the ParallelGraph mode + async NCCL by default if fixed the NCCL hang issue...

Yancey1989 added 5 commits December 6, 2018 21:16

init parallel graph mode

c9de6f1

clean code

cb8a24b

clean code

220db4f

update

73edf13

fix performance

47740ac

Yancey1989 changed the title ~~Add ParallelGraph executor mode in parallelexecutor to improve performance~~ [WIP, Feature] Add ParallelGraph executor mode in parallelexecutor to improve performance Dec 7, 2018

Yancey1989 requested review from typhoonzero and panyx0718 December 9, 2018 15:30

panyx0718 reviewed Dec 10, 2018

View reviewed changes

panyx0718 requested review from chengduoZH and sneaxiy December 10, 2018 01:58

sneaxiy reviewed Dec 10, 2018

View reviewed changes

chengduoZH reviewed Dec 10, 2018

View reviewed changes

Yancey1989 added 2 commits December 10, 2018 13:13

Merge branch 'develop' of github.com:PaddlePaddle/Paddle into paralle…

2dda19f

…l_graph_mode

fix pyreader failed

79082c9

Yancey1989 added 2 commits December 11, 2018 19:27

exception safe

8272640

update by comment

5cc83f7

chengduoZH reviewed Dec 12, 2018

View reviewed changes

dzhwinter reviewed Dec 12, 2018

View reviewed changes

Yancey1989 added 3 commits December 12, 2018 19:57

add unittest for parllelgraph mode test=develop

106e285

fix ci test=develop

23eb8c4

fix api.spec test=develop

affdd70

Yancey1989 changed the title ~~[WIP, Feature] Add ParallelGraph executor mode in parallelexecutor to improve performance~~ [Feature] Add ParallelGraph executor mode in parallelexecutor to improve performance Dec 13, 2018

Yancey1989 added 3 commits December 13, 2018 15:00

fix api.spec test=develop

1bac8f9

Merge branch 'develop' of github.com:PaddlePaddle/Paddle into paralle…

c722b1d

…l_graph_mode test=develop

fix unittest test=develop

4f304ea

typhoonzero reviewed Dec 14, 2018

View reviewed changes

Yancey1989 added 6 commits December 26, 2018 14:32

Merge branch 'develop' of github.com:PaddlePaddle/Paddle into paralle…

41a64f6

…l_graph_mode

cleanup code

845bfd5

delete comment code

28cdfbc

enable gc

495e73d

Merge branch 'develop' of github.com:PaddlePaddle/Paddle into paralle…

86bb583

…l_graph_mode

fix lr scale test=develop

a8612ad

Yancey1989 force-pushed the parallel_graph_mode branch from 6262e3d to a8612ad Compare December 26, 2018 09:57

Yancey1989 added 2 commits December 26, 2018 19:01

fix unittest test=develop

1a4f79a

fix nccl unittest acc test=develop

8cad371

panyx0718 reviewed Dec 27, 2018

View reviewed changes

Yancey1989 added 6 commits December 27, 2018 17:25

Merge branch 'develop' of github.com:PaddlePaddle/Paddle into paralle…

4743c9c

…l_graph_mode

selecte execution according to strategy test=develop

ca8c77d

Merge branch 'develop' of github.com:PaddlePaddle/Paddle into paralle…

0a885ac

…l_graph_mode test=develop

polish unittest test=develop

82b42e3

fix unittest test=develop

35cda13

polish unittest test=develop

af91444

chengduoZH reviewed Jan 1, 2019

View reviewed changes

Yancey1989 added 5 commits January 2, 2019 15:03

update by comment

94c8034

Merge branch 'develop' of github.com:PaddlePaddle/Paddle into paralle…

e654361

…l_graph_mode test=develop

disable parallel graph executor by default

db60339

disable parallelgraph mode by default test=develop

449bf58

disable sync nccl by default test=develop

4ad9de7

chengduoZH approved these changes Jan 3, 2019

View reviewed changes

panyx0718 approved these changes Jan 3, 2019

View reviewed changes

panyx0718 reviewed Jan 3, 2019

View reviewed changes

Yancey1989 merged commit a1e60ab into PaddlePaddle:develop Jan 3, 2019

Yancey1989 deleted the parallel_graph_mode branch January 3, 2019 10:42

Yancey1989 mentioned this pull request Jan 7, 2019

Enable ParallelGraph mode in ParallelExecutor #15200

Closed

		@@ -14,6 +14,7 @@ limitations under the License. */

		#pragma once

		#include <pthread.h>

		if (g_state == ProfilerState::kDisabled) return;
		std::lock_guard<std::mutex> l(profiler_mu);

[Feature] Add ParallelGraph executor mode in parallelexecutor to improve performance #14791

[Feature] Add ParallelGraph executor mode in parallelexecutor to improve performance #14791

Conversation

Yancey1989 commented Dec 7, 2018 • edited by wanghuancoder

Backgroup

Experiment

panyx0718 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Yancey1989 commented Dec 11, 2018 • edited

Choose a reason for hiding this comment

Yancey1989 Dec 17, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Yancey1989 Dec 14, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

panyx0718 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Yancey1989 Dec 27, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Yancey1989 Dec 27, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chengduoZH left a comment

Choose a reason for hiding this comment

Yancey1989 commented Jan 3, 2019 • edited

panyx0718 left a comment

Choose a reason for hiding this comment

Yancey1989 commented Dec 7, 2018 •

edited by wanghuancoder

Yancey1989 commented Dec 11, 2018 •

edited

Yancey1989 Dec 17, 2018 •

edited

Yancey1989 Dec 14, 2018 •

edited

Yancey1989 Dec 27, 2018 •

edited

Yancey1989 Dec 27, 2018 •

edited

Yancey1989 commented Jan 3, 2019 •

edited

Yancey1989 Jan 3, 2019 •

edited