Skip to content

Commit

Permalink
Merge pull request #325 from belonHan/master
Browse files Browse the repository at this point in the history
3 doc - 20190301
  • Loading branch information
wizardforcel committed Mar 1, 2019
2 parents b225008 + 9767b21 commit c740778
Show file tree
Hide file tree
Showing 3 changed files with 125 additions and 120 deletions.
29 changes: 15 additions & 14 deletions docs/1.0/bottleneck.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,36 @@


# torch.utils.bottleneck
> 译者: [belonHan](https://github.com/belonHan)
`torch.utils.bottleneck` is a tool that can be used as an initial step for debugging bottlenecks in your program. It summarizes runs of your script with the Python profiler and PyTorch’s autograd profiler.
`torch.utils.bottleneck`是 调试瓶颈`bottleneck`时首先用到的工具.它总结了python分析工具与PyTorch自动梯度分析工具在脚本运行中情况.

Run it on the command line with
在命令行运行如下命令

```py
python -m torch.utils.bottleneck /path/to/source/script.py [args]

```

where [args] are any number of arguments to `script.py`, or run `python -m torch.utils.bottleneck -h` for more usage instructions.
其中 `[args]``script.py`脚本的参数(任意个数).运行`python -m torch.utils.bottleneck -h`命令获取更多帮助说明.

警告

Warning
请确保脚本在分析时能够在有限时间内退出.

Because your script will be profiled, please ensure that it exits in a finite amount of time.
警告

Warning
当运行CUDA代码时,由于CUDA内核的异步特性, cProfile的输出 和cpu模式的autograd分析工具可能无法显示正确的计时: 报告的CPU时间 是用于启动内核的时间,不包括在GPU上执行的时间。 在常规cpu模式分析器下,同步操作是非常昂贵的。在这种无法准确计时的情况下,可以使用cuda模式的autograd分析工具。

Due to the asynchronous nature of CUDA kernels, when running against CUDA code, the cProfile output and CPU-mode autograd profilers may not show correct timings: the reported CPU time reports the amount of time used to launch the kernels but does not include the time the kernel spent executing on a GPU unless the operation does a synchronize. Ops that do synchronize appear to be extremely expensive under regular CPU-mode profilers. In these case where timings are incorrect, the CUDA-mode autograd profiler may be helpful.
注意

Note
选择查看哪个分析工具的输出结果(CPU模式还是CUDA模式) ,首先应确定脚本是不是CPU密集型`CPU-bound`(“CPU总时间远大于CUDA总时间”)。如果是cpu密集型,选择查看cpu模式的结果。相反,如果大部分时间都运行在GPU上,再查看CUDA分析结果中相应的CUDA操作。

To decide which (CPU-only-mode or CUDA-mode) autograd profiler output to look at, you should first check if your script is CPU-bound (“CPU total time is much greater than CUDA total time”). If it is CPU-bound, looking at the results of the CPU-mode autograd profiler will help. If on the other hand your script spends most of its time executing on the GPU, then it makes sense to start looking for responsible CUDA operators in the output of the CUDA-mode autograd profiler.
当然,实际情况取决于您的模型,可能会更复杂,不属于上面两种极端情况。除了分析结果之外,可以尝试使用`nvprof`命令查看[`torch.autograd.profiler.emit_nvtx()`](autograd.html#torch.autograd.profiler.emit_nvtx "torch.autograd.profiler.emit_nvtx")的结果.然而需要注意NVTX的开销是非常高的,时间线经常会有严重的偏差。

Of course the reality is much more complicated and your script might not be in one of those two extremes depending on the part of the model you’re evaluating. If the profiler outputs don’t help, you could try looking at the result of [`torch.autograd.profiler.emit_nvtx()`](autograd.html#torch.autograd.profiler.emit_nvtx "torch.autograd.profiler.emit_nvtx") with `nvprof`. However, please take into account that the NVTX overhead is very high and often gives a heavily skewed timeline.

Warning
警告

If you are profiling CUDA code, the first profiler that `bottleneck` runs (cProfile) will include the CUDA startup time (CUDA buffer allocation cost) in its time reporting. This should not matter if your bottlenecks result in code much slower than the CUDA startup time.
如果您在分析CUDA代码, `bottleneck`运行的第一个分析工具 (cProfile),它的时间中会包含CUDA的启动(CUDA缓存分配)时间。当然,如果CUDA启动时间远小于代码的中瓶颈,这就被可以忽略。

For more complicated uses of the profilers (like in a multi-GPU case), please see [https://docs.python.org/3/library/profile.html](https://docs.python.org/3/library/profile.html) or [`torch.autograd.profiler.profile()`](autograd.html#torch.autograd.profiler.profile "torch.autograd.profiler.profile") for more information.
更多更复杂关于分析工具的使用方法(比如多GPU),请点击[https://docs.python.org/3/library/profile.html](https://docs.python.org/3/library/profile.html) 或者 [`torch.autograd.profiler.profile()`](autograd.html#torch.autograd.profiler.profile "torch.autograd.profiler.profile").

53 changes: 28 additions & 25 deletions docs/1.0/checkpoint.md
Original file line number Diff line number Diff line change
@@ -1,63 +1,66 @@


# torch.utils.checkpoint
> 译者: [belonHan](https://github.com/belonHan)
注意

Note
checkpointing的实现方法是在向后传播期间重新运行已被checkpint的前向传播段。 所以会导致像RNG这类(模型)的持久化的状态比实际更超前。默认情况下,checkpoint包含了使用RNG状态的逻辑(例如通过dropout),与non-checkpointed传递相比,checkpointed具有更确定的输出。RNG状态的存储逻辑可能会导致一定的性能损失。如果不需要确定的输出,设置全局标志(global flag) `torch.utils.checkpoint.preserve_rng_state=False` 忽略RNG状态在checkpoint时的存取。

Checkpointing is implemented by rerunning a forward-pass segment for each checkpointed segment during backward. This can cause persistent states like the RNG state to be advanced than they would without checkpointing. By default, checkpointing includes logic to juggle the RNG state such that checkpointed passes making use of RNG (through dropout for example) have deterministic output as compared to non-checkpointed passes. The logic to stash and restore RNG states can incur a moderate performance hit depending on the runtime of checkpointed operations. If deterministic output compared to non-checkpointed passes is not required, set the global flag `torch.utils.checkpoint.preserve_rng_state=False` to omit stashing and restoring the RNG state during each checkpoint.

```py
torch.utils.checkpoint.checkpoint(function, *args)
```

Checkpoint a model or part of the model
checkpoint模型或模型的一部分

Checkpointing works by trading compute for memory. Rather than storing all intermediate activations of the entire computation graph for computing backward, the checkpointed part does **not** save intermediate activations, and instead recomputes them in backward pass. It can be applied on any part of a model.
checkpoint通过计算换内存空间来工作。与向后传播中存储整个计算图的所有中间激活不同的是,checkpoint不会保存中间激活部分,而是在反向传递中重新计算它们。它被应用于模型的任何部分。

Specifically, in the forward pass, `function` will run in `torch.no_grad()` manner, i.e., not storing the intermediate activations. Instead, the forward pass saves the inputs tuple and the `function` parameter. In the backwards pass, the saved inputs and `function` is retreived, and the forward pass is computed on `function` again, now tracking the intermediate activations, and then the gradients are calculated using these activation values.
具体来说,在正向传播中,`function`将以`torch.no_grad()`方式运行 ,即不存储中间激活,但保存输入元组和 `function`的参数。在向后传播中,保存的输入变量以及 `function`会被取回,并且`function`在正向传播中被重新计算.现在跟踪中间激活,然后使用这些激活值来计算梯度。

Warning
警告

Checkpointing doesn’t work with [`torch.autograd.grad()`](autograd.html#torch.autograd.grad "torch.autograd.grad"), but only with [`torch.autograd.backward()`](autograd.html#torch.autograd.backward "torch.autograd.backward").

Warning
Checkpointing 在 [`torch.autograd.grad()`](autograd.html#torch.autograd.grad "torch.autograd.grad")中不起作用, 仅作用于 [`torch.autograd.backward()`](autograd.html#torch.autograd.backward "torch.autograd.backward").

If `function` invocation during backward does anything different than the one during forward, e.g., due to some global variable, the checkpointed version won’t be equivalent, and unfortunately it can’t be detected.
警告

Parameters:
如果function在向后执行和前向执行不同,例如,由于某个全局变量,checkpoint版本将会不同,并且无法被检测到。

* **function** – describes what to run in the forward pass of the model or part of the model. It should also know how to handle the inputs passed as the tuple. For example, in LSTM, if user passes `(activation, hidden)`, `function` should correctly use the first input as `activation` and the second input as `hidden`
* **args** – tuple containing inputs to the `function`
参数:

* **function** - 描述在模型的正向传递或模型的一部分中运行的内容。它也应该知道如何处理作为元组传递的输入。例如,在LSTM中,如果用户通过 ,应正确使用第一个输入作为第二个输入(activation, hidden)functionactivationhidden
* **args** – 包含输入的元组function

| Returns: | Output of running `function` on `*args` |
| Returns: | 输出 |
| --- | --- |

```py
torch.utils.checkpoint.checkpoint_sequential(functions, segments, *inputs)
```

A helper function for checkpointing sequential models.
用于checkpoint sequential模型的辅助函数

Sequential models execute a list of modules/functions in order (sequentially). Therefore, we can divide such a model in various segments and checkpoint each segment. All segments except the last will run in `torch.no_grad()` manner, i.e., not storing the intermediate activations. The inputs of each checkpointed segment will be saved for re-running the segment in the backward pass.
Sequential模型按顺序执行模块/函数。因此,我们可以将这样的模型划分为不同的段(segment),并对每个段进行checkpoint。除最后一段外的所有段都将以`torch.no_grad()`方式运行,即,不存储中间活动。将保存每个checkpoint段的输入,以便在向后传递中重新运行该段。

See [`checkpoint()`](#torch.utils.checkpoint.checkpoint "torch.utils.checkpoint.checkpoint") on how checkpointing works.
checkpointing工作方式: [`checkpoint()`](#torch.utils.checkpoint.checkpoint "torch.utils.checkpoint.checkpoint").

Warning
警告

Checkpointing无法作用于[`torch.autograd.grad()`](autograd.html#torch.autograd.grad "torch.autograd.grad"), 只作用于[`torch.autograd.backward()`](autograd.html#torch.autograd.backward "torch.autograd.backward").

Checkpointing doesn’t work with [`torch.autograd.grad()`](autograd.html#torch.autograd.grad "torch.autograd.grad"), but only with [`torch.autograd.backward()`](autograd.html#torch.autograd.backward "torch.autograd.backward").
参数:

Parameters:
* **functions** – 按顺序执行的模型, 一个 [`torch.nn.Sequential`](nn.html#torch.nn.Sequential "torch.nn.Sequential")对象,或者一个由modules或functions组成的list。
* **segments** – 段的数量
* **inputs** – 输入,Tensor组成的元组

* **functions** – A [`torch.nn.Sequential`](nn.html#torch.nn.Sequential "torch.nn.Sequential") or the list of modules or functions (comprising the model) to run sequentially.
* **segments** – Number of chunks to create in the model
* **inputs** – tuple of Tensors that are inputs to `functions`


| Returns: | Output of running `functions` sequentially on `*inputs` |
| Returns: | 按顺序返回每个`*inputs`的结果
| --- | --- |

Example

例子

```py
>>> model = nn.Sequential(...)
Expand Down

0 comments on commit c740778

Please sign in to comment.