diff --git a/beginner_source/Intro_to_TorchScript_tutorial.py b/beginner_source/Intro_to_TorchScript_tutorial.py
index d9b0023a6..e0c551b22 100644
--- a/beginner_source/Intro_to_TorchScript_tutorial.py
+++ b/beginner_source/Intro_to_TorchScript_tutorial.py
@@ -1,33 +1,31 @@
 """
-Introduction to TorchScript
+TorchScript 소개
 ===========================
 
 *James Reed (jamesreed@fb.com), Michael Suo (suo@fb.com)*, rev2
 
-This tutorial is an introduction to TorchScript, an intermediate
-representation of a PyTorch model (subclass of ``nn.Module``) that
-can then be run in a high-performance environment such as C++.
+이 튜토리얼은 C++와 같은 고성능 환경에서 실행될 수 있는
+PyTorch 모델(``nn.Module`` 의 하위클래스)의 중간 표현인
+TorchScript에 대한 소개입니다.
 
-In this tutorial we will cover:
+이 튜토리얼에서는 다음을 다룰 것입니다:
 
-1. The basics of model authoring in PyTorch, including:
+1. 다음을 포함한 PyTorch의 모델 제작의 기본:
 
--  Modules
--  Defining ``forward`` functions
--  Composing modules into a hierarchy of modules
+-  모듈(Modules)
+-  ``forward`` 함수 정의
+-  모듈을 계층 구조로 구성
 
-2. Specific methods for converting PyTorch modules to TorchScript, our
-   high-performance deployment runtime
+2. PyTorch 모듈을 고성능 배포 런타임인 TorchScript로 변환하는 특정 방법
 
--  Tracing an existing module
--  Using scripting to directly compile a module
--  How to compose both approaches
--  Saving and loading TorchScript modules
+-  기존 모듈 추적
+-  스크립트를 사용하여 모듈을 직접 컴파일
+-  두 가지 접근 방법을 구성하는 방법
+-  TorchScript 모듈 저장 및 불러오기
 
-We hope that after you complete this tutorial, you will proceed to go through
-`the follow-on tutorial <https://pytorch.org/tutorials/advanced/cpp_export.html>`_
-which will walk you through an example of actually calling a TorchScript
-model from C++.
+이 튜토리얼을 완료한 후에는
+`다음 학습서 <https://pytorch.org/tutorials/advanced/cpp_export.html>`_
+를 통해 C++에서 TorchScript 모델을 실제로 호출하는 예제를 안내합니다.
 
 """
 
@@ -36,19 +34,18 @@
 
 
 ######################################################################
-# Basics of PyTorch Model Authoring
+# PyTorch 모델 작성의 기초
 # ---------------------------------
 #
-# Let’s start out be defining a simple ``Module``. A ``Module`` is the
-# basic unit of composition in PyTorch. It contains:
+# 간단한 ``모듈`` 을 정의하는 것부터 시작하겠습니다. ``모듈`` 은  PyTorch의
+# 기본 구성 단위입니다. 이것은 다음을 포함합니다:
 #
-# 1. A constructor, which prepares the module for invocation
-# 2. A set of ``Parameters`` and sub-\ ``Modules``. These are initialized
-#    by the constructor and can be used by the module during invocation.
-# 3. A ``forward`` function. This is the code that is run when the module
-#    is invoked.
+# 1. 호출을 위해 모듈을 준비하는 생성자
+# 2. ``매개 변수`` 집합과 하위 ``모듈`` . 이것들은 생성자에 의해 초기화되며
+#    호출 중에 모듈에 의해 사용될 수 있습니다.
+# 3. ``forward`` 함수. 모듈이 호출될 때 실행되는 코드입니다.
 #
-# Let’s examine a small example:
+# 작은 예를 시험해 봅시다:
 #
 
 class MyCell(torch.nn.Module):
@@ -66,22 +63,21 @@ def forward(self, x, h):
 
 
 ######################################################################
-# So we’ve:
+# 우리는 다음 작업을 수행했습니다.:
 #
-# 1. Created a class that subclasses ``torch.nn.Module``.
-# 2. Defined a constructor. The constructor doesn’t do much, just calls
-#    the constructor for ``super``.
-# 3. Defined a ``forward`` function, which takes two inputs and returns
-#    two outputs. The actual contents of the ``forward`` function are not
-#    really important, but it’s sort of a fake `RNN
-#    cell <https://colah.github.io/posts/2015-08-Understanding-LSTMs/>`__–that
-#    is–it’s a function that is applied on a loop.
+# 1. 하위 클래스로 ``torch.nn.Module`` 을 갖는 클래스를 생성했습니다.
+# 2. 생성자를 정의했습니다. 생성자는 많은 작업을 수행하지 않고 ``super`` 로
+#    생성자를 호출합니다.
+# 3. 두 개의 입력을 받아 두 개의 출력을 반환하는 ``forward`` 함수를 정의했습니다.
+#    ``forward`` 함수의 실제 내용은 크게 중요하진 않지만, 가짜 `RNN
+#    cell <https://colah.github.io/posts/2015-08-Understanding-LSTMs/>`__ 의
+#    일종입니다. 즉, 반복(loop)에 적용되는 함수입니다.
 #
-# We instantiated the module, and made ``x`` and ``y``, which are just 3x4
-# matrices of random values. Then we invoked the cell with
-# ``my_cell(x, h)``. This in turn calls our ``forward`` function.
+# 모듈을 인스턴스화하고, 3x4 크기의 랜덤 값들로 이루어진 행렬 ``x`` 와 ``y`` 를 만들었습니다.
+# 그런 다음, ``my_cell(x, h)`` 를 이용해 cell을 호출했습니다. 이것은 ``forward``
+# 함수를 호출합니다.
 #
-# Let’s do something a little more interesting:
+# 좀 더 흥미로운 것을 해봅시다:
 #
 
 class MyCell(torch.nn.Module):
@@ -99,29 +95,27 @@ def forward(self, x, h):
 
 
 ######################################################################
-# We’ve redefined our module ``MyCell``, but this time we’ve added a
-# ``self.linear`` attribute, and we invoke ``self.linear`` in the forward
-# function.
+# 모듈 ``MyCell`` 을 재정의했지만, 이번에는 ``self.linear`` 속성을 추가하고
+# forward 함수에서 ``self.linear`` 을 호출했습니다.
 #
-# What exactly is happening here? ``torch.nn.Linear`` is a ``Module`` from
-# the PyTorch standard library. Just like ``MyCell``, it can be invoked
-# using the call syntax. We are building a hierarchy of ``Module``\ s.
+# 여기서 정확히 무슨 일이 일어납니까? ``torch.nn.Linear`` 은 ``MyCell`` 과
+# 마찬가지로 PyTorch 표준 라이브러리의 ``모듈`` 입니다. 이것은 호출 구문을 사용하여
+# 호출할 수 있습니다. 우리는 ``모듈`` 의 계층을 구축하고 있습니다.
 #
-# ``print`` on a ``Module`` will give a visual representation of the
-# ``Module``\ ’s subclass hierarchy. In our example, we can see our
-# ``Linear`` subclass and its parameters.
+# ``모듈`` 에서 ``print`` 하는 것은 ``모듈`` 의 하위 클래스 계층에 대한
+# 시각적 표현을 제공할 것입니다. 이 예제에서는 ``Linear`` 의 하위 클래스와
+# 하위 클래스의 매개 변수를 볼 수 있습니다.
 #
-# By composing ``Module``\ s in this way, we can succintly and readably
-# author models with reusable components.
+# ``모듈`` 을 이런 방식으로 작성하면, 재사용 가능한 구성 요소를 사용하여
+# 모델을 간결하고 읽기 쉽게 작성할 수 있습니다.
 #
-# You may have noticed ``grad_fn`` on the outputs. This is a detail of
-# PyTorch’s method of automatic differentiation, called
-# `autograd <https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html>`__.
-# In short, this system allows us to compute derivatives through
-# potentially complex programs. The design allows for a massive amount of
-# flexibility in model authoring.
+# 출력된 내용에서 ``grad_fn`` 을 눈치챘을 것입니다. 이것은
+# `autograd <https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html>`__
+# 라 불리는 PyTorch의 자동 미분 방법의 세부 정보입니다. 요컨데, 이 시스템은 잠재적으로
+# 복잡한 프로그램을 통해 미분을 계산할 수 있게 합니다. 이 디자인은 모델 제작에 엄청난
+# 유연성을 제공합니다.
 #
-# Now let’s examine said flexibility:
+# 이제 유연성을 시험해 봅시다.
 #
 
 class MyDecisionGate(torch.nn.Module):
@@ -147,35 +141,34 @@ def forward(self, x, h):
 
 
 ######################################################################
-# We’ve once again redefined our MyCell class, but here we’ve defined
-# ``MyDecisionGate``. This module utilizes **control flow**. Control flow
-# consists of things like loops and ``if``-statements.
+# MyCell 클래스를 다시 정의했지만, 여기선 ``MyDecisionGate`` 를 정의했습니다.
+# 이 모듈은 **제어 흐름** 을 활용합니다. 제어 흐름은 루프와 ``if`` 명령문과 같은
+# 것으로 구성됩니다.
 #
-# Many frameworks take the approach of computing symbolic derivatives
-# given a full program representation. However, in PyTorch, we use a
-# gradient tape. We record operations as they occur, and replay them
-# backwards in computing derivatives. In this way, the framework does not
-# have to explicitly define derivatives for all constructs in the
-# language.
+# 많은 프레임 워크는 완전한 프로그램 표현이 주어지면 심볼릭 미분값(symbolic derivatives)을
+# 계산하는 접근법을 취합니다. 그러나, PyTorch에서는 그라디언트 테이프(gradient tape)를
+# 사용합니다. 연산이 발생할 때 이를 기록하고, 미분값을 계산할 때 거꾸로 재생합니다.
+# 이런 방식으로, 프레임 워크는 언어의 모든 구문에 대한 미분값을 명시적으로
+# 정의할 필요가 없습니다.
 #
 # .. figure:: https://github.com/pytorch/pytorch/raw/master/docs/source/_static/img/dynamic_graph.gif
-#    :alt: How autograd works
+#    :alt: 오토 그라드(autograd)가 작동하는 방식
 #
-#    How autograd works
+#    오토 그라드(autograd)가 작동하는 방식
 #
 
 
 ######################################################################
-# Basics of TorchScript
+# TorchScript의 기초
 # ---------------------
 #
-# Now let’s take our running example and see how we can apply TorchScript.
+# 이제 실행 예제를 살펴보고 TorchScript를 적용하는 방법을 살펴보겠습니다.
 #
-# In short, TorchScript provides tools to capture the definition of your
-# model, even in light of the flexible and dynamic nature of PyTorch.
-# Let’s begin by examining what we call **tracing**.
+# 요컨데, TorchScript는 PyTorch의 유연하고 동적인 특성을 고려하여 모델 정의를
+# 캡쳐할 수 있는 도구를 제공합니다.
+# **추적(tracing)** 이라 부르는 것을 검사하는 것으로 시작하겠습니다.
 #
-# Tracing ``Modules``
+# ``모듈`` 추적
 # ~~~~~~~~~~~~~~~~~~~
 #
 
@@ -196,51 +189,44 @@ def forward(self, x, h):
 
 
 ######################################################################
-# We’ve rewinded a bit and taken the second version of our ``MyCell``
-# class. As before, we’ve instantiated it, but this time, we’ve called
-# ``torch.jit.trace``, passed in the ``Module``, and passed in *example
-# inputs* the network might see.
+# 조금 되감아서 ``MyCell`` 의 두 번째 버전을 가져왔습니다. 이전에 이것을
+# 인스턴스화 했지만 이번엔 ``torch.jit.trace`` 를 호출하고, ``Module`` 을
+# 전달했으며, 네트워크가 볼 수 있는 *입력 예* 를 전달했습니다.
 #
-# What exactly has this done? It has invoked the ``Module``, recorded the
-# operations that occured when the ``Module`` was run, and created an
-# instance of ``torch.jit.ScriptModule`` (of which ``TracedModule`` is an
-# instance)
+# 여기서 무슨 일이 발생했습니까? ``모듈`` 을 호출하였고, ``모듈`` 이 돌아갈 때
+# 발생한 연산을 기록하였고, ``torch.jit.ScriptModule`` 의 인스터스를 생성했습니다.
+# ( ``TracedModule`` 은 인스턴스입니다)
 #
-# TorchScript records its definitions in an Intermediate Representation
-# (or IR), commonly referred to in Deep learning as a *graph*. We can
-# examine the graph with the ``.graph`` property:
+# TorchScript는 일반적으로 딥 러닝에서 *그래프* 라고 하는 중간 표현(또는 IR)에
+# 정의를 기록합니다. ``.graph`` 속성으로 그래프를 검사할 수 있습니다:
 #
 
 print(traced_cell.graph)
 
 
 ######################################################################
-# However, this is a very low-level representation and most of the
-# information contained in the graph is not useful for end users. Instead,
-# we can use the ``.code`` property to give a Python-syntax interpretation
-# of the code:
+# 그러나, 이것은 매우 낮은 수준의 표현이며 그래프에 포함된 대부분의 정보는
+# 최종 사용자에게 유용하지 않습니다. 대신, ``.code`` 속성을 사용하여 코드에
+# 대한 파이썬 구문 해석을 제공할 수 있습니다:
 #
 
 print(traced_cell.code)
 
 
 ######################################################################
-# So **why** did we do all this? There are several reasons:
+# 그래서 우리는 **어째서** 이 모든 것을 했을까요? 여기에는 몇 가지 이유가 있습니다:
 #
-# 1. TorchScript code can be invoked in its own interpreter, which is
-#    basically a restricted Python interpreter. This interpreter does not
-#    acquire the Global Interpreter Lock, and so many requests can be
-#    processed on the same instance simultaneously.
-# 2. This format allows us to save the whole model to disk and load it
-#    into another environment, such as in a server written in a language
-#    other than Python
-# 3. TorchScript gives us a representation in which we can do compiler
-#    optimizations on the code to provide more efficient execution
-# 4. TorchScript allows us to interface with many backend/device runtimes
-#    that require a broader view of the program than individual operators.
+# 1. TorchScript 코드는 기본적으로 제한된 파이썬 인터프리터인 자체 인터프리터에서
+#    호출될 수 있습니다. 이 인터프리터는 글로벌 인터프리터 락(Global Interpreter Lock)을
+#    얻지 않으므로 동일한 인스턴스에서 동시에 많은 요청을 처리할 수 있습니다.
+# 2. 이 형식을 사용하면 전체 모델을 디스크에 저장하고 파이썬 이외의 언어로 작성된
+#    서버와 같은 다른 환경에서 불러올 수 있습니다.
+# 3. TorchScript는 보다 효율적인 실행을 제공하기 위해 코드에서 컴파일러 최적화를
+#    수행할 수 있는 표현을 제공합니다.
+# 4. TorchScript를 사용하면 개별 연산자보다 프로그램의 더 넓은 관점을 요구하는 많은
+#    백엔드/장치 런타임과 상호작용(interface)할 수 있습니다.
 #
-# We can see that invoking ``traced_cell`` produces the same results as
-# the Python module:
+# ``traced_cell`` 을 호출하면 Python 모듈과 동일한 결과가 생성됩니다:
 #
 
 print(my_cell(x, h))
@@ -248,11 +234,11 @@ def forward(self, x, h):
 
 
 ######################################################################
-# Using Scripting to Convert Modules
+# 스크립팅을 사용하여 모듈 변환
 # ----------------------------------
 #
-# There’s a reason we used version two of our module, and not the one with
-# the control-flow-laden submodule. Let’s examine that now:
+# 제어 흐름이 포함된(control-flow-laden) 하위 모듈이 아닌 모듈 버전 2를 사용하는
+# 이유가 있습니다. 지금 살펴봅시다:
 #
 
 class MyDecisionGate(torch.nn.Module):
@@ -278,16 +264,14 @@ def forward(self, x, h):
 
 
 ######################################################################
-# Looking at the ``.code`` output, we can see that the ``if-else`` branch
-# is nowhere to be found! Why? Tracing does exactly what we said it would:
-# run the code, record the operations *that happen* and construct a
-# ScriptModule that does exactly that. Unfortunately, things like control
-# flow are erased.
+# ``.code`` 출력을 보면, ``if-else`` 분기가 어디에도 없다는 것을 알 수 있습니다!
+# 어째서일까요? 추적은 코드를 실행하고 *발생하는* 작업을 기록하며 정확하게 수행하는
+# 스크립트 모듈(ScriptModule)을 구성하는 일을 수행합니다. 불행하게도, 제어 흐름과
+# 같은 것들은 지워집니다.
 #
-# How can we faithfully represent this module in TorchScript? We provide a
-# **script compiler**, which does direct analysis of your Python source
-# code to transform it into TorchScript. Let’s convert ``MyDecisionGate``
-# using the script compiler:
+# TorchScript에서 이 모듈을 어떻게 충실하게 나타낼 수 있을까요? 파이썬 소스 코드를
+# 직접 분석하여 TorchScript로 변환하는 **스크립트 컴파일러(script compiler)** 를
+# 제공합니다. ``MyDecisionGate`` 를 스크립트 컴파일러를 사용하여 변환해 봅시다:
 #
 
 scripted_gate = torch.jit.script(MyDecisionGate())
@@ -298,27 +282,26 @@ def forward(self, x, h):
 
 
 ######################################################################
-# Hooray! We’ve now faithfully captured the behavior of our program in
-# TorchScript. Let’s now try running the program:
+# 만세! 이제 TorchScript에서 프로그램의 동작을 충실하게 캡쳐했습니다. 이제 프로그램을
+# 실행해 봅시다:
 #
 
-# New inputs
+# 새로운 입력
 x, h = torch.rand(3, 4), torch.rand(3, 4)
 traced_cell(x, h)
 
 
 ######################################################################
-# Mixing Scripting and Tracing
+# 스크립팅과 추적 혼합
 # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 #
-# Some situations call for using tracing rather than scripting (e.g. a
-# module has many architectural decisions that are made based on constant
-# Python values that we would like to not appear in TorchScript). In this
-# case, scripting can be composed with tracing: ``torch.jit.script`` will
-# inline the code for a traced module, and tracing will inline the code
-# for a scripted module.
+# 어떤 상황에서는 스크립팅보다는 추적을 사용해야 합니다. (예: 모듈에는 TorchScript에
+# 표시하지 않으려는 파이썬 상수 값을 기반으로 만들어진 많은
+# 구조적인 결정(architectural decisions)이 있습니다.) 이 경우, 스크립팅은 추적으로
+# 구성될 수 있습니다: ``torch.jit.script`` 는 추적된 모듈의 코드를 인라인(inline) 할
+# 것이고, 추적은 스크립트 된 모듈의 코드를 인라인 할 것입니다.
 #
-# An example of the first case:
+# 첫 번째 사례의 예:
 #
 
 class MyRNNLoop(torch.nn.Module):
@@ -338,7 +321,7 @@ def forward(self, xs):
 
 
 ######################################################################
-# And an example of the second case:
+# 두 번째 경우의 예:
 #
 
 class WrapRNN(torch.nn.Module):
@@ -355,17 +338,15 @@ def forward(self, xs):
 
 
 ######################################################################
-# This way, scripting and tracing can be used when the situation calls for
-# each of them and used together.
+# 이러한 방식으로, 상황에 따라 스크립팅과 추적을 함께 사용할 수 있습니다.
 #
-# Saving and Loading models
+# 모델 저장 및 불러오기
 # -------------------------
 #
-# We provide APIs to save and load TorchScript modules to/from disk in an
-# archive format. This format includes code, parameters, attributes, and
-# debug information, meaning that the archive is a freestanding
-# representation of the model that can be loaded in an entirely separate
-# process. Let’s save and load our wrapped RNN module:
+# TorchScript 모듈을 아카이브 형식으로 디스크에 저장하고 불러오는 API를 제공합니다.
+# 이 형식은 코드, 매개 변수, 속성과 디버그 정보를 포함합니다. 이것은 그 아카이브가
+# 완전히 별개의 프로세스로 로드할 수 있는 모델의 독립 표현임을 의미합니다.
+# 랩핑 된 RNN 모듈을 저장하고 로드해 봅시다:
 #
 
 traced.save('wrapped_rnn.zip')
@@ -377,17 +358,14 @@ def forward(self, xs):
 
 
 ######################################################################
-# As you can see, serialization preserves the module hierarchy and the
-# code we’ve been examining throughout. The model can also be loaded, for
-# example, `into
-# C++ <https://pytorch.org/tutorials/advanced/cpp_export.html>`__ for
-# python-free execution.
+# 보시다시피, 직렬화는 모듈 계층과 검사한 코드를 유지합니다. 또한 모델을 로드할
+# 수 있습니다. 예를 들어, 파이썬 없이 실행하기 위해 모델을
+# `C++ <https://pytorch.org/tutorials/advanced/cpp_export.html>`__ 로 로드할
+# 수 있습니다.
 #
-# Further Reading
+# 추가 자료
 # ~~~~~~~~~~~~~~~
-#
-# We’ve completed our tutorial! For a more involved demonstration, check
-# out the NeurIPS demo for converting machine translation models using
-# TorchScript:
+# 튜토리얼을 완료했습니다! 관련 데모를 보려면 TorchScript를 사용하여 기계 번역
+# 모델을 변환하기 위한 NeurIPS 데모를 확인하십시오:
 # https://colab.research.google.com/drive/1HiICg6jRkBnr5hvK2-VnMi88Vi9pUzEJ
 #
diff --git a/intermediate_source/dist_tuto.rst b/intermediate_source/dist_tuto.rst
index 3a76bb1dd..b27457b26 100644
--- a/intermediate_source/dist_tuto.rst
+++ b/intermediate_source/dist_tuto.rst
@@ -1,11 +1,10 @@
-Writing Distributed Applications with PyTorch
+Pytorch로 분산 어플리케이션 개발하기
 =============================================
 **Author**: `Séb Arnold <https://seba1511.com>`_
+  **번역**: `황성수 <https://github.com/adonisues>`_
 
-In this short tutorial, we will be going over the distributed package
-of PyTorch. We'll see how to set up the distributed setting, use the
-different communication strategies, and go over some the internals of
-the package.
+이 짧은 튜토리얼에서 Pytorch의 분산 패키지를 둘러봅니다. 분산 설정 방법을 살펴보고,
+다른 통신 전략을 사용하고, 몇몇 내부 패키지를 확인해 봅니다.
 
 Setup
 -----
@@ -17,23 +16,21 @@ Setup
    * variables and init_process_group
    -->
 
-The distributed package included in PyTorch (i.e.,
-``torch.distributed``) enables researchers and practitioners to easily
-parallelize their computations across processes and clusters of
-machines. To do so, it leverages messaging passing semantics
-allowing each process to communicate data to any of the other processes.
-As opposed to the multiprocessing (``torch.multiprocessing``) package,
-processes can use different communication backends and are not
-restricted to being executed on the same machine.
-
-In order to get started we need the ability to run multiple processes
-simultaneously. If you have access to compute cluster you should check
-with your local sysadmin or use your favorite coordination tool. (e.g.,
-`pdsh <https://linux.die.net/man/1/pdsh>`__,
-`clustershell <https://cea-hpc.github.io/clustershell/>`__, or
-`others <https://slurm.schedmd.com/>`__) For the purpose of this
-tutorial, we will use a single machine and fork multiple processes using
-the following template.
+Pytorch에 포함된 분산 패키지 (i.e., ``torch.distributed``)는 연구자와 개발자가
+여러 개의 프로세서와 머신 클러스터에서 계산을 쉽게 병렬화하게 해준다.
+그렇게 하기 위해서, messaging passing semantics 가 각 프로세스가 다른 프로세스들과
+데이터를 통신하도록 해준다. 다중 처리(``torch.multiprocessing``) 패키지와 달리
+프로세스는 다른 통신 백엔드를 사용할 수 있으며 동일한 기계에서 실행되는 것으로
+제한됩니다.
+
+시작하려면 여러 프로세스를 동시에 실행할 수 있어야 합니다. 계산 클러스터에
+접속할 경우 local sysadmin으로 점검하거나 또는 선호하는 coordination tool을
+사용하십시오.
+(e.g.,
+`pdsh <https://linux.die.net/man/1/pdsh>`__ ,
+`clustershell <https://cea-hpc.github.io/clustershell/>`__ 또는
+`others <https://slurm.schedmd.com/>`__) 이 튜토리얼에서는 다음 템플릿을 사용하여
+단일 기기를 사용하고 여러 프로세스를 포크합니다.
 
 .. code:: python
 
@@ -67,35 +64,30 @@ the following template.
         for p in processes:
             p.join()
 
-The above script spawns two processes who will each setup the
-distributed environment, initialize the process group
-(``dist.init_process_group``), and finally execute the given ``run``
-function.
+위 스크립트는 각각 분산 환경을 설정하는 두 개의 프로세스를 생성하고,
+프로세스 그룹(``dist.init_process_group``)을 초기화하고, 마지막으로 주어진
+``run`` 함수를 실행합니다.
 
-Let's have a look at the ``init_process`` function. It ensures that
-every process will be able to coordinate through a master, using the
-same ip address and port. Note that we used the ``gloo`` backend but
-other backends are available. (c.f.
-`Section 5.1 <#communication-backends>`__) We will go over the magic
-happening in ``dist.init_process_group`` at the end of this tutorial,
-but it essentially allows processes to communicate with each other by
-sharing their locations.
+``init_process`` 함수는 동일한 IP 주소와 포트를 사용해서 모든 프로세스가 마스터를
+통해서 조직되게 합니다. 우리는 ``gloo`` 백엔드를 사용했지만 다른 백엔드도 사용 가능합니다.
+(c.f. `Section 5.1 <#communication-backends>`__) 이 튜토리얼의 마지막에 있는
+``dist.init_process_group`` 에서 일어나는 마법을 살펴봅니다. 그러나 기본적으로
+프로세스는 자신의 위치를 공유하여 서로 통신할 수 있습니다.
 
-Point-to-Point Communication
-----------------------------
+지점 간 통신(Point-to-Point Communication)
+-------------------------------------------
 
 .. figure:: /_static/img/distributed/send_recv.png
    :width: 100%
    :align: center
    :alt: Send and Recv
 
-   Send and Recv
+   전송과 수신
 
+하나의 프로세스에서 다른 프로세스로 데이터를 전송하는 것을 지점 간 통신이라고 합니다.
+이것은 ``send`` 와 ``recv`` 함수 또는 직접 대응부인 (*immediate* counter-parts)
+``isend`` 와 ``irecv`` 를 통해 이루어집니다.
 
-A transfer of data from one process to another is called a
-point-to-point communication. These are achieved through the ``send``
-and ``recv`` functions or their *immediate* counter-parts, ``isend`` and
-``irecv``.
 
 .. code:: python
 
@@ -112,16 +104,14 @@ and ``recv`` functions or their *immediate* counter-parts, ``isend`` and
             dist.recv(tensor=tensor, src=0)
         print('Rank ', rank, ' has data ', tensor[0])
 
-In the above example, both processes start with a zero tensor, then
-process 0 increments the tensor and sends it to process 1 so that they
-both end up with 1.0. Notice that process 1 needs to allocate memory in
-order to store the data it will receive.
+위의 예제에서 두 프로세스는 모두 값이 0인 Tensor로 시작하고, 0번 프로세스는
+Tensor를 증가시키고 프로세스 1로 보내서 양쪽 모두 1.0으로 끝납니다. 프로세스 1은
+수신할 데이터를 저장하기 위해 메모리를 할당해야 합니다.
 
-Also notice that ``send``/``recv`` are **blocking**: both processes stop
-until the communication is completed. On the other hand immediates are
-**non-blocking**; the script continues its execution and the methods
-return a ``Work`` object upon which we can choose to
-``wait()``.
+또한 ``send`` / ``recv`` 는 **blocking** 으로 동작합니다: 통신이 완료될 때까지
+두 프로세스 모두 멈춥니다. 반면에 Immediates (``isend`` 와 ``irecv``)는
+**non-blocking** 으로 동작 합니다. 스크립트는 실행을 계속하고 메서드는 ``wait()`` 를
+선택할 수 있는 ``Work`` 객체를 반환합니다.
 
 .. code:: python
 
@@ -142,28 +132,28 @@ return a ``Work`` object upon which we can choose to
         req.wait()
         print('Rank ', rank, ' has data ', tensor[0])
 
-When using immediates we have to be careful about with our usage of the sent and received tensors.
-Since we do not know when the data will be communicated to the other process,
-we should not modify the sent tensor nor access the received tensor before ``req.wait()`` has completed.
-In other words,
 
--  writing to ``tensor`` after ``dist.isend()`` will result in undefined behaviour.
--  reading from ``tensor`` after ``dist.irecv()`` will result in undefined behaviour.
+Immediates를 사용할 때 보내고 받는 Tensor에 대한 사용법에 주의해야 합니다.
+언제 데이터가 다른 프로세스와 통신 될지 알지 못하기 때문에, ``req.wait ()`` 가
+완료되기 전에 전송된 Tensor를 수정하거나 수신된 Tensor에 접근해서는 안 됩니다.
 
-However, after ``req.wait()``
-has been executed we are guaranteed that the communication took place,
-and that the value stored in ``tensor[0]`` is 1.0.
+다시 말하면,
 
-Point-to-point communication is useful when we want a fine-grained
-control over the communication of our processes. They can be used to
-implement fancy algorithms, such as the one used in `Baidu's
-DeepSpeech <https://github.com/baidu-research/baidu-allreduce>`__ or
-`Facebook's large-scale
-experiments <https://research.fb.com/publications/imagenet1kin1h/>`__.(c.f.
-`Section 4.1 <#our-own-ring-allreduce>`__)
+- ``dist.isend ()`` 다음에 ``tensor`` 에 쓰면 정의되지 않은 동작이 발생합니다.
+- ``dist.irecv ()`` 다음에 ``tensor`` 를 읽으면 정의되지 않은 동작이 발생합니다.
 
-Collective Communication
-------------------------
+그러나 ``req.wait ()`` 가 실행된 후에 통신이 이루어진 것과, ``tensor[0]`` 에
+저장된 값이 1.0이라는 것이 보장됩니다.
+
+지점 간 통신은 프로세스 통신에 대한 세분화된 제어를 원할 때 유용합니다. 그것들은
+`Baidu's DeepSpeech <https://github.com/baidu-research/baidu-allreduce>`__ 또는
+`Facebook's large-scale experiments <https://research.fb.com/publications/imagenet1kin1h/>`__
+(c.f. `Section 4.1 <#our-own-ring-allreduce>`__) 와 같은 고급 알고리즘을 구현하는데
+사용됩니다.
+
+
+집단 통신 (Collective Communication)
+--------------------------------------
 
 +----------------------------------------------------+-----------------------------------------------------+
 | .. figure:: /_static/img/distributed/scatter.png   | .. figure:: /_static/img/distributed/gather.png     |
@@ -189,14 +179,13 @@ Collective Communication
 +----------------------------------------------------+-----------------------------------------------------+
 
 
+지점 간 통신과는 달리 집단 통신은 **그룹(Group)** 의 모든 프로세스에서 통신 패턴을
+허용합니다. 그룹은 모든 프로세스의 하위 집합입니다.
+그룹을 만들려면, ``dist.new_group (group)`` 에 순위 목록을 전달하면 됩니다.
+기본적으로 집단 통신은 **월드(World)** 라고도하는 모든 프로세스에서 실행됩니다.
+예를 들어, 모든 프로세스에서 모든 Tensor의 합을 얻으려면,
+``dist.all_reduce (tensor, op, group)`` 를 사용할 수 있습니다.
 
-As opposed to point-to-point communcation, collectives allow for
-communication patterns across all processes in a **group**. A group is a
-subset of all our processes. To create a group, we can pass a list of
-ranks to ``dist.new_group(group)``. By default, collectives are executed
-on the all processes, also known as the **world**. For example, in order
-to obtain the sum of all tensors at all processes, we can use the
-``dist.all_reduce(tensor, op, group)`` collective.
 
 .. code:: python
 
@@ -208,37 +197,34 @@ to obtain the sum of all tensors at all processes, we can use the
         dist.all_reduce(tensor, op=dist.reduce_op.SUM, group=group)
         print('Rank ', rank, ' has data ', tensor[0])
 
-Since we want the sum of all tensors in the group, we use
-``dist.reduce_op.SUM`` as the reduce operator. Generally speaking, any
-commutative mathematical operation can be used as an operator.
-Out-of-the-box, PyTorch comes with 4 such operators, all working at the
-element-wise level:
+그룹의 모든 Tensor의 합이 필요하기 때문에 Reduce 연산자로 ``dist.reduce_op.SUM`` 을
+사용합니다. 일반적으로 교환 법칙이 성립하는 수학 연산은 연산자로 사용할 수 있습니다. 
+특별히, PyTorch는 4개의 연산자를 제공하고 모두 요소 별로(element-wise) 작동합니다:
 
 -  ``dist.reduce_op.SUM``,
 -  ``dist.reduce_op.PRODUCT``,
 -  ``dist.reduce_op.MAX``,
 -  ``dist.reduce_op.MIN``.
 
-In addition to ``dist.all_reduce(tensor, op, group)``, there are a total
-of 6 collectives currently implemented in PyTorch.
-
--  ``dist.broadcast(tensor, src, group)``: Copies ``tensor`` from
-   ``src`` to all other processes.
--  ``dist.reduce(tensor, dst, op, group)``: Applies ``op`` to all
-   ``tensor`` and stores the result in ``dst``.
--  ``dist.all_reduce(tensor, op, group)``: Same as reduce, but the
-   result is stored in all processes.
--  ``dist.scatter(tensor, src, scatter_list, group)``: Copies the
-   :math:`i^{\text{th}}` tensor ``scatter_list[i]`` to the
-   :math:`i^{\text{th}}` process.
--  ``dist.gather(tensor, dst, gather_list, group)``: Copies ``tensor``
-   from all processes in ``dst``.
--  ``dist.all_gather(tensor_list, tensor, group)``: Copies ``tensor``
-   from all processes to ``tensor_list``, on all processes.
--  ``dist.barrier(group)``: block all processes in `group` until each one has entered this function.
-
-Distributed Training
---------------------
+``dist.all_reduce (tensor, op, group)`` 외에 현재 PyTorch에서 구현된 총 6개의
+집단 통신이 있습니다.
+
+-  ``dist.broadcast(tensor, src, group)``: ``src`` 에서 다른 모든 프로세스로
+   ``tensor`` 를 복사합니다.
+-  ``dist.reduce(tensor, dst, op, group)``: 모든 ``tensor`` 에 ``op`` 를 적용하고
+   그 결과를 ``dst`` 에 저장합니다.
+-  ``dist.all_reduce(tensor, op, group)``: reduce와 같지만 결과는 모든 프로세스에
+   저장됩니다.
+-  ``dist.scatter(tensor, src, scatter_list, group)``: i번째 tensor
+   ``scatter_list[i]`` 를 i번째 프로세스에 복사합니다.
+-  ``dist.gather(tensor, dst, gather_list, group)``: ``dst`` 의 모든 프로세스에서
+   ``tensor`` 를 복사합니다
+-  ``dist.all_gather(tensor_list, tensor, group)``:  모든 프로세스에서 ``tensor`` 를
+   모든 프로세스의 ``tensor_list`` 에 복사합니다.
+-  ``dist.barrier(group)``: 각 프로세스가 이 함수에 들어갈 때까지 모든 프로세스를 차단합니다.
+
+분산 학습(Distributed Training)
+---------------------------------
 
 .. raw:: html
 
@@ -250,25 +236,22 @@ Distributed Training
    TODO: Custom ring-allreduce
    -->
 
-**Note:** You can find the example script of this section in `this
-GitHub repository <https://github.com/seba-1511/dist_tuto.pth/>`__.
-
-Now that we understand how the distributed module works, let us write
-something useful with it. Our goal will be to replicate the
-functionality of
-`DistributedDataParallel <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__.
-Of course, this will be a didactic example and in a real-world
-situation you should use the official, well-tested and well-optimized
-version linked above.
-
-Quite simply we want to implement a distributed version of stochastic
-gradient descent. Our script will let all processes compute the
-gradients of their model on their batch of data and then average their
-gradients. In order to ensure similar convergence results when changing
-the number of processes, we will first have to partition our dataset.
-(You could also use
-`tnt.dataset.SplitDataset <https://github.com/pytorch/tnt/blob/master/torchnet/dataset/splitdataset.py#L4>`__,
-instead of the snippet below.)
+**알림:** 이 섹션의 예제 스크립트를
+`GitHub repository <https://github.com/seba-1511/dist_tuto.pth/>`__ 에서 찾으실
+수 있습니다.
+
+
+이제 분산 모듈이 어떻게 작동하는지 이해했으므로 유용한 모듈을 작성해 보겠습니다.
+우리의 목표는 `DistributedDataParallel <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__ 의
+기능을 복제하는 것입니다. 물론, 이것은 교훈적인 예가 되지만, 실제 상황에서 위에
+링크된 잘 검증되고 최적화된 공식 버전을 사용해야 합니다.
+
+매우 간단하게 확률적 경사 하강법의 분산 버전을 구현하고자 합니다. 스크립트는 모든
+프로세스가 데이터 배치에서 모델의 변화도를 계산한 다음 변화도를 평균합니다.
+프로세스 수를 변경할 때 유사한 수렴 결과를 보장하기 위해 우선 데이터 세트를 분할해야
+합니다. (아래 단편 코드 대신에
+`tnt.dataset.SplitDataset <https://github.com/pytorch/tnt/blob/master/torchnet/dataset/splitdataset.py#L4>`__
+를 이용할 수 있습니다.)
 
 .. code:: python
 
@@ -306,8 +289,7 @@ instead of the snippet below.)
         def use(self, partition):
             return Partition(self.data, self.partitions[partition])
 
-With the above snippet, we can now simply partition any dataset using
-the following few lines:
+위의 단편 코드로 다음 몇 줄을 이용해 모든 데이터 세트를 간단하게 분할할 수 있습니다:
 
 .. code:: python
 
@@ -328,14 +310,14 @@ the following few lines:
                                              shuffle=True)
         return train_set, bsz
 
-Assuming we have 2 replicas, then each process will have a ``train_set``
-of 60000 / 2 = 30000 samples. We also divide the batch size by the
-number of replicas in order to maintain the *overall* batch size of 128.
+2개의 복제본이 있다고 가정하면, 각 프로세스는 60000 / 2 = 30000 샘플의
+``train_set`` 을 가질 것입니다. 또한 **전체** 배치 크기 128을 유지하기 위해 배치
+크기를 복제본 수로 나눕니다.
 
-We can now write our usual forward-backward-optimize training code, and
-add a function call to average the gradients of our models. (The
-following is largely inspired from the official `PyTorch MNIST
-example <https://github.com/pytorch/examples/blob/master/mnist/main.py>`__.)
+이제는 일반적인 forward-backward-optimize 학습 코드를 작성하고, 모델의 변화도를
+평균하는 함수 호출을 추가할 수 있습니다. (다음은 공식
+`PyTorch MNIST 예제 <https://github.com/pytorch/examples/blob/master/mnist/main.py>`__
+에서 영감을 얻었습니다.
 
 .. code:: python
 
@@ -361,9 +343,8 @@ example <https://github.com/pytorch/examples/blob/master/mnist/main.py>`__.)
             print('Rank ', dist.get_rank(), ', epoch ',
                   epoch, ': ', epoch_loss / num_batches)
 
-It remains to implement the ``average_gradients(model)`` function, which
-simply takes in a model and averages its gradients across the whole
-world.
+단순히 모델을 취하여 world의 변화도를 평균하는 ``average_gradients (model)`` 함수를
+구현하는 것이 남았습니다.
 
 .. code:: python
 
@@ -374,21 +355,20 @@ world.
             dist.all_reduce(param.grad.data, op=dist.reduce_op.SUM)
             param.grad.data /= size
 
-*Et voilà*! We successfully implemented distributed synchronous SGD and
-could train any model on a large computer cluster.
+*완성*! 우리는 분산 동기식 SGD를 성공적으로 구현했으며 대형 컴퓨터 클러스터에서
+모든 모델을 학습할 수 있었습니다.
+
+**주의:** 마지막 문장은 *기술적으로* 사실이지만 동기식 SGD의 상용 수준을 구현하는데
+필요한 `더 많은 트릭 <https://seba-1511.github.io/dist_blog>`__ 이 있습니다. 다시 말하면
+`검증되고 최적화된 함수 <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__ 를
+사용하십시오.
 
-**Note:** While the last sentence is *technically* true, there are `a
-lot more tricks <https://seba-1511.github.io/dist_blog>`__ required to
-implement a production-level implementation of synchronous SGD. Again,
-use what `has been tested and
-optimized <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__.
 
 Our Own Ring-Allreduce
 ~~~~~~~~~~~~~~~~~~~~~~
 
-As an additional challenge, imagine that we wanted to implement
-DeepSpeech's efficient ring allreduce. This is fairly easily implemented
-using point-to-point collectives.
+추가 과제로서 DeepSpeech의 효율적인 ring allreduce 를 구현하고 싶다고 상상해보십시오.
+이것은 지점 간 집단 통신 (point-to-point collectives)을 사용하여 쉽게 구현됩니다.
 
 .. code:: python
 
@@ -418,161 +398,133 @@ using point-to-point collectives.
             send_req.wait()
         recv[:] = accum[:]
 
-In the above script, the ``allreduce(send, recv)`` function has a
-slightly different signature than the ones in PyTorch. It takes a
-``recv`` tensor and will store the sum of all ``send`` tensors in it. As
-an exercise left to the reader, there is still one difference between
-our version and the one in DeepSpeech: their implementation divide the
-gradient tensor into *chunks*, so as to optimally utilize the
-communication bandwidth. (Hint:
+위의 스크립트에서, ``allreduce (send, recv)`` 함수는 PyTorch에 있는 것과 약간 다른
+특징을 가지고 있습니다.
+그것은 ``recv`` tensor를 취해서 모든 ``send`` tensor의 합을 저장합니다. 독자에게
+남겨진 실습으로, 우리의 버전과 DeepSpeech의 차이점은 여전히 한 가지가 있습니다:
+그들의 구현은 통신 대역폭을 최적으로 활용하기 위해 경사도 tensor를 *chunks* 로
+나눕니다. (힌트:
 `torch.chunk <https://pytorch.org/docs/stable/torch.html#torch.chunk>`__)
 
-Advanced Topics
+고급 주제(Advanced Topics)
 ---------------
 
-We are now ready to discover some of the more advanced functionalities
-of ``torch.distributed``. Since there is a lot to cover, this section is
-divided into two subsections:
+이제 ``torch.distributed`` 보다 진보된 기능들을 발견할 준비가 되었습니다. 커버할
+부분이 많으므로 이 섹션은 두 개의 하위 섹션으로 구분됩니다:
 
-1. Communication Backends: where we learn how to use MPI and Gloo for
-   GPU-GPU communication.
-2. Initialization Methods: where we understand how to best setup the
-   initial coordination phase in ``dist.init_process_group()``.
+1. 통신 백엔드 : GPU-GPU 통신을 위해 MPI 및 Gloo를 사용하는 방법을 배웁니다.
+2. 초기화 방법 : ``dist.init_process_group()`` 에서 초기 구성 단계를 가장 잘
+   설정하는 방법을 이해합니다.
 
-Communication Backends
+통신 백엔드
 ~~~~~~~~~~~~~~~~~~~~~~
 
-One of the most elegant aspects of ``torch.distributed`` is its ability
-to abstract and build on top of different backends. As mentioned before,
-there are currently three backends implemented in PyTorch: Gloo, NCCL, and
-MPI. They each have different specifications and tradeoffs, depending
-on the desired use case. A comparative table of supported functions can
-be found
-`here <https://pytorch.org/docs/stable/distributed.html#module-torch.distributed>`__.
-
-**Gloo Backend**
-
-So far we have made extensive usage of the `Gloo backend <https://github.com/facebookincubator/gloo>`__.
-It is quite handy as a development platform, as it is included in
-the pre-compiled PyTorch binaries and works on both Linux (since 0.2)
-and macOS (since 1.3). It supports all point-to-point and collective
-operations on CPU, and all collective operations on GPU. The
-implementation of the collective operations for CUDA tensors is not as
-optimized as the ones provided by the NCCL backend.
-
-As you have surely noticed, our
-distributed SGD example does not work if you put ``model`` on the GPU.
-In order to use multiple GPUs, let us also do the following
-modifications:
+``torch.distributed`` 의 가장 우아한 면 중 하나는 다른 백엔드 위에서 추상화하고
+빌드 할 수 있는 능력입니다. 앞서 언급했듯이 현재 PyTorch에는 Gloo, NCCL 및 MPI의
+세 가지 백엔드가 구현되어 있습니다. 그것들은 원하는 사용 사례에 따라 서로 다른
+특징과 trade-off 를 가지고 있습니다. 지원되는 기능의 비교표는
+`여기 <https://pytorch.org/docs/stable/distributed.html#module-torch.distributed>`__
+에서 찾을 수 있습니다.
+
+**Gloo 백엔드**
+
+지금까지 우리는 `Gloo 백엔드 <https://github.com/facebookincubator/gloo>`__ 를
+광범위하게 사용했습니다. 이것은 미리 컴파일된 Pytorch 바이너리에 포함되어 있고
+Linux (0.2 이후) 및 macOS (1.3 이후)에서 작동하므로 개발 플랫폼으로서 매우 편리합니다.
+이것은 CPU에서 모든 점대점(point-to-point) 및 집단 작업과 GPU에서의 모든 집단 작업을 지원합니다.
+CUDA tensor에 대한 집단 작업의 구현은 NCCL 백엔드에서 제공하는 것만큼 최적화되지 않았습니다.
+
+확실히 알고 있듯이, GPU에 ``model`` 을 넣으면 분산 SGD 예제가 작동하지 않습니다. 여러
+GPU를 사용하려면 다음과 같이 수정하십시오:
 
 1.  Use ``device = torch.device("cuda:{}".format(rank))``
 2. ``model = Net()`` :math:`\rightarrow` ``model = Net().to(device)``
 3.  Use ``data, target = data.to(device), target.to(device)``
 
-With the above modifications, our model is now training on two GPUs and
-you can monitor their utilization with ``watch nvidia-smi``.
-
-**MPI Backend**
-
-The Message Passing Interface (MPI) is a standardized tool from the
-field of high-performance computing. It allows to do point-to-point and
-collective communications and was the main inspiration for the API of
-``torch.distributed``. Several implementations of MPI exist (e.g.
-`Open-MPI <https://www.open-mpi.org/>`__,
-`MVAPICH2 <http://mvapich.cse.ohio-state.edu/>`__, `Intel
-MPI <https://software.intel.com/en-us/intel-mpi-library>`__) each
-optimized for different purposes. The advantage of using the MPI backend
-lies in MPI's wide availability - and high-level of optimization - on
-large computer clusters. `Some <https://developer.nvidia.com/mvapich>`__
-`recent <https://developer.nvidia.com/ibm-spectrum-mpi>`__
-`implementations <https://www.open-mpi.org/>`__ are also able to take
-advantage of CUDA IPC and GPU Direct technologies in order to avoid
-memory copies through the CPU.
-
-Unfortunately, PyTorch's binaries can not include an MPI implementation
-and we'll have to recompile it by hand. Fortunately, this process is
-fairly simple given that upon compilation, PyTorch will look *by itself*
-for an available MPI implementation. The following steps install the MPI
-backend, by installing PyTorch `from
-source <https://github.com/pytorch/pytorch#from-source>`__.
-
-1. Create and activate your Anaconda environment, install all the
-   pre-requisites following `the
-   guide <https://github.com/pytorch/pytorch#from-source>`__, but do
-   **not** run ``python setup.py install`` yet.
-2. Choose and install your favorite MPI implementation. Note that
-   enabling CUDA-aware MPI might require some additional steps. In our
-   case, we'll stick to Open-MPI *without* GPU support:
+위의 수정으로 우리 모델은 이제 2개의 GPU에서 학습하고, ``watch nvidia-smi`` 로
+사용률을 모니터링 할 수 있습니다.
+
+**MPI 백엔드**
+
+MPI (Message Passing Interface)는 고성능 컴퓨팅 분야의 표준 도구입니다. 그것은
+지점 간과 집단 통신을 가능하게 하고 ``torch.distributed`` 의 API에 대한 주요
+영감이었습니다. 다양한 목적으로 최적화된 여러 가지 MPI 구현 (예 :
+`Open-MPI <https://www.open-mpi.org/>`__ , `MVAPICH2 <http://mvapich.cse.ohio-state.edu/>`__ ,
+`Intel MPI <https://software.intel.com/en-us/intel-mpi-library>`__)이 있습니다.
+MPI 백엔드를 사용하면 큰 컴퓨터 클러스터에서 MPI의 광범위한 가용성과 높은 수준의
+최적화가 가능하다는 장점이 있습니다. `일부 <https://developer.nvidia.com/mvapich>`__
+`최신 <https://developer.nvidia.com/ibm-spectrum-mpi>`__
+`구현 <https://www.open-mpi.org/>`__ 들은 CPU를 통한 메모리 복사를 피하기 위해서
+CUDA IPC와 GPU 다이렉트 기술를 활용하고 있습니다.
+
+불행하게도 PyTorch의 바이너리는 MPI 구현을 포함할 수 없으므로 수동으로 다시
+컴파일해야 합니다. 다행히도, 이 컴파일 과정은 매우 간단합니다. PyTorch는 사용 가능한
+MPI 구현을 자동으로 살펴볼 것입니다.
+다음 단계는 PyTorch를 `소스 <https://github.com/pytorch/pytorch#from-source>`__ 로
+설치하여 MPI 백엔드를 설치합니다.
+
+1. 아나콘다 환경을 만들고 활성화하고,
+   `가이드 <https://github.com/pytorch/pytorch#from-source>`__ 에 따라 모든 필수
+   조건을 설치하십시오. 그러나 아직 ``python setup.py install`` 을 실행하지
+   마십시오.
+2. 원하는 MPI 구현을 선택하고 설치하십시오. CUDA 인식하는 MPI를 활성화하려면
+   몇 가지 추가 단계가 필요할 수 있습니다. GPU *없이*  Open-MPI를 사용할 것입니다:
    ``conda install -c conda-forge openmpi``
-3. Now, go to your cloned PyTorch repo and execute
-   ``python setup.py install``.
-
-In order to test our newly installed backend, a few modifications are
-required.
-
-1. Replace the content under ``if __name__ == '__main__':`` with
-   ``init_process(0, 0, run, backend='mpi')``.
-2. Run ``mpirun -n 4 python myscript.py``.
-
-The reason for these changes is that MPI needs to create its own
-environment before spawning the processes. MPI will also spawn its own
-processes and perform the handshake described in `Initialization
-Methods <#initialization-methods>`__, making the ``rank``\ and ``size``
-arguments of ``init_process_group`` superfluous. This is actually quite
-powerful as you can pass additional arguments to ``mpirun`` in order to
-tailor computational resources for each process. (Things like number of
-cores per process, hand-assigning machines to specific ranks, and `some
-more <https://www.open-mpi.org/faq/?category=running#mpirun-hostfile>`__)
-Doing so, you should obtain the same familiar output as with the other
-communication backends.
+3. 이제 복제된 PyTorch repo로 이동하여 ``python setup.py install`` 을 실행하십시오.
+
+새로 설치된 백엔드를 테스트하려면 몇 가지 수정이 필요합니다.
+
+1. ``if __name__ == '__main__':`` 아래 내용을 ``init_process(0, 0, run, backend='mpi')`` 로
+   변경하십시오.
+2. ``mpirun -n 4 python myscript.py`` 를 실행하십시오.
+
+이러한 변경의 이유는 MPI가 프로세스를 생성하기 전에 자체 환경을 만들어야 하기 때문입니다.
+MPI는 또한 자신의 프로세스를 생성하고 ``init_process_group`` 의 ``rank`` 와 ``size`` 인자를
+불필요하게 만드는 `초기화 방법 <#initialization-methods>`__ 에서 설명한 handshake를
+수행합니다. 각 프로세스의 계산 리소스를 맞추기 위해``mpirun``에 추가 인자를 전달할
+수 있기 때문에 이것이 실제로 강력합니다.
+(프로세스 당 코어 수, 특정 순위의 머신에 수동 할당,
+`기타 추가할 것들 <https://www.open-mpi.org/faq/?category=running#mpirun-hostfile>`__)
+이렇게하면 다른 통신 백엔드와 같고 익숙한 출력을 얻어야 합니다.
 
 **NCCL Backend**
 
-The `NCCL backend <https://github.com/nvidia/nccl>`__ provides an
-optimized implementation of collective operations against CUDA
-tensors. If you only use CUDA tensors for your collective operations,
-consider using this backend for the best in class performance. The
-NCCL backend is included in the pre-built binaries with CUDA support.
+`NCCL 백엔드 <https://github.com/nvidia/nccl>`__ 는 CUDA tensor에 대한 집단 작업의
+최적화된 구현을 제공합니다. 집단 작업에 CUDA tensor만을 사용하는 경우, 동급 최고 성능을
+위해서 이 백엔드를 사용하는 것이 좋습니다. NCCL 백엔드는 CUDA를 지원하는 사전 빌드된
+바이너리에 포함되어 있습니다.
 
-Initialization Methods
+초기화 방법
 ~~~~~~~~~~~~~~~~~~~~~~
 
-To finish this tutorial, let's talk about the very first function we
-called: ``dist.init_process_group(backend, init_method)``. In
-particular, we will go over the different initialization methods which
-are responsible for the initial coordination step between each process.
-Those methods allow you to define how this coordination is done.
-Depending on your hardware setup, one of these methods should be
-naturally more suitable than the others. In addition to the following
-sections, you should also have a look at the `official
-documentation <https://pytorch.org/docs/stable/distributed.html#initialization>`__.
-
-**Environment Variable**
-
-We have been using the environment variable initialization method
-throughout this tutorial. By setting the following four environment
-variables on all machines, all processes will be able to properly
-connect to the master, obtain information about the other processes, and
-finally handshake with them.
-
--  ``MASTER_PORT``: A free port on the machine that will host the
-   process with rank 0.
--  ``MASTER_ADDR``: IP address of the machine that will host the process
-   with rank 0.
--  ``WORLD_SIZE``: The total number of processes, so that the master
-   knows how many workers to wait for.
--  ``RANK``: Rank of each process, so they will know whether it is the
-   master of a worker.
-
-**Shared File System**
-
-The shared filesystem requires all processes to have access to a shared
-file system, and will coordinate them through a shared file. This means
-that each process will open the file, write its information, and wait
-until everybody did so. After what all required information will be
-readily available to all processes. In order to avoid race conditions,
-the file system must support locking through
-`fcntl <http://man7.org/linux/man-pages/man2/fcntl.2.html>`__.
+이 튜토리얼을 끝내기 위해, 호출한 첫 번째 함수인
+``dist.init_process_group(backend, init_method)`` 에 대해 이야기해 봅시다. 특히
+각 프로세스 간의 초기 구성 단계를 담당하는 다양한 초기화 메소드를 살펴보겠습니다.
+이러한 메서드를 사용하면 이 구성이 수행되는 방법을 정의할 수 있습니다.
+하드웨어 설정에 따라, 이러한 방법 중 하나는 자연스럽게 다른 것보다 더 적합해야
+합니다. 다음 섹션들에 덧붙여
+`공식 문서 <https://pytorch.org/docs/stable/distributed.html#initialization>`__ 를
+살펴봐야 합니다.
+
+**환경 변수**
+
+이 튜토리얼에서는 환경 변수 초기화 메소드를 사용해 왔습니다. 모든 머신에서 다음
+네 가지 환경 변수를 설정해서 모든 프로세스들이 마스터와 적합하게 연결될 수 있고
+다른 프로세스의 정보를 얻고, 최종적으로 그들과 handshake 할 수 있습니다.
+
+-  ``MASTER_PORT``: 순위 0의 프로세스를 호스트 할 머신의 자유 포트.
+-  ``MASTER_ADDR``: 순위 0의 프로세스를 호스트 할 머신의 IP 주소.
+-  ``WORLD_SIZE``: 기다려야 하는 워커 숫자를 마스터가 알 수 있게 하는 총 프로세스 수.
+-  ``RANK``: 워커의 마스터인지 아닌지를 알 수 있게 하는 각 프로세스의 순위.
+
+**공유 파일 시스템(Shared File System)**
+
+공유 파일 시스템은 모든 프로세스가 공유 파일 시스템에 접속하는 것을 요구하며 공유
+파일을 통해 이를 구성합니다. 이것은 각 프로세스가 파일을 열고, 정보를 쓰고, 모두가
+그렇게 할 때까지 기다리는 것을 의미합니다. 필요한 모든 정보는 모든 프로세스에게
+쉽게 사용 가능할 것입니다. 경쟁 조건을 피하기 위해 파일 시스템은
+`fcntl <http://man7.org/linux/man-pages/man2/fcntl.2.html>`__ 을 통한 잠금을
+지원해야 합니다.
 
 .. code:: python
 
@@ -583,9 +535,8 @@ the file system must support locking through
 
 **TCP**
 
-Initializing via TCP can be achieved by providing the IP address of the process with rank 0 and a reachable port number.
-Here, all workers will be able to connect to the process
-with rank 0 and exchange information on how to reach each other.
+0순위 프로세스의 IP 주소와 도달 가능한 포트 번호를 제공함으로써 TCP를 통한 초기화를 달성할 수 있습니다.
+모든 워커는 0순위 프로세스에 연결하고 서로 연락하는 방법에 관한 정보를 교환할 수 있습니다.
 
 .. code:: python
 
@@ -598,32 +549,33 @@ with rank 0 and exchange information on how to reach each other.
 
    <!--
    ## Internals
-   * The magic behind init_process_group:
-
-   1. validate and parse the arguments
-   2. resolve the backend: name2channel.at()
-   3. Drop GIL & THDProcessGroupInit: instantiate the channel and add address of master from config
-   4. rank 0 inits master, others workers
-   5. master: create sockets for all workers -> wait for all workers to connect -> send them each the info about location of other processes
-   6. worker: create socket to master, send own info, receive info about each worker, and then handshake with each of them
-   7. By this time everyone has handshake with everyone.
+   * init_process_group 뒤에 있는 마법 :
+
+   1. 인자의 유효성을 검사하고 구문을 분석합니다.
+   2. 백엔드 해결 : name2channel.at()
+   3. Drop GIL & THDProcessGroupInit : 채널을 인스턴스화하고 config의 마스터 주소를
+      추가합니다.
+   4. 순위 0이 마스터, 다른 워커 초기화
+   5. 마스터 : 모든 워커를 위한 소켓 생성 -> 모든 워커가 연결될 때까지 대기 -> 다른
+      프로세스의 위치에 대한 정보를 각자에게 보냄
+   6. 워커 : 마스터에 소켓을 생성하고, 자신의 정보를 보내고, 각 워커에 대한 정보를
+      얻고, 각각과 handshake를 한다.
+   7. 이 때 모두가 모두와 handshake를 한다.
    -->
 
 .. raw:: html
 
    <center>
 
-**Acknowledgements**
+**알림**
 
 .. raw:: html
 
    </center>
 
-I'd like to thank the PyTorch developers for doing such a good job on
-their implementation, documentation, and tests. When the code was
-unclear, I could always count on the
-`docs <https://pytorch.org/docs/stable/distributed.html>`__ or the
-`tests <https://github.com/pytorch/pytorch/blob/master/test/test_distributed.py>`__
-to find an answer. In particular, I'd like to thank Soumith Chintala,
-Adam Paszke, and Natalia Gimelshein for providing insightful comments
-and answering questions on early drafts.
+PyTorch 개발자들이 구현, 문서화 및 테스트을 잘 수행해 준 것에 대해 감사드리고
+싶습니다. 코드가 불분명할 때, 나는 언제나 답을 찾기 위해
+`docs <https://pytorch.org/docs/stable/distributed.html>`__ 나
+`tests <https://github.com/pytorch/pytorch/blob/master/test/test_distributed.py>`__ 의
+도움을 받았습니다. 특히, 초기 초안에 대한 통찰력 있는 의견 및 질문에 답변해주신
+Soumith Chintala, Adam Paszke 및 Natalia Gimelshein에게 감사드립니다.
\ No newline at end of file