apache · tqchen · Feb 20, 2020 · Feb 20, 2020 · Feb 20, 2020
diff --git a/docs/api/python/index.rst b/docs/api/python/index.rst
@@ -21,6 +21,7 @@ Python API
 .. toctree::
    :maxdepth: 2
 
+   tvm
    runtime
    ndarray
    error

diff --git a/docs/api/python/relay/op.rst b/docs/api/python/relay/op.rst
@@ -53,8 +53,3 @@ tvm.relay.op
 
 .. automodule:: tvm.relay.op.nn
     :members:
-
-.. automodule:: tvm.relay.op.vision.multibox
-    :members:
-
-.. autofunction:: tvm.relay.vision.nms
diff --git a/docs/api/python/runtime.rst b/docs/api/python/runtime.rst
@@ -27,7 +27,7 @@ tvm.runtime
 
 .. autoclass:: tvm.runtime.PackedFunc
    :members:
-   :inheritated-members:
+   :inherited-members:
 
 .. autofunction:: tvm.register_func
 

diff --git a/docs/contribute/pull_request.rst b/docs/contribute/pull_request.rst
@@ -29,12 +29,11 @@ This is a quick guide to submit a pull request, please also refer to the detaile
     git rebase upstream/master
 
 - Make sure code style check pass by typing the following command, and all the existing test-cases pass.
-  - ``docker/bash.sh tvmai/ci-lint ./tests/scripts/task_lint.sh``
-     (Note: You must install docker beforehand so you can run a docker image.)
+- ``docker/bash.sh tvmai/ci-lint ./tests/scripts/task_lint.sh``. (Note: You must install docker beforehand so you can run a docker image.)
 - Add test-cases to cover the new features or bugfix the patch introduces.
 - Document the code you wrote, see more at :ref:`doc_guide`
-- Send the pull request,  fix the problems reported by automatic checks.
-  Request code reviews from other contributors and improves your patch according to feedbacks.
+- Send the pull request and fix the problems reported by automatic checks.
+- Request code reviews from other contributors and improves your patch according to feedbacks.
 
   - To get your code reviewed quickly, we encourage you to help review others' code so they can do the favor in return.
   - Code review is a shepherding process that helps to improve contributor's code quality.

diff --git a/docs/deploy/index.rst b/docs/deploy/index.rst
@@ -56,7 +56,6 @@ embedded devices is through TVM's RPC API.
 Here are the links to the related tutorials.
 
 - :ref:`tutorial-cross-compilation-and-rpc`
-- :ref:`tutorial-deploy-model-on-mali-gpu`
 - :ref:`tutorial-deploy-model-on-rasp`
 
 After you finished tuning and benchmarking, you might need to deploy the model on the
@@ -68,3 +67,5 @@ target device without relying on RPC. see the following resources on how to do s
    cpp_deploy
    android
    integrate
+   aocl_fpga
+   aws_fpga
diff --git a/docs/dev/inferbound.rst b/docs/dev/inferbound.rst
@@ -118,13 +118,11 @@ In the Operation class declaration above, we can see that each operation also ha
 
 .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/stage_graph.png
     :align: center
-    :scale: 70%
 
 InferBound makes one pass through the graph, visiting each stage exactly once. InferBound starts from the output stages (i.e., the solid blue nodes in the graph above), and moves upwards (in the opposite direction of the edges). This is achieved by performing a reverse topological sort on the nodes of the graph. Therefore, when InferBound visits a stage, each of its consumer stages has already been visited.
 
 .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/inferbound_traversal.png
     :align: center
-    :scale: 70%
 
 The InferBound pass is shown in the following pseudo-code:
 
@@ -162,7 +160,6 @@ Recall that all IterVars of the stage are related by IterVarRelations. The IterV
 
 .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/relations.png
     :align: center
-    :scale: 70%
 
 
 The above diagram shows the IterVar hyper-graph for one stage. The stage has one root_iter_var, ``i``. It has been split, and the resulting inner axis ``i.inner``, has been split again. The leaf_iter_vars of the stage are shown in green: ``i.outer``, ``i.inner.outer``, and ``i.inner.inner``.
@@ -208,7 +205,6 @@ As mentioned above, a consumer may only require a small number of elements from
 
 .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/inferbound_phases.png
     :align: center
-    :scale: 70%
 
 IntSets
 ~~~~~~~
@@ -323,14 +319,12 @@ A ComputeOp has only a single output Tensor, whose axes correspond to the axis v
 
 .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/gatherbound.png
     :align: center
-    :scale: 70%
 
 
 The union of IntSets is computed by converting each IntSet to an Interval, and then taking the minimum of all minimums, and the maximum of all of these interval's maximums.
 
 .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/union.png
     :align: center
-    :scale: 70%
 
 
 This clearly results in some unnecessary computation, i.e., tensor elements will be computed that are never used.
@@ -340,7 +334,6 @@ Unfortunately, even if we're lucky and the IntervalSet unions do not produce unn
 
 .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/gatherbound_problem.png
     :align: center
-    :scale: 70%
 
 .. _InferBoundCA:
 
@@ -696,7 +689,6 @@ When InferRootBound is working on stage B, it visits B's consumer stage C to fin
 
 .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/passupdomain_problem.png
     :align: center
-    :scale: 70%
 
 
 
@@ -756,17 +748,14 @@ If the split factor is 4, or 8, in the above example, the region of B needed in
 
 .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/passupdomain_div.png
     :align: center
-    :scale: 70%
 
 However, if the split factor is changed from 4 to 3 in the example above, it is easy to see that the region of B that C needs can no longer be described by an independent Range for each of its axes.
 
 
 .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/passupdomain_nodiv.png
     :align: center
-    :scale: 70%
 
 The best that can be done with rectangular regions is shown in the following diagram. The orange regions are the minimum rectangular regions covering the region of B that needs to be computed, at each iteration of the outer loop.
 
 .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/passupdomain_min.png
     :align: center
-    :scale: 70%
diff --git a/docs/dev/relay_bring_your_own_codegen.rst b/docs/dev/relay_bring_your_own_codegen.rst
@@ -535,7 +535,7 @@ To simplify, we define a graph representation named "ExampleJSON" in this guide.
 
 Then the ExampleJON of this subgraph looks like:
 
-.. code-block:: json
+.. code-block:: none
 
   subgraph_0
     input 0 10 10
@@ -544,7 +544,7 @@ Then the ExampleJON of this subgraph looks like:
     input 3 10 10
     add 4 inputs: 0 1 shape: 10 10
     sub 5 inputs: 4 2 shape: 10 10
-    add 6 inputs: 5 3 shape: 10 10
+    mul 6 inputs: 5 3 shape: 10 10
 
 The ``input`` keyword declares an input tensor with its ID and shape; while the other statements describes computations in ``<op> <output ID> inputs: [input ID] shape: [shape]`` syntax.
 

diff --git a/docs/dev/relay_intro.rst b/docs/dev/relay_intro.rst
@@ -39,7 +39,6 @@ compile for heterogeneous execution environments (e.g., executing parts of the g
 
 .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/dataflow.png
     :align: center
-    :scale: 70%
 
 
 You can use Relay to build a computational (dataflow) graph. Specifically, the above code shows how to
@@ -130,7 +129,6 @@ The code example below shows one program with two forms side by side.
 
 .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/dataflow_vs_func.png
     :align: center
-    :scale: 70%
 
 
 The nested let binding is called A-normal form, and it is commonly used as IRs in functional programming languages.
@@ -155,7 +153,6 @@ which does not use let bindings.
 
 .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/let_scope.png
     :align: center
-    :scale: 70%
 
 The problem comes when we try to decide where we should evaluate node ``%1``. In particular, while the text format seems
 to suggest that we should evaluate node ``%1`` outside the if scope, the AST(as shown in the picture) does not suggest so.

diff --git a/docs/dev/runtime.rst b/docs/dev/runtime.rst
@@ -258,6 +258,7 @@ It also allows us to get members of an object easily in front-end language.
 For example, in the following code, we accessed the op field of the TensorNode.
 
 .. code:: python
+
     import tvm
 
     x = tvm.placeholder((3,4), name="x")

diff --git a/docs/dev/virtual_machine.rst b/docs/dev/virtual_machine.rst
@@ -91,6 +91,7 @@ Ret
 ^^^
 **Arguments**:
 ::
+
   RegName dst
   RegName result
 
@@ -100,6 +101,7 @@ InvokePacked
 ^^^^^^^^^^^^
 **Arguments**:
 ::
+
   Index packed_index
   Index arity
   Index output_size
@@ -114,6 +116,7 @@ AllocTensor
 ^^^^^^^^^^^
 **Arguments**:
 ::
+
   RegName dst
   RegName storage
   uint32_t ndim
@@ -127,6 +130,7 @@ AllocTensorReg
 ^^^^^^^^^^^^^^
 **Arguments**:
 ::
+
   RegName dst
   RegName storage
   RegName shape_register
@@ -139,6 +143,7 @@ AllocStorage
 ^^^^^^^^^^^^
 **Arguments**:
 ::
+
   RegName dst
   RegName size
   RegName alignment
@@ -151,6 +156,7 @@ AllocADT
 ^^^^^^^^
 **Arguments**:
 ::
+
   RegName dst
   Index tag
   Index num_fields
@@ -163,6 +169,7 @@ AllocClosure
 ^^^^^^^^^^^^
 **Arguments**:
 ::
+
   RegName dst
   Index clo_index
   Index num_freevar
@@ -176,6 +183,7 @@ GetField
 ^^^^^^^^
 **Arguments**:
 ::
+
   RegName dst
   RegName object
   Index field_index
@@ -186,6 +194,7 @@ If
 ^^
 **Arguments**:
 ::
+
   RegName test
   RegName target
   Index true_offset
@@ -199,6 +208,7 @@ GetTag
 ^^^^^^
 **Arguments**:
 ::
+
   RegName object
   RegName dst
 
@@ -212,6 +222,7 @@ Goto
 ^^^^
 **Arguments**:
 ::
+
   Index pc_offset
 
 Relative unconditional jump by ``pc_offset``.
@@ -220,6 +231,7 @@ Invoke
 ^^^^^^
 **Arguments**:
 ::
+
   Index func_index
 
 Invoke function at ``func_index``, consumes the number of arguments contained in the VMFunction's
@@ -229,6 +241,7 @@ InvokeClosure
 ^^^^^^^^^^^^^
 **Arguments**:
 ::
+
     RegName closure
     Index num_closure_args
     RegName* closure_args
@@ -239,6 +252,7 @@ LoadConst
 ^^^^^^^^^
 **Arguments**:
 ::
+
   RegName dst
   Index const_index
 
@@ -248,6 +262,7 @@ LoadConsti
 ^^^^^^^^^^
 **Arguments**:
 ::
+
   Index val
   RegName dst
 
@@ -277,7 +292,7 @@ previous call. Registers are allocated in a continuous space (virtual register f
 
 We keep track of a set of Relay functions we have called, a pointer into its bytecode, an offset into the byte code (known as the program counter).
 
-::
+.. code-block:: c
 
     struct VirtualMachine {
       ...
@@ -331,6 +346,7 @@ Optimizations marked with `TODO` are not implemented yet.
 
 Serialization
 ~~~~~~~~~~~~~
+
 Serializing and deserializing the executable generated by the Relay VM compiler is a must as
 we may want to save the model to the disk and perform inference later. Previously, Relay has produced
 a serialized form in a json file for the graph runtime. However, the same format is not directly
@@ -372,14 +388,17 @@ Unresolved Questions
 
 How do we handle dynamic shapes?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
 TODO
 
 How can we modify the VM to support JIT compilation of certain code paths?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
 In the code generation space there are still many tradeoffs to be analyzed and the VM is designed
 to be very flexible so we can modify it for future experiments.
 
 How do we support heterogenous execution?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
 Heterogenous execution should work out of the box assuming we have annotated the appropriate device copies.
 In order to do this properly we need to run the device annotation and copying passes. 
diff --git a/docs/vta/dev/hardware.rst b/docs/vta/dev/hardware.rst
@@ -215,6 +215,7 @@ This would result in a ``load-gemm-activate-store`` task pipeline which closely
 Adding more stages has a cost however: it can add storage and extra logic overhead, which is why we opted for a default 3-stage pipeline.
 
 .. _vta-uarch:
+
 Microarchitectural Overview
 ---------------------------
 

diff --git a/docs/vta/hardware.rst b/docs/vta/hardware.rst