Skip to content
Permalink
Browse files

GitHub Release v1.2.0

  • Loading branch information...
poyenc
poyenc committed Sep 6, 2019
1 parent 1bf44c6 commit 7d24ab6dbeda0e191f7565a6a0fa16ba1ae3fd43
Showing 369 changed files with 16,267 additions and 3,211 deletions.
@@ -17,7 +17,7 @@ AllowShortLoopsOnASingleLine: false
AlwaysBreakAfterDefinitionReturnType: None
AlwaysBreakAfterReturnType: None
AlwaysBreakBeforeMultilineStrings: false
AlwaysBreakTemplateDeclarations: Yes
AlwaysBreakTemplateDeclarations: true
BinPackArguments: true
BinPackParameters: true
BraceWrapping:
@@ -39,7 +39,7 @@ BraceWrapping:
BreakBeforeBinaryOperators: None
BreakBeforeBraces: Custom
BreakBeforeInheritanceComma: false
BreakInheritanceList: BeforeColon
# BreakInheritanceList: BeforeColon
BreakBeforeTernaryOperators: true
BreakConstructorInitializersBeforeComma: true
BreakConstructorInitializers: BeforeComma
@@ -90,16 +90,16 @@ MacroBlockBegin: ''
MacroBlockEnd: ''
MaxEmptyLinesToKeep: 1
NamespaceIndentation: None
ObjCBinPackProtocolList: Auto
ObjCBlockIndentWidth: 2
ObjCSpaceAfterProperty: false
ObjCSpaceBeforeProtocolList: true
# ObjCBinPackProtocolList: Auto
# ObjCBlockIndentWidth: 2
# ObjCSpaceAfterProperty: false
# ObjCSpaceBeforeProtocolList: true
PenaltyBreakAssignment: 2
PenaltyBreakBeforeFirstCallParameter: 19
PenaltyBreakComment: 300
PenaltyBreakFirstLessLess: 120
PenaltyBreakString: 1000
PenaltyBreakTemplateDeclaration: 10
# PenaltyBreakTemplateDeclaration: 10
PenaltyExcessCharacter: 1000000
PenaltyReturnTypeOnItsOwnLine: 60
PointerAlignment: Left
@@ -109,11 +109,11 @@ SortUsingDeclarations: true
SpaceAfterCStyleCast: false
SpaceAfterTemplateKeyword: true
SpaceBeforeAssignmentOperators: true
SpaceBeforeCpp11BracedList: false
SpaceBeforeCtorInitializerColon: true
SpaceBeforeInheritanceColon: true
# SpaceBeforeCpp11BracedList: false
# SpaceBeforeCtorInitializerColon: true
# SpaceBeforeInheritanceColon: true
SpaceBeforeParens: ControlStatements
SpaceBeforeRangeBasedForLoopColon: true
# SpaceBeforeRangeBasedForLoopColon: true
SpaceInEmptyParentheses: false
SpacesBeforeTrailingComments: 1
SpacesInAngles: false

This file was deleted.

@@ -7,3 +7,4 @@ autom4te.cache
build*
.vscode
*.swp
*.nvdla
@@ -0,0 +1,153 @@
# ONNC v1.2.0 Supported Operators

## Supported ONNX Operators

- *Add*
- *AveragePool*
- *BatchNormalization*
- *Concat*
- *Conv*
- *Gemm*
- *GlobalAveragePool*
- *LRN*
- *MaxPool*
- *Mul*
- *Relu*
- *Reshape*
- *Softmax*
- *Sum*
- *Transpose* *(used in ShuffleNet)*
- *Unsqueeze*

## Hardware Execution Unit Mapping

|Operator|Execution Unit|
|:-|:-:|
|*Add*|*SDP*|
|*AveragePool*|*PDP*|
|*BatchNormalization*|*SDP*|
|*Concat*|*RUBIK*|
|*Conv*|*CONV*, *SDP*|
|*Gemm*|*CONV*, *SDP*|
|*GlobalAveragePool*|*PDP*, *SDP*|
|*LRN*|*CDP*|
|*MaxPool*|*PDP*|
|*Mul*|*SDP*|
|*Relu*|*SDP*|
|*Reshape*|*Compiler Optimization*|
|*Softmax*|***EMU***|
|*Sum*|*SDP*|
|*Transpose*|*RUBIK*|
|*Unsqueeze*|*Computer Optimization*|

## Limitations

### Add

- Support [*conditional broadcasting*](#Conditional-Broadcasting)
- Accept **at most** *1* input constant tensor

### AveragePool

- Attribute *auto_pad* should be "NOTSET"
- Attribute *kernel_shape* should contain values in the range of *[1, 8]*
- Attribute *pads* should contain values in the range of *[0, 7]*
- Attribute *strides* should contain values in the range of *[1, 8]*

### BatchNormalization

- Input *scale* should be a constant tensor
- Input *B* should be a constant tensor
- Input *mean* should be a constant tensor
- Input *var* should be a constant tensor

### Concat

- Attribute *axis* should be *1*
- Input cannot be a constant tensor
- Input tensor's height<sub>input</sub>, width<sub>input</sub> , and channel<sub>input</sub> should all be in the range of *[1, 8192]*
- Output tensor's height<sub>output</sub>, width<sub>output</sub> and channel<sub>output</sub> should all be in the range of *[1, 8192]*

### Conv

- Input *W* should be a constant tensor
- Input *B* should be a constant tensor
- Attribute *auto_pad* should be "NOTSET"
- Attribute *group* should in the range of *[1, 8192]*
- Attribute *dilations* should contain values in the range of *[1, 32]*
- Attribute *kernel_shape* should contain values in the range of *[1, 32]*
- Attribute *pads* should contain values in the range of *[0, 31]*
- Attribute *strides* should contain values in the range of *[1, 8]*

### Gemm

- Input *B* should be a constant tensor
- Input *C* should be a constant tensor

### GlobalAveragePool

*None*

### LRN

- Attribute *alpha* should be *0.0001*
- Attribute *beta* should be *0.75*
- Attribute *bias* should be *1.0*
- Attribute *size* should be one of [*3*, *5*, *7*, *9*]

### MaxPool

- Attribute *storage_order* should be *0*
- Attribute *auto_pad* should be "NOTSET"
- Attribute *kernel_shape* should contain values in the range of *[1, 8]*
- Attribute *pads* should contain values in the range of *[0, 7]*
- Attribute *strides* should contain values in the range of *[1, 8]*

### Mul

- Support [*conditional broadcasting*](#Conditional-Broadcasting)
- Accept **at most** *1* input constant tensor

### Relu

*(none)*

### Reshape

- Input *shape* should be a constant tensor
- Input *data*'s height<sub>data</sub> * width<sub>data</sub> should be equal to its output tensor's height<sub>reshaped</sub> * width<sub>reshaped</sub>

### Softmax

- Attribute *axis* should be *1*

### Sum

- Accept **at most** *2* input tensors
- Input tensors should **not** be constant tensors
- Input tensors' shape should be identical

### Transpose

- Only support the *Reshape*-*Transpose*-*Reshape* pattern used *in ShuffleNet*
- Dimension of input tensor should be *5*
- Attribute *perm* should be *[0, 2, 1, 3, 4]*

### Unsqueeze

- Input should be a constant tensor
- Dimension of output tensor should be less than or equal to *4*

## Conditional Broadcasting

Unlike ONNX's broadcasting [definition](https://github.com/onnx/onnx/blob/rel-1.3.0/docs/Broadcasting.md), *ONNC* can broadcast tensors in some cases due to limited NVDLA hardware support.

Limitation of *NVDLA* hardware
- For broadcasting sources in SDP operations, only support constants.
- No support for bi-directional broadcasting

Broadcasting support in *ONNC*
- Single direction per-channel broadcasting
- Single direction per-layer broa
- |dcasting (Source should be a constant tensor)

@@ -8,7 +8,7 @@ This application note demonstrates how to create an ONNC backend for a target ha

ONNC is a collection of open source, modular, reusable compiler algorithms, and tool chains targeted on deep learning accelerators (DLAs). ONNC has been built from ground up for translating ONNX intermediate representations (IRs) to proprietary DLA code. Its software architecture design emphasizes portability and reusability, thus simplifying retargeting. Figure 1 depicts the top-level block diagram of ONNC software stacks. The software stack illustrates the functional blocks from importing an ONNX computation graph model to emitting corresponding hardware binaries. In addition to leveraging the LLVM backend, ONNC paves another fast track for proprietary DLAs to execute ONNX models by defining ONNC IR, an intermediate representation (IR) that has one-to-one mapping to the ONNX IR. Two other popular compilation frameworks in deep learning systems, TVM and Glow, built their software stacks on top of the LLVM backend. The intermediate representations of LLVM have a finer granularity than ONNC IRs while mapping to hardware operators. For accelerators built with coarse-grained operators such as convolution, it requires more porting efforts to hack the LLVM backend. Many DLA designs such as Nvidia’s [NVDLA](http://nvdla.org/) and Bitman’s [Sophon BM168X series](https://sophon.ai/product/introduce/bm1682.html) favor coarse-grained operators over LLVM operators. In those cases, ONNC provides a more straightforward way to convert ONNX models to target binaries using its own Vanilla backend. thus speeding up porting a compiler to new hardware. For fast porting, users only need to copy the Vanilla backend as a template, override two software pipes at minimum, add optional optimization passes and the framework will handle the rest of work like a charm.

![](https://github.com/ONNC/onnc/wiki/files/1.0.0/onnc-software-architecture-diagram.png)
![](https://github.com/ONNC/onnc/wiki/files/1.2.0/onnc-software-architecture-diagram.png)

**Figure 1. ONNC Software Architecture Diagram**

@@ -196,23 +196,24 @@ The `codeEmit` pass is added to handle code generation for the target backend. P
The generated backend has included some optimization algorithms by default. Each algorithm is implemented in ONNC as a “pass” (the same concept as the LLVM pass). The default optimization passes may not fit your need so you need to develop your own passes and then edit `FooBackend.cpp` to add those passes into the compilation flow. Below is an example of adding a pass. Refer to the application note, [Pass Manager Getting Started Guide](ONNC-Pass-Manager-Getting-Started-Guide.md), for more details about how to add a pass.

```cpp
void FooBackend::addTensorSel(PassManager& pPM)
void FooBackend::addOnncIrOptimization(PassManager& pPM, OptimizationOptions& options)
{
...
TargetBackend::addOnncIrOptimization(pPM, options);
// One example of adding your optimization pass.
pPM.add(CreateYourProprietaryPass());
pPM.add<YourProprietaryPass>();
}
```
**Code Snippet 2. Example of adding an optimization pass into a backend.**

In the above example, the optimization pass is added in the method, `addTensorSel()`. There are four stages in the compilation flow for users to add passes. Each stage in the compilation flow is implemented in a corresponding method. The following table shows the meaning and input/output of each method.
In the above example, the optimization pass is added in the method, `addOnncIrOptimization()`. There are five stages in the compilation flow for users to add passes. Each stage in the compilation flow is implemented in a corresponding method. The following table shows the meaning and input/output of each method.

**Table 2. The four methods representing the four compilation phases.**
**Table 2. The five methods representing the five compilation phases.**

| Method | Input | Output | Description |
| ------ | ----- | ------ | ----------- |
| `addTensorSel` | **ONNX IR** | **ONNC IR** | This method contains passes for translating models in the ONNX format into ONNC IR. |
| `addTensorSched` | **ONNC IR** | **ONNC IR** in optimized order | This method contains passes for better scheduling the execution order of ONNC IR. |
| `addOnncIrOptimization` | **ONNC IR** | **ONNC IR** in optimized order | This method contains passes for optimizing ONNC IR in order to better performance. |
| `addTensorSched` | **ONNC IR** in optimized order | **ONNC IR** in optimized order | This method contains passes for better scheduling the execution order of ONNC IR. |
| `addMemAlloc` | **ONNC IR** in optimized order | **ONNC IR** with addresses | This method contains passes for allocating memory space of input data, weights, and activation data. |
| `addCodeEmit` | **ONNC IR** with address | **Machine codes** | This method contains passes for handling code generation and optimization. |

@@ -19,7 +19,7 @@ The `CustomPass<T>` abstract class defines several virtual functions. These memb
| Prototype |
| --------- |
| `virtual ReturnType doInitialization(Module&);` |
| `virtual ReturnType runOnModule(Module&) = 0;` |
| `virtual ReturnType runOnModule(Module&);` |
| `virtual ReturnType doFinalization(Module&);` |

| Method | Description |
@@ -43,10 +43,10 @@ The above three methods are invoked exactly once per run. Users can assemble mea
```cpp
class MyPass : public CustomPass<MyPass> {
public:
ReturnType runOnModule(Module& module) override {
// do something here
return kModuleChanged;
}
ReturnType runOnModule(Module& module) override {
// do something here
return kModuleChanged;
}
};
```
The type argument in `CustomPass<T>` has to be the same as the derived class name.
@@ -66,11 +66,11 @@ class Bar: public CustomPass<Bar> { /* implementation goes here */ };
class MyPass : public CustomPass<MyPass> {
public:
/* other code here */
void getAnalysisUsage(AnalysisUsage& usage) const override {
usage.addRequired<Foo>();
usage.addRequired<Bar>();
}
/* other code here */
void getAnalysisUsage(AnalysisUsage& usage) const override {
usage.addRequired<Foo>();
usage.addRequired<Bar>();
}
};
```
@@ -79,18 +79,18 @@ Pass manager is designed to manage pass instances and pass executions. ONNC prov
### Registering a Pass
Users have to register a pass object via the method `add()` in the pass manager before they can be executed. There is only one registered pass object running at same time. The `add()` prototype is shown as below.
Users have to register a pass object via the method `add()` in the pass manager before they can be executed. There is only one registered pass object running at same time. The `add()` method prototype is shown as below.
| Prototype |
| --------- |
| `void add(Pass*);` |
| `template <typename Pass, typename... Args> add(Args&&...);` |
The following example shows how a pass object is registered to the pass manager.
```cpp
PassManager manager;
manager.add(new MyPass);
manager.add<MyPass>(); // provide constructor arguments and let PassManager create pass by its own
```
Pass manager gets pass dependency via the `getAnalysisUsage()` method, create and run unregistered pass objects if users declare such dependency in their customized pass type. Note that conditional dependency is not supported in the ONNC framework because the output of the `getAnalysisUsage()` method has to remain the same throughout the compilation process.

0 comments on commit 7d24ab6

Please sign in to comment.
You can’t perform that action at this time.