Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Add some readme in EN #529

Merged
merged 2 commits into from
Nov 8, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
282 changes: 280 additions & 2 deletions docs/en/faq/develop_a_new_model.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,281 @@
# FastDeploy integrates new model process
# How to Integrate New Model on FastDeploy

coming soon...
How to add a new model on FastDeploy, including C++/Python deployment? Here, we take the ResNet50 model in torchvision v0.12.0 as an example, introducing external [Model Integration](#modelsupport) on FastDeploy. The whole process only needs 3 steps.

| Step | Description | Create or modify the files |
|:-----------:|:--------------------------------------------------------------------------------:|:-----------------------------------------:|
| [1](#step2) | Add a model implementation to the corresponding task module in FastDeploy/vision | resnet.h、resnet.cc、vision.h |
| [2](#step4) | Python interface binding via pybind | resnet_pybind.cc、classification_pybind.cc |
| [3](#step5) | Use Python to call Interface | resnet.py、\_\_init\_\_.py |

After completing the above 3 steps, an external model is integrated.

If you want to contribute your code to FastDeploy, it is very kind of you to add test code, instructions (Readme), and code annotations for the added model in the [test](#test).

## Model Integration

### Prepare the models

Before integrating external models, it is important to convert the trained models (.pt, .pdparams, etc.) to the model formats (.onnx, .pdmodel) that FastDeploy supports for deployment. Most open source repositories provide model conversion scripts for developers. As torchvision does not provide conversion scripts, developers can write conversion scripts manually. In this demo, we convert `torchvison.models.resnet50` to `resnet50.onnx` with the following code for your reference.

```python
import torch
import torchvision.models as models
model = models.resnet50(pretrained=True)
batch_size = 1 #batch size
input_shape = (3, 224, 224) #Input data, change to your own input shape
model.eval()
x = torch.randn(batch_size, *input_shape) # Generate Tensor
export_onnx_file = "resnet50.onnx" # ONNX file name
torch.onnx.export(model,
x,
export_onnx_file,
opset_version=12,
input_names=["input"], # Input names
output_names=["output"], # Output names
dynamic_axes={"input":{0:"batch_size"}, # batch size variables
"output":{0:"batch_size"}})
```

Running the above script will generate a`resnet50.onnx` file.

### C++

* Create`resnet.h` file
* Create a path
* FastDeploy/fastdeploy/vision/classification/contrib/resnet.h (FastDeploy/C++ code/vision/task name/external model name/model name.h)
* Create content
* First, create ResNet class in resnet.h and inherit from FastDeployModel parent class, then declare `Predict`, `Initialize`, `Preprocess`, `Postprocess` and `Constructor`, and necessary variables, please refer to [resnet.h](https://github.com/PaddlePaddle/FastDeploy/pull/347/files#diff-69128489e918f305c208476ba793d8167e77de2aa7cadf5dcbac30da448bd28e) for details.

```C++
class FASTDEPLOY_DECL ResNet : public FastDeployModel {
public:
ResNet(...);
virtual bool Predict(...);
private:
bool Initialize();
bool Preprocess(...);
bool Postprocess(...);
};
```

* Create`resnet.cc` file
* Create a path
* FastDeploy/fastdeploy/vision/classification/contrib/resnet.cc (FastDeploy/C++ code/vision/task name/external model name/model name.cc)
* Create content
* Implement the specific logic of the functions declared in `resnet.h` to `resnet.cc`, where `PreProcess` and `PostProcess` need to refer to the official source library for pre- and post-processing logic reproduction. The specific logic of each ResNet function is as follows. For more detailed code, please refer to [resnet.cc](https:// github.com/PaddlePaddle/FastDeploy/pull/347/files#diff-d229d702de28345253a53f2a5839fd2c638f3d32fffa6a7d04d23db9da13a871).

```C++
ResNet::ResNet(...) {
// Constructor logic
// 1. Specify Backend 2. Set RuntimeOption 3. Call Initialize()function
}
bool ResNet::Initialize() {
// Initialization logic
// 1. Assign values to global variables 2. Call InitRuntime()function
return true;
}
bool ResNet::Preprocess(Mat* mat, FDTensor* output) {
// Preprocess logic
// 1. Resize 2. BGR2RGB 3. Normalize 4. HWC2CHW 5. save the results to FDTensor class
return true;
}
bool ResNet::Postprocess(FDTensor& infer_result, ClassifyResult* result, int topk) {
//Postprocess logic
// 1. Softmax 2. Choose topk labels 3. Save the results to ClassifyResult
return true;
}
bool ResNet::Predict(cv::Mat* im, ClassifyResult* result, int topk) {
Preprocess(...)
Infer(...)
Postprocess(...)
return true;
}
```

* Add new model file to`vision.h`
* modify location
* FastDeploy/fastdeploy/vision.h
* modify content

```C++
#ifdef ENABLE_VISION
#include "fastdeploy/vision/classification/contrib/resnet.h"
#endif
```

### Pybind

* Create Pybind file

* Create path

* FastDeploy/fastdeploy/vision/classification/contrib/resnet_pybind.cc (FastDeploy/C++ code/vision model/taks name/external model/model name_pybind.cc)

* Create content

* Use Pybind to bind function variables from C++ to Python, please refer to [resnet_pybind.cc](https://github.com/PaddlePaddle/FastDeploy/pull/347/files#diff-270af0d65720310e2cfbd5373c391b2110d65c0f4efa547f7b7eeffcb958bdec) for more details.

```C++
void BindResNet(pybind11::module& m) {
pybind11::class_<vision::classification::ResNet, FastDeployModel>(
m, "ResNet")
.def(pybind11::init<std::string, std::string, RuntimeOption, ModelFormat>())
.def("predict", ...)
.def_readwrite("size", &vision::classification::ResNet::size)
.def_readwrite("mean_vals", &vision::classification::ResNet::mean_vals)
.def_readwrite("std_vals", &vision::classification::ResNet::std_vals);
}
```

* Call Pybind function

* modify path

* FastDeploy/fastdeploy/vision/classification/classification_pybind.cc (FastDeploy/C++ code/vision model/task name/task name}_pybind.cc)

* modify content

```C++
void BindResNet(pybind11::module& m);
void BindClassification(pybind11::module& m) {
auto classification_module =
m.def_submodule("classification", "Image classification models.");
BindResNet(classification_module);
}
```

### Python

* Create`resnet.py`file
* Create path
* FastDeploy/python/fastdeploy/vision/classification/contrib/resnet.py (FastDeploy/Python code/fastdeploy/vision model/task name/external model/model name.py)
* Create content
* Create ResNet class inherited from FastDeployModel, and implement `\_\_init\_\_`, Pybind bonded functions (such as `predict()`), and `functions to assign and get global variables bound to Pybind`. Please refer to [resnet.py](https://github.com/PaddlePaddle/FastDeploy/pull/347/files#diff-a4dc5ec2d450e91f1c03819bf314c238b37ac678df56d7dea3aab7feac10a157) for details

```python
class ResNet(FastDeployModel):
def __init__(self, ...):
self._model = C.vision.classification.ResNet(...)
def predict(self, input_image, topk=1):
return self._model.predict(input_image, topk)
@property
def size(self):
return self._model.size
@size.setter
def size(self, wh):
...
```

* Import ResNet classes
* modify path
* FastDeploy/python/fastdeploy/vision/classification/\_\_init\_\_.py (FastDeploy/Python code/fastdeploy/vision model/task name/\_\_init\_\_.py)
* modify content

```Python
from .contrib.resnet import ResNet
```

## Test

### Compile

* C++
* Path:FastDeploy/

```
mkdir build & cd build
cmake .. -DENABLE_ORT_BACKEND=ON -DENABLE_VISION=ON -DCMAKE_INSTALL_PREFIX=${PWD/fastdeploy-0.0.3
-DENABLE_PADDLE_BACKEND=ON -DENABLE_TRT_BACKEND=ON -DWITH_GPU=ON -DTRT_DIRECTORY=/PATH/TO/TensorRT/
make -j8
make install
```

Compile to get build/fastdeploy-0.0.3/。

* Python
* Path:FastDeploy/python/

```
export TRT_DIRECTORY=/PATH/TO/TensorRT/ #If TensorRT is used, developers need to fill in the location of TensorRT and enable ENABLE_TRT_BACKEND
export ENABLE_TRT_BACKEND=ON
export WITH_GPU=ON
export ENABLE_PADDLE_BACKEND=ON
export ENABLE_OPENVINO_BACKEND=ON
export ENABLE_VISION=ON
export ENABLE_ORT_BACKEND=ON
python setup.py build
python setup.py bdist_wheel
cd dist
pip install fastdeploy_gpu_python-Version number-cpxx-cpxxm-system architecture.whl
```

### Compile Test Code

* Create path: FastDeploy/examples/vision/classification/resnet/ (FastDeploy/examples/vision model/task anme/model name/)
* Creating directory structure

```
.
├── cpp
│ ├── CMakeLists.txt
│ ├── infer.cc // C++ test code
│ └── README.md // C++ Readme
├── python
│ ├── infer.py // Python test code
│ └── README.md // Python Readme
└── README.md // ResNet model integration readme
```

* C++
* Write CmakeLists、C++ code and README.md . Please refer to[cpp/](https://github.com/PaddlePaddle/FastDeploy/pull/347/files#diff-afcbe607b796509581f89e38b84190717f1eeda2df0419a2ac9034197ead5f96)
* Compile infer.cc
* Path:FastDeploy/examples/vision/classification/resnet/cpp/

```
mkdir build & cd build
cmake .. -DFASTDEPLOY_INSTALL_DIR=/PATH/TO/FastDeploy/build/fastdeploy-0.0.3/
make
```

* Python
* Please refer to[python/](https://github.com/PaddlePaddle/FastDeploy/pull/347/files#diff-5a0d6be8c603a8b81454ac14c17fb93555288d9adf92bbe40454449309700135) for Python code and Readme.md

### Annotate the Code



To make the code clear for understanding, developers can annotate the newly-added code.

- C++ code
Developers need to add annotations for functions and variables in the resnet.h file, there are three annotating methods as follows, please refer to [resnet.h](https://github.com/PaddlePaddle/FastDeploy/pull/347/files#diff- 69128489e918f305c208476ba793d8167e77de2aa7cadf5dcbac30da448bd28e) for more details.

```C++
/** \brief Predict for the input "im", the result will be saved in "result".
*
* \param[in] im Input image for inference.
* \param[in] result Saving the inference result.
* \param[in] topk The length of return values, e.g., if topk==2, the result will include the 2 most possible class label for input image.
*/
virtual bool Predict(cv::Mat* im, ClassifyResult* result, int topk = 1);
/// Tuple of (width, height)
std::vector<int> size;
/*! @brief Initialize for ResNet model, assign values to the global variables and call InitRuntime()
*/
bool Initialize();
```

- Python
The following example is to demonstrate how to annotate functions and variables in resnet.py file. For more details, please refer to [resnet.py](https://github.com/PaddlePaddle/FastDeploy/pull/347/files#diff-a4dc5ec2d450e91f1c03819bf314c238b37ac678df56d7dea3aab7feac10a157).

```python
def predict(self, input_image, topk=1):
"""Classify an input image
:param input_image: (numpy.ndarray)The input image data, 3-D array with layout HWC, BGR format
:param topk: (int)The topk result by the classify confidence score, default 1
:return: ClassifyResult
"""
return self._model.predict(input_image, topk)
```

Other files in the integration process can also be annotated to explain the details of the implementation.
29 changes: 29 additions & 0 deletions examples/vision/detection/yolov5/quantize/README_EN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# YOLOv5 Quantized Model Deployment

FastDeploy supports the deployment of quantized models and provides a one-click model quantization tool.
Users can use the one-click model quantization tool to quantize and deploy the models themselves or download the quantized models provided by FastDeploy directly for deployment.

## FastDeploy One-Click Model Quantization Tool

FastDeploy provides a one-click quantization tool that allows users to quantize a model simply with a configuration file.
For a detailed tutorial, please refer to: [One-Click Model Quantization Tool](... /... /... /... /... /... /... /tools/quantization/)

## Download Quantized YOLOv5s Model

Users can also directly download the quantized models in the table below for deployment.

| Model | Inference Backend | Hardware | FP32 Inference Time Delay | INT8  Inference Time Delay | Acceleration ratio | FP32 mAP | INT8 mAP | Method |
| ----------------------------------------------------------------------- | ----------------- | -------- | ------------------------- | -------------------------- | ------------------ | -------- | -------- | ------------------------------- |
| [YOLOv5s](https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s_quant.tar) | TensorRT | GPU | 8.79 | 5.17 | 1.70 | 37.6 | 36.6 | Quantized distillation training |
| [YOLOv5s](https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s_quant.tar) | Paddle Inference | CPU | 217.05 | 133.31 | 1.63 | 37.6 | 36.8 | Quantized distillation training |

The data in the above table shows the end-to-end inference performance of FastDeploy deployment before and after model quantization.

- The test images are from COCO val2017.
- The inference time delay is the inference latency on different Runtime in milliseconds.
- CPU is Intel(R) Xeon(R) Gold 6271C, GPU is Tesla T4, TensorRT version 8.4.15, and the fixed CPU thread is 1 for all tests.

## More Detailed Tutorials

- [Python Deployment](python)
- [C++ Deployment](cpp)
57 changes: 57 additions & 0 deletions examples/vision/detection/yolov5/serving/README_EN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# YOLOv5 Serving Deployment Demo

## Launch Serving

```bash
#Download yolov5 model file
wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s.onnx

# Save the model under models/infer/1 and rename it as model.onnx
mv yolov5s.onnx models/infer/1/

# Pull fastdeploy image
docker pull paddlepaddle/fastdeploy:0.3.0-gpu-cuda11.4-trt8.4-21.10

# Run the docker. The docker name is fd_serving, and the current directory is mounted as the docker's /yolov5_serving directory
nvidia-docker run -it --net=host --name fd_serving -v `pwd`/:/yolov5_serving paddlepaddle/fastdeploy:0.3.0-gpu-cuda11.4-trt8.4-21.10 bash

# Start the service (Without setting the CUDA_VISIBLE_DEVICES environment variable, it will have scheduling privileges for all GPU cards)
CUDA_VISIBLE_DEVICES=0 fastdeployserver --model-repository=models --backend-config=python,shm-default-byte-size=10485760
```

Output the following contents if serving is launched

```
......
I0928 04:51:15.784517 206 grpc_server.cc:4117] Started GRPCInferenceService at 0.0.0.0:8001
I0928 04:51:15.785177 206 http_server.cc:2815] Started HTTPService at 0.0.0.0:8000
I0928 04:51:15.826578 206 http_server.cc:167] Started Metrics Service at 0.0.0.0:8002
```

## Client Requests

Execute the following command in the physical machine to send a grpc request and output the result

```
#Download test images
wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg

#Installing client-side dependencies
python3 -m pip install tritonclient\[all\]

# Send requests
python3 yolov5_grpc_client.py
```

When the request is sent successfully, the results are returned in json format and printed out:

```
output_name: detction_result
{'boxes': [[268.48028564453125, 81.05305480957031, 298.69476318359375, 169.43902587890625], [104.73116302490234, 45.66197204589844, 127.58382415771484, 93.44938659667969], [378.9093933105469, 39.75013732910156, 395.6086120605469, 84.24342346191406], [158.552978515625, 80.36149597167969, 199.18576049804688, 168.18191528320312], [414.37530517578125, 90.94805908203125, 506.3218994140625, 280.40521240234375], [364.00341796875, 56.608917236328125, 381.97857666015625, 115.96823120117188], [351.7251281738281, 42.635345458984375, 366.9103088378906, 98.04837036132812], [505.8882751464844, 114.36674499511719, 593.1248779296875, 275.99530029296875], [327.7086181640625, 38.36369323730469, 346.84991455078125, 80.89302062988281], [583.493408203125, 114.53289794921875, 612.3546142578125, 175.87353515625], [186.4706573486328, 44.941375732421875, 199.6645050048828, 61.037628173828125], [169.6158905029297, 48.01460266113281, 178.1415557861328, 60.88859558105469], [25.81019401550293, 117.19969177246094, 59.88878631591797, 152.85012817382812], [352.1452941894531, 46.71272277832031, 381.9460754394531, 106.75212097167969], [1.875, 150.734375, 37.96875, 173.78125], [464.65728759765625, 15.901412963867188, 472.512939453125, 34.11640930175781], [64.625, 135.171875, 84.5, 154.40625], [57.8125, 151.234375, 103.0, 174.15625], [165.890625, 88.609375, 527.90625, 339.953125], [101.40625, 152.5625, 118.890625, 169.140625]], 'scores': [0.8965693116188049, 0.8695310950279236, 0.8684297800064087, 0.8429877758026123, 0.8358422517776489, 0.8151364326477051, 0.8089362382888794, 0.801361083984375, 0.7947245836257935, 0.7606497406959534, 0.6325908303260803, 0.6139386892318726, 0.5906146764755249, 0.505328893661499, 0.40457233786582947, 0.3460320234298706, 0.33283042907714844, 0.3325657248497009, 0.2594234347343445, 0.25389009714126587], 'label_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 0, 24, 24, 33, 24], 'masks': [], 'contain_masks': False}
```

## Modify Configs



The default is to run ONNXRuntime on GPU. If developers need to run it on CPU or other inference engines, please see the [Configs File](../../../../../serving/docs/zh_CN/model_configuration.md) to modify the configs in `models/runtime/config.pbtxt`.
Loading