Add Image Text Retrieval taskflow&pipelines API #4516

w5688414 · 2023-01-18T13:00:47Z

PR types

PR changes

Description

新增Taskflow 图文向量检索API，支持（Ernievil2，ChineseCLIP，CLIP模型），后续用于PIpelines的跨模态搜索应用。

from paddlenlp import Taskflow
from PIL import Image

# multi modal feature_extraction with ernie_vil-2.0-base-zh
vision_language = Taskflow("feature_extraction")
image_embeds = vision_language([Image.open("demo/000000039769.jpg")])
print(image_embeds)
'''
Tensor(shape=[1, 768], dtype=float32, place=Place(gpu:0), stop_gradient=True,
        [[-0.59475428, -0.69795364,  0.22144008,  0.88066685, -0.58184201,
            -0.73454666,  0.95557910, -0.61410815,  0.23474170,  0.13301648,
            0.86196446,  0.12281934,  0.69097638,  1.47614217,  0.07238606,
            ...

'''
text_embeds = vision_language(["猫的照片","狗的照片"])
text_features = text_embeds["features"]
print(text_features)
'''
Tensor(shape=[2, 768], dtype=float32, place=Place(gpu:0), stop_gradient=True,
        [[ 0.04250504, -0.41429776,  0.26163983, ...,  0.26221892,
            0.34387422,  0.18779707],
'''
image_features /= image_features.norm(axis=-1, keepdim=True)
text_features /= text_features.norm(axis=-1, keepdim=True)
logits_per_image = 100 * image_features @ text_features.t()
probs = F.softmax(logits_per_image, axis=-1)
print(probs)
'''
Tensor(shape=[1, 2], dtype=float32, place=Place(gpu:0), stop_gradient=True,
    [[0.99833173, 0.00166824]])
'''

1. 场景概述

文图跨模态检索系统目的是通过文字找到最符合描述的图片。传统的方案是用标签和图片的关键字进行匹配，而跨模态检索真正的实现了文本语义和图片语义内容的匹配，这种检索方式更符合人类的逻辑判断，是一种真正意义上的端到端人工智能。文图应用目前可以广泛应用于电商搜索，安防视频，图像检索，抖音等小视频，旅游app应用搜索。有助于提升效率和搜索体验。另外还有一些潜在的领域，比如司法的互联网调查取证，侵权检测，数据增强，文案匹配，各种互联网logo，肖像，风景，海报等图片网站的检索，医药等专业领域的文图搜索等。

2. 产品功能介绍

本项目提供了低成本搭建端到端文图跨模态检索系统的能力。用户只需要处理好自己的业务数据，就可以使用本项目预置的文图跨模态检索系统模型快速搭建一个针对自己业务数据的跨模态检索系统，并可以提供 Web 化产品服务。

paddle-bot · 2023-01-18T13:00:51Z

Thanks for your contribution!

JunnYu · 2023-01-18T23:55:39Z

paddlenlp/taskflow/taskflow.py

+    "vision_language": {
+        "models": {
+            "PaddlePaddle/ernie_vil-2.0-base-zh": {
+                "task_class": VisionLanguageTask,
+                "task_flag": "vision_language_embeddings-2.0-base-zh",
+            },
+        },
+        "default": {"model": "PaddlePaddle/ernie_vil-2.0-base-zh"},
+    },


感觉除了ernie_vil应该也可以接一下clip和chineseclip吧

已经添加

sijunhe · 2023-01-19T02:59:34Z

这个taskflow需要设计一下，以下几个点需要考虑：

vision_language这个名字过于通用，应该更详细一点
如果这个是做匹配，那么建议尽量和匹配的taskflow的api一致，方便后续在做单模态、多模态检索的时候提供一致的体验。
@linjieccc 那边在做taskflow设计的升级，将taskflow的逻辑梳理一下，同时做的更加通用。年后可以沟通一下

w5688414 · 2023-01-19T07:26:36Z

这个taskflow需要设计一下，以下几个点需要考虑：

vision_language这个名字过于通用，应该更详细一点

如果这个是做匹配，那么建议尽量和匹配的taskflow的api一致，方便后续在做单模态、多模态检索的时候提供一致的体验。

@linjieccc 那边在做taskflow设计的升级，将taskflow的逻辑梳理一下，同时做的更加通用。年后可以沟通一下

嗯嗯，这个API后续会用在Pipelines里面做图文搜索应用，当前的设计有好多需要讨论的点，年后可以沟通一下，设计得更通用一点后，再继续进行完善。
后续语义索引的模型，也会以Taskflow的形式接入，然后应用到pipelines中，可以全局再看看。

…nto vl1

codecov · 2023-01-31T12:26:22Z

Codecov Report

Merging #4516 (1b2e89d) into develop (96b2493) will increase coverage by 0.17%.
The diff coverage is 73.92%.

@@             Coverage Diff             @@
##           develop    #4516      +/-   ##
===========================================
+ Coverage    43.98%   44.15%   +0.17%     
===========================================
  Files          440      443       +3     
  Lines        63145    63446     +301     
===========================================
+ Hits         27773    28014     +241     
- Misses       35372    35432      +60

Impacted Files	Coverage Δ
paddlenlp/taskflow/task.py	`51.98% <25.00%> (+5.27%)`	⬆️
paddlenlp/transformers/auto/processing.py	`53.16% <53.16%> (ø)`
paddlenlp/taskflow/feature_extraction.py	`84.02% <84.02%> (ø)`
paddlenlp/taskflow/taskflow.py	`80.45% <100.00%> (+3.71%)`	⬆️
paddlenlp/transformers/__init__.py	`100.00% <100.00%> (ø)`
paddlenlp/transformers/chineseclip/procesing.py	`40.00% <100.00%> (+1.22%)`	⬆️
paddlenlp/transformers/clip/procesing.py	`40.00% <100.00%> (+1.22%)`	⬆️
paddlenlp/transformers/ernie_vil/procesing.py	`82.00% <100.00%> (+0.36%)`	⬆️
paddlenlp/datasets/imdb.py	`41.66% <0.00%> (-4.49%)`	⬇️
paddlenlp/datasets/poetry.py	`45.16% <0.00%> (-3.33%)`	⬇️
... and 40 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

wawltor · 2023-02-01T13:46:33Z

paddlenlp/transformers/auto/processing.py

@@ -0,0 +1,175 @@
+# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
+#


这里最好引入一下Huggingface的license

wawltor · 2023-02-01T13:48:38Z

paddlenlp/taskflow/image_text_retrieval.py

+        """
+        self._input_spec = [
+            paddle.static.InputSpec(shape=[None, None], dtype="int64", name="input_ids"),
+        ]


这里的任务设计有点不符合Taskflow的定位，Taskflow是任务类型，就是图像和文本是否匹配，以及输出匹配的概率，这个是需要满足的，同时如何需要输出图像和文本的feature，可以另外单独的接口输出，例如通过flag，输出文本特征和图像特征

我理解的话这个是做召回的，所以只能双塔输出embedding, 会有另外一个taskflow做text image matching

API已经重新设计，定位是featureextraction

wawltor · 2023-02-01T13:50:04Z

paddlenlp/taskflow/taskflow.py

@@ -486,6 +487,75 @@
        },
        "default": {"model": "utc-large"},
    },
+    "image_text_retrieval": {


对于任务的设计，可以参考一下HF的pipelines是否类似的任务

已参考featureextraction的实现

sijunhe · 2023-02-02T02:19:32Z

paddlenlp/taskflow/image_text_retrieval.py

+            if "input_ids" in batch_inputs:
+                text_features = self._model.get_text_features(input_ids=batch_inputs["input_ids"])
+                all_feats.append(text_features)
+            if "pixel_values" in batch_inputs:
+                image_features = self._model.get_image_features(pixel_values=batch_inputs["pixel_values"])
+                all_feats.append(image_features)
+        inputs.update({"features": all_feats})


这里是不是features分开存更好？只有image，只有text和image text都有的情况会不会分不开？

image和text都存在的情况，建议用户分image和text，两次调用。两种输入同时传入，在组装batch的时候适配有点麻烦，所以没有考虑。

sijunhe · 2023-02-02T02:25:49Z

paddlenlp/taskflow/image_text_retrieval.py

+        """
+        self._input_spec = [
+            paddle.static.InputSpec(shape=[None, None], dtype="int64", name="input_ids"),
+        ]


我理解的话这个是做召回的，所以只能双塔输出embedding, 会有另外一个taskflow做text image matching

sijunhe · 2023-02-02T02:28:32Z

paddlenlp/taskflow/image_text_retrieval.py

+from .task import Task
+
+
+class ImageTextRetrievalTask(Task):


我理解这个task写成ImageTextRetrievalTask不是特别合适。既然是retrieval, 那么每次只会在双塔里面跑一个塔，然后产出embeddings，用于ANN搜索。既然这样，不如写成一个单输入的FeatureExtractionTask, 这样子task可以跑clip类的跨模态，也可以跑纯文本的召回模型

已经更名为MultimodalFeatureExtractionTask任务。

wawltor · 2023-02-07T09:19:56Z

paddlenlp/taskflow/feature_extraction.py

+        """
+        inputs = self._check_input_text(inputs)
+        batches = self._batchify(inputs, self._batch_size)
+        outputs = {"batches": batches, "text": inputs}


这个key叫batchs有点歧义，features之类的？

已沟通，保持不变

wawltor · 2023-02-07T09:22:57Z

paddlenlp/taskflow/feature_extraction.py

+            self.output_handle_map["image"] = self.output_handle
+            self._config_map["image"] = self._config
+        else:
+            self._prepare_onnx_mode()


如果输入和输出都已经改了，onnx mode看起来也是需要修改，同时需要FP16 mode

wawltor · 2023-02-08T06:48:03Z

pipelines/examples/image_text_retrieval/run_search_web.sh

@@ -0,0 +1,18 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.


wawltor · 2023-02-08T06:54:05Z

pipelines/pipelines/nodes/retriever/embedder.py

+            sizes = {model.embedding_dim for model in self.models.values()}
+            if None in sizes:
+                logger.warning(
+                    "Haystack could not find the output embedding dimensions for '%s'. "


wawltor · 2023-02-08T06:56:21Z

pipelines/pipelines/nodes/retriever/embedder.py

+
+        self.models = {}  # replace str with ContentTypes starting from Python3.8
+        for content_type, embedding_model in embedding_models.items():
+            self.models[content_type] = Taskflow("feature_extraction", model=embedding_model)


这里有个疑问，Taskflow输出就是一个dict，包含了多模的信息，这为什么还要传一个dict

做了一点修改，这是因为在设计的时候兼容各种模态，比如text,image,video,audio,table等，每个类型都要加载一个对应的embedding encoder。

wawltor · 2023-02-08T06:59:03Z

pipelines/pipelines/nodes/retriever/embedder.py

+        feature_extractors_params = {
+            content_type: {"max_length": 256, **(feature_extractors_params or {}).get(content_type, {})}
+            for content_type in ["text", "table", "image", "audio"]  # FIXME get_args(ContentTypes) from Python3.8 on
+        }


如果audio没有支持，是不是去掉更好

…nto vl1

linjieccc · 2023-02-13T04:41:12Z

paddlenlp/taskflow/taskflow.py

@@ -514,6 +515,75 @@
        },
        "default": {"model": "utc-large"},
    },
+    "feature_extraction": {
+        "models": {
+            "PaddlePaddle/ernie_vil-2.0-base-zh": {


ERNIE系列模型这里是否可以简化为ernie_vil-2.0-base-zh

已跟余军沟通，ernie_vil 2.0上传的时候当成社区模型了

linjieccc · 2023-02-13T04:42:28Z

paddlenlp/taskflow/taskflow.py

+        "models": {
+            "PaddlePaddle/ernie_vil-2.0-base-zh": {
+                "task_class": MultimodalFeatureExtractionTask,
+                "task_flag": "image_text_retrieval-2.0-base-zh",


image_text_retrieval-2.0-base-zh -> feature_extraction-ernie_vil-2.0-base-zh

建议按照 <task_name>-<model_name>的形式

linjieccc · 2023-02-13T04:49:42Z

docs/model_zoo/taskflow.md

+>>> from paddlenlp import Taskflow
+>>> from PIL import Image
+>>> import paddle.nn.functional as F
+# 单条输入


这里似乎缺少任务实例化的代码 task = Taskflow("feature_extraction")...

linjieccc · 2023-02-13T04:50:14Z

paddlenlp/taskflow/feature_extraction.py

+            from PIL import Image
+
+            # multi modal feature_extraction with ernie_vil-2.0-base-zh
+            senta = Taskflow("feature_extraction")


建议用其他命名替换一下senta

书写错误，已纠正

wawltor

LGTM

linjieccc · 2023-02-13T06:05:21Z

paddlenlp/taskflow/task.py

@@ -61,6 +61,8 @@ def __init__(self, model, task, priority_path=None, **kwargs):
        self._home_path = self.kwargs["home_path"] if "home_path" in self.kwargs else PPNLP_HOME
        self._task_flag = self.kwargs["task_flag"] if "task_flag" in self.kwargs else self.model
        self.from_hf_hub = kwargs.pop("from_hf_hub", False)
+        # Add mode flag for onnx output path redirection
+        self.mode = None


这里建议换一个变量名，mode在分词和ner任务中已经用过了

已经换成 export_type

linjieccc · 2023-02-14T06:59:49Z

paddlenlp/taskflow/feature_extraction.py

@@ -182,12 +182,13 @@ class MultimodalFeatureExtractionTask(Task):
        },
    }

-    def __init__(self, task, model, batch_size=1, _static_mode=True, **kwargs):
+    def __init__(self, task, model, batch_size=1, _static_mode=True, return_tensors=True, **kwargs):


_static_mode建议改成is_static_model，和其他任务统一

linjieccc

LGTM for Taskflow

Add vision language taskflow API

e29e4a5

w5688414 self-assigned this Jan 18, 2023

JunnYu reviewed Jan 18, 2023

View reviewed changes

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

b3c7d0a

…nto vl1

w5688414 changed the title ~~Add vision language taskflow API~~ Add Image Text Retrieval taskflow API Jan 31, 2023

Update image text retrieval taskflow api

88ea9fb

w5688414 requested review from sijunhe, wawltor and JunnYu January 31, 2023 12:07

wawltor reviewed Feb 1, 2023

View reviewed changes

Add multimodal_retriever of pipelines

2970900

w5688414 changed the title ~~Add Image Text Retrieval taskflow API~~ Add Image Text Retrieval taskflow&pipelines API Feb 1, 2023

sijunhe reviewed Feb 2, 2023

View reviewed changes

w5688414 added 2 commits February 3, 2023 11:06

Add image text retrieval pipelines application

6840d71

Change image text retrieval to feature extraction

3bcb212

wawltor reviewed Feb 8, 2023

View reviewed changes

w5688414 added 4 commits February 9, 2023 06:02

Add feature extraction docs

dd4c83b

Add onnx support

f1c08b9

Fix some bugs and remove unused comments

5840835

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

90d0a66

…nto vl1

linjieccc reviewed Feb 13, 2023

View reviewed changes

w5688414 added 2 commits February 13, 2023 05:26

fix some errors and adjust onnx ouput config

932cd02

Update docs

70cf0d8

wawltor previously approved these changes Feb 13, 2023

View reviewed changes

linjieccc reviewed Feb 13, 2023

View reviewed changes

Add taskflow loading finetune model

858a686

w5688414 dismissed wawltor’s stale review via 858a686 February 13, 2023 07:35

w5688414 added 6 commits February 13, 2023 08:10

Rename mode to export_type

2651dae

Remove clip english models

f8fcbae

Add unit test for feature extraction taskflow

d6c5638

set delta to 1e-5

0cd4588

change delta to 1e-5

75d72dc

change delta to 1e-5

d07f8c4

linjieccc reviewed Feb 14, 2023

View reviewed changes

Change to is_static_model

1b2e89d

linjieccc approved these changes Feb 14, 2023

View reviewed changes

w5688414 merged commit 2992a81 into PaddlePaddle:develop Feb 14, 2023

w5688414 mentioned this pull request Feb 17, 2023

PaddleNLP 2.5.1 Release Note Candidate #4852

Closed

		@@ -0,0 +1,175 @@
		# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
		#

		@@ -0,0 +1,18 @@
		# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.

Add Image Text Retrieval taskflow&pipelines API #4516

Add Image Text Retrieval taskflow&pipelines API #4516

Conversation

w5688414 commented Jan 18, 2023 • edited Loading

PR types

PR changes

Description

1. 场景概述

2. 产品功能介绍

paddle-bot bot commented Jan 18, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sijunhe commented Jan 19, 2023

w5688414 commented Jan 19, 2023 • edited Loading

codecov bot commented Jan 31, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wawltor Feb 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

w5688414 Feb 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linjieccc Feb 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wawltor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linjieccc left a comment

Choose a reason for hiding this comment

w5688414 commented Jan 18, 2023 •

edited

Loading

w5688414 commented Jan 19, 2023 •

edited

Loading

codecov bot commented Jan 31, 2023 •

edited

Loading

wawltor Feb 1, 2023 •

edited

Loading

w5688414 Feb 10, 2023 •

edited

Loading

linjieccc Feb 13, 2023 •

edited

Loading