Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Image Text Retrieval taskflow&pipelines API #4516

Merged
merged 20 commits into from
Feb 14, 2023

Conversation

w5688414
Copy link
Contributor

@w5688414 w5688414 commented Jan 18, 2023

PR types

PR changes

Description

  • 新增Taskflow 图文向量检索API,支持(Ernievil2,ChineseCLIP,CLIP模型),后续用于PIpelines的跨模态搜索应用。
from paddlenlp import Taskflow
from PIL import Image

# multi modal feature_extraction with ernie_vil-2.0-base-zh
vision_language = Taskflow("feature_extraction")
image_embeds = vision_language([Image.open("demo/000000039769.jpg")])
print(image_embeds)
'''
Tensor(shape=[1, 768], dtype=float32, place=Place(gpu:0), stop_gradient=True,
        [[-0.59475428, -0.69795364,  0.22144008,  0.88066685, -0.58184201,
            -0.73454666,  0.95557910, -0.61410815,  0.23474170,  0.13301648,
            0.86196446,  0.12281934,  0.69097638,  1.47614217,  0.07238606,
            ...

'''
text_embeds = vision_language(["猫的照片","狗的照片"])
text_features = text_embeds["features"]
print(text_features)
'''
Tensor(shape=[2, 768], dtype=float32, place=Place(gpu:0), stop_gradient=True,
        [[ 0.04250504, -0.41429776,  0.26163983, ...,  0.26221892,
            0.34387422,  0.18779707],
'''
image_features /= image_features.norm(axis=-1, keepdim=True)
text_features /= text_features.norm(axis=-1, keepdim=True)
logits_per_image = 100 * image_features @ text_features.t()
probs = F.softmax(logits_per_image, axis=-1)
print(probs)
'''
Tensor(shape=[1, 2], dtype=float32, place=Place(gpu:0), stop_gradient=True,
    [[0.99833173, 0.00166824]])
'''

1. 场景概述

文图跨模态检索系统目的是通过文字找到最符合描述的图片。传统的方案是用标签和图片的关键字进行匹配,而跨模态检索真正的实现了文本语义和图片语义内容的匹配,这种检索方式更符合人类的逻辑判断,是一种真正意义上的端到端人工智能。文图应用目前可以广泛应用于电商搜索,安防视频,图像检索,抖音等小视频,旅游app应用搜索。有助于提升效率和搜索体验。另外还有一些潜在的领域,比如司法的互联网调查取证,侵权检测,数据增强,文案匹配,各种互联网logo,肖像,风景,海报等图片网站的检索,医药等专业领域的文图搜索等。

2. 产品功能介绍

本项目提供了低成本搭建端到端文图跨模态检索系统的能力。用户只需要处理好自己的业务数据,就可以使用本项目预置的文图跨模态检索系统模型快速搭建一个针对自己业务数据的跨模态检索系统,并可以提供 Web 化产品服务。

@paddle-bot
Copy link

paddle-bot bot commented Jan 18, 2023

Thanks for your contribution!

@w5688414 w5688414 self-assigned this Jan 18, 2023
Comment on lines 490 to 498
"vision_language": {
"models": {
"PaddlePaddle/ernie_vil-2.0-base-zh": {
"task_class": VisionLanguageTask,
"task_flag": "vision_language_embeddings-2.0-base-zh",
},
},
"default": {"model": "PaddlePaddle/ernie_vil-2.0-base-zh"},
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉除了ernie_vil应该也可以接一下clip和chineseclip吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经添加

@sijunhe
Copy link
Collaborator

sijunhe commented Jan 19, 2023

这个taskflow需要设计一下,以下几个点需要考虑:

  1. vision_language这个名字过于通用,应该更详细一点
  2. 如果这个是做匹配,那么建议尽量和匹配的taskflow的api一致,方便后续在做单模态、多模态检索的时候提供一致的体验。
  3. @linjieccc 那边在做taskflow设计的升级,将taskflow的逻辑梳理一下,同时做的更加通用。年后可以沟通一下

@w5688414
Copy link
Contributor Author

w5688414 commented Jan 19, 2023

这个taskflow需要设计一下,以下几个点需要考虑:

  1. vision_language这个名字过于通用,应该更详细一点
  2. 如果这个是做匹配,那么建议尽量和匹配的taskflow的api一致,方便后续在做单模态、多模态检索的时候提供一致的体验。
  3. @linjieccc 那边在做taskflow设计的升级,将taskflow的逻辑梳理一下,同时做的更加通用。年后可以沟通一下

嗯嗯,这个API后续会用在Pipelines里面做图文搜索应用,当前的设计有好多需要讨论的点,年后可以沟通一下,设计得更通用一点后,再继续进行完善。
后续语义索引的模型,也会以Taskflow的形式接入,然后应用到pipelines中,可以全局再看看。

@w5688414 w5688414 changed the title Add vision language taskflow API Add Image Text Retrieval taskflow API Jan 31, 2023
@codecov
Copy link

codecov bot commented Jan 31, 2023

Codecov Report

Merging #4516 (1b2e89d) into develop (96b2493) will increase coverage by 0.17%.
The diff coverage is 73.92%.

@@             Coverage Diff             @@
##           develop    #4516      +/-   ##
===========================================
+ Coverage    43.98%   44.15%   +0.17%     
===========================================
  Files          440      443       +3     
  Lines        63145    63446     +301     
===========================================
+ Hits         27773    28014     +241     
- Misses       35372    35432      +60     
Impacted Files Coverage Δ
paddlenlp/taskflow/task.py 51.98% <25.00%> (+5.27%) ⬆️
paddlenlp/transformers/auto/processing.py 53.16% <53.16%> (ø)
paddlenlp/taskflow/feature_extraction.py 84.02% <84.02%> (ø)
paddlenlp/taskflow/taskflow.py 80.45% <100.00%> (+3.71%) ⬆️
paddlenlp/transformers/__init__.py 100.00% <100.00%> (ø)
paddlenlp/transformers/chineseclip/procesing.py 40.00% <100.00%> (+1.22%) ⬆️
paddlenlp/transformers/clip/procesing.py 40.00% <100.00%> (+1.22%) ⬆️
paddlenlp/transformers/ernie_vil/procesing.py 82.00% <100.00%> (+0.36%) ⬆️
paddlenlp/datasets/imdb.py 41.66% <0.00%> (-4.49%) ⬇️
paddlenlp/datasets/poetry.py 45.16% <0.00%> (-3.33%) ⬇️
... and 40 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@@ -0,0 +1,175 @@
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
#
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里最好引入一下Huggingface的license

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已添加

"""
self._input_spec = [
paddle.static.InputSpec(shape=[None, None], dtype="int64", name="input_ids"),
]
Copy link
Collaborator

@wawltor wawltor Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的任务设计有点不符合Taskflow的定位,Taskflow是任务类型,就是图像和文本是否匹配,以及输出匹配的概率,这个是需要满足的,同时如何需要输出图像和文本的feature,可以另外单独的接口输出,例如通过flag,输出文本特征和图像特征

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我理解的话这个是做召回的,所以只能双塔输出embedding, 会有另外一个taskflow做text image matching

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API已经重新设计,定位是featureextraction

@@ -486,6 +487,75 @@
},
"default": {"model": "utc-large"},
},
"image_text_retrieval": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对于任务的设计,可以参考一下HF的pipelines是否类似的任务

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已参考featureextraction的实现

@w5688414 w5688414 changed the title Add Image Text Retrieval taskflow API Add Image Text Retrieval taskflow&pipelines API Feb 1, 2023
Comment on lines 118 to 124
if "input_ids" in batch_inputs:
text_features = self._model.get_text_features(input_ids=batch_inputs["input_ids"])
all_feats.append(text_features)
if "pixel_values" in batch_inputs:
image_features = self._model.get_image_features(pixel_values=batch_inputs["pixel_values"])
all_feats.append(image_features)
inputs.update({"features": all_feats})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 这里是不是features分开存更好?只有image,只有text和image text都有的情况会不会分不开?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image和text都存在的情况,建议用户分image和text,两次调用。两种输入同时传入,在组装batch的时候适配有点麻烦,所以没有考虑。

"""
self._input_spec = [
paddle.static.InputSpec(shape=[None, None], dtype="int64", name="input_ids"),
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我理解的话这个是做召回的,所以只能双塔输出embedding, 会有另外一个taskflow做text image matching

from .task import Task


class ImageTextRetrievalTask(Task):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我理解这个task写成ImageTextRetrievalTask不是特别合适。既然是retrieval, 那么每次只会在双塔里面跑一个塔,然后产出embeddings,用于ANN搜索。既然这样,不如写成一个单输入的FeatureExtractionTask, 这样子task可以跑clip类的跨模态,也可以跑纯文本的召回模型

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经更名为MultimodalFeatureExtractionTask任务。

"""
inputs = self._check_input_text(inputs)
batches = self._batchify(inputs, self._batch_size)
outputs = {"batches": batches, "text": inputs}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个key叫batchs有点歧义,features之类的?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已沟通,保持不变

self.output_handle_map["image"] = self.output_handle
self._config_map["image"] = self._config
else:
self._prepare_onnx_mode()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果输入和输出都已经改了,onnx mode看起来也是需要修改,同时需要FP16 mode

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

@@ -0,0 +1,18 @@
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2022->2023

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

sizes = {model.embedding_dim for model in self.models.values()}
if None in sizes:
logger.warning(
"Haystack could not find the output embedding dimensions for '%s'. "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HayStack ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改


self.models = {} # replace str with ContentTypes starting from Python3.8
for content_type, embedding_model in embedding_models.items():
self.models[content_type] = Taskflow("feature_extraction", model=embedding_model)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里有个疑问,Taskflow输出就是一个dict,包含了多模的信息,这为什么还要传一个dict

Copy link
Contributor Author

@w5688414 w5688414 Feb 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

做了一点修改,这是因为在设计的时候兼容各种模态,比如text,image,video,audio,table等,每个类型都要加载一个对应的embedding encoder。

feature_extractors_params = {
content_type: {"max_length": 256, **(feature_extractors_params or {}).get(content_type, {})}
for content_type in ["text", "table", "image", "audio"] # FIXME get_args(ContentTypes) from Python3.8 on
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果audio没有支持,是不是去掉更好

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

@@ -514,6 +515,75 @@
},
"default": {"model": "utc-large"},
},
"feature_extraction": {
"models": {
"PaddlePaddle/ernie_vil-2.0-base-zh": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ERNIE系列模型这里是否可以简化为ernie_vil-2.0-base-zh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已跟余军沟通,ernie_vil 2.0上传的时候当成社区模型了

"models": {
"PaddlePaddle/ernie_vil-2.0-base-zh": {
"task_class": MultimodalFeatureExtractionTask,
"task_flag": "image_text_retrieval-2.0-base-zh",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image_text_retrieval-2.0-base-zh -> feature_extraction-ernie_vil-2.0-base-zh

Copy link
Contributor

@linjieccc linjieccc Feb 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议按照 <task_name>-<model_name>的形式

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修正

>>> from paddlenlp import Taskflow
>>> from PIL import Image
>>> import paddle.nn.functional as F
# 单条输入
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里似乎缺少任务实例化的代码 task = Taskflow("feature_extraction")...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已添加

from PIL import Image

# multi modal feature_extraction with ernie_vil-2.0-base-zh
senta = Taskflow("feature_extraction")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议用其他命名替换一下senta

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

书写错误,已纠正

wawltor
wawltor previously approved these changes Feb 13, 2023
Copy link
Collaborator

@wawltor wawltor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -61,6 +61,8 @@ def __init__(self, model, task, priority_path=None, **kwargs):
self._home_path = self.kwargs["home_path"] if "home_path" in self.kwargs else PPNLP_HOME
self._task_flag = self.kwargs["task_flag"] if "task_flag" in self.kwargs else self.model
self.from_hf_hub = kwargs.pop("from_hf_hub", False)
# Add mode flag for onnx output path redirection
self.mode = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里建议换一个变量名,mode在分词和ner任务中已经用过了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经换成 export_type

@@ -182,12 +182,13 @@ class MultimodalFeatureExtractionTask(Task):
},
}

def __init__(self, task, model, batch_size=1, _static_mode=True, **kwargs):
def __init__(self, task, model, batch_size=1, _static_mode=True, return_tensors=True, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_static_mode建议改成is_static_model,和其他任务统一

Copy link
Contributor

@linjieccc linjieccc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for Taskflow

@w5688414 w5688414 merged commit 2992a81 into PaddlePaddle:develop Feb 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants