support pp-translation pipeline #4133

changdazhou · 2025-06-04T19:18:49Z

No description provided.

paddle-bot · 2025-06-04T19:18:54Z

Thanks for your contribution!

TingquanGao · 2025-06-23T08:47:15Z

paddlex/inference/utils/io/readers.py

@@ -261,6 +298,15 @@ def read_file(self, in_path):
            yield img_cv


+class TXTReaderBackend(_BaseReaderBackend):


是不是这样更好：NativeFileReadlinesReaderBackend 或是 ReadlinesReaderBackend

TingquanGao · 2025-06-23T09:21:43Z

paddlex/inference/common/result/mixin.py

@@ -607,6 +607,7 @@ def __init__(self, *args: list, **kwargs: dict):
        self._markdown_writer = MarkdownWriter(*args, **kwargs)
        self._img_writer = ImageWriter(*args, **kwargs)
        self._save_funcs.append(self.save_to_markdown)
+        self.save_keys = {"markdown_texts"}


改为类属性
MARKDOWN_SAVE_KEYS = []

TingquanGao · 2025-06-23T09:22:10Z

paddlex/inference/pipelines/components/prompt_engineering/generate_translate_prompt.py

+from .base import BaseGeneratePrompt
+
+
+class GenerateTranslatePrompt(BaseGeneratePrompt):


TranslatePromptGenerator

TingquanGao · 2025-06-23T09:28:56Z

paddlex/inference/pipelines/pp_translation/pipeline.py

+
+@pipeline_requires_extra("ie")
+class PP_Translation_Pipeline(BasePipeline):
+    entities = ["PP-Translation"]


DocTranslation

TingquanGao · 2025-06-23T09:29:30Z

paddlex/inference/common/result/base_cv_result.py

-        else:
-            return fn
+            fn = f"{stem}_{page_idx}{suffix}"
+        if (language := self.get("language", None)) is not None:


TingquanGao · 2025-06-23T09:30:20Z

paddlex/inference/pipelines/pp_translation/pipeline.py

+
+        self.markdown_batch_sampler = MarkDownBatchSampler()
+
+        self.table_structure_len_max = 500


table_structure_len_max

TingquanGao · 2025-06-23T09:40:58Z

paddlex/inference/pipelines/pp_translation/pipeline.py

+            )
+
+        concatenate_result = {
+            "input_path": next(markdown_list)["input_path"],


Bobholamovic · 2025-06-26T11:29:15Z

paddlex/inference/pipelines/pp_doctranslation/pipeline.py

+    def visual_predict(
+        self,
+        input: Union[str, List[str], np.ndarray, List[np.ndarray]],
+        use_doc_orientation_classify: Optional[bool] = None,


这里的默认值和PP-StructureV3不一致，是有意的吗？除了这个参数以外，应该还有表格识别的一些参数。

Bobholamovic · 2025-06-26T11:29:31Z

.precommit/check_imports.py

@@ -83,6 +83,7 @@
    "ujson": "ujson",
    "uvicorn": "uvicorn",
    "yarl": "yarl",
+    "bs4": "beautifulsoup4",


建议按照字典序排列

Bobholamovic · 2025-06-26T11:29:43Z

setup.py

@@ -74,6 +74,7 @@
    "ujson": "",
    "uvicorn": ">= 0.16",
    "yarl": ">= 1.9",
+    "beautifulsoup4": "",


建议按照字典序排列

Bobholamovic · 2025-06-26T11:29:52Z

setup.py

+            "scikit-learn",
+            "shapely",
+            "tokenizers",
+            "beautifulsoup4",


建议按照字典序排列

changdazhou force-pushed the pptranslate branch 3 times, most recently from d1f7118 to f94160a Compare June 7, 2025 01:03

changdazhou force-pushed the pptranslate branch 5 times, most recently from 89b51bd to 2e1c064 Compare June 13, 2025 09:46

support pp-translation pipeline

795fe3e

changdazhou force-pushed the pptranslate branch from 2e1c064 to 66d5b1e Compare June 23, 2025 09:00

TingquanGao reviewed Jun 23, 2025

View reviewed changes

support load form md file in pp-translation pipeline

5c61960

changdazhou force-pushed the pptranslate branch from 66d5b1e to 5c61960 Compare June 23, 2025 10:33

rename PP-Translation -> PP-DocTranslation

44acb91

changdazhou force-pushed the pptranslate branch from e8b5016 to 44acb91 Compare June 23, 2025 11:12

add trans deps

179778b

TingquanGao approved these changes Jun 26, 2025

View reviewed changes

Bobholamovic reviewed Jun 26, 2025

View reviewed changes

TingquanGao merged commit e88afb4 into PaddlePaddle:develop Jun 26, 2025
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support pp-translation pipeline #4133

support pp-translation pipeline #4133

Uh oh!

changdazhou commented Jun 4, 2025

Uh oh!

paddle-bot bot commented Jun 4, 2025

Uh oh!

TingquanGao Jun 23, 2025

Uh oh!

TingquanGao Jun 23, 2025

Uh oh!

TingquanGao Jun 23, 2025

Uh oh!

TingquanGao Jun 23, 2025

Uh oh!

TingquanGao Jun 23, 2025

Uh oh!

TingquanGao Jun 23, 2025

Uh oh!

TingquanGao Jun 23, 2025

Uh oh!

Bobholamovic Jun 26, 2025

Uh oh!

Bobholamovic Jun 26, 2025

Uh oh!

Bobholamovic Jun 26, 2025

Uh oh!

Bobholamovic Jun 26, 2025

Uh oh!

Uh oh!

Uh oh!

		@@ -261,6 +298,15 @@ def read_file(self, in_path):
		yield img_cv


		class TXTReaderBackend(_BaseReaderBackend):

		from .base import BaseGeneratePrompt


		class GenerateTranslatePrompt(BaseGeneratePrompt):


		self.markdown_batch_sampler = MarkDownBatchSampler()

		self.table_structure_len_max = 500

support pp-translation pipeline #4133

support pp-translation pipeline #4133

Uh oh!

Conversation

changdazhou commented Jun 4, 2025

Uh oh!

paddle-bot bot commented Jun 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!