Skip to content

support pp-translation pipeline #4133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 26, 2025

Conversation

changdazhou
Copy link
Collaborator

No description provided.

Copy link

paddle-bot bot commented Jun 4, 2025

Thanks for your contribution!

@changdazhou changdazhou force-pushed the pptranslate branch 3 times, most recently from d1f7118 to f94160a Compare June 7, 2025 01:03
@changdazhou changdazhou force-pushed the pptranslate branch 5 times, most recently from 89b51bd to 2e1c064 Compare June 13, 2025 09:46
@@ -261,6 +298,15 @@ def read_file(self, in_path):
yield img_cv


class TXTReaderBackend(_BaseReaderBackend):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是不是这样更好:NativeFileReadlinesReaderBackend 或是 ReadlinesReaderBackend

@@ -607,6 +607,7 @@ def __init__(self, *args: list, **kwargs: dict):
self._markdown_writer = MarkdownWriter(*args, **kwargs)
self._img_writer = ImageWriter(*args, **kwargs)
self._save_funcs.append(self.save_to_markdown)
self.save_keys = {"markdown_texts"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改为类属性
MARKDOWN_SAVE_KEYS = []

from .base import BaseGeneratePrompt


class GenerateTranslatePrompt(BaseGeneratePrompt):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TranslatePromptGenerator


@pipeline_requires_extra("ie")
class PP_Translation_Pipeline(BasePipeline):
entities = ["PP-Translation"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DocTranslation

else:
return fn
fn = f"{stem}_{page_idx}{suffix}"
if (language := self.get("language", None)) is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移出去


self.markdown_batch_sampler = MarkDownBatchSampler()

self.table_structure_len_max = 500
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

table_structure_len_max

)

concatenate_result = {
"input_path": next(markdown_list)["input_path"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next

def visual_predict(
self,
input: Union[str, List[str], np.ndarray, List[np.ndarray]],
use_doc_orientation_classify: Optional[bool] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的默认值和PP-StructureV3不一致,是有意的吗?除了这个参数以外,应该还有表格识别的一些参数。

@@ -83,6 +83,7 @@
"ujson": "ujson",
"uvicorn": "uvicorn",
"yarl": "yarl",
"bs4": "beautifulsoup4",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议按照字典序排列

@@ -74,6 +74,7 @@
"ujson": "",
"uvicorn": ">= 0.16",
"yarl": ">= 1.9",
"beautifulsoup4": "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议按照字典序排列

"scikit-learn",
"shapely",
"tokenizers",
"beautifulsoup4",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议按照字典序排列

@TingquanGao TingquanGao merged commit e88afb4 into PaddlePaddle:develop Jun 26, 2025
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants