-
Notifications
You must be signed in to change notification settings - Fork 1.1k
support pp-translation pipeline #4133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for your contribution! |
d1f7118
to
f94160a
Compare
89b51bd
to
2e1c064
Compare
@@ -261,6 +298,15 @@ def read_file(self, in_path): | |||
yield img_cv | |||
|
|||
|
|||
class TXTReaderBackend(_BaseReaderBackend): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是不是这样更好:NativeFileReadlinesReaderBackend 或是 ReadlinesReaderBackend
@@ -607,6 +607,7 @@ def __init__(self, *args: list, **kwargs: dict): | |||
self._markdown_writer = MarkdownWriter(*args, **kwargs) | |||
self._img_writer = ImageWriter(*args, **kwargs) | |||
self._save_funcs.append(self.save_to_markdown) | |||
self.save_keys = {"markdown_texts"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改为类属性
MARKDOWN_SAVE_KEYS = []
from .base import BaseGeneratePrompt | ||
|
||
|
||
class GenerateTranslatePrompt(BaseGeneratePrompt): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TranslatePromptGenerator
|
||
@pipeline_requires_extra("ie") | ||
class PP_Translation_Pipeline(BasePipeline): | ||
entities = ["PP-Translation"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DocTranslation
else: | ||
return fn | ||
fn = f"{stem}_{page_idx}{suffix}" | ||
if (language := self.get("language", None)) is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
移出去
|
||
self.markdown_batch_sampler = MarkDownBatchSampler() | ||
|
||
self.table_structure_len_max = 500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
table_structure_len_max
) | ||
|
||
concatenate_result = { | ||
"input_path": next(markdown_list)["input_path"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next
def visual_predict( | ||
self, | ||
input: Union[str, List[str], np.ndarray, List[np.ndarray]], | ||
use_doc_orientation_classify: Optional[bool] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的默认值和PP-StructureV3不一致,是有意的吗?除了这个参数以外,应该还有表格识别的一些参数。
@@ -83,6 +83,7 @@ | |||
"ujson": "ujson", | |||
"uvicorn": "uvicorn", | |||
"yarl": "yarl", | |||
"bs4": "beautifulsoup4", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议按照字典序排列
@@ -74,6 +74,7 @@ | |||
"ujson": "", | |||
"uvicorn": ">= 0.16", | |||
"yarl": ">= 1.9", | |||
"beautifulsoup4": "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议按照字典序排列
"scikit-learn", | ||
"shapely", | ||
"tokenizers", | ||
"beautifulsoup4", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议按照字典序排列
No description provided.