Skip to content

Conversation

@badGarnet
Copy link
Collaborator

@badGarnet badGarnet commented Oct 19, 2023

This PR addresses CORE-2307

  • add a new kwarg to UnstructuredTableTransformerModel.run_prediction: output_format
  • default output_format is html, which is current behavior: output html string representation of the table
  • another options available is dataframe, which returns a pandas dataframe representation of the table
  • if not specified or any other string value for output_format it returns a list of dictionaries: table cell format, the original output format from table transformer
  • unstructured.model.tables.recognize no longer accepts out_html kwarg and it now only returns table cell format

@badGarnet badGarnet marked this pull request as ready for review October 20, 2023 15:16
@badGarnet badGarnet requested a review from qued October 20, 2023 15:26
Copy link
Contributor

@qued qued left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one typo and one suggestion/question.

Comment on lines +387 to +390
if output_format:
result = table_transformer.run_prediction(example_image, result_format=output_format)
else:
result = table_transformer.run_prediction(example_image)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is hitting this model expensive? If so could we mock the call?

badGarnet and others added 2 commits October 20, 2023 15:42
Co-authored-by: qued <64741807+qued@users.noreply.github.com>
@badGarnet badGarnet merged commit 326f180 into main Oct 20, 2023
@badGarnet badGarnet deleted the feat/add-more-output-format-for-table-inference branch October 20, 2023 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants