Skip to content

chore/move client specific params to their own section #68

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 7, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion api-reference/api-services/api-parameters.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ The only required parameter is `files` - the file you wish to process.
| `output_format` (_str_) | `outputFormat` (_string_) | The format of the response. Supported formats are `application/json` and `text/csv`. Default: `application/json`. |
| `pdf_infer_table_structure` (_bool_) | `pdfInferTableStructure` (_boolean_) | **Deprecated!** If True and strategy=hi_res, any Table Elements extracted from a PDF will include an additional metadata field, 'text_as_html', where the value (string) is a just a transformation of the data into an HTML table. |
| `skip_infer_table_types` (_List[str]_) | `skipInferTableTypes` (_string[]_) | The document types that you want to skip table extraction with. Default: [] |
| `split_pdf_page` (_bool_) | `splitPdfPage` (_boolean_) | Should the pdf file be split at client. Ignored on backend. |
| `starting_page_number` (_int_) | `startingPageNumber` (_number_) | Indicates what page number should be assigned to the first page in the document. This information will be reflected in elements' metadata and can be be especially useful when partitioning a document that is part of a larger document. |
| `strategy` (_str_) | `strategy` (_string_) | The strategy to use for partitioning PDF/image. Options are `fast`, `hi_res`, `auto`. Default: `auto` |
| `unique_element_ids` (_bool_) | `uniqueElementIds` (_boolean_) | When True, assign UUIDs to element IDs, which guarantees their uniqueness (useful when using them as primary keys in database). Otherwise a SHA-256 of element text is used. Default: False |
Expand All @@ -41,4 +40,11 @@ The following parameters only apply when a `chunking_strategy` is specified. Oth
| `overlap` (_int_) | `overlap` (_number_) | A prefix of this many trailing characters from the prior text-split chunk is applied to second and later chunks formed from oversized elements by text-splitting. Default: None |
| `overlap_all` (_bool_) | `overlapAll` (_boolean_) | When True, overlap is also applied to 'normal' chunks formed by combining whole elements. Use with caution as this can introduce noise into otherwise clean semantic units. Default: None |

The following parameters are specific to the Python and Javascript clients and are not sent to the server.

| Python & direct call | JavaScript | Description |
|---------------------------------------|----------------------------|--------------------------------------------------------------------------------------------------------------------------------|
| `split_pdf_page` (_bool_) | `splitPdfPage` (_boolean_) | Should the pdf file be split at client. See [page splitting](/api-reference/api-services/sdk#page-splitting) for more details. |
| `split_pdf_concurrency_level` (_int_) | _Not supported yet_ | Number of split files to be sent concurrently. Default: 5, maximum: 15 |

Need help getting started? Check out the [Examples page](/api-reference/api-services/examples) for some inspiration.