Create seperate document for each page in "markdown" mode of AzureAIDocumentIntelligenceLoader

Hi all, 
I would like to have a seperate document for each processed page of the pdf file in AzureAIDocumentIntelligence for markdown mode. I need to store the page-number for each chunk in the vector-database. Currently the load() function only returns a single document. I tried also the "page" mode, however it does not contain the tables and figures like the "markdown" mode. 

```
loader = AzureAIDocumentIntelligenceLoader(bytes_source=html_bytes, api_key = doc_intelligence_key, api_endpoint = doc_intelligence_endpoint, api_model="prebuilt-layout", mode="markdown")
docs_azure = loader.load()
```

I also tried this approach, however it also does not contain the tables and figures: 

```
separate_docs = []
seperate_docs_join = ""
for page in docs_azure[0].metadata["pages"]:
    page_number = page["pageNumber"]
    page_content = "\n".join([line["content"] for line in page["lines"]])
    page_metadata = {
        "page_number": page_number,
        **docs_azure[0].metadata  # Include other metadata if needed
    }
    seperate_docs_join += page_content + "\n\n"  # Join the content of all pages
    separate_docs.append(Document(page_content=page_content, metadata=page_metadata))
```

Thank you for your support 
Amir 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create seperate document for each page in "markdown" mode of AzureAIDocumentIntelligenceLoader #40790

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Create seperate document for each page in "markdown" mode of AzureAIDocumentIntelligenceLoader #40790

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions