# Data queries - Export figures

The document conversion of Deep Search is detecting the bounding boxes of all document components
(text, tables, figures, etc). This information can be used to extract the visual boxes.

In this example we visualize all the figures contained in a document together with their caption.

### Access required

The content of this notebook requires access to Deep Search capabilities which are not
available on the public access system.

[Contact us](https://ds4sd.github.io) if you are interested in exploring
these Deep Search capabilities.

### Set notebook parameters

In [1]:
from dsnotebooks.settings import NotebookSettings

# notebook settings auto-loaded from .env / env vars
notebook_settings = NotebookSettings()

PROFILE_NAME = notebook_settings.profile  # the profile to use

### Import example dependencies

In [2]:
# Import standard dependenices

# IPython utilities
from IPython.display import display, Markdown, HTML, display_html

# Import the deepsearch-toolkit
import deepsearch as ds
from deepsearch.documents.core.render import get_figure_svg
from deepsearch.cps.client.components.elastic import ElasticDataCollectionSource
from deepsearch.cps.queries import DataQuery
from deepsearch.cps.client.components.queries import RunQueryError

### Connect to Deep Search

In [3]:
api = ds.CpsApi.from_env(profile_name=PROFILE_NAME)



---

## Extract figures

Thanks to the `get_figure_svg()` function we can easily crop and visualize the bounding box of the figures which were found by the Deep Search PDF conversion.


In [8]:
# Search and fetch a document

search_query = "\"pNLP-Mixer\""
data_collection = ElasticDataCollectionSource(elastic_id="default", index_key="acl")

query = DataQuery(
    search_query, # The search query to be executed
    source=["figures", "page-dimensions", "_s3_data.pdf-images"], # Which fields of documents we want to fetch
    coordinates=data_collection # The data collection to be queries
)

res = api.queries.run(query)

In [16]:
# Select the first document from the response
document = res.outputs["data_outputs"][0]["_source"]

# Iterate through all figures
html_figures = ""
for figure in document["figures"]:
    page_svg = get_figure_svg(document, figure)
    html_figures += f"""
    <div class="matchfigure">
        {page_svg}
        <p>
            {figure.get("text", "")}
        </p>
    </div>
    """

html_output = f"""
<style>
    .matchfigure{{
        width: 40%;
        border: 1px solid #000;
        margin-bottom: 1em;
    }}

</style>
<div>
{html_figures}
</div>
"""

display(HTML(html_output))
