# Document Conversion - Quick start

## Getting started

The [Deep Search Toolkit](https://ds4sd.github.io/deepsearch-toolkit/) allows document conversion with the following few lines of code. It's that simple! For more info or step-by-step guide:
- Visit https://ds4sd.github.io/deepsearch-toolkit/guide/convert_doc/
- Follow this example notebook

### Set notebook parameters

In [8]:
from dsnotebooks.settings import ProjectNotebookSettings

# notebook settings auto-loaded from .env / env vars
notebook_settings = ProjectNotebookSettings()

PROFILE_NAME = notebook_settings.profile  # the profile to use
PROJ_KEY = notebook_settings.proj_key  # the project to use

# default project_key = 1234567890abcdefghijklmnopqrstvwyz123456

### Import example dependencies

In [9]:
import deepsearch as ds

from pathlib import Path

from IPython.display import display, Markdown

### Connect to Deep Search

In [10]:
api = ds.CpsApi.from_env(profile_name=PROFILE_NAME)

In [None]:
output_dir = Path("./converted_docs")

documents = ds.convert_documents(
    api=api,
    proj_key=PROJ_KEY,
    source_path="../../data/samples/2206.01062.pdf",
    progress_bar=True,
    export_md=True,
)
documents.download_all(result_dir=output_dir)
info = documents.generate_report(result_dir=output_dir)
print(info)

In [12]:
# group output files and visualize the output
md_files = list(output_dir.glob("*.md"))
json_files = list(output_dir.glob("*.json"))

In [None]:
# display last document
# display(Markdown(md_files[-1]))

---

**There's more!**

The Deep Search Toolkit provides utility functions which can convert documents from different type of inputs.

- From a single url
- From a list of urls. In this case, the toolkit will launch a batch processing with all tasks.
- From a local PDF file
- From a local zip archive containing PDF files.
- From a local folder containing PDF files. In this case, the toolkit is packaging the files into batches and creates multiple zip archives.


---

## Let's explore document conversion

### Single URL

In [None]:
documents = ds.convert_documents(
    api=api,
    proj_key=PROJ_KEY,
    url="https://arxiv.org/pdf/2206.00785.pdf",
    progress_bar=True,
)

In [None]:
# let's check what happened.
# we generate a csv report about the conversion task and store it locally
result_dir = "./converted_docs/"
info = documents.generate_report(result_dir=result_dir)
print(info)

The saved report may help in debugging and analysing the conversion task

In [None]:
# let's download all the converted documents:
documents.download_all(result_dir=result_dir, progress_bar=True)

In [None]:
# the documents object stores some additional info like:
documents.result, documents.task_id

### Multiple URLs

In [None]:
# please use deepsearch UI for multiple conversion

### Process local file

In [None]:
documents = ds.convert_documents(
    api=api,
    proj_key=PROJ_KEY,
    source_path="../../data/samples/2206.01062.pdf",
    progress_bar=True,
)

## Process folder of files

In [None]:
# please use deepsearch UI for multiple conversion

---

## What's next?

Explore other examples which demonstrate possible use cases of the document conversion

- Visualize the text bounding boxes
- Extract figures
- Convert document to epub for your e-reader
