# Example of Parsing PDF using Unstructured
Source: https://github.com/Unstructured-IO/unstructured

### Before running the code

You will need to `uniflow` conda environment to run this notebook. You can set up the environment following the instruction: https://github.com/CambioML/uniflow/tree/main#installation.

### Set up your Unstructured API Key and URL

1. create a `.env` file in your root folder;
2. acquire them from https://unstructured.io/api-key-hosted
2. add the following one line to your `.env file:
    ```
    UNSTRUCTURED_API_KEY=************************
    UNSTRUCTURED_API_URL=************************
    ```

In [14]:
%reload_ext autoreload
%autoreload 2

import sys

sys.path.append(".")
sys.path.append("..")
sys.path.append("../..")

### Install dependencies if you have not installed them before

In [None]:
!{sys.executable} -m pip install -q boto3
!{sys.executable} -m pip install -q easyocr
!{sys.executable} -m pip install -q pdf2image
!{sys.executable} -m pip install -q onnxruntime
!{sys.executable} -m pip install -q pip install opensearch-py
!{sys.executable} -m pip install -q onnxruntime-gpu
!{sys.executable} -m pip install -q requests-aws4auth

In [15]:
import os
from uniflow.flow.client import ExtractClient
from uniflow.flow.config import ExtractPDFConfig
from uniflow.op.model.model_config import UnstructuredModelConfig

### Load input data

In [16]:
dir_cur = os.getcwd()
pdf_file = "1408.5882_page-1.pdf"
input_file = os.path.join(f"{dir_cur}/data/raw_input/", pdf_file)

### Extract using Unstructured Model

In [17]:
data = [
    {"filename": input_file},
]

config = ExtractPDFConfig(
    model_config=UnstructuredModelConfig(
        model_name="unstructuredio/online"
    )
)
unstructured_client = ExtractClient(config)

output = unstructured_client.run(data)

100%|██████████| 1/1 [00:13<00:00, 13.67s/it]


In [18]:
output

[{'output': [{'text': ['Convolutional Neural Networks for Sentence Classiﬁcation',
     'Yoon Kim New York University yhk255@nyu.edu',
     '4 1 0 2',
     'p e S 3',
     '] L C . s c [',
     '2 v 2 8 8 5 . 8 0 4 1 : v i X r a',
     'Abstract',
     'We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vec- tors for sentence-level classiﬁcation tasks. We show that a simple CNN with lit- tle hyperparameter tuning and static vec- tors achieves excellent results on multi- ple benchmarks. Learning task-speciﬁc vectors through ﬁne-tuning offers further gains in performance. We additionally propose a simple modiﬁcation to the ar- chitecture to allow for the use of both task-speciﬁc and static vectors. The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classiﬁcation.',
     'Introduction',
     'Deep learning models have achieved remarkable results i

## End of the notebook

Check more Uniflow use cases in the [example folder](https://github.com/CambioML/uniflow/tree/main/example/model#examples)!

<a href="https://www.cambioml.com/" title="Title">
    <img src="../image/cambioml_logo_large.png" style="height: 100px; display: block; margin-left: auto; margin-right: auto;"/>
</a>