# Document AI > Oneline Processing
- Created: 2024-11-10 (Sat)
- Updated: 2024-11-11 (Sun)

See [How to use Document AI](https://youtu.be/9izcbNYmP8M?si=T_sPIKV1xZFs5mZu)

### Document AI API
- Unified Endpoint: documentai.googleapis.com
- Universal Document Structure: Document Object

### Oneline vs. Batch
- Online (Synchronous) API analyzes a single smalle document and get results quickly.
- Batch (Asynchronous) API analyzes multiple large documents in a batch and save results to storage.

Batch processing uses Long Running Operations to manage requests in an asynchronous manner, so we have to make the request and retrieve the output in a different manner than online processing. However, the output will be in the same Document object format whether using online or batch processing.

### Overview of the steps
1. Create a Service Account
2. Install the Python client libraries for Document AI
3. Configure the processor client
4. Construct the request
5. Call the API
6. Analyze the output

# 1. Create a Service Account
- Create a service account for your Document AI application.
- Grant this service account the DocumentAI API User role or roles/documentai.apiUser.
- This role can be granted on the entire project or on specific processors.

# 2. Install the Python client libraries for Document AI

In [1]:
!pip install --upgrade -q google-cloud-documentai

# 3. Configure the processor client

In [2]:
import google.cloud.documentai as docai

client = docai.DocumentProcessorServiceClient()
#opts = {"api_endpoint": f"{location}-documentai.googleapiscom"}
#client = docai.DocumentProcessorServiceClient(client_options=opts)

# 4. Construct the request
Both Project ID & (Doc AI) Processor ID are required.

In [4]:
# TODO: Change these variables
#project_id   = "docai-sandbox-439006"
project_id   = "qwiklabs-gcp-00-be8e83390131"
location     = "us"
processor_id = "8b1918585fb64d7e"

In [5]:
name = f"projects/{project_id}/locations/{location}/processors/{processor_id}"
#name = client.processor_path(project_id, location, processor_id)

#file_path = '/path/to/local/file.pdf'
file_name = "sample-online-ocr.pdf"  # TODO: Change this
file_path = file_name

# Load the document as bytes
with open( file_path, 'rb' ) as image:
    image_content = image.read()

mime_type = 'application/pdf'
request = docai.ProcessRequest( name=name, raw_document=docai.RawDocument(content=image_content, mime_type=mime_type),)

#raw_document = docai.RawDocument(content=image_content, mime_type=mime_type)
#request = docai.ProcessRequest( name=name, raw_document )

# 5. Call the API

In [6]:
result = client.process_document(request=request)

# 6. Analyze the output

In [7]:
document = result.document
# print(document.text)

# 7. (Optional) Create a bucket
Grant the roles/storage.admin permission

In [None]:
PROJECT_ID="docai-sandbox-439006"
SERVICE_ACCOUNT_EMAIL="589241295624-compute@developer.gserviceaccount.com"

!gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:$SERVICE_ACCOUNT_EMAIL" \
  --role="roles/storage.admin"

API [cloudresourcemanager.googleapis.com] not enabled on project [589241295624].
 Would you like to enable and retry (this will take a few minutes)? (y/N)?  

## Create a bucket

In [9]:
# TODO: Change this. Don't use the leading gs://
#bucket_name = "thekim-cepf-documentai"
bucket_name = "qwiklabs-gcp-00-be8e83390131-cepf-documentai"

In [11]:
# Create a Cloud Storage client
from google.cloud import storage

In [8]:
file = bucket_name + "/" + file_name
print(file)

def create_bucket(bucket_name):
  storage_client = storage.Client()
  bucket = storage_client.bucket(bucket_name)
  bucket = storage_client.create_bucket(bucket)

  print(f"Created bucket {bucket.name} with storage class {bucket.storage_class}")

# Call the function to create the bucket
# location = "us"  # This line is specified above. If not, uncomment it.
create_bucket(bucket_name)

NameError: name 'bucket_name' is not defined

## Save the OCR text to the created bucket

In [12]:
output_file_name = "cepf_online_ocr.txt"

storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
print(f"bucket={bucket}")
blob = bucket.blob(output_file_name)
#blob = bucket.blob("cepf_online_ocr.txt")
print(f"blob={blob}")

bucket=<Bucket: qwiklabs-gcp-00-be8e83390131-cepf-documentai>
blob=<Blob: qwiklabs-gcp-00-be8e83390131-cepf-documentai, cepf_online_ocr.txt, None>


In [13]:
blob.upload_from_string(document.text)
print(f"OCR text is saved to gs://{bucket_name}/{output_file_name}")
#print(f"OCR text is saved to gs://{bucket_name}/cepf_online_ocr.txt")

OCR text is saved to gs://qwiklabs-gcp-00-be8e83390131-cepf-documentai/cepf_online_ocr.txt
