# OCI Document Understanding (Beginner Notebook)

### What this file does:
Analyze receipts, invoices, or documents in your OCI bucket with Oracle's Gen AI Document Understanding. This walkthrough notebook is adapted from the logic in `vision/oci_document_understanding.py`.

**Documentation to reference:**
- OCI Document Understanding: https://docs.oracle.com/en-us/iaas/Content/document-understanding/using/home.htm
- OCI Python SDK: https://github.com/oracle/oci-python-sdk/tree/master/src/oci/ai_document

**Relevant slack channels:**
- #oci_ai_document_service_users: *for OCI Document Understanding API questions*
- #igiu-innovation-lab: *general discussions on your project*
- #igiu-ai-learning: *help with sandbox environment or help with running this code*

**Env setup:**
- sandbox.yaml: Contains OCI config, compartment, and bucket details.
- .env: Load environment variables if needed.
- configure cwd for jupyter match your workspace python code: 
    -  vscode menu -> Settings > Extensions > Jupyter > Notebook File Root
    -  change from `${fileDirname}` to `${workspaceFolder}`


**How to run in notebook:**
- Make sure your runtime environment has all dependencies and access to required config files.
- Run the notebook cells in order.

---

## Step 1: Setup and Configuration

**Key Concepts:**
- **Environment Setup:** Before interacting with OCI services, you need to configure your environment with credentials, compartment IDs, and bucket details. This is typically done via a YAML config file (sandbox.yaml) and environment variables.
- **Dependencies:** Install necessary libraries like the OCI SDK and configuration loaders.
- **Configuration Loading:** Load settings securely to authenticate and specify resources.

In this step, we'll install dependencies, load environment variables, and configure our OCI client.

In [None]:
# Import necessary libraries
import os
from dotenv import load_dotenv
from envyaml import EnvYAML
from pathlib import Path
import oci
from oci.object_storage import ObjectStorageClient
import json

# Load environment variables
load_dotenv()

In [None]:
# Define paths and load configuration
# Make sure your sandbox.yaml file is set up for your environment. You might have to specify the full path depending on your `cwd`.
# You can also try making your cwd for jupyter match your workspace python code:
# vscode menu -> Settings > Extensions > Jupyter > Notebook File Root
# change from ${fileDirname} to ${workspaceFolder}

SANDBOX_CONFIG_FILE = "sandbox.yaml"
FILE_TO_ANALYZE = Path("./vision/receipt.png")  # Change to your input file if desired

def load_config(config_path):
    try:
        with open(config_path, 'r') as f:
            return EnvYAML(config_path)
    except FileNotFoundError:
        print(f"Error: Configuration file '{config_path}' not found.")
        return None
    except Exception as e:
        print(f"Error loading config: {e}")
        return None

scfg = load_config(SANDBOX_CONFIG_FILE)
assert scfg is not None and 'oci' in scfg and 'bucket' in scfg, "Check your sandbox.yaml config!"
oci_cfg = oci.config.from_file(os.path.expanduser(scfg["oci"]["configFile"]), scfg["oci"]["profile"])
bucket_cfg = scfg["bucket"]
compartment_id = scfg["oci"]["compartment"]

print("Configuration loaded successfully.")

## Step 2: Upload Document to Object Storage

**Key Concepts:**
- **Object Storage:** OCI's Object Storage is a scalable service for storing files like images or PDFs. Before processing, documents must be uploaded here.
- **Bucket and Namespace:** Organize files in buckets within a namespace for secure access.
- **Prefix:** Use prefixes to organize objects, similar to folders.

In this step, we'll upload the document file to your OCI Object Storage bucket.

In [None]:
# Function to upload file to Object Storage
def upload(oci_cfg, bucket_cfg, file_path):
    if not file_path.exists():
        print(f"Error: File '{file_path}' not found.")
        return False
    object_storage_client = ObjectStorageClient(oci_cfg)
    print(f"Uploading file {file_path} ...")
    object_storage_client.put_object(
        bucket_cfg['namespace'],
        bucket_cfg['bucketName'],
        f"{bucket_cfg['prefix']}/{file_path.name}",
        open(file_path, 'rb')
    )
    print("Upload completed!")
    return True

# Perform the upload
uploaded = upload(oci_cfg, bucket_cfg, FILE_TO_ANALYZE)
if not uploaded:
    raise Exception("Upload failed. Check file path and sandbox.yaml config.")
else:
    print("File uploaded successfully to Object Storage.")

## Step 3: Configure Document Understanding Request

**Key Concepts:**
- **Document Understanding Service:** This OCI AI service analyzes documents to extract text, classify types, detect languages, pull key-value pairs, and identify tables.
- **Features:** Specify which analyses to perform (e.g., text extraction, classification).
- **Processor Job:** A job that processes the document asynchronously.

In this step, we'll set up the client and define the processing features and locations.

In [None]:
# Initialize the Document Understanding client
dus_client = oci.ai_document.AIServiceDocumentClientCompositeOperations(
    oci.ai_document.AIServiceDocumentClient(config=oci_cfg)
)

print("Document Understanding client initialized.")

In [None]:
# Define helper functions for input and output locations
def get_input_location(bucket_cfg):
    object_location = oci.ai_document.models.ObjectLocation()
    object_location.namespace_name = bucket_cfg["namespace"]
    object_location.bucket_name = bucket_cfg["bucketName"]
    object_location.object_name = f"{bucket_cfg['prefix']}/{os.path.basename(FILE_TO_ANALYZE)}"
    return object_location

def get_output_location(bucket_cfg):
    object_location = oci.ai_document.models.OutputLocation()
    object_location.namespace_name = bucket_cfg["namespace"]
    object_location.bucket_name = bucket_cfg["bucketName"]
    object_location.prefix = f"{bucket_cfg['prefix']}"
    return object_location

def create_processor(features, prefix, compartmentid, bucket_cfg):
    display_name = f"{prefix}-test"
    job_details = oci.ai_document.models.CreateProcessorJobDetails(
        display_name=display_name,
        compartment_id=compartmentid,
        input_location=oci.ai_document.models.ObjectStorageLocations(
            object_locations=[get_input_location(bucket_cfg)]),
        output_location=get_output_location(bucket_cfg),
        processor_config=oci.ai_document.models.GeneralProcessorConfig(features=features)
    )
    return job_details

# Set features: classification, language, key-value, tables, text
features = [
    oci.ai_document.models.DocumentClassificationFeature(),
    oci.ai_document.models.DocumentLanguageClassificationFeature(),
    oci.ai_document.models.DocumentKeyValueExtractionFeature(),
    oci.ai_document.models.DocumentTableExtractionFeature(),
    oci.ai_document.models.DocumentTextExtractionFeature()
]
prefix = bucket_cfg['prefix']

print("Processing features configured.")

## Step 4: Submit Processing Job

**Key Concepts:**
- **Asynchronous Processing:** Document analysis jobs run in the background to handle large files without blocking your code.
- **Lifecycle States:** Jobs go through states like 'in progress', 'succeeded', or 'failed'.
- **Waiting for Completion:** Use waiters to poll until the job finishes.

In this step, we'll submit the job and wait for it to complete.

In [None]:
# Define callback for job status updates
def create_processor_job_callback(times_called, response):
    print("Waiting for processor lifecycle state to go into succeeded state:", getattr(response, 'data', response))

# Submit the job and wait
processor_res = dus_client.create_processor_job_and_wait_for_state(
    create_processor_job_details=create_processor(features, prefix, compartment_id, bucket_cfg),
    wait_for_states=[oci.ai_document.models.ProcessorJob.LIFECYCLE_STATE_SUCCEEDED],
    waiter_kwargs={"wait_callback": create_processor_job_callback}
)

# Check job result
processor_job = None
if (processor_res and processor_res is not oci.util.Sentinel):
    data = getattr(processor_res, 'data', None)
    request_id = getattr(processor_res, 'request_id', None)
    if (data is not None and data is not oci.util.Sentinel and
        hasattr(data, 'lifecycle_state') and
        data.lifecycle_state == oci.ai_document.models.ProcessorJob.LIFECYCLE_STATE_SUCCEEDED):
        processor_job = data
        print(f"Processor job succeeded with lifecycle state: {processor_job.lifecycle_state} and request ID: {request_id}.")
    else:
        print("Processor job did not succeed.")
else:
    print("Processor job creation failed or timed out.")

if processor_job is None:
    raise Exception("Processor job failed to complete successfully.")
else:
    print("Job completed successfully.")

## Step 5: Retrieve and Display Results

**Key Concepts:**
- **Output Storage:** Results are stored back in Object Storage as JSON files.
- **Result Structure:** Includes extracted data like text, fields, and classifications.
- **Downloading Results:** Retrieve the JSON output for further processing.

In this step, we'll download the analysis results from Object Storage.

In [None]:
# Initialize Object Storage client for retrieval
object_storage_client = oci.object_storage.ObjectStorageClient(config=oci_cfg)
namespace = bucket_cfg['namespace']
bucket_name = bucket_cfg['bucketName']

# Construct result object name
if processor_job is not None and hasattr(processor_job, 'id'):
    result_object_name = (
        f"{prefix}/{processor_job.id}/{namespace}_{bucket_name}/results/{prefix}/{FILE_TO_ANALYZE.name}.json"
    )
    response = object_storage_client.get_object(
        namespace_name=namespace,
        bucket_name=bucket_name,
        object_name=result_object_name
    )
    if response is not None and hasattr(response, 'data') and hasattr(response.data, 'content'):
        json_data = json.loads(response.data.content.decode('utf-8'))
        print("Analysis complete! Document Understanding Results:")
        print(json.dumps(json_data, indent=2))
    else:
        print('Failed to retrieve results or parse.' )
        json_data = None
else:
    print('Error: Invalid processor job.')
    json_data = None

## Step 6: Parse and Summarize Key Results

**Key Concepts:**
- **Result Parsing:** Extract meaningful information from the JSON output, such as document type, language, and key fields.
- **Field Types:** Handle different extractions like key-value pairs and line items.
- **Summarization:** Present data in a readable format for quick insights.

In this step, we'll parse the JSON results and display a summary.

In [None]:
# Function to parse and summarize response
def parse_response(json_data):
    if json_data is None:
        print("No data to parse.")
        return
    doc_type = json_data.get('detectedDocumentTypes', [{}])[0].get('documentType', 'N/A')
    lang = json_data.get('detectedLanguages', [{}])[0].get('language', 'N/A')
    print(f"Detected Document Type: {doc_type}")
    print(f"Detected Language: {lang}")
    pages = json_data.get('pages', [])
    if not pages:
        print("No pages found in response.")
        return
    for page_idx, page in enumerate(pages, start=1):
        print(f"\n--- Page {page_idx} ---")
        fields = page.get('documentFields', [])
        if not fields:
            print("No document fields extracted on this page.")
            continue
        for field in fields:
            if field['fieldType'] == 'KEY_VALUE':
                label = field['fieldLabel']['name']
                value = field['fieldValue']
                if value['valueType'] == 'STRING':
                    print(f"{label}: {value['text']}")
                elif value['valueType'] == 'NUMBER':
                    print(f"{label}: {value['value']}")
                elif value['valueType'] == 'DATE':
                    print(f"{label}: {value['text']} ({value['value']})")
            elif field['fieldType'] == 'LINE_ITEM_GROUP':
                print(f"{field['fieldLabel']['name']}:")
                value = field['fieldValue']
                items = value.get('items', [])
                for item in items:
                    name = next((f['fieldValue']['text'] for f in item['fieldValue']['items'] if f['fieldLabel']['name']=='Name'), 'N/A')
                    quantity = next((f['fieldValue']['value'] for f in item['fieldValue']['items'] if f['fieldLabel']['name']=='Quantity'), 'N/A')
                    unit_price = next((f['fieldValue']['value'] for f in item['fieldValue']['items'] if f['fieldLabel']['name']=='UnitPrice'), 'N/A')
                    amount = next((f['fieldValue']['value'] for f in item['fieldValue']['items'] if f['fieldLabel']['name']=='Amount'), 'N/A')
                    print(f"  - {name}: Qty {quantity} @ ${unit_price} = ${amount}")
            else:
                print(f"Unsupported field type: {field['fieldType']}")

# Parse the results
parse_response(json_data)
print("\nParsing and summarization complete.")

## Play and Explore

- Swap in a different image or PDF by changing `FILE_TO_ANALYZE` (try invoices, forms, or handwritten documents).
- Use both printed and handwritten documents to see how the model handles different text styles.
- Try documents in different languages (ensure the language is supported).
- Modify the `features` list to focus on specific extractions (e.g., remove table extraction for text-only documents).
- Experiment with larger PDFs or multi-page documents.

## üßë‚Äçüíª Project Ideas for Practice

Below are some practical project prompts. Try one (or all) after you run a basic document through the models!

1. **Automated Receipt Processing**: Mimic the receipt entry that Oracle Expenses uses by extracting key fields like date, amount, vendor, and items. Save the structured data to a database or spreadsheet.
2. **Table Data Extraction**: Read tables from PDF documents and convert them to CSV or JSON format. Useful for financial reports or schedules.
3. **Document Classification System**: Build a tool that classifies uploaded documents (e.g., receipts, invoices, contracts) and routes them to appropriate processing workflows.
4. **Form Data Digitization**: Create a system to extract key-value pairs from forms and populate them into web forms or databases.
5. **Multi-Language Document Processing**: Develop a pipeline that detects language, extracts text, and translates it if needed.

If you see errors, double-check credentials or configurations. Refer to comments or docs for help.

---
**Happy building!**