## Service Setup

This notebooks sets up and verifies the services needed for the Document Intelligence app. The services include:

1. Storage Service (Volumes)
2. Database Service (Lakebase)
3. Document Service (DBSQL)
4. Agent Service (Model Serving)

These services are orchestrated by the application, but this notebook sets up and verifies each service individually.

We use the workspace client extensively throughout the app and repo, so first thing is to validate our connection.

In [1]:
from databricks.sdk import WorkspaceClient
from doc_intel.utils import get_workspace_client
client = get_workspace_client()
user_id = client.current_user.me().id

In [2]:
from doc_intel.config import DocConfig
config = DocConfig("./config.yaml")

## Storage Service
Next, let's make sure we have our storage service setup. This includes authentication and creating the volumes if they don't exist. We run a quick test to make sure the upload and download works.

In [3]:
from doc_intel.storage import StorageService
storage_service = StorageService(client, config)
storage_service.create_volume(config.storage.bronze_vol_name)
storage_service.create_volume(config.storage.silver_vol_name)

(True, 'Volume processed_pdfs already exists in shm.doc_intel')

In [11]:
with open('../fixtures/simple_financial_statement.pdf', "rb") as f:
    pdf_bytes = f.read()

storage_service.upload_file(pdf_bytes, "test.pdf", config.storage.bronze_vol_name)

(True,
 '/Volumes/shm/doc_intel/raw_pdfs/test.pdf',
 'File uploaded successfully: /Volumes/shm/doc_intel/raw_pdfs/test.pdf')

In [10]:
storage_service.download_file('/Volumes/shm/doc_intel/raw_pdfs/test.pdf')

(True,
 b'%PDF-1.6\r%\xe2\xe3\xcf\xd3\r\n269 0 obj\r<</Linearized 1/L 158776/O 271/E 96189/N 3/T 158395/H [ 477 228]>>\rendobj\r              \r\n280 0 obj\r<</DecodeParms<</Columns 4/Predictor 12>>/Filter/FlateDecode/ID[<44F1CED5FA69254AAF0C53F563B45367><CD809DABFBF7FC4C93033E5383B36252>]/Index[269 19]/Info 268 0 R/Length 69/Prev 158396/Root 270 0 R/Size 288/Type/XRef/W[1 2 1]>>stream\r\nh\xdebbd\x10``b`\xda\x08$\x18W\x80\x88k \xee2\x10\xf1\x11H0\x9b\x03\t\xf6P\x90\x84&\x90(x\x02$b\xdf\x01\t\x05q\x06&F\xc69@\x16\x03\x03#V\xe2?\xc3\xd9\xdf\x00\x01\x06\x00\xf3\xd4\x0b\x06\r\nendstream\rendobj\rstartxref\r\n0\r\n%%EOF\r\n       \r\n287 0 obj\r<</C 138/Filter/FlateDecode/I 160/Length 134/O 122/S 76>>stream\r\nh\xdeb```\x02\xa2V\x06f\x06\x06\x8e\xf9\x0c\x82\x0c\x08 \xc8\xc0\x02\x86\x1c\x8f\x1a\xb6\xdej`\x90PL(o\x00\n3\xaa\x9aU\x08\x05O\xab``\xe9h`\xe0\xe8\x00j\xeb\x80J\x03%\xb9\x18\x98\xbc\xd8\x804\'\x98\r\x02\xd7\x18\x04\x80z.0J3&\xb2\xd53\x04\xe7\xf3\xc5HL\xebbh\x04\xcbq30y\xc7\x82\x8c\x

## Document Service

Next, let's make sure we can run serverless DBSQL document processing jobs.

In [13]:
from doc_intel.document import DocumentService
document_service = DocumentService(client, config)