# Document Service

This notebook forms the core of our document service. It showcases how we are going to simplify our document intelligence application using Lakebase and Serverless jobs. This is tested on Serverless Version 3 - it takes a single file or a directory and parses all the files directly into an append operation on a postgres table. We can then get embeddings and use pgvector as the backend with a langgraph Agent.

We use our Databricks user IDs as the main entry point into the workflow and authentication

In [0]:
%pip install databricks-langchain
%restart_python

In [0]:
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
me = w.current_user.me()
print(me.id)  # This is your Databricks user ID
print(me.user_name) 
USER_ID = me.id

7873535765378608
scott.mckean@databricks.com


In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *

# to be parameterized
volume_path = '/Volumes/shm/default/raw_pdfs/73eb2c4f_3c424fe06ddfe8ce_1755574069.pdf'
embedding_endpoint = 'databricks-gte-large-en'

We use ai_parse_document in a serverless job as our document processing service. This could be any isolated microservice and has lots of room for optimization, but ai_parse_document does a pretty good job and can handle lots of file types

In [0]:
parsed_df = (
    spark.read.format("binaryFile")
    .load(volume_path)
    .withColumn("user_id", lit(USER_ID))
    .select(
        col("path"),
        col("user_id"),
        expr("ai_parse_document(content)").alias("parsed")
    )
    .withColumn(
        "parsed_json",
        parse_json(col("parsed").cast("string"))
    )
    .select(
        col("path"),
        col("user_id"),
        expr("parsed_json:document:pages").alias("pages"),
        expr("parsed_json:document:elements").alias("elements"),
        expr("parsed_json:document:_corrupted_data").alias("_corrupted_data")
    )
)

To get something simple and working, I propose that we simply chunk each page for now. We can work on refining the chunking strategy in this job, but this gives a good starting point. We even wrap the embedding call here for better horizontal scalability.

In [0]:
from pyspark.sql.functions import from_json, explode, col, concat_ws, lit
from pyspark.sql.types import ArrayType, StructType, StructField, IntegerType, StringType

# Define schema for pages based on provided example
page_schema = StructType([
    StructField("content", StringType()),
    StructField("footer", StringType()),
    StructField("header", StringType()),
    StructField("id", IntegerType()),
    StructField("page_number", IntegerType())
])

chunked_pages = (
    parsed_df
    .withColumn(
        "pages_array",
        from_json(
            col("pages").cast("string"),
            ArrayType(page_schema)
        )
    )
    .withColumn(
        "page_chunk",
        explode(col("pages_array"))
    )
    .select(
        col("path"),
        col("user_id"),
        col("page_chunk.id").cast("string").alias("page_id"),
        concat_ws(
            "\n",
            concat_ws("", lit("Content: ["), col("page_chunk.content"), lit("]")),
            concat_ws("", lit("Footer: ["), col("page_chunk.footer"), lit("]")),
            concat_ws("", lit("Header: ["), col("page_chunk.header"), lit("]")),
            concat_ws("", lit("ID: ["), col("page_chunk.id").cast("string"), lit("]")),
            concat_ws("", lit("Page Number: ["), col("page_chunk.page_number").cast("string"), lit("]"))
        ).alias("text")
    )
    .withColumn("embedding", expr(f"ai_query('{embedding_endpoint}', 'text')"))
)

display(chunked_pages)

path,user_id,page_id,text,embedding
dbfs:/Volumes/shm/default/raw_pdfs/73eb2c4f_3c424fe06ddfe8ce_1755574069.pdf,7873535765378608,0,"Content: [# Sample Company ## Income Statement (Service) For the Year Ended September 30, 2021 | Category | Details | Amount | Total | |-------------------------|------------------|--------|-------| | Service revenue | | | $2,750| | Operating Expenses: | Depreciation expense| 100 | | | | Wages expenses | 1,200 | | | | Supplies expenses| 60 | | | | Total operating expenses | | 1,360 | | Operating Income | | | 1,390 | | Other Item: | Interest expense | 40 | | | | Pretax income | | 1,350 | | | Income tax expense | 405 | | | Net income | | | $945 | # Sample Company ## Statement of Retained Earnings For the Year Ended September 30, 2021 | Description | Retained Earnings | |-------------------------|--------------------| | Balance, October 1, 2020| $820 | | Net income | 945 | | Dividends declared | (500) | | Balance, September 30, 2021 | $1,265 |] Footer: [] Header: [] ID: [0] Page Number: []","List(-0.497802734375, -0.27978515625, -0.5244140625, -0.384033203125, 0.287353515625, 0.314697265625, 0.3779296875, -0.4404296875, 0.38330078125, 0.0308685302734375, 0.61279296875, -0.495849609375, -0.43017578125, -0.23291015625, -0.1878662109375, -0.30419921875, -0.26611328125, -0.6103515625, 0.393310546875, -0.51953125, -0.271240234375, 0.2457275390625, 0.339599609375, -0.80419921875, 1.1513671875, 0.343017578125, -1.1865234375, -0.448974609375, 0.373291015625, -0.0706787109375, -0.343994140625, 0.209716796875, -0.56689453125, -0.254638671875, -0.0633544921875, 0.76513671875, -0.09527587890625, -0.367431640625, -0.1612548828125, -0.04058837890625, -0.859375, -0.63671875, -0.270263671875, -0.4521484375, -0.7021484375, -0.12103271484375, 0.11053466796875, 0.018829345703125, -0.152099609375, 0.98828125, -0.9365234375, 0.12225341796875, -0.4453125, -0.1878662109375, 0.404052734375, 0.431884765625, 0.408203125, -0.8359375, 0.5751953125, 0.2254638671875, -0.669921875, 0.1590576171875, -0.272216796875, -0.32666015625, 0.2347412109375, 0.74609375, -0.28125, -0.45263671875, -1.287109375, 0.476318359375, -0.2086181640625, 0.29150390625, 0.1982421875, -1.0224609375, -0.193603515625, -0.127685546875, -0.9677734375, 0.2734375, -0.6533203125, 1.33984375, -0.3984375, -0.1903076171875, -0.28759765625, -1.1533203125, -0.7861328125, 1.205078125, 0.01201629638671875, 0.0931396484375, -0.1705322265625, 0.5478515625, 0.1138916015625, -0.39013671875, -0.03131103515625, 0.15234375, -0.0243072509765625, -0.0467529296875, -0.2076416015625, -0.61767578125, -0.499267578125, -0.0048065185546875, -0.024810791015625, -0.5087890625, -0.76416015625, 0.1258544921875, 0.466552734375, 0.287841796875, -0.5615234375, 0.423095703125, 0.1793212890625, 0.055511474609375, 0.52490234375, 0.35009765625, -0.0191802978515625, 0.052032470703125, 0.163818359375, 0.79052734375, 0.351318359375, -0.4140625, -0.69287109375, 0.12109375, -0.642578125, 0.947265625, 0.67138671875, -0.385986328125, 0.66064453125, 0.26171875, 0.0660400390625, -0.2568359375, -0.36181640625, 0.73779296875, 0.1829833984375, -0.353271484375, -0.0362548828125, -0.470703125, -0.040985107421875, 0.022247314453125, -0.826171875, 0.33984375, 0.86181640625, 0.85498046875, 0.297119140625, -0.6904296875, 0.371826171875, -0.609375, 0.779296875, 0.1756591796875, -0.370361328125, 0.580078125, -0.11126708984375, 0.2412109375, -0.76171875, 0.22314453125, -0.256103515625, 0.1400146484375, 0.00848388671875, 0.04290771484375, -0.10357666015625, -0.437744140625, -0.671875, -1.326171875, -0.85791015625, -0.7685546875, -0.2568359375, 0.37841796875, -0.26904296875, 1.09375, -0.9912109375, -0.5361328125, -0.02490234375, 0.56689453125, -0.6845703125, 0.36376953125, -0.057952880859375, -0.2578125, -0.0341796875, -0.32421875, -0.054290771484375, 0.1787109375, 0.393798828125, -0.66455078125, -0.10198974609375, -1.3447265625, -0.44482421875, -0.5146484375, 0.58935546875, 0.379150390625, 0.333984375, 0.0264739990234375, -0.30029296875, -0.4375, -0.6494140625, -0.2335205078125, -0.419189453125, -0.2099609375, -0.73583984375, 0.300537109375, -1.44140625, -0.146484375, 0.08428955078125, 0.5546875, 0.51025390625, 0.525390625, 1.521484375, -0.263427734375, 0.036956787109375, 0.467041015625, 0.157470703125, -0.0716552734375, 0.7236328125, -0.271240234375, 0.5537109375, 0.0338134765625, -0.169677734375, -0.01806640625, 0.744140625, 0.7021484375, 0.279541015625, -0.70703125, -0.0079803466796875, 0.020751953125, 0.175048828125, 0.26123046875, -0.14013671875, 0.13037109375, 0.544921875, -0.470458984375, 0.544921875, 1.03515625, 0.1319580078125, 0.319091796875, 0.6240234375, 0.52880859375, 0.09906005859375, -0.07843017578125, 0.47265625, -0.7138671875, -0.481201171875, 0.459228515625, -0.453857421875, -0.71923828125, -0.11773681640625, -0.3935546875, 0.246826171875, 0.31201171875, 0.5869140625, -0.0855712890625, 1.0458984375, 0.904296875, 0.413330078125, 0.86376953125, 0.0246124267578125, 0.7119140625, 0.480224609375, 0.262939453125, -0.177734375, -0.2486572265625, 0.2142333984375, 0.3876953125, 0.6533203125, -0.18798828125, -0.53515625, -0.321533203125, 1.9892578125, -0.51513671875, -0.3349609375, 0.053253173828125, -0.57861328125, -0.0810546875, 0.7607421875, 0.127685546875, 0.595703125, -0.25830078125, 0.0153656005859375, 0.07098388671875, 0.45947265625, 0.444091796875, -0.291259765625, -0.177978515625, 1.14453125, -0.87451171875, 0.24267578125, 0.2646484375, 0.171142578125, 0.385498046875, -0.037353515625, 0.31201171875, -1.2294921875, 1.26171875, -0.1580810546875, 0.7998046875, 0.6669921875, -0.29150390625, -0.454345703125, 0.267822265625, 0.466552734375, 0.6171875, 0.420654296875, 0.11083984375, 0.0032367706298828125, -0.27880859375, 0.85888671875, 1.083984375, -0.161865234375, -1.228515625, 0.7431640625, 0.623046875, 0.8173828125, -0.11993408203125, -1.2138671875, -0.222900390625, 0.317138671875, 0.61083984375, 0.07940673828125, 0.424560546875, -0.6025390625, 0.43505859375, -0.55712890625, -0.57421875, -0.75244140625, -0.751953125, 0.55126953125, 0.12213134765625, 0.88134765625, 0.84716796875, 0.0582275390625, 0.68798828125, -0.28173828125, 0.058807373046875, -0.29541015625, 0.5986328125, 0.1395263671875, -0.84326171875, -0.0355224609375, -0.292724609375, 0.043365478515625, -0.0567626953125, 0.06268310546875, -0.6806640625, 0.09539794921875, -0.346923828125, -0.312744140625, 0.154296875, 0.390625, -0.8798828125, 0.348388671875, -0.85302734375, -1.005859375, -0.45703125, -0.69189453125, -0.499267578125, 0.367919921875, -0.424560546875, -0.2325439453125, -0.06591796875, 0.135498046875, -0.54248046875, 0.276123046875, 0.488037109375, -0.28759765625, 0.197998046875, 0.59326171875, 0.27197265625, -1.064453125, -0.362060546875, -1.5830078125, 0.01395416259765625, -0.045135498046875, 0.86328125, 0.2064208984375, -0.6025390625, 0.0810546875, 0.1951904296875, -0.08526611328125, -0.246337890625, 0.58349609375, -0.0888671875, 0.0033111572265625, 0.67822265625, -0.16748046875, -0.8388671875, 0.337158203125, -0.44921875, -0.1522216796875, -0.2181396484375, -0.1649169921875, 0.3466796875, -0.728515625, -0.413818359375, 0.90576171875, 0.148193359375, -0.12005615234375, -0.326416015625, -0.1414794921875, -0.1636962890625, 0.53076171875, -0.48583984375, 0.40869140625, 0.345703125, -0.347412109375, -0.14208984375, 0.059417724609375, -0.521484375, -1.3212890625, 0.41650390625, -0.7412109375, -0.0163421630859375, 0.87255859375, -0.369140625, -0.76318359375, -0.44384765625, 0.452392578125, 0.113525390625, -0.380615234375, -0.445068359375, -0.95166015625, -0.53125, 0.0297088623046875, -0.323974609375, -0.07659912109375, 0.06341552734375, -0.484619140625, -0.277099609375, 0.1104736328125, 0.04376220703125, -0.2149658203125, 0.69287109375, 0.65283203125, 0.035186767578125, -0.79443359375, -0.6513671875, -0.5400390625, 0.403564453125, -0.6728515625, 0.2474365234375, -0.381591796875, -0.82470703125, -0.1375732421875, -0.0625, -1.328125, -0.167724609375, 0.2005615234375, 0.2384033203125, -0.82568359375, 1.3447265625, -0.25439453125, -0.09796142578125, 0.0440673828125, -0.60888671875, -0.1181640625, 0.1370849609375, 0.8779296875, 0.1572265625, 0.71044921875, 0.69482421875, -0.149169921875, -0.75048828125, -0.4296875, -0.5888671875, 0.0229034423828125, 0.3720703125, 0.34228515625, 0.837890625, 0.0282745361328125, -0.7802734375, 0.1444091796875, -1.060546875, -0.57763671875, 0.14697265625, -0.607421875, 0.072998046875, -0.0216064453125, 0.81689453125, -0.27099609375, 0.55810546875, -0.374267578125, -0.13232421875, 0.427490234375, -0.412109375, 0.974609375, -0.444091796875, 0.732421875, -0.07012939453125, 0.3876953125, -0.261962890625, 1.1220703125, -0.419189453125, -0.31689453125, -0.040618896484375, -0.255859375, -0.55029296875, -0.8701171875, -0.5068359375, -0.88037109375, -0.923828125, -0.0576171875, 0.0164947509765625, -0.35400390625, -0.188720703125, 0.401611328125, -0.5185546875, -0.77685546875, 0.461669921875, 0.08526611328125, -0.5595703125, 0.48193359375, 0.308349609375, 0.22509765625, 0.15966796875, 1.009765625, 0.99609375, -0.3828125, 0.292236328125, -0.135986328125, 0.0655517578125, -0.51025390625, -0.2958984375, 0.486328125, -0.455322265625, 0.0036144256591796875, -0.034423828125, -0.50341796875, -0.04815673828125, -0.07196044921875, 0.2156982421875, 0.1826171875, 0.7666015625, 0.15576171875, 0.1744384765625, -0.703125, 0.15234375, 0.8984375, -0.0377197265625, 0.391845703125, 0.035125732421875, -0.57275390625, -0.06549072265625, 0.035980224609375, 0.0694580078125, -0.354736328125, 0.177978515625, -0.452880859375, -0.0245819091796875, 0.051025390625, 0.294677734375, 0.3681640625, 1.2841796875, 0.37158203125, -0.06817626953125, 0.367431640625, 0.2137451171875, -0.39599609375, 0.484619140625, -0.61328125, 0.5322265625, -0.358642578125, -0.57177734375, 0.09576416015625, -0.90771484375, 0.338623046875, -0.044158935546875, 0.259033203125, 0.384765625, -0.263671875, 0.1943359375, -0.54541015625, 0.931640625, -0.222900390625, 0.0253753662109375, -0.57373046875, 0.0318603515625, 0.309326171875, -0.42041015625, 0.156982421875, -0.130126953125, -0.14208984375, -0.25146484375, 0.69189453125, -0.63134765625, 0.6669921875, -0.5126953125, 0.6005859375, 0.355224609375, 0.13916015625, 0.0290985107421875, -0.05426025390625, 0.56201171875, 0.63671875, -0.2166748046875, -0.0811767578125, -0.2076416015625, -0.253173828125, -0.4912109375, -0.438232421875, 0.426025390625, 0.3310546875, -0.2314453125, -0.5625, 1.171875, -0.3095703125, -0.39892578125, -0.439697265625, -0.0049896240234375, 0.1812744140625, 0.1854248046875, 0.265625, -0.69677734375, -1.16015625, 0.3095703125, -0.08941650390625, 0.63916015625, -0.59228515625, -0.2386474609375, 0.18359375, 0.50390625, 0.328369140625, 0.01561737060546875, -0.3837890625, 0.378662109375, -0.111328125, -0.81787109375, -0.615234375, -0.057708740234375, -0.9599609375, -0.072021484375, 0.292236328125, -0.1904296875, -0.159423828125, -0.1856689453125, -0.91748046875, 0.2281494140625, 0.64990234375, -0.0501708984375, -0.04107666015625, -0.1395263671875, -0.98486328125, 0.393310546875, 0.382568359375, 0.650390625, 1.0400390625, 0.57958984375, 0.2005615234375, -0.24560546875, -0.72509765625, -0.61376953125, -0.0706787109375, 0.3623046875, -0.304443359375, 0.1656494140625, -0.346435546875, 0.484619140625, -0.391845703125, -0.481689453125, 0.58203125, -0.1558837890625, 0.4658203125, -0.27001953125, -1.1865234375, 0.0810546875, 0.876953125, 0.0684814453125, -0.31787109375, 1.4931640625, 0.34521484375, 0.1236572265625, -0.17138671875, -0.445068359375, -0.2841796875, -0.9501953125, 0.0169219970703125, -0.08514404296875, 0.529296875, -0.450439453125, -0.370849609375, 0.79248046875, 0.32568359375, -0.77587890625, 0.324462890625, 9.055137634277344E-4, -0.58642578125, 0.02398681640625, 0.50439453125, -0.9091796875, -0.07281494140625, 0.0714111328125, -0.0033893585205078125, -0.70458984375, -0.2236328125, 0.49560546875, -0.806640625, 0.03125, 0.2103271484375, 0.1845703125, 0.327880859375, -0.61083984375, -1.228515625, -1.166015625, -0.576171875, 0.389892578125, 0.10791015625, 0.70361328125, 0.096923828125, -0.5625, -0.92724609375, -0.138671875, -0.1370849609375, 0.1734619140625, 0.791015625, -0.3271484375, 0.55517578125, -0.6279296875, 0.38720703125, -0.459228515625, -0.303955078125, -0.032806396484375, 0.52978515625, -0.63232421875, 0.576171875, 0.5576171875, 0.280517578125, -1.466796875, -0.73779296875, 0.26513671875, 0.2391357421875, -0.3798828125, -0.3759765625, -0.6611328125, -0.494873046875, 0.1356201171875, -0.0214691162109375, 0.2191162109375, -0.6767578125, -0.46826171875, 0.65234375, -0.54052734375, 0.93359375, 0.7138671875, -0.287109375, 0.052642822265625, -0.00322723388671875, 0.01078033447265625, -0.285888671875, 0.3837890625, 0.304443359375, 0.7060546875, -0.42138671875, 0.42431640625, 2.2109375, 1.2705078125, -1.146484375, -0.215576171875, -0.55859375, 0.06903076171875, -0.0102386474609375, -1.2646484375, -0.130859375, 0.0086517333984375, 0.006351470947265625, -0.66259765625, 0.0400390625, 0.276611328125, 5.469322204589844E-4, 0.08740234375, -0.38330078125, -0.1827392578125, 0.94921875, -0.5908203125, 0.62451171875, 1.09765625, -0.379638671875, -0.476318359375, -0.422607421875, 0.5146484375, -0.179931640625, -0.310302734375, 0.21337890625, 0.34814453125, 0.0039215087890625, 0.211181640625, 0.5322265625, 0.092529296875, 0.5791015625, -0.29931640625, -0.51123046875, 0.1314697265625, 0.154296875, -0.2342529296875, 0.12298583984375, 0.30322265625, -0.265380859375, -1.0146484375, -0.32861328125, 0.01398468017578125, 1.0029296875, 0.395751953125, 0.40673828125, -0.184814453125, 0.177490234375, -0.7314453125, -0.30078125, -0.2244873046875, 0.13623046875, 0.7978515625, 0.146240234375, -0.826171875, 0.25244140625, -0.52978515625, 0.65185546875, -0.25048828125, 0.0377197265625, 0.365478515625, 0.544921875, 0.409423828125, -0.141357421875, 0.59130859375, 0.57421875, -0.56005859375, 1.1865234375, 0.71240234375, -0.1468505859375, -0.2279052734375, 0.367431640625, 0.26416015625, 0.80908203125, 0.69287109375, 1.0654296875, -0.07086181640625, -1.1025390625, -0.3583984375, -0.5537109375, 0.591796875, -0.09991455078125, 1.1572265625, -0.342041015625, 0.296142578125, 0.230224609375, -1.125, 0.305419921875, -0.1448974609375, 0.169189453125, -0.177490234375, 0.361572265625, -0.1947021484375, -0.223876953125, -0.43505859375, 0.2305908203125, 0.047332763671875, 0.461669921875, -0.310791015625, 0.034912109375, -0.238037109375, 0.5546875, -0.08502197265625, 0.0226898193359375, -0.55078125, -0.370849609375, -0.45654296875, -0.02679443359375, -1.0810546875, -0.1943359375, -0.428466796875, 1.337890625, -0.68115234375, -0.642578125, -0.3291015625, -0.99658203125, -0.1595458984375, -0.356201171875, -0.58154296875, 0.25048828125, -0.11871337890625, -0.22509765625, -0.7099609375, 0.2235107421875, 0.216552734375, -0.50341796875, -0.293212890625, 0.23779296875, -0.546875, -0.07373046875, -0.568359375, 0.70654296875, 0.242431640625, 0.93505859375, -1.302734375, 1.2490234375, -0.1961669921875, 0.1910400390625, -0.2335205078125, -0.2763671875, -0.30517578125, -0.378662109375, -0.64794921875, 0.57763671875, -0.00982666015625, -0.66845703125, 0.681640625, 0.452880859375, 0.2236328125, -0.26123046875, -0.0072021484375, 0.253662109375, -0.75390625, 0.70751953125, 0.92138671875, -0.60302734375, 0.09857177734375, -0.0166473388671875, -0.666015625, 0.3388671875, -0.544921875, -0.80078125, -0.64306640625, -0.70068359375, -0.25390625, 0.55322265625, -0.61767578125, -0.08984375, -0.3681640625, -0.34521484375, -0.66748046875, 0.1346435546875, 0.9306640625, 0.088623046875, 0.2066650390625, -0.10009765625, 0.7333984375, -0.046234130859375, 0.07440185546875, 0.27197265625, 1.1904296875, 0.108642578125, 0.32373046875, 0.439697265625, 0.1568603515625, -0.2880859375, 0.4580078125, 0.416259765625, 0.1756591796875, 0.03997802734375, 0.364990234375, -0.552734375, -0.65869140625, -16.78125, -0.146240234375, 0.166748046875, 0.1339111328125, -0.59619140625, -0.03106689453125, 0.71923828125, -0.2401123046875, -0.5703125, -0.0728759765625, -0.1485595703125, 0.6962890625, -0.62353515625, -1.0380859375, -0.00472259521484375, -0.1964111328125, -0.11431884765625, -1.142578125, -0.1304931640625, -0.88134765625, 0.362548828125, -0.381103515625, -0.2822265625, -0.0223236083984375, 0.295654296875, -0.429931640625, 0.345703125, -0.0296630859375, 0.66357421875, 0.3671875, -0.058807373046875, 0.471923828125, 0.1439208984375, 0.0716552734375, -0.391357421875, -0.7509765625, 0.381591796875, 0.376220703125, 0.1446533203125, -0.232666015625, -0.81884765625, 0.7255859375, -0.01451873779296875, -0.6025390625, -0.206298828125, 0.62353515625, -0.11553955078125, -0.50390625, -0.2406005859375, 0.431640625, 0.521484375, 1.6806640625, 0.60302734375, 0.2474365234375, -0.3212890625, -0.282470703125, 0.11651611328125, -0.1168212890625, 0.56787109375, 0.027374267578125, 0.1566162109375, 0.130859375, 0.3916015625, -0.25634765625, 0.62841796875, -1.279296875, -0.12347412109375, -0.09747314453125, 0.2183837890625, -0.6640625, 0.06768798828125, -0.409423828125, 0.11834716796875, -0.0156707763671875, 0.31787109375, 0.458740234375, 0.7578125, -0.347412109375, 0.16162109375, 0.37841796875, -0.363037109375, 0.0311431884765625)"
dbfs:/Volumes/shm/default/raw_pdfs/73eb2c4f_3c424fe06ddfe8ce_1755574069.pdf,7873535765378608,1,"Content: [# Sample Company Income Statement (Product) For the Year Ended September 30, 2021 | Sales revenue | $6,875 | | Cost of Goods Sold | (4,125) | | Gross Profit | $2,750 | | Depreciation expense | 100 | | Wages expenses | 1,200 | | Supplies expenses | 60 | | Total operating expenses | 1,360 | | Operating Income | 1,390 | | Interest expense | 40 | | Pretax income | 1,350 | | Income tax expense | 405 | | Net income | $945 | # Sample Company Statement of Retained Earnings For the Year Ended September 30, 2021 | Balance, October 1, 2020 | $820 | | Net income | 945 | | Dividends declared | (500) | | Balance, September 30, 2021 | $1,265 |] Footer: [] Header: [] ID: [1] Page Number: []","List(-0.497802734375, -0.27978515625, -0.5244140625, -0.384033203125, 0.287353515625, 0.314697265625, 0.3779296875, -0.4404296875, 0.38330078125, 0.0308685302734375, 0.61279296875, -0.495849609375, -0.43017578125, -0.23291015625, -0.1878662109375, -0.30419921875, -0.26611328125, -0.6103515625, 0.393310546875, -0.51953125, -0.271240234375, 0.2457275390625, 0.339599609375, -0.80419921875, 1.1513671875, 0.343017578125, -1.1865234375, -0.448974609375, 0.373291015625, -0.0706787109375, -0.343994140625, 0.209716796875, -0.56689453125, -0.254638671875, -0.0633544921875, 0.76513671875, -0.09527587890625, -0.367431640625, -0.1612548828125, -0.04058837890625, -0.859375, -0.63671875, -0.270263671875, -0.4521484375, -0.7021484375, -0.12103271484375, 0.11053466796875, 0.018829345703125, -0.152099609375, 0.98828125, -0.9365234375, 0.12225341796875, -0.4453125, -0.1878662109375, 0.404052734375, 0.431884765625, 0.408203125, -0.8359375, 0.5751953125, 0.2254638671875, -0.669921875, 0.1590576171875, -0.272216796875, -0.32666015625, 0.2347412109375, 0.74609375, -0.28125, -0.45263671875, -1.287109375, 0.476318359375, -0.2086181640625, 0.29150390625, 0.1982421875, -1.0224609375, -0.193603515625, -0.127685546875, -0.9677734375, 0.2734375, -0.6533203125, 1.33984375, -0.3984375, -0.1903076171875, -0.28759765625, -1.1533203125, -0.7861328125, 1.205078125, 0.01201629638671875, 0.0931396484375, -0.1705322265625, 0.5478515625, 0.1138916015625, -0.39013671875, -0.03131103515625, 0.15234375, -0.0243072509765625, -0.0467529296875, -0.2076416015625, -0.61767578125, -0.499267578125, -0.0048065185546875, -0.024810791015625, -0.5087890625, -0.76416015625, 0.1258544921875, 0.466552734375, 0.287841796875, -0.5615234375, 0.423095703125, 0.1793212890625, 0.055511474609375, 0.52490234375, 0.35009765625, -0.0191802978515625, 0.052032470703125, 0.163818359375, 0.79052734375, 0.351318359375, -0.4140625, -0.69287109375, 0.12109375, -0.642578125, 0.947265625, 0.67138671875, -0.385986328125, 0.66064453125, 0.26171875, 0.0660400390625, -0.2568359375, -0.36181640625, 0.73779296875, 0.1829833984375, -0.353271484375, -0.0362548828125, -0.470703125, -0.040985107421875, 0.022247314453125, -0.826171875, 0.33984375, 0.86181640625, 0.85498046875, 0.297119140625, -0.6904296875, 0.371826171875, -0.609375, 0.779296875, 0.1756591796875, -0.370361328125, 0.580078125, -0.11126708984375, 0.2412109375, -0.76171875, 0.22314453125, -0.256103515625, 0.1400146484375, 0.00848388671875, 0.04290771484375, -0.10357666015625, -0.437744140625, -0.671875, -1.326171875, -0.85791015625, -0.7685546875, -0.2568359375, 0.37841796875, -0.26904296875, 1.09375, -0.9912109375, -0.5361328125, -0.02490234375, 0.56689453125, -0.6845703125, 0.36376953125, -0.057952880859375, -0.2578125, -0.0341796875, -0.32421875, -0.054290771484375, 0.1787109375, 0.393798828125, -0.66455078125, -0.10198974609375, -1.3447265625, -0.44482421875, -0.5146484375, 0.58935546875, 0.379150390625, 0.333984375, 0.0264739990234375, -0.30029296875, -0.4375, -0.6494140625, -0.2335205078125, -0.419189453125, -0.2099609375, -0.73583984375, 0.300537109375, -1.44140625, -0.146484375, 0.08428955078125, 0.5546875, 0.51025390625, 0.525390625, 1.521484375, -0.263427734375, 0.036956787109375, 0.467041015625, 0.157470703125, -0.0716552734375, 0.7236328125, -0.271240234375, 0.5537109375, 0.0338134765625, -0.169677734375, -0.01806640625, 0.744140625, 0.7021484375, 0.279541015625, -0.70703125, -0.0079803466796875, 0.020751953125, 0.175048828125, 0.26123046875, -0.14013671875, 0.13037109375, 0.544921875, -0.470458984375, 0.544921875, 1.03515625, 0.1319580078125, 0.319091796875, 0.6240234375, 0.52880859375, 0.09906005859375, -0.07843017578125, 0.47265625, -0.7138671875, -0.481201171875, 0.459228515625, -0.453857421875, -0.71923828125, -0.11773681640625, -0.3935546875, 0.246826171875, 0.31201171875, 0.5869140625, -0.0855712890625, 1.0458984375, 0.904296875, 0.413330078125, 0.86376953125, 0.0246124267578125, 0.7119140625, 0.480224609375, 0.262939453125, -0.177734375, -0.2486572265625, 0.2142333984375, 0.3876953125, 0.6533203125, -0.18798828125, -0.53515625, -0.321533203125, 1.9892578125, -0.51513671875, -0.3349609375, 0.053253173828125, -0.57861328125, -0.0810546875, 0.7607421875, 0.127685546875, 0.595703125, -0.25830078125, 0.0153656005859375, 0.07098388671875, 0.45947265625, 0.444091796875, -0.291259765625, -0.177978515625, 1.14453125, -0.87451171875, 0.24267578125, 0.2646484375, 0.171142578125, 0.385498046875, -0.037353515625, 0.31201171875, -1.2294921875, 1.26171875, -0.1580810546875, 0.7998046875, 0.6669921875, -0.29150390625, -0.454345703125, 0.267822265625, 0.466552734375, 0.6171875, 0.420654296875, 0.11083984375, 0.0032367706298828125, -0.27880859375, 0.85888671875, 1.083984375, -0.161865234375, -1.228515625, 0.7431640625, 0.623046875, 0.8173828125, -0.11993408203125, -1.2138671875, -0.222900390625, 0.317138671875, 0.61083984375, 0.07940673828125, 0.424560546875, -0.6025390625, 0.43505859375, -0.55712890625, -0.57421875, -0.75244140625, -0.751953125, 0.55126953125, 0.12213134765625, 0.88134765625, 0.84716796875, 0.0582275390625, 0.68798828125, -0.28173828125, 0.058807373046875, -0.29541015625, 0.5986328125, 0.1395263671875, -0.84326171875, -0.0355224609375, -0.292724609375, 0.043365478515625, -0.0567626953125, 0.06268310546875, -0.6806640625, 0.09539794921875, -0.346923828125, -0.312744140625, 0.154296875, 0.390625, -0.8798828125, 0.348388671875, -0.85302734375, -1.005859375, -0.45703125, -0.69189453125, -0.499267578125, 0.367919921875, -0.424560546875, -0.2325439453125, -0.06591796875, 0.135498046875, -0.54248046875, 0.276123046875, 0.488037109375, -0.28759765625, 0.197998046875, 0.59326171875, 0.27197265625, -1.064453125, -0.362060546875, -1.5830078125, 0.01395416259765625, -0.045135498046875, 0.86328125, 0.2064208984375, -0.6025390625, 0.0810546875, 0.1951904296875, -0.08526611328125, -0.246337890625, 0.58349609375, -0.0888671875, 0.0033111572265625, 0.67822265625, -0.16748046875, -0.8388671875, 0.337158203125, -0.44921875, -0.1522216796875, -0.2181396484375, -0.1649169921875, 0.3466796875, -0.728515625, -0.413818359375, 0.90576171875, 0.148193359375, -0.12005615234375, -0.326416015625, -0.1414794921875, -0.1636962890625, 0.53076171875, -0.48583984375, 0.40869140625, 0.345703125, -0.347412109375, -0.14208984375, 0.059417724609375, -0.521484375, -1.3212890625, 0.41650390625, -0.7412109375, -0.0163421630859375, 0.87255859375, -0.369140625, -0.76318359375, -0.44384765625, 0.452392578125, 0.113525390625, -0.380615234375, -0.445068359375, -0.95166015625, -0.53125, 0.0297088623046875, -0.323974609375, -0.07659912109375, 0.06341552734375, -0.484619140625, -0.277099609375, 0.1104736328125, 0.04376220703125, -0.2149658203125, 0.69287109375, 0.65283203125, 0.035186767578125, -0.79443359375, -0.6513671875, -0.5400390625, 0.403564453125, -0.6728515625, 0.2474365234375, -0.381591796875, -0.82470703125, -0.1375732421875, -0.0625, -1.328125, -0.167724609375, 0.2005615234375, 0.2384033203125, -0.82568359375, 1.3447265625, -0.25439453125, -0.09796142578125, 0.0440673828125, -0.60888671875, -0.1181640625, 0.1370849609375, 0.8779296875, 0.1572265625, 0.71044921875, 0.69482421875, -0.149169921875, -0.75048828125, -0.4296875, -0.5888671875, 0.0229034423828125, 0.3720703125, 0.34228515625, 0.837890625, 0.0282745361328125, -0.7802734375, 0.1444091796875, -1.060546875, -0.57763671875, 0.14697265625, -0.607421875, 0.072998046875, -0.0216064453125, 0.81689453125, -0.27099609375, 0.55810546875, -0.374267578125, -0.13232421875, 0.427490234375, -0.412109375, 0.974609375, -0.444091796875, 0.732421875, -0.07012939453125, 0.3876953125, -0.261962890625, 1.1220703125, -0.419189453125, -0.31689453125, -0.040618896484375, -0.255859375, -0.55029296875, -0.8701171875, -0.5068359375, -0.88037109375, -0.923828125, -0.0576171875, 0.0164947509765625, -0.35400390625, -0.188720703125, 0.401611328125, -0.5185546875, -0.77685546875, 0.461669921875, 0.08526611328125, -0.5595703125, 0.48193359375, 0.308349609375, 0.22509765625, 0.15966796875, 1.009765625, 0.99609375, -0.3828125, 0.292236328125, -0.135986328125, 0.0655517578125, -0.51025390625, -0.2958984375, 0.486328125, -0.455322265625, 0.0036144256591796875, -0.034423828125, -0.50341796875, -0.04815673828125, -0.07196044921875, 0.2156982421875, 0.1826171875, 0.7666015625, 0.15576171875, 0.1744384765625, -0.703125, 0.15234375, 0.8984375, -0.0377197265625, 0.391845703125, 0.035125732421875, -0.57275390625, -0.06549072265625, 0.035980224609375, 0.0694580078125, -0.354736328125, 0.177978515625, -0.452880859375, -0.0245819091796875, 0.051025390625, 0.294677734375, 0.3681640625, 1.2841796875, 0.37158203125, -0.06817626953125, 0.367431640625, 0.2137451171875, -0.39599609375, 0.484619140625, -0.61328125, 0.5322265625, -0.358642578125, -0.57177734375, 0.09576416015625, -0.90771484375, 0.338623046875, -0.044158935546875, 0.259033203125, 0.384765625, -0.263671875, 0.1943359375, -0.54541015625, 0.931640625, -0.222900390625, 0.0253753662109375, -0.57373046875, 0.0318603515625, 0.309326171875, -0.42041015625, 0.156982421875, -0.130126953125, -0.14208984375, -0.25146484375, 0.69189453125, -0.63134765625, 0.6669921875, -0.5126953125, 0.6005859375, 0.355224609375, 0.13916015625, 0.0290985107421875, -0.05426025390625, 0.56201171875, 0.63671875, -0.2166748046875, -0.0811767578125, -0.2076416015625, -0.253173828125, -0.4912109375, -0.438232421875, 0.426025390625, 0.3310546875, -0.2314453125, -0.5625, 1.171875, -0.3095703125, -0.39892578125, -0.439697265625, -0.0049896240234375, 0.1812744140625, 0.1854248046875, 0.265625, -0.69677734375, -1.16015625, 0.3095703125, -0.08941650390625, 0.63916015625, -0.59228515625, -0.2386474609375, 0.18359375, 0.50390625, 0.328369140625, 0.01561737060546875, -0.3837890625, 0.378662109375, -0.111328125, -0.81787109375, -0.615234375, -0.057708740234375, -0.9599609375, -0.072021484375, 0.292236328125, -0.1904296875, -0.159423828125, -0.1856689453125, -0.91748046875, 0.2281494140625, 0.64990234375, -0.0501708984375, -0.04107666015625, -0.1395263671875, -0.98486328125, 0.393310546875, 0.382568359375, 0.650390625, 1.0400390625, 0.57958984375, 0.2005615234375, -0.24560546875, -0.72509765625, -0.61376953125, -0.0706787109375, 0.3623046875, -0.304443359375, 0.1656494140625, -0.346435546875, 0.484619140625, -0.391845703125, -0.481689453125, 0.58203125, -0.1558837890625, 0.4658203125, -0.27001953125, -1.1865234375, 0.0810546875, 0.876953125, 0.0684814453125, -0.31787109375, 1.4931640625, 0.34521484375, 0.1236572265625, -0.17138671875, -0.445068359375, -0.2841796875, -0.9501953125, 0.0169219970703125, -0.08514404296875, 0.529296875, -0.450439453125, -0.370849609375, 0.79248046875, 0.32568359375, -0.77587890625, 0.324462890625, 9.055137634277344E-4, -0.58642578125, 0.02398681640625, 0.50439453125, -0.9091796875, -0.07281494140625, 0.0714111328125, -0.0033893585205078125, -0.70458984375, -0.2236328125, 0.49560546875, -0.806640625, 0.03125, 0.2103271484375, 0.1845703125, 0.327880859375, -0.61083984375, -1.228515625, -1.166015625, -0.576171875, 0.389892578125, 0.10791015625, 0.70361328125, 0.096923828125, -0.5625, -0.92724609375, -0.138671875, -0.1370849609375, 0.1734619140625, 0.791015625, -0.3271484375, 0.55517578125, -0.6279296875, 0.38720703125, -0.459228515625, -0.303955078125, -0.032806396484375, 0.52978515625, -0.63232421875, 0.576171875, 0.5576171875, 0.280517578125, -1.466796875, -0.73779296875, 0.26513671875, 0.2391357421875, -0.3798828125, -0.3759765625, -0.6611328125, -0.494873046875, 0.1356201171875, -0.0214691162109375, 0.2191162109375, -0.6767578125, -0.46826171875, 0.65234375, -0.54052734375, 0.93359375, 0.7138671875, -0.287109375, 0.052642822265625, -0.00322723388671875, 0.01078033447265625, -0.285888671875, 0.3837890625, 0.304443359375, 0.7060546875, -0.42138671875, 0.42431640625, 2.2109375, 1.2705078125, -1.146484375, -0.215576171875, -0.55859375, 0.06903076171875, -0.0102386474609375, -1.2646484375, -0.130859375, 0.0086517333984375, 0.006351470947265625, -0.66259765625, 0.0400390625, 0.276611328125, 5.469322204589844E-4, 0.08740234375, -0.38330078125, -0.1827392578125, 0.94921875, -0.5908203125, 0.62451171875, 1.09765625, -0.379638671875, -0.476318359375, -0.422607421875, 0.5146484375, -0.179931640625, -0.310302734375, 0.21337890625, 0.34814453125, 0.0039215087890625, 0.211181640625, 0.5322265625, 0.092529296875, 0.5791015625, -0.29931640625, -0.51123046875, 0.1314697265625, 0.154296875, -0.2342529296875, 0.12298583984375, 0.30322265625, -0.265380859375, -1.0146484375, -0.32861328125, 0.01398468017578125, 1.0029296875, 0.395751953125, 0.40673828125, -0.184814453125, 0.177490234375, -0.7314453125, -0.30078125, -0.2244873046875, 0.13623046875, 0.7978515625, 0.146240234375, -0.826171875, 0.25244140625, -0.52978515625, 0.65185546875, -0.25048828125, 0.0377197265625, 0.365478515625, 0.544921875, 0.409423828125, -0.141357421875, 0.59130859375, 0.57421875, -0.56005859375, 1.1865234375, 0.71240234375, -0.1468505859375, -0.2279052734375, 0.367431640625, 0.26416015625, 0.80908203125, 0.69287109375, 1.0654296875, -0.07086181640625, -1.1025390625, -0.3583984375, -0.5537109375, 0.591796875, -0.09991455078125, 1.1572265625, -0.342041015625, 0.296142578125, 0.230224609375, -1.125, 0.305419921875, -0.1448974609375, 0.169189453125, -0.177490234375, 0.361572265625, -0.1947021484375, -0.223876953125, -0.43505859375, 0.2305908203125, 0.047332763671875, 0.461669921875, -0.310791015625, 0.034912109375, -0.238037109375, 0.5546875, -0.08502197265625, 0.0226898193359375, -0.55078125, -0.370849609375, -0.45654296875, -0.02679443359375, -1.0810546875, -0.1943359375, -0.428466796875, 1.337890625, -0.68115234375, -0.642578125, -0.3291015625, -0.99658203125, -0.1595458984375, -0.356201171875, -0.58154296875, 0.25048828125, -0.11871337890625, -0.22509765625, -0.7099609375, 0.2235107421875, 0.216552734375, -0.50341796875, -0.293212890625, 0.23779296875, -0.546875, -0.07373046875, -0.568359375, 0.70654296875, 0.242431640625, 0.93505859375, -1.302734375, 1.2490234375, -0.1961669921875, 0.1910400390625, -0.2335205078125, -0.2763671875, -0.30517578125, -0.378662109375, -0.64794921875, 0.57763671875, -0.00982666015625, -0.66845703125, 0.681640625, 0.452880859375, 0.2236328125, -0.26123046875, -0.0072021484375, 0.253662109375, -0.75390625, 0.70751953125, 0.92138671875, -0.60302734375, 0.09857177734375, -0.0166473388671875, -0.666015625, 0.3388671875, -0.544921875, -0.80078125, -0.64306640625, -0.70068359375, -0.25390625, 0.55322265625, -0.61767578125, -0.08984375, -0.3681640625, -0.34521484375, -0.66748046875, 0.1346435546875, 0.9306640625, 0.088623046875, 0.2066650390625, -0.10009765625, 0.7333984375, -0.046234130859375, 0.07440185546875, 0.27197265625, 1.1904296875, 0.108642578125, 0.32373046875, 0.439697265625, 0.1568603515625, -0.2880859375, 0.4580078125, 0.416259765625, 0.1756591796875, 0.03997802734375, 0.364990234375, -0.552734375, -0.65869140625, -16.78125, -0.146240234375, 0.166748046875, 0.1339111328125, -0.59619140625, -0.03106689453125, 0.71923828125, -0.2401123046875, -0.5703125, -0.0728759765625, -0.1485595703125, 0.6962890625, -0.62353515625, -1.0380859375, -0.00472259521484375, -0.1964111328125, -0.11431884765625, -1.142578125, -0.1304931640625, -0.88134765625, 0.362548828125, -0.381103515625, -0.2822265625, -0.0223236083984375, 0.295654296875, -0.429931640625, 0.345703125, -0.0296630859375, 0.66357421875, 0.3671875, -0.058807373046875, 0.471923828125, 0.1439208984375, 0.0716552734375, -0.391357421875, -0.7509765625, 0.381591796875, 0.376220703125, 0.1446533203125, -0.232666015625, -0.81884765625, 0.7255859375, -0.01451873779296875, -0.6025390625, -0.206298828125, 0.62353515625, -0.11553955078125, -0.50390625, -0.2406005859375, 0.431640625, 0.521484375, 1.6806640625, 0.60302734375, 0.2474365234375, -0.3212890625, -0.282470703125, 0.11651611328125, -0.1168212890625, 0.56787109375, 0.027374267578125, 0.1566162109375, 0.130859375, 0.3916015625, -0.25634765625, 0.62841796875, -1.279296875, -0.12347412109375, -0.09747314453125, 0.2183837890625, -0.6640625, 0.06768798828125, -0.409423828125, 0.11834716796875, -0.0156707763671875, 0.31787109375, 0.458740234375, 0.7578125, -0.347412109375, 0.16162109375, 0.37841796875, -0.363037109375, 0.0311431884765625)"
dbfs:/Volumes/shm/default/raw_pdfs/73eb2c4f_3c424fe06ddfe8ce_1755574069.pdf,7873535765378608,2,"Content: [# Sample Company Balance Sheet September 30, 2021 | Assets | Liabilities and Stockholders' Equity | | --- | --- | | **Current Assets:** | **Current Liabilities:** | | Cash $ 1,550 | Accounts payable $60 | | Accounts receivable 770 | Interest payable 80 | | Supplies 40 | Wages payable 100 | | Total current assets 2,360 | Income taxes payable 405 | | | Utilities payable 250 | | | Total Current Liabilities 895 | | Equipment 12,000 | Long-term Notes Payable 8,000 | | Less: Accumulated deprec. (1,300) | **Owners' Equity:** | | 10,700 | Owners' capital 2,900 | | | Retained earnings 1,265 | | | Total equity 4,165 | | **Total assets** $13,060 | **Total liabilities and owners' equity** $13,060 |] Footer: [] Header: [] ID: [2] Page Number: []","List(-0.497802734375, -0.27978515625, -0.5244140625, -0.384033203125, 0.287353515625, 0.314697265625, 0.3779296875, -0.4404296875, 0.38330078125, 0.0308685302734375, 0.61279296875, -0.495849609375, -0.43017578125, -0.23291015625, -0.1878662109375, -0.30419921875, -0.26611328125, -0.6103515625, 0.393310546875, -0.51953125, -0.271240234375, 0.2457275390625, 0.339599609375, -0.80419921875, 1.1513671875, 0.343017578125, -1.1865234375, -0.448974609375, 0.373291015625, -0.0706787109375, -0.343994140625, 0.209716796875, -0.56689453125, -0.254638671875, -0.0633544921875, 0.76513671875, -0.09527587890625, -0.367431640625, -0.1612548828125, -0.04058837890625, -0.859375, -0.63671875, -0.270263671875, -0.4521484375, -0.7021484375, -0.12103271484375, 0.11053466796875, 0.018829345703125, -0.152099609375, 0.98828125, -0.9365234375, 0.12225341796875, -0.4453125, -0.1878662109375, 0.404052734375, 0.431884765625, 0.408203125, -0.8359375, 0.5751953125, 0.2254638671875, -0.669921875, 0.1590576171875, -0.272216796875, -0.32666015625, 0.2347412109375, 0.74609375, -0.28125, -0.45263671875, -1.287109375, 0.476318359375, -0.2086181640625, 0.29150390625, 0.1982421875, -1.0224609375, -0.193603515625, -0.127685546875, -0.9677734375, 0.2734375, -0.6533203125, 1.33984375, -0.3984375, -0.1903076171875, -0.28759765625, -1.1533203125, -0.7861328125, 1.205078125, 0.01201629638671875, 0.0931396484375, -0.1705322265625, 0.5478515625, 0.1138916015625, -0.39013671875, -0.03131103515625, 0.15234375, -0.0243072509765625, -0.0467529296875, -0.2076416015625, -0.61767578125, -0.499267578125, -0.0048065185546875, -0.024810791015625, -0.5087890625, -0.76416015625, 0.1258544921875, 0.466552734375, 0.287841796875, -0.5615234375, 0.423095703125, 0.1793212890625, 0.055511474609375, 0.52490234375, 0.35009765625, -0.0191802978515625, 0.052032470703125, 0.163818359375, 0.79052734375, 0.351318359375, -0.4140625, -0.69287109375, 0.12109375, -0.642578125, 0.947265625, 0.67138671875, -0.385986328125, 0.66064453125, 0.26171875, 0.0660400390625, -0.2568359375, -0.36181640625, 0.73779296875, 0.1829833984375, -0.353271484375, -0.0362548828125, -0.470703125, -0.040985107421875, 0.022247314453125, -0.826171875, 0.33984375, 0.86181640625, 0.85498046875, 0.297119140625, -0.6904296875, 0.371826171875, -0.609375, 0.779296875, 0.1756591796875, -0.370361328125, 0.580078125, -0.11126708984375, 0.2412109375, -0.76171875, 0.22314453125, -0.256103515625, 0.1400146484375, 0.00848388671875, 0.04290771484375, -0.10357666015625, -0.437744140625, -0.671875, -1.326171875, -0.85791015625, -0.7685546875, -0.2568359375, 0.37841796875, -0.26904296875, 1.09375, -0.9912109375, -0.5361328125, -0.02490234375, 0.56689453125, -0.6845703125, 0.36376953125, -0.057952880859375, -0.2578125, -0.0341796875, -0.32421875, -0.054290771484375, 0.1787109375, 0.393798828125, -0.66455078125, -0.10198974609375, -1.3447265625, -0.44482421875, -0.5146484375, 0.58935546875, 0.379150390625, 0.333984375, 0.0264739990234375, -0.30029296875, -0.4375, -0.6494140625, -0.2335205078125, -0.419189453125, -0.2099609375, -0.73583984375, 0.300537109375, -1.44140625, -0.146484375, 0.08428955078125, 0.5546875, 0.51025390625, 0.525390625, 1.521484375, -0.263427734375, 0.036956787109375, 0.467041015625, 0.157470703125, -0.0716552734375, 0.7236328125, -0.271240234375, 0.5537109375, 0.0338134765625, -0.169677734375, -0.01806640625, 0.744140625, 0.7021484375, 0.279541015625, -0.70703125, -0.0079803466796875, 0.020751953125, 0.175048828125, 0.26123046875, -0.14013671875, 0.13037109375, 0.544921875, -0.470458984375, 0.544921875, 1.03515625, 0.1319580078125, 0.319091796875, 0.6240234375, 0.52880859375, 0.09906005859375, -0.07843017578125, 0.47265625, -0.7138671875, -0.481201171875, 0.459228515625, -0.453857421875, -0.71923828125, -0.11773681640625, -0.3935546875, 0.246826171875, 0.31201171875, 0.5869140625, -0.0855712890625, 1.0458984375, 0.904296875, 0.413330078125, 0.86376953125, 0.0246124267578125, 0.7119140625, 0.480224609375, 0.262939453125, -0.177734375, -0.2486572265625, 0.2142333984375, 0.3876953125, 0.6533203125, -0.18798828125, -0.53515625, -0.321533203125, 1.9892578125, -0.51513671875, -0.3349609375, 0.053253173828125, -0.57861328125, -0.0810546875, 0.7607421875, 0.127685546875, 0.595703125, -0.25830078125, 0.0153656005859375, 0.07098388671875, 0.45947265625, 0.444091796875, -0.291259765625, -0.177978515625, 1.14453125, -0.87451171875, 0.24267578125, 0.2646484375, 0.171142578125, 0.385498046875, -0.037353515625, 0.31201171875, -1.2294921875, 1.26171875, -0.1580810546875, 0.7998046875, 0.6669921875, -0.29150390625, -0.454345703125, 0.267822265625, 0.466552734375, 0.6171875, 0.420654296875, 0.11083984375, 0.0032367706298828125, -0.27880859375, 0.85888671875, 1.083984375, -0.161865234375, -1.228515625, 0.7431640625, 0.623046875, 0.8173828125, -0.11993408203125, -1.2138671875, -0.222900390625, 0.317138671875, 0.61083984375, 0.07940673828125, 0.424560546875, -0.6025390625, 0.43505859375, -0.55712890625, -0.57421875, -0.75244140625, -0.751953125, 0.55126953125, 0.12213134765625, 0.88134765625, 0.84716796875, 0.0582275390625, 0.68798828125, -0.28173828125, 0.058807373046875, -0.29541015625, 0.5986328125, 0.1395263671875, -0.84326171875, -0.0355224609375, -0.292724609375, 0.043365478515625, -0.0567626953125, 0.06268310546875, -0.6806640625, 0.09539794921875, -0.346923828125, -0.312744140625, 0.154296875, 0.390625, -0.8798828125, 0.348388671875, -0.85302734375, -1.005859375, -0.45703125, -0.69189453125, -0.499267578125, 0.367919921875, -0.424560546875, -0.2325439453125, -0.06591796875, 0.135498046875, -0.54248046875, 0.276123046875, 0.488037109375, -0.28759765625, 0.197998046875, 0.59326171875, 0.27197265625, -1.064453125, -0.362060546875, -1.5830078125, 0.01395416259765625, -0.045135498046875, 0.86328125, 0.2064208984375, -0.6025390625, 0.0810546875, 0.1951904296875, -0.08526611328125, -0.246337890625, 0.58349609375, -0.0888671875, 0.0033111572265625, 0.67822265625, -0.16748046875, -0.8388671875, 0.337158203125, -0.44921875, -0.1522216796875, -0.2181396484375, -0.1649169921875, 0.3466796875, -0.728515625, -0.413818359375, 0.90576171875, 0.148193359375, -0.12005615234375, -0.326416015625, -0.1414794921875, -0.1636962890625, 0.53076171875, -0.48583984375, 0.40869140625, 0.345703125, -0.347412109375, -0.14208984375, 0.059417724609375, -0.521484375, -1.3212890625, 0.41650390625, -0.7412109375, -0.0163421630859375, 0.87255859375, -0.369140625, -0.76318359375, -0.44384765625, 0.452392578125, 0.113525390625, -0.380615234375, -0.445068359375, -0.95166015625, -0.53125, 0.0297088623046875, -0.323974609375, -0.07659912109375, 0.06341552734375, -0.484619140625, -0.277099609375, 0.1104736328125, 0.04376220703125, -0.2149658203125, 0.69287109375, 0.65283203125, 0.035186767578125, -0.79443359375, -0.6513671875, -0.5400390625, 0.403564453125, -0.6728515625, 0.2474365234375, -0.381591796875, -0.82470703125, -0.1375732421875, -0.0625, -1.328125, -0.167724609375, 0.2005615234375, 0.2384033203125, -0.82568359375, 1.3447265625, -0.25439453125, -0.09796142578125, 0.0440673828125, -0.60888671875, -0.1181640625, 0.1370849609375, 0.8779296875, 0.1572265625, 0.71044921875, 0.69482421875, -0.149169921875, -0.75048828125, -0.4296875, -0.5888671875, 0.0229034423828125, 0.3720703125, 0.34228515625, 0.837890625, 0.0282745361328125, -0.7802734375, 0.1444091796875, -1.060546875, -0.57763671875, 0.14697265625, -0.607421875, 0.072998046875, -0.0216064453125, 0.81689453125, -0.27099609375, 0.55810546875, -0.374267578125, -0.13232421875, 0.427490234375, -0.412109375, 0.974609375, -0.444091796875, 0.732421875, -0.07012939453125, 0.3876953125, -0.261962890625, 1.1220703125, -0.419189453125, -0.31689453125, -0.040618896484375, -0.255859375, -0.55029296875, -0.8701171875, -0.5068359375, -0.88037109375, -0.923828125, -0.0576171875, 0.0164947509765625, -0.35400390625, -0.188720703125, 0.401611328125, -0.5185546875, -0.77685546875, 0.461669921875, 0.08526611328125, -0.5595703125, 0.48193359375, 0.308349609375, 0.22509765625, 0.15966796875, 1.009765625, 0.99609375, -0.3828125, 0.292236328125, -0.135986328125, 0.0655517578125, -0.51025390625, -0.2958984375, 0.486328125, -0.455322265625, 0.0036144256591796875, -0.034423828125, -0.50341796875, -0.04815673828125, -0.07196044921875, 0.2156982421875, 0.1826171875, 0.7666015625, 0.15576171875, 0.1744384765625, -0.703125, 0.15234375, 0.8984375, -0.0377197265625, 0.391845703125, 0.035125732421875, -0.57275390625, -0.06549072265625, 0.035980224609375, 0.0694580078125, -0.354736328125, 0.177978515625, -0.452880859375, -0.0245819091796875, 0.051025390625, 0.294677734375, 0.3681640625, 1.2841796875, 0.37158203125, -0.06817626953125, 0.367431640625, 0.2137451171875, -0.39599609375, 0.484619140625, -0.61328125, 0.5322265625, -0.358642578125, -0.57177734375, 0.09576416015625, -0.90771484375, 0.338623046875, -0.044158935546875, 0.259033203125, 0.384765625, -0.263671875, 0.1943359375, -0.54541015625, 0.931640625, -0.222900390625, 0.0253753662109375, -0.57373046875, 0.0318603515625, 0.309326171875, -0.42041015625, 0.156982421875, -0.130126953125, -0.14208984375, -0.25146484375, 0.69189453125, -0.63134765625, 0.6669921875, -0.5126953125, 0.6005859375, 0.355224609375, 0.13916015625, 0.0290985107421875, -0.05426025390625, 0.56201171875, 0.63671875, -0.2166748046875, -0.0811767578125, -0.2076416015625, -0.253173828125, -0.4912109375, -0.438232421875, 0.426025390625, 0.3310546875, -0.2314453125, -0.5625, 1.171875, -0.3095703125, -0.39892578125, -0.439697265625, -0.0049896240234375, 0.1812744140625, 0.1854248046875, 0.265625, -0.69677734375, -1.16015625, 0.3095703125, -0.08941650390625, 0.63916015625, -0.59228515625, -0.2386474609375, 0.18359375, 0.50390625, 0.328369140625, 0.01561737060546875, -0.3837890625, 0.378662109375, -0.111328125, -0.81787109375, -0.615234375, -0.057708740234375, -0.9599609375, -0.072021484375, 0.292236328125, -0.1904296875, -0.159423828125, -0.1856689453125, -0.91748046875, 0.2281494140625, 0.64990234375, -0.0501708984375, -0.04107666015625, -0.1395263671875, -0.98486328125, 0.393310546875, 0.382568359375, 0.650390625, 1.0400390625, 0.57958984375, 0.2005615234375, -0.24560546875, -0.72509765625, -0.61376953125, -0.0706787109375, 0.3623046875, -0.304443359375, 0.1656494140625, -0.346435546875, 0.484619140625, -0.391845703125, -0.481689453125, 0.58203125, -0.1558837890625, 0.4658203125, -0.27001953125, -1.1865234375, 0.0810546875, 0.876953125, 0.0684814453125, -0.31787109375, 1.4931640625, 0.34521484375, 0.1236572265625, -0.17138671875, -0.445068359375, -0.2841796875, -0.9501953125, 0.0169219970703125, -0.08514404296875, 0.529296875, -0.450439453125, -0.370849609375, 0.79248046875, 0.32568359375, -0.77587890625, 0.324462890625, 9.055137634277344E-4, -0.58642578125, 0.02398681640625, 0.50439453125, -0.9091796875, -0.07281494140625, 0.0714111328125, -0.0033893585205078125, -0.70458984375, -0.2236328125, 0.49560546875, -0.806640625, 0.03125, 0.2103271484375, 0.1845703125, 0.327880859375, -0.61083984375, -1.228515625, -1.166015625, -0.576171875, 0.389892578125, 0.10791015625, 0.70361328125, 0.096923828125, -0.5625, -0.92724609375, -0.138671875, -0.1370849609375, 0.1734619140625, 0.791015625, -0.3271484375, 0.55517578125, -0.6279296875, 0.38720703125, -0.459228515625, -0.303955078125, -0.032806396484375, 0.52978515625, -0.63232421875, 0.576171875, 0.5576171875, 0.280517578125, -1.466796875, -0.73779296875, 0.26513671875, 0.2391357421875, -0.3798828125, -0.3759765625, -0.6611328125, -0.494873046875, 0.1356201171875, -0.0214691162109375, 0.2191162109375, -0.6767578125, -0.46826171875, 0.65234375, -0.54052734375, 0.93359375, 0.7138671875, -0.287109375, 0.052642822265625, -0.00322723388671875, 0.01078033447265625, -0.285888671875, 0.3837890625, 0.304443359375, 0.7060546875, -0.42138671875, 0.42431640625, 2.2109375, 1.2705078125, -1.146484375, -0.215576171875, -0.55859375, 0.06903076171875, -0.0102386474609375, -1.2646484375, -0.130859375, 0.0086517333984375, 0.006351470947265625, -0.66259765625, 0.0400390625, 0.276611328125, 5.469322204589844E-4, 0.08740234375, -0.38330078125, -0.1827392578125, 0.94921875, -0.5908203125, 0.62451171875, 1.09765625, -0.379638671875, -0.476318359375, -0.422607421875, 0.5146484375, -0.179931640625, -0.310302734375, 0.21337890625, 0.34814453125, 0.0039215087890625, 0.211181640625, 0.5322265625, 0.092529296875, 0.5791015625, -0.29931640625, -0.51123046875, 0.1314697265625, 0.154296875, -0.2342529296875, 0.12298583984375, 0.30322265625, -0.265380859375, -1.0146484375, -0.32861328125, 0.01398468017578125, 1.0029296875, 0.395751953125, 0.40673828125, -0.184814453125, 0.177490234375, -0.7314453125, -0.30078125, -0.2244873046875, 0.13623046875, 0.7978515625, 0.146240234375, -0.826171875, 0.25244140625, -0.52978515625, 0.65185546875, -0.25048828125, 0.0377197265625, 0.365478515625, 0.544921875, 0.409423828125, -0.141357421875, 0.59130859375, 0.57421875, -0.56005859375, 1.1865234375, 0.71240234375, -0.1468505859375, -0.2279052734375, 0.367431640625, 0.26416015625, 0.80908203125, 0.69287109375, 1.0654296875, -0.07086181640625, -1.1025390625, -0.3583984375, -0.5537109375, 0.591796875, -0.09991455078125, 1.1572265625, -0.342041015625, 0.296142578125, 0.230224609375, -1.125, 0.305419921875, -0.1448974609375, 0.169189453125, -0.177490234375, 0.361572265625, -0.1947021484375, -0.223876953125, -0.43505859375, 0.2305908203125, 0.047332763671875, 0.461669921875, -0.310791015625, 0.034912109375, -0.238037109375, 0.5546875, -0.08502197265625, 0.0226898193359375, -0.55078125, -0.370849609375, -0.45654296875, -0.02679443359375, -1.0810546875, -0.1943359375, -0.428466796875, 1.337890625, -0.68115234375, -0.642578125, -0.3291015625, -0.99658203125, -0.1595458984375, -0.356201171875, -0.58154296875, 0.25048828125, -0.11871337890625, -0.22509765625, -0.7099609375, 0.2235107421875, 0.216552734375, -0.50341796875, -0.293212890625, 0.23779296875, -0.546875, -0.07373046875, -0.568359375, 0.70654296875, 0.242431640625, 0.93505859375, -1.302734375, 1.2490234375, -0.1961669921875, 0.1910400390625, -0.2335205078125, -0.2763671875, -0.30517578125, -0.378662109375, -0.64794921875, 0.57763671875, -0.00982666015625, -0.66845703125, 0.681640625, 0.452880859375, 0.2236328125, -0.26123046875, -0.0072021484375, 0.253662109375, -0.75390625, 0.70751953125, 0.92138671875, -0.60302734375, 0.09857177734375, -0.0166473388671875, -0.666015625, 0.3388671875, -0.544921875, -0.80078125, -0.64306640625, -0.70068359375, -0.25390625, 0.55322265625, -0.61767578125, -0.08984375, -0.3681640625, -0.34521484375, -0.66748046875, 0.1346435546875, 0.9306640625, 0.088623046875, 0.2066650390625, -0.10009765625, 0.7333984375, -0.046234130859375, 0.07440185546875, 0.27197265625, 1.1904296875, 0.108642578125, 0.32373046875, 0.439697265625, 0.1568603515625, -0.2880859375, 0.4580078125, 0.416259765625, 0.1756591796875, 0.03997802734375, 0.364990234375, -0.552734375, -0.65869140625, -16.78125, -0.146240234375, 0.166748046875, 0.1339111328125, -0.59619140625, -0.03106689453125, 0.71923828125, -0.2401123046875, -0.5703125, -0.0728759765625, -0.1485595703125, 0.6962890625, -0.62353515625, -1.0380859375, -0.00472259521484375, -0.1964111328125, -0.11431884765625, -1.142578125, -0.1304931640625, -0.88134765625, 0.362548828125, -0.381103515625, -0.2822265625, -0.0223236083984375, 0.295654296875, -0.429931640625, 0.345703125, -0.0296630859375, 0.66357421875, 0.3671875, -0.058807373046875, 0.471923828125, 0.1439208984375, 0.0716552734375, -0.391357421875, -0.7509765625, 0.381591796875, 0.376220703125, 0.1446533203125, -0.232666015625, -0.81884765625, 0.7255859375, -0.01451873779296875, -0.6025390625, -0.206298828125, 0.62353515625, -0.11553955078125, -0.50390625, -0.2406005859375, 0.431640625, 0.521484375, 1.6806640625, 0.60302734375, 0.2474365234375, -0.3212890625, -0.282470703125, 0.11651611328125, -0.1168212890625, 0.56787109375, 0.027374267578125, 0.1566162109375, 0.130859375, 0.3916015625, -0.25634765625, 0.62841796875, -1.279296875, -0.12347412109375, -0.09747314453125, 0.2183837890625, -0.6640625, 0.06768798828125, -0.409423828125, 0.11834716796875, -0.0156707763671875, 0.31787109375, 0.458740234375, 0.7578125, -0.347412109375, 0.16162109375, 0.37841796875, -0.363037109375, 0.0311431884765625)"


In [0]:
chunked_pages_pd = chunked_pages.toPandas()
chunked_pages_pd['embedding'] = chunked_pages_pd['embedding'].apply(lambda x: list(x))

## Postgres Connection
Now we are going to move our chunks into postgres. First we create our chunks table, next we read into Pandas and write chunks into Postgres. This gets around shuttling larger tables quite nicely, and we can bolster the connection and database for horizontal scalability (e.g. https://learn.microsoft.com/en-us/azure/databricks/oltp/query/notebook)

In [0]:
import psycopg2

from databricks.sdk import WorkspaceClient
import uuid

w = WorkspaceClient()

instance_name = "shm"

instance = w.database.get_database_instance(name=instance_name)
cred = w.database.generate_database_credential(request_id=str(uuid.uuid4()), instance_names=[instance_name])

def connect_to_pg():
  conn = psycopg2.connect(
      host=instance.read_write_dns,
      dbname="databricks_postgres",
      user=me.user_name,
      password=cred.token,
      sslmode="require"
  )
  return conn

In [0]:
table_ddl = """
CREATE TABLE IF NOT EXISTS parsed_pages (
    path TEXT,
    user_id TEXT,
    page_id TEXT,
    text TEXT,
    embedding VECTOR(1024)
);
"""

conn = connect_to_pg()
with conn.cursor() as cur:
    cur.execute(table_ddl)
    conn.commit()
conn.close()

Insert records into our PG table. We keep appending the table (could have individual user tables but this is unnecessary in my opinion).

In [0]:
conn = connect_to_pg()
with conn.cursor() as cur:
    records = chunked_pages_pd.to_records(index=False)
    data_tuples = list(records)
    insert_query = "INSERT INTO parsed_pages (path, user_id, page_id, text, embedding) VALUES (%s, %s, %s, %s, %s)"
    cur.executemany(insert_query, data_tuples)
    conn.commit()
conn.close()

Test the table insert went through by pulling the first row. We now have a postgres table with embeddings ready to go

In [0]:
conn = connect_to_pg()
with conn.cursor() as cur:
    cur.execute("SELECT * FROM parsed_pages")
    print(cur.fetchone())
    conn.commit()
conn.close()

('dbfs:/Volumes/shm/default/raw_pdfs/73eb2c4f_3c424fe06ddfe8ce_1755574069.pdf', '7873535765378608', '0', 'Content: [# Sample Company\n## Income Statement (Service)\nFor the Year Ended September 30, 2021\n\n| Category | Amount |\n| --- | --- |\n| Service revenue | $2,750 |\n| Operating Expenses: |  |\n| Depreciation expense | 100 |\n| Wages expenses | 1,200 |\n| Supplies expenses | 60 |\n| Total operating expenses | 1,360 |\n| Operating Income | 1,390 |\n| Other Item: |  |\n| Interest expense | 40 |\n| Pretax income | 1,350 |\n| Income tax expense | 405 |\n| Net income | $945 |\n\n# Sample Company\n## Statement of Retained Earnings\nFor the Year Ended September 30, 2021\n\n| Retained Earnings | Amount |\n| --- | --- |\n| Balance, October 1, 2020 | $820 |\n| Net income | 945 |\n| Dividends declared | (500) |\n| Balance, September 30, 2021 | $1,265 |]\nFooter: []\nHeader: []\nID: [0]\nPage Number: []', '[-0.49780273,-0.27978516,-0.52441406,-0.3840332,0.28735352,0.31469727,0.3779297,-0.440

## Test Vector Search
Here is the pattern for using vector search in our app (and via LangGraph too)

In [0]:
from databricks_langchain import DatabricksEmbeddings
emb = DatabricksEmbeddings(endpoint="databricks-gte-large-en")
vect = emb.embed_documents(["hello world"])[0]

In [0]:
conn = connect_to_pg()
with conn.cursor() as cur:
    search_query = f"""
        SELECT path, user_id, page_id, text, embedding,
                (embedding <=> ARRAY{str(vect)}::vector) AS distance
        FROM parsed_pages
        ORDER BY distance ASC
        LIMIT 1
    """
    cur.execute(search_query)
    results = cur.fetchall()

In [0]:
results

[('dbfs:/Volumes/shm/default/raw_pdfs/73eb2c4f_3c424fe06ddfe8ce_1755574069.pdf',
  '7873535765378608',
  '0',
  'Content: [# Sample Company\n## Income Statement (Service)\nFor the Year Ended September 30, 2021\n\n| Category | Amount |\n| --- | --- |\n| Service revenue | $2,750 |\n| Operating Expenses: |  |\n| Depreciation expense | 100 |\n| Wages expenses | 1,200 |\n| Supplies expenses | 60 |\n| Total operating expenses | 1,360 |\n| Operating Income | 1,390 |\n| Other Item: |  |\n| Interest expense | 40 |\n| Pretax income | 1,350 |\n| Income tax expense | 405 |\n| Net income | $945 |\n\n# Sample Company\n## Statement of Retained Earnings\nFor the Year Ended September 30, 2021\n\n| Retained Earnings | Amount |\n| --- | --- |\n| Balance, October 1, 2020 | $820 |\n| Net income | 945 |\n| Dividends declared | (500) |\n| Balance, September 30, 2021 | $1,265 |]\nFooter: []\nHeader: []\nID: [0]\nPage Number: []',
  '[-0.49780273,-0.27978516,-0.52441406,-0.3840332,0.28735352,0.31469727,0.37792