<div id="singlestore-header" style="display: flex; background-color: rgba(209, 153, 255, 0.25); padding: 5px;">
    <div id="icon-image" style="width: 90px; height: 90px;">
        <img width="100%" height="100%" src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/header-icons/vector-circle.png" />
    </div>
    <div id="text" style="padding: 5px; margin-left: 10px;">
        <div id="badge" style="display: inline-block; background-color: rgba(0, 0, 0, 0.15); border-radius: 4px; padding: 4px 8px; align-items: center; margin-top: 6px; margin-bottom: -2px; font-size: 80%">SingleStore Notebooks</div>
        <h1 style="font-weight: 500; margin: 8px 0 0 4px;">Launch Open-Source Apps with LangChain</h1>
    </div>
</div>

In [1]:
%%writefile requirements.txt
langchain==0.0.339
openai==1.3.3
pdf2image==1.17.0
pdfminer==20191125
pdfminer.six==20221105
pillow_heif==0.13.1
tabulate==0.9.0
tiktoken==0.5.1
unstructured==0.11.0
opencv-contrib-python-headless==4.8.1.78
unstructured.pytesseract==0.3.12
unstructured.inference==0.7.15

Writing requirements.txt


In [2]:
%conda install -y --quiet poppler tesseract

In [3]:
%pip install -r requirements.txt --quiet

In [4]:
import nltk
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger_eng')

[nltk_data] Downloading package punkt_tab to /home/jovyan/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger_eng.zip.


True

In [5]:
from langchain.document_loaders import OnlinePDFLoader

loader = OnlinePDFLoader("http://leavcom.com/pdf/DBpdf.pdf")

data = loader.load()

[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [6]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

print (f"You have {len(data)} document(s) in your data")
print (f"There are {len(data[0].page_content)} characters in your document")

You have 1 document(s) in your data
There are 13040 characters in your document


In [7]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=0)
texts = text_splitter.split_documents(data)

print (f"You have {len(texts)} pages")

You have 7 pages


In [8]:
%%sql
DROP DATABASE IF EXISTS pdf_db;
CREATE DATABASE IF NOT EXISTS pdf_db;

<div class="alert alert-block alert-warning">
    <b class="fa fa-solid fa-exclamation-circle"></b>
    <div>
        <p><b>Action Required</b></p>
        <p>Make sure to select the <tt>pdf_db</tt> database from the drop-down menu at the top of this notebook. It updates the <tt>connection_url</tt> to connect to that database.</p>
    </div>
</div>

In [9]:
%%sql
DROP TABLE IF EXISTS pdf_docs1;
CREATE TABLE IF NOT EXISTS pdf_docs1 (
    id INT PRIMARY KEY,
    content TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
    vector BLOB
);

In [10]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

OpenAI API Key: ········


In [11]:
import json
import sqlalchemy as sa
from langchain.embeddings import OpenAIEmbeddings
from singlestoredb import create_engine

conn = create_engine().connect()

embedder = OpenAIEmbeddings()

# Fetch all embeddings in one call
embeddings = embedder.embed_documents([doc.page_content for doc in texts])

# Build query parameters
params = []
for i, (text_content, embedding) in enumerate(zip(texts, embeddings)):
    params.append(dict(id=i+1, content=text_content, vector=json.dumps(embedding)))

stmt = sa.text("""
    INSERT INTO pdf_docs1 (
        id,
        content,
        vector
    )
    VALUES (
        :id,
        :content,
        JSON_ARRAY_PACK_F32(:vector)
    )
""")

conn.execute(stmt, params)

<sqlalchemy.engine.cursor.CursorResult at 0x7ec535dc5e80>

In [12]:
%%sql
SELECT JSON_ARRAY_UNPACK_F32(vector) as vector
FROM pdf_docs1
LIMIT 1;

vector
"[-0.00173446606, -0.0330541134, 0.0240638237, -0.0308199991, -0.0242253263, 0.00614717649, -0.00999967661, -0.00549780345, -0.0261364356, -0.015719546, 0.0131691564, 0.0244944971, 0.0209683664, 0.000776808651, -0.00229972368, 0.0339154564, 0.0147639904, 0.00613708282, 0.0324888565, -0.00849905144, 0.00365398685, 0.00537331216, -0.012738484, -0.0125904409, -0.0170519371, 0.0122068729, 0.0115339467, -0.00658457819, 0.0208472386, -0.00500320271, 0.00118014344, 0.00140305015, -0.00593520515, 0.00775210466, -0.015894508, 0.0137478728, -0.0162578877, -0.00697823986, 0.0151812052, 0.0200801045, 0.0148985758, 0.00630194973, -0.00863363687, -0.00962956715, 0.00466674007, 0.0340769589, -0.00117341417, -0.0150735369, -0.00915178936, -0.0135796415, -0.00701861549, 0.0642778724, -0.0348844677, -0.00188587443, 0.000198933674, 0.0034958492, -0.00262441044, 0.0129067153, -0.00304498896, 0.00136940391, 0.00981798675, 0.0219239201, -0.027509205, -0.00956900418, -0.0427173264, 0.0157733802, 0.00932675041, -1.87420337e-05, 0.00924599916, -0.0183708724, 0.0143602351, 0.0448437706, 0.010638956, 0.0165001396, -0.0121463099, -0.0309815016, 0.00334612327, 0.003339394, 0.00653074449, -0.0181824528, 0.0102082836, -0.0438209251, -0.0240503661, 0.00463309372, 0.0189092122, 0.0201339368, -0.027778374, 0.0115474053, 0.00497628562, 0.0144948205, 0.0164059307, 0.023552401, -9.88359679e-05, 0.0103697851, -0.0172134396, 0.0339154564, 0.00387941697, 0.0175902788, -0.0211298671, -0.016338639, -0.0133912219, 0.00359005877, -0.0139026456, -0.0116416151, -0.00431008916, 0.0282090474, 0.0280206278, -0.00851923972, 0.0135863703, -0.0186938774, -0.0160290916, 0.037737675, 0.00740891229, 0.00571650406, 0.0093536675, -0.0210356582, 0.00443794532, -0.018855378, -0.0049998383, -0.0179805756, 0.0225699283, 0.0125971697, 0.021304829, -0.0313045084, -0.00860671978, 0.00727432733, -0.0137209557, 0.00938058458, 0.0210625753, 0.00550453225, 0.00367753929, 0.0262306444, 0.00432354771, 0.0133306589, 0.00310555217, 0.0248982515, -0.0266209431, 0.013148969, -0.00649373326, -0.0628243536, 0.0204973184, 0.0060193208, -0.00142996723, 0.00770499976, 0.00554490788, 0.0122068729, 0.0186938774, 0.0183439553, 0.00565594062, -0.0137747899, 0.0204165671, -0.000500488561, -0.011937703, 0.0080549214, 0.0216951258, -0.00920562353, -0.0169846453, 0.0318428464, -0.00312573998, -0.0166750997, 0.00754349772, 0.0194879286, -0.0102755763, 0.0317351818, 0.0219777543, 0.0158675909, 0.0171057712, 0.0217085835, 0.00210962212, -0.0155984191, 0.00423270278, 0.0274419114, -0.0181555357, 0.00899028778, 0.0140641481, 0.0137074972, 0.00387941697, 0.0231890213, -0.0158002973, -0.0272669513, 0.0228794757, 0.000235944593, 0.00909795612, 0.028666636, -0.00489216996, -0.00151744753, 0.0121866846, 0.00949498173, -0.00806837995, 0.00376165495, 0.0239696149, 0.00641971175, 0.0114330081, 0.00401400216, -0.628781796, 0.00195821398, -0.0245214142, -0.0403486267, -0.00505703688, 0.000611100695, 0.0156522542, 0.0114464667, -0.00484842993, 0.0151273711, -0.0175364446, 0.000364431355, 0.0178863648, -0.0206722785, -0.00861344952, -0.0109013971, 0.0113724452, -0.0117694708, -0.0030096604, -0.0057770675, 0.00374483177, 0.0043201833, -0.00198176643, -0.00864036661, -0.015544585, 0.00614717649, -0.00216009165, 0.00761078997, 0.00497628562, 0.0163251795, 0.00700515695, 0.0178998243, 0.00213317457, 0.00159735745, 0.0466202945, -0.00185391039, -0.016338639, 0.0174556933, 0.0242118686, 0.0311430041, -0.00342687429, -0.00862690806, 0.00625820924, -0.00345210917, -0.0135729127, 0.0091652479, 0.0279667936, -0.0222334657, -0.00386595842, -0.0133643048, 0.0158406738, 0.0182901211, -0.0244810376, -0.0170653965, 0.0127115669, -0.000511844177, 0.0128999865, -0.0230275188, 0.0161367599, 0.0106120389, 0.00754349772, 0.0166078061, -0.0126981083, -0.0216278322, -0.0397833697, 0.015571502, -0.0225295536, 0.00904412195, 0.0306854136, -0.0120251831, -0.0171865225, 0.0309545845, 0.00473739719, -0.0137546025, 0.00736853713, -0.00532620726, 0.0236869864, 0.00426971354, -0.0233774409, 0.0248444173, 0.0160021745, -0.025503885, -0.0147370743, 0.00975069404, 0.0334040336, 0.00427980768, -0.0217220429, -0.0227448903, -0.000965648447, -0.0165001396, 0.00756368553, 0.014777449, -0.00475422014, -0.0014568842, 0.0234581903, 0.0359880663, -0.0307661649, -0.00238383934, 0.00962956715, -0.0218431689, -0.00612025941, -0.0186804179, 0.0109821483, 0.0280475449, -0.00946133584, 0.0380337611, 0.00660813088, 0.00362034049, 0.0282628816, -0.0173345655, 0.0023804747, -0.00821642391, 0.000260758708, 0.00592847588, -0.00498637976, -0.0193937197, 0.02446758, 0.00288516912, -0.00570977479, -0.0360688195, 0.0103226807, -0.0133912219, 0.0228256416, -0.0433902517, -0.0188688375, -0.00394670991, 0.0128461523, 0.00940750167, 0.0137276854, -0.00757714408, 0.00184213428, -0.0234312732, 0.0113993622, -0.0189899635, -0.000705730869, -0.000511844177, 0.0101477196, -0.0136132874, -0.0101208026, -0.0248578768, -0.00556173129, -0.021425955, 0.0110965455, -0.0135527244, -0.00434373552, -0.0304700769, -0.0327041931, 0.0166616403, -0.0189361293, -0.00278086565, -0.0053396658, -0.00255207089, -0.023875406, 0.0168904345, -0.00115659111, 0.00240907399, -0.0228121821, -0.0201877709, -0.0325696096, 7.74916043e-05, -0.00611689501, 0.0112176724, -0.01974364, 0.00225261878, -0.0146428645, -0.0532418862, -0.0141448993, 0.0190841742, -0.0213586632, -0.0298779029, -0.00270516146, -0.0136334756, 0.030200908, -0.0152754141, -0.0102823051, 0.00302984822, -0.0203223564, -0.000356019766, 0.0457051173, -0.0152215799, 0.0118233049, -0.010315951, 0.0107735405, -0.00714647118, 0.00876149256, -0.0191783831, 0.0162713453, 0.0101611782, -0.00861344952, 0.00831063278, 0.0209683664, 0.0472663045, -0.0143198594, -0.00328555983, -0.0118367635, -0.00522526819, -0.0142794847, 0.000162763914, 0.00510750618, 0.0285589695, 0.0374685042, 0.00892972387, 0.00369772688, 0.0105043706, 0.0175902788, 0.00139632088, -0.0105851218, 0.00478786649, 0.00651055668, 0.0167962257, 0.0201339368, 0.00288685132, 0.00971704721, -0.0183035787, 0.00619428139, 0.0464049578, -0.00705899112, 0.0122472486, -0.0220450461, -0.00942096021, -0.00598567445, 0.0105851218, 0.0404024608, 0.00179334707, 0.0206722785, -0.0142391091, -0.00210962212, 0.0161502194, 0.0135123488, -0.0227179732, -0.0102957636, 0.00100686518, 0.0171057712, 0.00195821398, 0.0195283052, 0.0178459901, 0.0359611511, -0.0402409583, 0.0288819727, 0.00109855121, -0.0070859082, 0.033188697, 0.0135931, -0.0121261217, -0.00349248457, -9.5155905e-05, 0.0241849516, 0.00860671978, -0.0300663225, 0.0125971697, -0.00955554564, 0.00478113722, -0.0125567941, 0.0162040535, 0.0146159474, -0.00886243209, 0.014185275, 0.00561556546, 0.0353689753, 0.00462972885, 0.00127519423, -0.00332593545, 0.0135392658, -0.00904412195, -0.00201541279, 0.0158675909, 0.00057619263, 0.0155984191, 0.0105851218, 0.00569631625, -0.00820969418, -0.033807788, -0.00763770705, -0.00164530345, -0.0127317552, 0.0145621132, -0.0149120344, -0.0139564797, 0.0176575705, 0.0289088897, -0.00665187091, -0.0316005945, -0.0116079692, 0.00523199746, -0.00914506055, -0.0168500599, -0.0186804179, 0.0157330055, -0.0126308165, -0.0058241724, 0.00262945727, 0.0195552222, 0.00599913299, 0.0193668026, 0.0146024888, 0.00989873707, 0.0369570814, -0.0268228203, -0.0254769679, -0.00172605459, 0.0209279899, 1.67179987e-05, -0.0115474053, -0.0219373796, 0.0500925928, -0.0310084186, 0.00346893212, -0.020658819, -0.0130009251, -0.0124760428, -0.0240772832, -0.0143871522, -0.0191245489, 0.0274553709, -0.000371581176, -0.017199982, 0.00835100841, -0.00312742242, 0.0276168734, 0.0106254974, 0.00768481195, -0.0265401918, -0.0130884061, 0.00682683149, 0.0675617382, -0.0265805665, -0.0164328478, 0.0341846272, -0.00703880331, -0.0120992046, -0.0124491258, -0.0156657118, 0.00364725757, 0.0209010728, -0.013889187, -0.00926618744, 0.0336193703, -0.00136435695, 0.0121530388, 0.00485852361, 0.00560547132, -0.0357188955, 0.00491235778, -0.0189092122, -0.0316813476, -0.0144948205, 0.00205578818, 0.0318966806, 0.0385721028, 0.00138706819, 0.0357458144, 0.0453821123, 0.0154234581, -0.0185996667, -0.00600249739, 0.0135998288, -0.00726086879, 0.00855288561, 0.00436055847, 0.0098449029, 0.00989873707, -0.00987181999, -0.00618755212, 0.00826352835, 0.0201070216, 0.0181017015, 0.0173883997, -0.0126038995, -0.00765789486, -0.0125904409, -0.0112580471, 0.0253692996, -0.0174287762, -0.0228794757, -0.0116416151, -0.00138202123, -0.0172134396, 0.0121261217, -0.0219239201, 0.0132700959, -0.022018129, -0.0120184533, -0.011379174, -0.0381145142, -0.00518152816, -0.011318611, 0.0547761545, -0.00988527853, 0.0201743133, 0.000517311622, -0.0228794757, 0.00307022361, -0.031762097, -0.00556846056, 0.0188015439, -0.0242118686, -0.0209010728, -0.00589819392, 0.0248713344, 0.00380539498, 0.0112176724, -0.00894318242, 0.00866728276, -0.000517311622, -0.0118838688, -0.0144275278, -0.0140910652, -0.0388951078, -0.0262306444, 0.014333318, -0.0127182966, -0.00334612327, -0.0109821483, 0.00182699342, 0.00239393325, -0.0357727297, 0.0286128018, -0.00583426608, 0.0265536495, 0.00408465974, 0.0213990379, 0.0216009151, 0.00422933791, -0.0299317371, -0.0259749331, 0.0249655452, -0.0150466198, -0.0412368886, 0.01087448, 0.0207261126, -0.0220046714, 0.0137613313, -0.00399717921, 0.00400727289, -0.00366744539, -0.0280475449, 0.0166481826, -0.015571502, 0.0057770675, -0.00538340583, 0.00373473787, 0.0351536386, 0.00400390849, 0.00333771156, -0.0136805801, -0.0153023312, 0.0250059199, -0.0111840256, -0.0252212565, 0.0163251795, -0.00385922915, -0.0237677377, -0.0219912138, -0.016338639, 0.00876149256, 0.0199051425, 0.00273207854, -0.0143467765, 0.00448168535, -0.00225598342, 0.00718684681, 0.0024763667, -0.0216816664, -0.00992565416, 0.00201204815, 0.00789341982, 0.0175633617, -0.0206991956, 0.0137074972, -0.022785265, -0.00787323155, -0.00130968168, 0.0316275135, 0.0129269036, -0.0293933973, -0.0145351961, 0.00451196684, -0.000599745079, 0.0156926289, -0.0423674025, -0.00285320519, -0.0239696149, 0.0367417447, 0.00724068098, 0.0197167229, -0.0109686898, 0.0271323659, 0.00270516146, -0.00038062362, -0.0140103139, -0.0043706526, -0.023996532, -0.0191110913, -0.0106591433, -0.01702502, 0.0120453704, -0.0217355005, 0.0134315975, 0.00336462865, 0.00347902603, -0.0151273711, -0.00965648424, -0.0318428464, -0.0248175003, -0.00959592126, -0.00482487725, -0.00919889472, -0.00514788181, -0.0115339467, 0.00476767868, 0.0306584965, 0.00858653244, 0.00592847588, -0.0325965248, 0.0136267459, 0.0206857361, 0.0125971697, -0.0169711858, -0.000657364319, -0.00696478132, 0.0140641481, -0.0260153096, -0.0247771256, 0.0068436549, 0.00765116559, 0.0298509859, 0.0293395631, -0.00335621717, 0.00866728276, 0.0178998243, -0.0115474053, -0.0171730649, -0.0127721308, -0.0176306535, 0.0124087501, -0.0112782354, -0.0310622528, -0.023875406, -0.0149254929, 0.0328387767, 0.00482824212, 0.00772518758, -0.0186938774, -0.0292318948, 0.0488274917, -0.0305777453, 0.0384375155, -0.0102486592, 0.0285858847, 0.00845194701, 0.00269674975, -0.0236869864, -0.0126644624, 0.0141045237, -0.0166616403, 0.0394603647, 0.0274149943, -0.00827698596, -0.00221224339, 0.00999294687, -0.0212375354, -0.0212375354, -0.0376569219, 0.00238383934, 0.00633896049, -0.00486188848, -0.030227825, 0.00188419211, -0.00606979011, 0.0339423716, -0.0150735369, -0.00194643775, -0.00789341982, -0.0275899563, -0.0206722785, 0.0356112272, -0.0190976318, 0.0110763572, -0.0156522542, -0.0229871422, -0.0152350385, -0.0254635103, 0.0243464541, 0.00220214948, 0.0114195496, 0.00969685987, 0.00971704721, 0.00524882087, 0.0219239201, 0.00711282529, -0.00656102598, -0.00711955456, -0.0143871522, 0.0238081124, 0.00276404247, -0.01702502, 0.0278052911, 0.0125096897, -0.00849232264, -0.00765789486, -0.0123414584, -0.0190841742, -0.000753256259, 0.00681000855, 0.00207261113, 0.00293732085, 0.02310827, 0.0247502085, -0.0288550556, -0.0333501995, -0.0119040562, -0.000933684467, 0.00768481195, -0.00919216499, -0.0190572571, 0.00170334324, -0.010019864, 0.00707244966, -0.0249386281, -0.0234447327, 0.0120386416, 0.00138454465, -0.0220450461, -0.00460954104, 0.0243733693, 0.00127603544, -0.0326772779, 0.0146966986, -0.00347566139, -0.0103832437, 0.025948016, -0.0181689952, 0.0112782354, -0.0171730649, 0.00276572467, -0.0117963878, 0.0111167328, 0.00884224381, 0.0245214142, 0.0152215799, 0.0284513012, -0.0234581903, -0.0279937107, 0.0305508282, 0.000773023465, -0.0107668117, -0.0150062442, -0.00632550195, 0.0161098428, 0.0057265982, -0.00913833082, -0.00188250979, -0.0417483114, -0.00375829032, -0.0338616222, 0.0434171669, -0.0243868288, 0.00990546681, -0.0192187577, 0.0111369211, -0.0105111003, 0.0289627239, 0.000464739336, 0.0352613069, 0.0133306589, -0.000909290917, 0.0151139125, 0.00710609602, 0.000384829415, -0.0234581903, 0.000265805662, -0.0115743224, -0.0451936908, 0.00250496599, 0.00126425922, 0.0461088717, 0.0157868396, -0.0392988622, -0.0171057712, -0.00319303269, -0.0481007323, 0.00573332747, 0.0113253398, 0.0250462964, 0.0337808691, 0.00742237084, 0.0117290951, 0.0181420781, -0.000552640238, -0.00100770639, -0.0538609773, -0.00726759806, -0.0119107859, -0.00788669009, 0.0322735198, -0.0131893447, -0.0345076323, -0.00496282708, -0.00417886861, -0.0037852074, 0.014212192, 0.0210222006, 0.0161771365, -0.00550116785, -0.00730124442, -0.0130076548, -0.0112244012, 0.0300932396, 0.00995257124, -0.0445207655, 0.0135258073, 0.0254500508, 0.0210222006, 0.0210760329, -0.00199522497, -0.00640288834, 0.0138218943, 0.0122203315, 0.00612025941, 0.0078530442, -0.00072928326, 0.0227045137, 0.0117762005, 0.00836446695, 0.000479459588, 0.0496619195, -0.0120924758, -0.0201608539, 0.0190034229, -0.0104168905, -0.0245079547, 0.00122388371, -0.0236331522, -0.0150466198, 0.0169981029, -0.0396487825, -0.0104707247, -0.00151155947, 0.0101275323, 0.00185727503, -0.00833082013, -0.0207126532, 0.0010682696, 0.0119242445, -0.010551475, 0.0132768247, 0.000332887954, 0.00406783633, -0.0134921614, -0.00478786649, 0.00100265944, -0.00137781538, -0.0198782254, 0.00817604829, -0.00510077691, -0.0237946548, -0.00398708507, -0.0196090564, -0.010638956, 0.000531190715, 0.202093065, 0.0173749421, -0.0130884061, 0.0306046624, 0.0128326938, -0.00825679954, -1.21770645e-05, -0.0212106183, 0.0079539828, 0.0161367599, -0.0116483448, 0.0149793271, -0.0184112471, -0.00179502938, 0.036230322, -0.0106658731, -0.0298509859, -0.0178325307, -0.0139699383, 0.0274957456, -0.00466674007, -0.0221392568, -0.0247367509, -0.0188150033, 0.0376838408, -0.0156522542, -0.0159483403, 0.0106053092, 0.0218162518, -0.00343528599, 0.00741564156, -0.0209683664, 0.000463057018, -0.0102015538, 0.0104976417, -0.0126442742, 0.0170115624, -0.0217893347, 0.00339322817, 0.00386259379, 0.023229396, 0.00798762869, 0.0256250128, 0.00527237309, 0.0292588118, 0.022785265, -0.00373473787, 0.0142391091, 0.00560883619, -0.0166616403, -0.025234716, 0.0140776066, 0.0234985668, 0.016042551, 0.00380875962, 0.0251674224, 0.0166078061, 0.0134787029, -0.0174287762, -0.00446822681, -0.0145217376, 0.0156791713, -0.00493254559, -0.0128865279, -0.0100736981, 0.00420578569, -0.0165674314, 0.032946445, -0.00515124621, -0.0264459811, 0.0141718164, -0.0177786984, -0.0186938774, -0.0202416051, -0.0197167229, -0.0105447462, 0.0124020213, 0.0225026365, -0.0139833968, 0.00677299779, -0.00139800319, -0.0153830824, -0.0258941818, -0.00390633428, -0.0108946674, -0.0124827726, 0.00648700399, -0.0276437886, 0.00936712604, -0.00536994729, -0.00682683149, -0.0183170382, -0.0101208026, -0.0136402044, 0.0029995665, -0.00580061972, -0.00611689501, 0.0173076503, -0.0277245399, -0.0185323749, -0.0143198594, 0.0754215121, 0.0434710011, 0.00164530345, 0.0127923181, -0.00186568662, -0.00652065035, 0.0118704103, 0.0257730559, 0.00825679954, 0.0177652389, -0.031439092, -0.000967330765, -0.0118165761, 0.00289526302, 0.00729451515, 0.00344033283, -0.00455234246, 0.00816931948, -0.0193129685, -0.000862186134, -0.00755695626, 0.00301302504, 0.00454224879, -0.031439092, 0.0115003008, 0.00901047513, 0.0101140738, -0.0245214142, -0.0175229851, 0.011998266, -0.0452206098, -0.000312069315, -0.000691431167, 0.0189226717, -0.00107415766, 0.0302816592, -0.0032418198, -0.00140220905, 0.0075704148, 0.000611521245, 0.0234581903, 0.0166885573, 0.0231890213, 0.0146966986, 0.000815922453, -0.00641634688, 0.00443121605, 0.00187241589, -0.0152619556, -0.010490912, -0.00355304801, 0.00105481106, 0.0180209503, -0.00102368835, -0.0178998243, -0.0168231428, -0.0205646101, 0.00516806962, 0.0143602351, -0.0515461117, 0.014185275, 0.0430941619, -0.00728105661, -0.0084855929, -0.0146966986, -0.17011562, 0.0367148258, 0.0039298865, -0.0325426906, 0.0289358068, -0.0169577282, 0.00763770705, 0.00676290365, 0.00742910011, -0.00494263927, 0.00201541279, -0.010638956, -0.0461357869, -0.00852596853, 0.0281013791, 0.00950171147, 0.0133844931, 0.00117930234, 0.0357727297, 0.0112849642, 0.0162309706, -0.00328051299, 0.0150331613, -0.0235389415, 0.010194825, 0.0179805756, -0.00769154122, 0.0323004387, -0.0190572571, -0.0111772968, -0.00263282191, -0.00219878484, 0.0251808818, 0.00438074628, 0.00394334504, 0.0104370778, -0.0131355105, -8.39054264e-05, -0.00793379545, 0.0166750997, 0.000818445929, 0.0253154654, 4.83402509e-05, 0.0294203144, 0.0119915362, 0.0251539648, 0.0104505364, -0.0285320524, 0.0104572661, -0.0080549214, 0.00959592126, -0.0308469161, 0.0210894924, -0.00179334707, -0.00155193498, 0.0108004576, 0.00978433993, 0.0334040336, -0.00870092958, -0.0134921614, -0.000246459065, -0.0296894833, 0.00286834594, -0.00270179682, -0.0140910652, -0.0327580273, 0.0125433356, 0.0131085934, -0.0415598936, 0.0112176724, -0.00773864612, 0.00736853713, 0.0124020213, 0.00273376075, -0.0197705571, 0.000547172735, 0.0160829257, 0.00647691032, 0.0261633527, 0.00878168084, -0.00369436224, -0.00731470296, -0.000515208812, 0.0213855803, -0.00907103904, -0.000517732231, 0.0204704013, 0.0232563131, 0.0214124974, 0.00104051142, 0.0306315795, -0.0502540953, -0.00894318242, -0.00178830011, -0.00287843985, 0.00919889472, 0.00990546681, 0.00547088636, 0.017199982, -0.00253356528, 0.0224622618, 0.0124222087, 0.00263786875, -0.0075704148, 0.0271996576, -0.00212139823, -0.00859999098, -0.0224353448, 0.0488544069, 0.00693113497, -0.00672589289, 0.0371724181, 0.0162040535, 0.0113859037, -0.0163251795, 0.0219508372, 0.00685038418, -0.0122136017, 0.00263282191, -0.0132902833, 0.025059754, 0.0021769146, -0.0192725919, -0.0130816763, -0.00677972706, -0.0449245237, -0.111544169, -0.0226910561, 0.0292318948, 0.0240369067, -0.011554135, 0.019945519, -0.00520844525, 0.0405639634, -0.0395411141, 0.0221796315, -0.00949498173, -0.00740218302, -0.0045153317, -0.00744255865, 0.0383836851, 0.012381834, -0.000898355851, -0.0189765058, -0.00689748907, 0.0357727297, 0.00600586226, 0.00214999774, 0.00394670991, -0.0114599252, -0.0194340944, -0.0304431599, -0.0334040336, 0.0110494401, -0.00280105346, -0.00460281176, -0.00171427836, -0.0154369166, 0.0149658686, -0.018559292, -0.0202416051, -0.0112042138, -0.0375223383, 0.0159617998, 0.0186669603, -0.0352882259, 0.0232563131, 0.00995257124, 0.00585445389, -0.0128192352, -0.0177786984, -0.0210087411, -0.00357660023, -0.00798762869, 0.0318697654, -0.00878840964, -0.0230140593, -0.0343192108, -0.00593184028, -0.00372464396, 0.00744255865, 0.001621751, 0.0141718164, -0.00262945727, 0.000337934878, 0.0126038995, -0.0133441174, 0.00424279645, -0.00851923972, 0.0242118686, -0.0144006107, -0.00368763297, -0.00988527853, -0.0157599226, -0.0167423915, -0.0202550646, -0.0186804179, 0.00498974416, -0.0336462855, -0.0118098464, 0.00275563076, 0.0300124884, -0.0283167157, -0.0299317371, -0.00894318242, -0.0154099995, -0.0257595964, -0.026863195, -0.0224353448, -0.0276707057, 0.0380068459, 0.0280206278, -0.0040274607, -0.0027690893, 0.021870086, -0.0233236067, -0.0224891771, 0.00756368553, 0.0148178246, -0.015248497, 0.0151677467, 0.0157464631, -0.00683356076, -0.00242253253, 0.0164732225, -0.0139161041, -0.0256923046, 0.0191380084, -0.0420982353, 0.0212644525, -0.000912655552, 0.00455234246, -0.00173614838, -0.00767135341, -0.0135258073, -0.0371724181, 0.0184516236, 0.0153023312, -0.0332425311, -0.00806165114, -0.0176171958, -0.00197167252, -0.0432825834, -0.0237408206, 0.0213317461, 0.0132162618, -0.00201373035, 0.00975069404, 0.00376838422, 0.0220046714, 0.00901047513, 0.019622514, -0.0111974841, -0.000432775356, -0.0200128108, 0.00216513849, -0.0101611782, -0.00734162005, 0.0134383272, -0.0269304886, -0.0167020168, 0.00693786424, -0.0355304778, -0.0262979381, -0.00748293428, 0.0223142169, 0.0304700769, 0.0443323478, -0.034669131, -0.0514384434, 0.00779248029, -0.0101342611, -0.0189899635, -0.0197705571, -0.000243725299, -0.00912487321, -0.00675280998, -0.0381145142, 0.0171596054, 0.0236466099, -0.00616736431, -0.0153427068, -0.0150869954, -0.00468019862, 0.00611353014, 0.0122203315, 0.0138353528, -0.00128024118, 0.0391911939, 0.0229063928, -0.0108610215, -0.00489889923, -0.00614381209, -0.0169981029, -0.0166750997, -0.00200531888, 0.0141448993, 0.0118569518, -0.00815586094, 0.00281114713, 0.0118636806, 0.00469029229, 0.0172538161, 0.000708674896, 0.0112916939, -0.00161754526, -0.0124962311, 0.0182228293, -0.00419232715, -0.0122607071, -0.00917197764, 0.0240099896, 0.014656323, 0.00722722244, -0.00997948833, 0.0168366022, 0.0131018646, 0.00682346709, -0.00791360717, 0.0195013881, -0.0297702346, -0.0030971407, -0.0281821303, 0.00457925955, -0.0188150033, 0.00869419985, -0.0110965455, 0.0207395703, 0.00952189881, -0.00938731339, -0.00374819641, -0.0238888636, -0.0101409908, 0.0306584965, -0.0367955789, -0.00740218302, -0.02174896, 0.0182093699, 0.000608577218, 0.00955554564, 0.0137680611, 0.00830390304, -0.0101140738, -0.0193398856, -0.0207126532, -0.0432287492, -0.00511423545, 0.0176844876, 0.0111974841, 0.0310353357, 0.00504021347, -0.0317082629, 0.0296894833, 0.0132364491, 0.0168096852, -0.0201339368, 0.000840316003, -0.0402409583, 0.0129874665, -0.0139161041, 0.0127115669, -0.0188150033, -0.013653663, -0.00624138629, -0.0194475539, 0.0246290825, 0.0254500508, 0.0397564508, 0.0159617998, -0.0162175111, 0.0143871522, -0.0191110913, 0.00523199746, -0.00233841687, 0.000240781243, -0.00305508287, -0.0382221825, 0.032327354, -0.0152888726, 0.0113657154, -0.00199186034, 0.0175902788, 0.00280778273, -0.00181858183, 0.0181555357, 0.00880859792, 0.00932002161, 0.0153023312, -0.0172672737, 0.000460112991, -0.00311228144, 0.000794052379, -0.0162848048, 0.0197705571, 0.00794725399, -0.00374819641, -0.0422866531, 0.0199858937, -0.00169072591, -0.0111907553, -0.000301975408, 4.12955596e-05, -0.014333318, 0.0030584475, -0.0325965248, 0.00726759806, -0.00479123136, 0.00673262216, 0.00769154122, -0.015396541, -0.0229736846, 0.00661822455, -0.00881532673, -0.00778575102, -0.00820296537, -0.0204300247]"


In [13]:
query_text = "Will object-oriented databases be commercially successful?"

query_embedding = embedder.embed_documents([query_text])[0]

stmt = sa.text("""
    SELECT
        content,
        DOT_PRODUCT_F32(JSON_ARRAY_PACK_F32(:embedding), vector) AS score
    FROM pdf_docs1
    ORDER BY score DESC
    LIMIT 1
""")

results = conn.execute(stmt, dict(embedding=json.dumps(query_embedding)))

for row in results:
    print(row[0])

page_content='are gaining in popularity and are ex- pected to outsell even relational data- bases by 2003. And OO databases (see the “OO Database Orientation” sidebar) are still minor players with solid but strictly niche markets. Sales of relational databases have grown considerably faster than the sales of OO databases, and annual worldwide RDBMS revenues are now about 50 times larger.\n\nRick Cattell, distinguished engineer at Sun Microsystems, indicated, “Object- oriented databases are doing just ﬁne, and the news of their demise is highly exag- gerated. While their market [share] isn’t as big, they continue to be used in areas like CAD (computer-aided design) and telecommunications, where RDBMSs are not well suited.”\n\nHowever, said Michael Stonebraker, chief technology ofﬁcer at Informix and an ORDBMS proponent and pioneer, “ODBMSs occupy a small niche market that has no broad appeal. The technology is in semi-rigor mortis, and ORDBMSs will corner the market within ﬁve years.”\n

In [14]:
import openai

client = openai.OpenAI()

prompt = f"The user asked: {query_text}. The most similar text from the document is: {row[0]}"

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt}
    ]
)

print(response.choices[0].message.content)

Based on the information provided in the document, it seems that object-oriented databases are not expected to be commercially as successful as relational databases. While they have niche markets such as in CAD and telecommunications, they are still considered minor players in the overall database market. Sales of relational databases have been growing considerably faster than sales of object-oriented databases, with a significant difference in revenues.

The document also mentions differing opinions from industry experts, with some like Rick Cattell stating that object-oriented databases are still relevant and utilized in specific areas, while others like Michael Stonebraker believe that object-relational databases will dominate the market in the coming years.

In terms of market forecasts, IDC data suggests that relational databases will continue to outperform object-oriented databases in terms of sales revenue, with the latter having slower growth rates. Object-oriented databases ar

## Clean up

In [15]:
%%sql
DROP DATABASE IF EXISTS pdf_db

<div id="singlestore-footer" style="background-color: rgba(194, 193, 199, 0.25); height:2px; margin-bottom:10px"></div>
<div><img src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/singlestore-logo-grey.png" style="padding: 0px; margin: 0px; height: 24px"/></div>