<img src = "https://github.com/VeryFatBoy/notebooks/blob/main/common/images/img_github_singlestore-jupyter_featured_2.png?raw=true">

<div id="singlestore-header" style="display: flex; background-color: rgba(235, 249, 245, 0.25); padding: 5px;">
    <div id="icon-image" style="width: 90px; height: 90px;">
        <img width="100%" height="100%" src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/header-icons/browser.png" />
    </div>
    <div id="text" style="padding: 5px; margin-left: 10px;">
        <div id="badge" style="display: inline-block; background-color: rgba(0, 0, 0, 0.15); border-radius: 4px; padding: 4px 8px; align-items: center; margin-top: 6px; margin-bottom: -2px; font-size: 80%">SingleStore Notebooks</div>
        <h1 style="font-weight: 500; margin: 8px 0 0 4px;">SingleStoreDB integration with LangChain</h1>
    </div>
</div>

In [3]:
!pip cache purge --quiet

[0m

In [4]:
!pip install langchain --quiet
!pip install langchain-community --quiet
!pip install langchain-openai --quiet
!pip install nltk --quiet
!pip install openai --quiet
!pip install pdf2image --quiet
!pip install pdfminer.six --quiet
!pip install unstructured==0.10.14 --quiet

In [7]:
import getpass
import nltk
import os
import shutil

from langchain.document_loaders import OnlinePDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import SingleStoreDB
from langchain_openai import OpenAIEmbeddings
from openai import OpenAI
from singlestoredb.management import get_secret

<div class="alert alert-block alert-warning">
    <b class="fa fa-solid fa-exclamation-circle"></b>
    <div>
        <p><b>Action Required</b></p>
        <p>Select the workspace from the drop-down menu at the top of this notebook.</p>
    </div>
</div>

In [9]:
%%sql
CREATE DATABASE IF NOT EXISTS pdf_db;

USE pdf_db;

DROP TABLE IF EXISTS pdf_docs;

<div class="alert alert-block alert-warning">
    <b class="fa fa-solid fa-exclamation-circle"></b>
    <div>
        <p><b>Action Required</b></p>
        <p>Select the <b>pdf_db</b> database from the drop-down menu at the top of this notebook.</p>
    </div>
</div>

In [11]:
nltk.download("punkt_tab")
nltk.download("averaged_perceptron_tagger_eng")

[nltk_data] Downloading package punkt_tab to /home/jovyan/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger_eng.zip.


True

In [12]:
loader = OnlinePDFLoader("http://leavcom.com/pdf/DBpdf.pdf")

data = loader.load()

[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [13]:
print (f"You have {len(data)} document(s) in your data")
print (f"There are {len(data[0].page_content)} characters in your document")

You have 1 document(s) in your data
There are 13040 characters in your document


In [14]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 2000,
    chunk_overlap = 20
)
texts = text_splitter.split_documents(data)

print (f"You have {len(texts)} pages")

You have 8 pages


In [15]:
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

OpenAI API Key: ········


In [17]:
embedding = OpenAIEmbeddings(model = "text-embedding-3-small")

docsearch = SingleStoreDB.from_documents(
    texts,
    embedding,
    table_name = "pdf_docs",
    distance_strategy = "DOT_PRODUCT",
)

In [18]:
query_text = "Will object-oriented databases be commercially successful?"

docs = docsearch.similarity_search(query_text)

print(docs[0].page_content)

I N D U S T R Y T R E N D S

Whatever Happened to Object-Oriented Databases?

Neal Leavitt

are gaining in popularity and are ex- pected to outsell even relational data- bases by 2003. And OO databases (see the “OO Database Orientation” sidebar) are still minor players with solid but strictly niche markets. Sales of relational databases have grown considerably faster than the sales of OO databases, and annual worldwide RDBMS revenues are now about 50 times larger.

Rick Cattell, distinguished engineer at Sun Microsystems, indicated, “Object- oriented databases are doing just ﬁne, and the news of their demise is highly exag- gerated. While their market [share] isn’t as big, they continue to be used in areas like CAD (computer-aided design) and telecommunications, where RDBMSs are not well suited.”

A couple of years ago, industry

observers touted object-oriented databases as a technology on the rise, well suited for the emerg- ing Internet age.

However, said Michael Stonebraker, chief

In [19]:
client = OpenAI()

prompt = f"The user asked: {query_text}. The most similar text from the document is: {docs[0].page_content}"

response = client.chat.completions.create(
    model = "gpt-4o-mini",
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt}
    ]
)

print(response.choices[0].message.content)

The text discusses the state of object-oriented databases (OODBMS) and their commercial success compared to relational databases (RDBMS). While there were predictions that OODBMS would gain popularity and potentially outsell RDBMS by 2003, the actual market trends showed otherwise. OODBMS have been recognized as having a solid, but niche market, especially in specific fields like CAD and telecommunications. Despite their perceived limitations in broader applicability, some industry experts argue that OODBMS are “doing just fine,” although their market share remains significantly smaller than that of RDBMS, which have continued to grow rapidly in revenue. Ultimately, the predictions of OODBMS becoming the primary database technology have not materialized. 

In conclusion, while object-oriented databases have their place and utility, especially in niche markets, they have not achieved widespread commercial success when compared to relational databases.


## Cleanup

In [20]:
shutil.rmtree("nltk_data")

In [21]:
%%sql
DROP TABLE IF EXISTS pdf_docs;

In [22]:
%%sql
DROP DATABASE IF EXISTS pdf_db;