# SQLAlchemy

This notebook demonstrates how to load documents from a [CrateDB] database,
using the document loader `CrateDBLoader`, which is based on [SQLAlchemy].

It loads the result of a database query with one document per row.

[CrateDB]: https://github.com/crate/crate
[SQLAlchemy]: https://www.sqlalchemy.org/

## Prerequisites

Install required packages.

In [40]:
#!pip install -r requirements.txt

Populate database.

In [21]:
!rm -f mlb_teams_2012.sql
!wget --quiet https://github.com/crate-workbench/langchain/raw/cratedb/docs/docs/integrations/document_loaders/example_data/mlb_teams_2012.sql

!crash --schema=notebook < mlb_teams_2012.sql;
!crash --schema=notebook --command "REFRESH TABLE mlb_teams_2012;"

[32mCONNECT OK
[0m[32mPSQL OK, 1 row affected (0.001 sec)
[0m[32mDELETE OK, 30 rows affected (0.010 sec)
[0m[32mINSERT OK, 30 rows affected (0.011 sec)
[0m[0m[32mCONNECT OK
[0m[32mREFRESH OK, 1 row affected (0.026 sec)
[0m[0m

## Usage

In [13]:
from langchain.document_loaders.cratedb import CrateDBLoader
from pprint import pprint

CONNECTION_STRING = "crate://crate@localhost/?schema=notebook"

loader = CrateDBLoader(
    'SELECT * FROM mlb_teams_2012 ORDER BY "Team" LIMIT 5;',
    url=CONNECTION_STRING,
)
documents = loader.load()

In [14]:
pprint(documents)

[Document(page_content='Team: Angels\nPayroll (millions): 154.49\nWins: 89', metadata={}),
 Document(page_content='Team: Astros\nPayroll (millions): 60.65\nWins: 55', metadata={}),
 Document(page_content='Team: Athletics\nPayroll (millions): 55.37\nWins: 94', metadata={}),
 Document(page_content='Team: Blue Jays\nPayroll (millions): 75.48\nWins: 73', metadata={}),
 Document(page_content='Team: Braves\nPayroll (millions): 83.31\nWins: 94', metadata={})]


## Specifying Which Columns are Content vs Metadata

In [15]:
loader = CrateDBLoader(
    'SELECT * FROM mlb_teams_2012 ORDER BY "Team" LIMIT 5;',
    url=CONNECTION_STRING,
    page_content_columns=["Team"],
    metadata_columns=["Payroll (millions)"],
)
documents = loader.load()

In [16]:
pprint(documents)

[Document(page_content='Team: Angels', metadata={'Payroll (millions)': 154.49}),
 Document(page_content='Team: Astros', metadata={'Payroll (millions)': 60.65}),
 Document(page_content='Team: Athletics', metadata={'Payroll (millions)': 55.37}),
 Document(page_content='Team: Blue Jays', metadata={'Payroll (millions)': 75.48}),
 Document(page_content='Team: Braves', metadata={'Payroll (millions)': 83.31})]


## Adding Source to Metadata

In [17]:
loader = CrateDBLoader(
    'SELECT * FROM mlb_teams_2012 ORDER BY "Team" LIMIT 5;',
    url=CONNECTION_STRING,
    source_columns=["Team"],
)
documents = loader.load()

In [18]:
pprint(documents)

[Document(page_content='Team: Angels\nPayroll (millions): 154.49\nWins: 89', metadata={'source': 'Angels'}),
 Document(page_content='Team: Astros\nPayroll (millions): 60.65\nWins: 55', metadata={'source': 'Astros'}),
 Document(page_content='Team: Athletics\nPayroll (millions): 55.37\nWins: 94', metadata={'source': 'Athletics'}),
 Document(page_content='Team: Blue Jays\nPayroll (millions): 75.48\nWins: 73', metadata={'source': 'Blue Jays'}),
 Document(page_content='Team: Braves\nPayroll (millions): 83.31\nWins: 94', metadata={'source': 'Braves'})]
