In [1]:
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Google Spanner
> [Spanner](https://cloud.google.com/spanner) is a highly scalable database that combines unlimited scalability with relational semantics, such as secondary indexes, strong consistency, schemas, and SQL providing 99.999% availability in one easy solution.

This notebook goes over how to chunk documents when using `Spanner` for vector search. We'll use the `SpannerVectorStore` class from LangChain library.

Learn more about Spanner's integration with LangChain by visiting the [GitHub repo](https://github.com/googleapis/langchain-google-spanner-python/).

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/GoogleCloudPlatform/spanner-vector-hybrid-search-samples/blob/main/chunking/chunking-basics.ipynb)

## Before You Begin

To run this notebook, you will need to do the following:

 * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)
 * [Enable the Cloud Spanner API](https://console.cloud.google.com/flows/enableapi?apiid=spanner.googleapis.com)
 * [Create a Spanner instance](https://cloud.google.com/spanner/docs/create-manage-instances)
 * [Create a Spanner database](https://cloud.google.com/spanner/docs/create-manage-databases)

### 🦜🔗 Install dependencies
Let's first install langchain and Vertex AI libraries

In [7]:
%pip install --upgrade --quiet langchain-text-splitters langchain-google-spanner langchain-google-vertexai

Note: you may need to restart the kernel to use updated packages.


**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top.

In [None]:
# # Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)

### 🔐 Authentication
Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.

* If you are using Colab to run this notebook, use the cell below and continue.
* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env).

In [None]:
from google.colab import auth

auth.authenticate_user()

### ☁ Set Your Google Cloud Project
Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.

If you don't know your project ID, try the following:

* Run `gcloud config list`.
* Run `gcloud projects list`.
* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113).

In [None]:
# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.

PROJECT_ID = "<<your-gcp-project>>"  # @param {type:"string"}

# Set the project id
!gcloud config set project {PROJECT_ID}
%env GOOGLE_CLOUD_PROJECT={PROJECT_ID}

### 💡 API Enablement
The `langchain-google-spanner` package requires that you [enable the Spanner API](https://console.cloud.google.com/flows/enableapi?apiid=spanner.googleapis.com) in your Google Cloud Project.

In [None]:
# enable Spanner API
!gcloud services enable spanner.googleapis.com

### Set Spanner database values
Find your database values, in the [Spanner Instances page](https://console.cloud.google.com/spanner?_ga=2.223735448.2062268965.1707700487-2088871159.1707257687).

In [None]:
# @title Set Your Values Here { display-mode: "form" }
INSTANCE = "<<your-spanner-instance>>"  # @param {type: "string"}
DATABASE = "<<your-spanner-database>>"  # @param {type: "string"}
TABLE_NAME = "<<your-spanner-table>>"  # @param {type: "string"}

# Setup some helper methods

In [6]:
from google.cloud import spanner
from google.cloud.spanner_admin_database_v1.types import spanner_database_admin

spanner_client = spanner.Client()
database_admin_api = spanner_client.database_admin_api

OPERATION_TIMEOUT_SECONDS = 240

def drop_table(table_name):
    request = spanner_database_admin.UpdateDatabaseDdlRequest(
        database=database_admin_api.database_path(
            PROJECT_ID, INSTANCE, DATABASE
        ),
        statements=[
            f"DROP TABLE IF EXISTS {table_name}"
        ],
    )

    operation = database_admin_api.update_database_ddl(request)

    print(f"Waiting for drop operation (on table '{table_name}') to complete...")
    operation.result(OPERATION_TIMEOUT_SECONDS)
    print(f"Dropped table {table_name}")

### Initialize a table
The `SpannerVectorStore` class instance requires a database table with id, content and embeddings columns. 

The helper method `init_vector_store_table()` that can be used to create a table with the proper schema for you.

In [7]:
import langchain_google_spanner

from langchain_google_spanner import SecondaryIndex, SpannerVectorStore, TableColumn

# Uncomment the following line to ensure that the table 
# where vector embeddings will be stored is dropped (if it exists).
# This will clear out existing entries
drop_table(TABLE_NAME)

print("Initializing vector store")
SpannerVectorStore.init_vector_store_table(
    instance_id=INSTANCE,
    database_id=DATABASE,
    table_name=TABLE_NAME,
    # Customize the table creation
    id_column="id",
    # content_column="content_column",
    # metadata_columns=[
    #     TableColumn(name="metadata", type="JSON", is_null=True),
    #     TableColumn(name="title", type="STRING(MAX)", is_null=False),
    # ],
    # secondary_indexes=[
    #     SecondaryIndex(index_name="row_id_and_title", columns=["row_id", "title"])
    # ],
)

Waiting for drop operation (on table 'vectors_search_data') to complete...
Dropped table vectors_search_data
Initializing vector store
Waiting for operation to complete...


True

### Create an embedding class instance

You can use any [LangChain embeddings model](https://python.langchain.com/docs/integrations/text_embedding/).
You may need to enable Vertex AI API to use `VertexAIEmbeddings`. We recommend setting the embedding model's version for production, learn more about the [Text embeddings models](https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/text-embeddings) and [Model versions](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api#model_versions).

In [None]:
# enable Vertex AI API
!gcloud services enable aiplatform.googleapis.com

In [8]:
from langchain_google_vertexai import VertexAIEmbeddings

# Make sure you update the model version below reflect the latest production version
embeddings = VertexAIEmbeddings(
    model_name="text-embedding-005", project=PROJECT_ID
)

### Chunking overview
With all of that setup out of the way, let's talk about chunking (aka text splitting). In order to index documents in a vector store like Spanner, it's necessary to first partition or chunk the document into smaller pieces and then send those pieces to the data store to be indexed.

Why is it "necessary" to split documents before indexing them? At a high level, it's because documents (even small ones) are made up of a collection of smaller "fragments". You can think of these fragments as sentences, concepts, words, etc... And in fact, there are a variety of approaches for splitting documents, and LangChain offers multiple options as described [here](https://python.langchain.com/docs/concepts/text_splitters/#text-structured-based).

As explained in the above LangChain article on text splitters, there are roughly four broad approaches for chunking:

- Length based
- Text-structure based
- Document-structured based
- Semantic meaning based

#### Chunking with CharacterTextSplitter followed by indexing on Spanner

In [9]:
import uuid
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("vector_doc_input.txt")
file_contents = loader.load()

# CharacterTextSplitter is just one of many text splitters
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)

# Generate chunks (list of LangChain Documents)
documents = text_splitter.split_documents(file_contents)

ids = [str(uuid.uuid4()) for _ in range(len(documents))]
# The following indexes the above chunks in Spanner
vectorstore = SpannerVectorStore.from_documents(documents,
                                                embeddings,
                                                INSTANCE,
                                                DATABASE,
                                                TABLE_NAME,
                                                id_column="id",
                                                ids=ids)

The above code chunks the document and indexes the chunks in the specified Spanner table. Let's query the underlying Spanner table (TABLE_NAME) directly

In [10]:
import pandas as pd
from google.cloud import spanner

spanner_db = spanner.Client(project=PROJECT_ID).instance(INSTANCE).database(DATABASE)

result_df = pd.DataFrame()

with spanner_db.snapshot() as snapshot:
    results = snapshot.execute_sql(f"SELECT * FROM {TABLE_NAME} LIMIT 10;")

    rows = []
    for row in results:
        rows.append(row)
    
    # Get column names
    cols = [x.name for x in results.fields]

    # Convert to pandas dataframe
    result_df = pd.DataFrame(rows, columns = cols)

display(result_df)

Unnamed: 0,id,content,embedding
0,04b6d5ec-5f1a-45b5-96d3-bc8212bb60d1,Since both correctness and availability are cr...,"[-0.024999909102916718, -0.001801451202481985,..."
1,232c7770-9c71-4889-b5bc-03d6b5ea836b,Blackhole the request: Sometimes the file syst...,"[-0.0427422858774662, -0.0013925273669883609, ..."
2,50270e45-119b-479f-a4fa-6f57ec3ad8a5,Upping reliability with chaos testing\nWe run ...,"[-0.04426778107881546, 0.0026189256459474564, ..."
3,5179e395-d7a9-4181-8b68-e6601a936e99,5. Cloud faults\nAccess to Spanner from the Go...,"[-0.04665933921933174, 0.001961533911526203, 0..."
4,536e8982-4ed4-4ca7-a6a4-00ac9385d73b,A fault-tolerant design foundation\nSpanner is...,"[-0.04364815726876259, 0.007677283603698015, 0..."
5,576eacb1-882f-43e3-b4cb-ced2bb33a0fe,Declared checks and foreign key constraints ar...,"[-0.041425663977861404, -0.01261491421610117, ..."
6,7b2dd05c-4d0b-4d0f-876d-3289bcd0ac8c,A read or query on the database does not retur...,"[-0.018720634281635284, -0.020843015983700752,..."
7,8338f341-d542-4a4c-b817-5af2e7c74c20,Spanner earns its reputation for reliability\n...,"[-0.04425937682390213, -0.005921074189245701, ..."
8,8ec74f92-bfb8-4226-b0ee-64dbbce50c89,The restart logic is quite complex and we even...,"[-0.028140805661678314, -0.019556596875190735,..."
9,ab7ba4fc-fce5-4260-8ddb-047b666b7582,"For example, through chaos testing, we found a...","[-0.029230546206235886, -0.012972258031368256,..."


Let's now do a similarity search on the indexed data via LangChain and display the results:

In [11]:
results = vectorstore.similarity_search(query="resilience", k=3)
print('Num results: ' + str(len(results)))
print()

search_rows = [x.page_content for x in results]
cols = ['page_content']

# The following ensures that the full chunked text fragment is displayed without truncation
pd.set_option('display.max_colwidth', None)

search_df = pd.DataFrame(search_rows, columns = cols)
display(search_df)

Num results: 3



Unnamed: 0,page_content
0,Spanner earns its reputation for reliability\nSpanner is fault tolerant by design. We continuously validate Spanner’s reliability by running many large-scale randomized system tests that employ chaos testing.\n\nYou can learn more about what makes Spanner unique and how it’s being used today. Or try it yourself for free for 90-days or for as little as $65 USD/month for a production-ready instance that grows with your business without downtime or disruptive re-architecture.
1,"5. Cloud faults\nAccess to Spanner from the Google Cloud Platform is mediated by Spanner API Front End Servers, which proxy requests coming into Google Cloud through Google front ends to a Spanner database. External clients open sessions with the Spanner database and execute transactions on these sessions. For Spanner, we crash the Spanner API frontend servers, which forces sessions to migrate to other Spanner API frontend servers. This should not be visible to the client (besides some additional latency).\n\n6. Regional outages\nThe largest faults we simulate in system tests are outages of an entire region, forcing Spanner to serve data from a quorum of other regions. The majority of our system tests simulate several kinds of regional outages, triggered either by file system or network outages, and we verify Spanner continues to serve. This resilience is a property of the Paxos algorithm, which guarantees progress as long as a quorum (2 of 3, or 3 of 5) of replicas remain healthy."
2,"A fault-tolerant design foundation\nSpanner is built from “mostly reliable” components including machines, disks, and networking hardware that have a low rate of failure. Even so, bad things happen: bad memory and disks may lead to data corruption; file accesses may yield transient or permanent errors or corruption; or network connectivity within or between data centers may be throttled or lost altogether. Worst of all, software bugs sometimes produce correlated failures in all servers running the same version of the code."


The above chunking approach "blindly" breaks up our input document into 1000 character chunks - without regard for the semantic meaning contained in these chunks. Let's use a more sophisticated chunking approach provided by LangChain: SemanticChunker. More info [here](https://python.langchain.com/docs/how_to/semantic-chunker/). First, let's initialize a separate Spanner table to index embeddings from this new chunking approach.

In [None]:
from langchain_experimental.text_splitter import SemanticChunker

# A separate table to index using second chunking approach
TABLE_NAME_SC = "<<your-spanner-table-sc>>" # sc for semantic chunking

drop_table(TABLE_NAME_SC)
SpannerVectorStore.init_vector_store_table(
    instance_id=INSTANCE,
    database_id=DATABASE,
    table_name=TABLE_NAME_SC,
    # Customize the table creation
    id_column="id",
    # content_column="content_column",
    # metadata_columns=[
    #     TableColumn(name="metadata", type="JSON", is_null=True),
    #     TableColumn(name="title", type="STRING(MAX)", is_null=False),
    # ],
    # secondary_indexes=[
    #     SecondaryIndex(index_name="row_id_and_title", columns=["row_id", "title"])
    # ],
)

Waiting for drop operation (on table 'vectors_search_data_sc') to complete...
Dropped table vectors_search_data_sc
Waiting for operation to complete...


True

Let's now index the chunks into the new table that we just initialized above.

In [13]:
# We'll use SemanticChunker this time
text_splitter = SemanticChunker(embeddings)

documents = []

with open("vector_doc_input.txt") as f:
    doc_contents = f.read()
    # Generate chunks (list of LangChain Documents)
    documents = text_splitter.create_documents([doc_contents])

ids = [str(uuid.uuid4()) for _ in range(len(documents))]

# The following indexes the above chunks in Spanner
vectorstore_sc = SpannerVectorStore.from_documents(documents,
                                                embeddings,
                                                INSTANCE,
                                                DATABASE,
                                                TABLE_NAME_SC,
                                                id_column="id",
                                                ids=ids)

Let's do a search using these newly indexed (using the SemanticChunker) documents

In [14]:
results_sc = vectorstore_sc.similarity_search(query="resilience", k=3)
print('Num results: ' + str(len(results_sc)))
print()

search_rows_sc = [x.page_content for x in results_sc]

# The following ensures that the full chunked text fragment is displayed without truncation
pd.set_option('display.max_colwidth', None)

cols = ['page_content']
search_sc_df = pd.DataFrame(search_rows_sc, columns = cols)
display(search_sc_df)

Num results: 3



Unnamed: 0,page_content
0,"5. Cloud faults\nAccess to Spanner from the Google Cloud Platform is mediated by Spanner API Front End Servers, which proxy requests coming into Google Cloud through Google front ends to a Spanner database. External clients open sessions with the Spanner database and execute transactions on these sessions. For Spanner, we crash the Spanner API frontend servers, which forces sessions to migrate to other Spanner API frontend servers. This should not be visible to the client (besides some additional latency). 6. Regional outages\nThe largest faults we simulate in system tests are outages of an entire region, forcing Spanner to serve data from a quorum of other regions. The majority of our system tests simulate several kinds of regional outages, triggered either by file system or network outages, and we verify Spanner continues to serve. This resilience is a property of the Paxos algorithm, which guarantees progress as long as a quorum (2 of 3, or 3 of 5) of replicas remain healthy. Spanner earns its reputation for reliability\nSpanner is fault tolerant by design. We continuously validate Spanner’s reliability by running many large-scale randomized system tests that employ chaos testing. You can learn more about what makes Spanner unique and how it’s being used today. Or try it yourself for free for 90-days or for as little as $65 USD/month for a production-ready instance that grows with your business without downtime or disruptive re-architecture."
1,"SOURCE: https://cloud.google.com/blog/products/databases/chaos-testing-spanner-improves-reiliability\n\nOne of the secrets behind Spanner’s reliability is the team’s extensive use of chaos testing, the process of deliberately injecting faults into production-like instances of the database. Although engineers focus on testing the “happy path,” most software bugs occur when things go wrong. Given Spanner’s complex architecture and constantly evolving codebase, it is inevitable that bugs will be introduced. Here, we give an overview of the types of chaos testing we employ and the kinds of bugs it finds. A fault-tolerant design foundation\nSpanner is built from “mostly reliable” components including machines, disks, and networking hardware that have a low rate of failure. Even so, bad things happen: bad memory and disks may lead to data corruption; file accesses may yield transient or permanent errors or corruption; or network connectivity within or between data centers may be throttled or lost altogether. Worst of all, software bugs sometimes produce correlated failures in all servers running the same version of the code. Since both correctness and availability are critical, Spanner uses principles of fault-tolerant design to mask failures of these components and achieve high reliability for the service. For example, checksums are used to detect data corruption at many levels. Spanner tablets, which store a fragment of the database, are replicated across three or (usually) more data centers and the reads and writes use Paxos to achieve consensus and consistency of the distributed state. Checksums are also used to detect corruption of a tablet replica."
2,"The data for these tablets is stored in files, and the file system keeps multiple copies of the data blocks within the data center, using checksums to detect corrupted blocks. Finally, we proceed cautiously when rolling out new software versions, alerting on any anomalies that may be caused by a new bug. Upping reliability with chaos testing\nWe run over a thousand system tests per week to validate that Spanner’s design and implementation actually mask faults and provide a highly reliable service. Each test creates a production-like instance of Spanner comprising hundreds of processes running on the same computing platform and using the same dependent systems (e.g., file system, lock service) as production Spanner. Most tests run for between one and 24 hours and execute tens or hundreds of thousands of transactions. Actual faults in production occur at a very low rate. To cover Spanner’s error-handling and fault-tolerance mechanisms, we inject faults (e.g., file and network errors) at a much higher rate in these system tests. If these faults uncover bugs, the test fails in one of several ways:\n\nA read or query on the database does not return the expected result. Being able to compute the expected result of a randomly generated read/query on a database populated with randomly generated data is a challenging problem. Spanner’s strong consistency model is the key to validate read/query results efficiently: each transaction records a log summarizing its effects, and subsequent transactions can replay these logs to compute the state they should observe. We describe this in further detail in an earlier article. A Spanner API call returns an unexpected error. A Spanner server crashes in an unexpected way. Some of the faults we inject will cause a server to crash, but we filter these and fail the test only if some new unexpected crash occurs. One of Spanner’s internal consistency checkers reports a problem. Checkers verify that:\n\nFiles are not leaked (like Unix fsck, but on the distributed file system)\n\nSecondary indexes are consistent with the tables they index\n\nDeclared checks and foreign key constraints are satisfied\n\nAll replicas of a tablet are equal\n\nLet’s take a look at the kinds of faults that we inject when chaos testing Spanner. 1. Server crashes\nOne of the most basic faults we inject is to force a server to crash abruptly (e.g., via a SIGABRT Unix signal). This simple fault causes lots of complex failure recovery logic to be executed:\n\nServers use a disk-based log to protect against the loss of their in-memory state, thus crashing exercises the logic that recovers the state of all the tablets that were on the crashed server from their logs. All distributed transactions being coordinated by the crashed server must abort and be restarted since the locks are kept in memory. Clients that were pulling data from the crashed server via reads and/or queries are forced to fail over to another replica. The client must resume the operation without starting again at the beginning, and without losing or duplicating any results. The restart logic is quite complex and we even trigger restarts without server crashes to exercise it at various points in the streaming of the results."


As you can see, these results are more self-consistent. To drive the point home further, let's see how these two chunking approaches fare when they're used in a RAG workflow.



## Setup for RAG

In [18]:
from langchain_google_vertexai import ChatVertexAI
from langchain_core.prompts import PromptTemplate

# this vector store uses the embeddings generated
# with CharacterTextSplitter
vector_store = SpannerVectorStore(
    embedding_service=embeddings,
    instance_id=INSTANCE,
    database_id=DATABASE,
    table_name=TABLE_NAME,
    id_column="id",
    content_column="content",
    embedding_column="embedding",
)

retriever = vector_store.as_retriever(
    search_type="mmr", search_kwargs={"k": 5, "lambda_mult": 0.8}
)

# this vector store uses the embeddings generated
# with SemanticChunker
vector_store_sc = SpannerVectorStore(
    embedding_service=embeddings,
    instance_id=INSTANCE,
    database_id=DATABASE,
    table_name=TABLE_NAME_SC,
    id_column="id",
    content_column="content",
    embedding_column="embedding",
)

retriever_sc = vector_store_sc.as_retriever(
    search_type="mmr", search_kwargs={"k": 5, "lambda_mult": 0.8}
)

# let's initialize our llm
llm = ChatVertexAI(
    model="gemini-2.0-flash",
    temperature=0,
    max_tokens=None,
    max_retries=6,
    stop=None,
    # other params...
)

prompt_str = """You are a knowledgeable and helpful bot who answers questions for a 
                technical audience at a 200 level of complexity. Please ensure that your
                responses are self-consistent. With that background, please answer this 
                question: {question} based on the following context: {context}"""
prompt_template = PromptTemplate.from_template(prompt_str)

# helper method to format fragments
# retrieved from vector store
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

First, let's run a RAG using context on the CharacterTextSplitter based chunking approach

In [None]:
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
import textwrap

# notice that we're passing in the retriever
# that was instantiated on top of the table
# containing "chunks" generated via CharacterTextSplitter
rag_chain = (
        {"context": retriever | format_docs, "question": RunnablePassthrough()}
        | prompt_template
        | llm
        | StrOutputParser()
)

ai_response = rag_chain.invoke("How does Spanner increase availability with chaos testing?")
# ai_response = wrapped_text = textwrap.fill(ai_response, width=80)

print(ai_response)

Spanner uses chaos testing to increase availability by proactively injecting faults into production-like instances. This allows engineers to validate Spanner's fault-tolerance mechanisms and error-handling code. By injecting faults such as server crashes, network partitions, and errors in dependent systems, Spanner can identify and fix bugs that would otherwise occur in production. This helps to ensure that Spanner can continue to operate even when failures occur, thus increasing availability.

Here's a breakdown of how specific fault injections contribute to increased availability:

*   **Server Crashes:** Simulating server crashes validates the recovery mechanisms for in-memory state from disk-based logs, ensures distributed transactions abort and restart correctly, and verifies that clients can failover to other replicas without data loss or duplication. This ensures that the service remains available even if individual servers fail.

*   **Network Partitions:** Blocking RPCs to spe

Now let's run a RAG using the table (vector store) containing chunks generated by the SemanticChunker:

In [24]:

# notice that we're passing in the retriever
# that was instantiated on top of the table
# containing "chunks" generated via SemanticChunker
rag_chain = (
        {"context": retriever_sc | format_docs, "question": RunnablePassthrough()}
        | prompt_template
        | llm
        | StrOutputParser()
)

ai_response = rag_chain.invoke("How does Spanner increase availability with chaos testing?")
#ai_response = wrapped_text = textwrap.fill(ai_response, width=80)

print(ai_response)

Spanner uses chaos testing to increase availability by proactively injecting faults into production-like instances of the database. This allows engineers to validate that Spanner's fault-tolerant design and implementation effectively mask failures and maintain a highly reliable service. Here's how:

*   **Fault Injection:** Spanner injects various faults, such as server crashes, file system errors (e.g., corruption, blackholes), RPC failures (delays, errors, network partitions), memory/quota exhaustion, cloud frontend server crashes and regional outages.
*   **Testing Recovery Mechanisms:** By injecting these faults, Spanner tests its recovery mechanisms, including:
    *   **Server crash recovery:** Validates the disk-based log recovery mechanism, distributed transaction abort/restart logic, and client failover to other replicas.
    *   **File system fault tolerance:** Ensures that Spanner can handle file system errors, data corruption (detected by checksums), and file system unavail

## Summary

You can see that there's a meaningful difference in the level of detail and self-consistency between the two approaches. We invite you to play with the various [chunking approaches](https://python.langchain.com/docs/concepts/text_splitters/) to determine the best fit for your use case.

## Cleanup

If you created a Spanner instance just to run this demo - to ensure that you don't continue to get billed for the resources you provisioned, just go into the [Cloud Spanner section](https://console.cloud.google.com/spanner/instances/) of the Cloud Console and delete the instance you created.