<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/vector_stores/FirestoreVectorStore.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Firestore Vector Store

# Google Firestore (Native Mode)

> [Firestore](https://cloud.google.com/firestore) is a serverless document-oriented database that scales to meet any demand. Extend your database application to build AI-powered experiences leveraging Firestore's Langchain integrations.

This notebook goes over how to use [Firestore](https://cloud.google.com/firestore) to store vectors and query them using the `FirestoreVectorStore` class.

## Before You Begin

To run this notebook, you will need to do the following:

* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)
* [Enable the Firestore API](https://console.cloud.google.com/flows/enableapi?apiid=firestore.googleapis.com)
* [Create a Firestore database](https://cloud.google.com/firestore/docs/manage-databases)

After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts.

## Library Installation

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙. For this notebook, we will also install `langchain-google-genai` to use Google Generative AI embeddings.

In [None]:
%pip list

In [2]:
%pip install --quiet llama-index==0.11.19
%pip install --quiet llama-index-embeddings-huggingface==0.3.1

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m19.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m21.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m242.1/242.1 kB[0m [31m14.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.8/295.8 kB[0m [31m16.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m38.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.9/48.9 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
%pip install llama-index-vector-store-firestore

### ☁ Set Your Google Cloud Project
Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.

If you don't know your project ID, try the following:

* Run `gcloud config list`.
* Run `gcloud projects list`.
* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113).

In [None]:
# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.

PROJECT_ID = "gcp-genai-fingpt"  # @param {type:"string"}

# Set the project id
!gcloud config set project {PROJECT_ID}

Updated property [core/project].


### 🔐 Authentication

Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.

- If you are using Colab to run this notebook, use the cell below and continue.
- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env).

In [None]:
from google.colab import auth

auth.authenticate_user()

# Basic Usage

### Initialize FirestoreVectorStore

`FirestoreVectroStore` allows you to load data into Firestore and query it.

In [None]:
# @markdown Please specify a source for demo purpose.
COLLECTION_NAME = "test-rag-data"

In [None]:
!pip install docx2txt

Collecting docx2txt
  Downloading docx2txt-0.8.tar.gz (2.8 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: docx2txt
  Building wheel for docx2txt (setup.py) ... [?25l[?25hdone
  Created wheel for docx2txt: filename=docx2txt-0.8-py3-none-any.whl size=3960 sha256=3d33da0ee60c4847db2d14b44cb467339ba354eb0fabd7bb4da31a216f650d26
  Stored in directory: /root/.cache/pip/wheels/22/58/cf/093d0a6c3ecfdfc5f6ddd5524043b88e59a9a199cb02352966
Successfully built docx2txt
Installing collected packages: docx2txt
Successfully installed docx2txt-0.8


In [4]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

# Set the embedding model, this is a local model
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-large-en-v1.5")

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/779 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

In [None]:
!pip install google-cloud

Collecting google-cloud
  Downloading google_cloud-0.34.0-py2.py3-none-any.whl.metadata (2.7 kB)
Downloading google_cloud-0.34.0-py2.py3-none-any.whl (1.8 kB)
Installing collected packages: google-cloud
Successfully installed google-cloud-0.34.0


In [None]:
from google.cloud import firestore

db = firestore.Client(project=PROJECT_ID,database=)
print(db)

<google.cloud.firestore_v1.client.Client object at 0x7f888e02f550>


In [None]:
collection = db.collection('test-rag-data_v1',)

In [None]:
print(collection)

<google.cloud.firestore_v1.collection.CollectionReference object at 0x7f888fa68c40>


In [5]:
from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SemanticSplitterNodeParser

# Load documents and build index
documents = SimpleDirectoryReader(
    "/content/sample_data/"
).load_data()

In [6]:
from llama_index.core import VectorStoreIndex
from llama_index.core import StorageContext, Settings

from llama_index.vector_stores.firestore import FirestoreVectorStore

# Create a Firestore vector store
store = FirestoreVectorStore(client = db, collection_name=COLLECTION_NAME)

storage_context = StorageContext.from_defaults(vector_store=store)
Settings.embed_model = embed_model
Settings.llm = None

index1 = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

NameError: name 'db' is not defined

In [None]:
splitter = SemanticSplitterNodeParser(include_metadata=True,
        buffer_size=90, breakpoint_percentile_threshold=95, embed_model=Settings.embed_model)
Settings.transformation = transformations=[
                splitter,
                Settings.embed_model,
        ]

In [None]:
from llama_index.core.ingestion import IngestionPipeline
pipeline = IngestionPipeline(name="RAG_TEXT_INGESTION",
            transformations=Settings.transformation,
            vector_store=store
        )
pipeline.disable_cache = True
nodes = pipeline.arun(documents=documents,show_progress=True)


In [None]:
print(len(nodes))

1


In [None]:
index = VectorStoreIndex.from_vector_store(
    vector_store=store
)

In [None]:
query_engine = index.as_query_engine()
res = query_engine.query("price?")
print(str(res.source_nodes[0].text))

6809-9FDE-A3B6, 17.79375, nan
Vector store (Firestore), Cloud Firestore Entity Writes Iowa, 9125000.0, us-central1, F17B-412E-CB64, 9822-4C63-901B, 7.665, nan
Vector store (Firestore), Cloud Firestore Entity Deletes Iowa, 0.0, us-central1, F17B-412E-CB64, 3E61-C297-8795, 0.0, nan
Vector store (Firestore), Cloud Firestore Storage Iowa, 50.0, us-central1, F17B-412E-CB64, 081D-E9E6-8764, 7.35, nan
Application Service (App Engine), Frontend Instances, 2.0, us-central1, F17B-412E-CB64, E2EB-F679-D108, 30.41667, nan
Ingestion (Source) (Cloud Storage), Network Data Transfer GCP Inter Region within Northern America, 50.0, nan, 95FF-2EF5-5EA1, 8878-37D4-D2AC, 0.0, nan
Ingestion (Source) (Cloud Storage), Standard Storage US Regional, 50.0, us-central1, 95FF-2EF5-5EA1, E5F0-6A5D-7BAD, 0.9, nan
nan, nan, nan, nan, nan, nan, nan, nan
nan, nan, nan, nan, nan, Total Price:, 938.55686, nan
nan, nan, nan, nan, nan, nan, nan, nan
Prices are in US dollars, effective date is 2024-12-13T10:30:07.225Z, nan,