Simple notebook that uses the Cloud DiscoveryEngine API to check a Data Store for indexed docs.
Uses the DiscoveryEngine Python API.

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/dfcx-scrapi/blob/main/examples/vertex_ai_conversation/check_documents_in_datastore.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo"><br> Run in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/dfcx-scrapi/blob/main/examples/vertex_ai_conversation/check_documents_in_datastore.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/dfcx-scrapi/blob/main/examples/vertex_ai_conversation/check_documents_in_datastore.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
</table>
<br><br><br>



## Instructions

## Setup

In [None]:
# Dependencies
!pip install dfcx-scrapi --quiet

from google.colab import auth
from google.auth import default

# Authentication
auth.authenticate_user()
creds, _ = default()

# USER INPUTS
You can find your `datastore_id` by using the `get_data_stores_map` method in SCRAPI.

In [None]:
from dfcx_scrapi.core.data_stores import DataStores
from dfcx_scrapi.core.search import Search

PROJECT_ID = "" #@param{type: 'string'}

s = Search()
ds = DataStores(project_id=PROJECT_ID)

ds_map = ds.get_data_stores_map(reverse=True, location="global")
ds_map

In [None]:
# Access your datastore_id from the ds_map by using the human readable display name
datastore_id = ds_map["my-cool-datastore"]

# Check Data Store Index Status
Using the `check_datastore_index_status` method, to check if the data store has finished indexing.

In [None]:
s.check_datastore_index_status(datastore_id)

# List Documents
List all the documents for a given Data Store ID

In [None]:
docs = s.list_documents(datastore_id)
docs[0]

# List Indexed URLs

In [None]:
urls = s.list_indexed_urls(datastore_id, docs)
urls[0]

Display list of urls

In [None]:
urls

Write Urls to a file in json format

In [None]:
import json

with open('my_list.json', 'w') as file:
    json.dump(urls, file)

# Search Indexed URLs

In [None]:
s.search_url(urls, 'tundra-250')

# Search Data Store by Doc ID


Search through all Docs in a given Data Store and find a specific Doc ID.

In [None]:
document_id = 'a71d802406f2f0e546b621245e1cbc6a'

s.search_doc_id(document_id=document_id, docs=docs)

List docs and search document ID all at once.

In [None]:
document_id = 'a71d802406f2f0e546b621245e1cbc6a'

s.search_doc_id(document_id=document_id, datastore_id=datastore_id)