# Ingest Website to Graph DB

## **Part 5** - Ontology Refinement using Vector Search

1. Add Vectorised Indices to GrpahDB (Node4j).

2. Use Semantic Proximity Searches to identify Rerlationships and Nodes which are candidates for:

   1. Amalgamation - Collapsing/Simplification

   2. Elimination - Not Relevant to Desired KM Use Case

This is a GCP reworking of (the openai example in)

https://python.langchain.com/v0.1/docs/integrations/vectorstores/neo4jvector/

... and a langchain-vertexai reworking of:

https://python.langchain.com/v0.1/docs/integrations/text_embedding/google_generative_ai/

This is becuase currently at BJSS we cvant get access to the Gemini API Keys. 

This notebook is **langchain-google-genai independant** for the avoidance of doubt.

There are no copied class files in this notebook. 

The Resultant Graph can be used within Node and Relationship Filters on the Full Corpus of Interest - in this case **Generative AI**. 

##### **Minimal install for Vertex AI**

This solved the instability problem by *NOT* installing OpenAI classes via the community install. 

In [None]:
pip install -U langchain langchain-google-vertexai neo4j

**Check Version Nos of what was installed**

In [None]:
!pip show langchain langchain-core langchain-google-vertexai langchain-experimental langchain-community neo4j google-cloud-aiplatform

**Check Jupyter Version No**

In [4]:
!jupyter --version

Selected Jupyter core packages...
IPython          : 8.21.0
ipykernel        : 6.29.4
ipywidgets       : 8.1.2
jupyter_client   : 7.4.9
jupyter_core     : 5.7.2
jupyter_server   : 1.24.0
jupyterlab       : 3.4.8
nbclient         : 0.10.0
nbconvert        : 7.16.4
nbformat         : 5.10.4
notebook         : 6.5.7
qtconsole        : not installed
traitlets        : 5.14.3


**Check Python Version/Path** - *Expect 3.10.14*

In [5]:
import sys
import platform
print(sys.version)
print(platform.python_version())
print(sys.path)

3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
3.10.14
['/opt/conda/lib/python310.zip', '/opt/conda/lib/python3.10', '/opt/conda/lib/python3.10/lib-dynload', '', '/opt/conda/lib/python3.10/site-packages']


**Now for the Imports**

This time we are isloating Vertex AI

In [6]:
import os
from langchain.globals import set_debug
from langchain_google_vertexai import VertexAIEmbeddings
from langchain_community.graphs import Neo4jGraph
from langchain_community.vectorstores import Neo4jVector

##### **Connect to Google LLMs**

This API KEY approach works with **langchain-google-vertexai** but **_not_** with **langchain-google-genai**

*Least Privilege Security.*

The Notebook is "owned" by a bespoke Service Account created in terrafrom for this purpose.

Minimal permisisons are added (also via terraform) via predefined roles (esp. Vertex) as required.

This is typically triggered by a PERMISSION DENIED error

In [16]:
# Set It - will require regeneration
os.environ['GOOGLE_API_KEY'] = '3545f2eca2d32f3e27d031774fda2fee227c593f'
# Access the environment variable later in your code
env_api_key = os.environ['GOOGLE_API_KEY']
print(f"env_api_key: {env_api_key}")
PROJECT_ID = "nlp-dev-6aae"
test_embedding = "hello, world!"
search_string = "Vertex AI"

env_api_key: 3545f2eca2d32f3e27d031774fda2fee227c593f


****Enable Langchain Debugging****

See: https://python.langchain.com/v0.1/docs/guides/development/debugging/

In [17]:
# Currently Disabled, Set to True to enable
set_debug(True)

##### **Create The Embeddings**

This proves basc connectivity & functionality of the GCP Embedding Model for GenAI

Sourced from here: https://python.langchain.com/v0.1/docs/integrations/platforms/google/

In [18]:
embeddingGCP = VertexAIEmbeddings(
    model_name="textembedding-gecko@latest", project=PROJECT_ID
)

query_result = embeddingGCP.embed_query(test_embedding)

print(f"test_embedding: {test_embedding}")
print(f"query_result: {query_result}")

test_embedding: hello, world!
query_result: [0.052017416805028915, -0.030953068286180496, -0.030846256762742996, -0.028158482164144516, 0.01781940646469593, -0.0019130000146105886, 0.028597984462976456, -0.007565246894955635, 0.010808120481669903, -0.0057900105603039265, 0.03907504677772522, 0.05087621137499809, -0.00807026494294405, -0.06057383120059967, -0.006879169028252363, -0.02224457450211048, 0.013218574225902557, -0.008559225127100945, -0.000701079610735178, -0.0029124850407242775, -0.003639709437265992, 0.009413229301571846, -0.02782364934682846, -0.030522421002388, 0.021218476817011833, 0.011880539357662201, -0.0013187489239498973, -0.07345182448625565, 0.012441609054803848, 0.05887635052204132, -0.03551314026117325, 0.017118927091360092, -0.05440368875861168, 0.006286651361733675, 0.03878151252865791, -0.05733191594481468, 0.03970646485686302, 0.009752064943313599, -0.0015157802263274789, -0.0001953284372575581, 0.02433612570166588, -0.09208427369594574, -0.04463260993361473

#### **Node4J Connectivity**

Requires signing up for free version.

DB Will be stopped if not recently used and will require resuming else will fail. 

In [19]:
os.environ["NEO4J_URI"] = "neo4j+s://a657168d.databases.neo4j.io"
os.environ["NEO4J_USERNAME"] = "neo4j"
os.environ["NEO4J_PASSWORD"] = "VM3A9Mz6usNT99nLs_lqQssfVK8JxeD81DnEiXlDkZU"

graph = Neo4jGraph()

#### Working with vectorstore

Above, we created a vectorstore from scratch. However, often times we want to work with an existing vectorstore. In order to do that, we can initialize it directly.

Extract from: https://python.langchain.com/v0.1/docs/integrations/vectorstores/neo4jvector/

Apparently Vectorised GrpahDB Indices and Vector Stores are synonymous? a

In [26]:
existing_graph = Neo4jVector.from_existing_graph(
    embedding=embeddingGCP,
    url=os.environ["NEO4J_URI"],
    username=os.environ["NEO4J_USERNAME"],
    password=os.environ["NEO4J_PASSWORD"],
    index_name="vertex_index",
    node_label="Vertex AI",
    text_node_properties=["id"],
    embedding_node_property="embedding",
)

**Verify DB supports Vectors**

In [27]:
existing_graph.verify_version()

**Verify Index Created**

In [28]:
existing_index = existing_graph.retrieve_existing_index() 
print(f"existing_index: {existing_index}")

existing_index: (768, 'NODE')


**Perform a Search** using the Vector Search Index.

In [29]:
##result = existing_graph.similarity_search(search_string)
result = existing_graph.similarity_search("Any", k=1)
print(f"query: {search_string}")
print(f"result: {result}")

query: Vertex AI
result: []


In [30]:
docs_with_score = existing_graph.similarity_search_with_score("Any")
print(f"query: {search_string}")
print(f"result: {docs_with_score}")

query: Vertex AI
result: []


#### LangChain Dox, Debug, Diagnostics

Neo4J wrapper classes are stored in:
1. libs\community\langchain_community\vectorstores\neo4j_vector.py - Neo4jVector class
2. libs\core\langchain_core\vectorstores.py - VectorStore interface

In theory therefore it  may be possible to 
1. exclude langchain community 
2. copy the contents of neo4j_vector.py into a jupyter cell
3. use the jupyter debugger on said code

There does not aoear to be any chaining happening in this class   

#### Node4J Dox, Debug, Diagnostics

Node4J on Vector Indices:  https://neo4j.com/docs/cypher-manual/current/indexes/semantic-indexes/vector-indexes/ 

Node4j Bolt Driver Source Code and Dox Link: https://github.com/neo4j/neo4j-python-driver/blob/5.0/README.rst 

Bolt Driver API: https://neo4j.com/docs/api/python-driver/current/index.html 
1. Logging Enabling
2. Direct Query Execution

#### Putting it all together

* Eable Node4J Driver Debug Logs
* Run Direct Queries using the No4J Driver bases on https://neo4j.com/docs/cypher-manual/current/indexes/semantic-indexes/vector-indexes/
     1. Check for existence fo Vector Indices
     2. Run Vector Queries & Check Results 
* Then check what is happenign within the Neo4jVector class in doing the same
