Skip to content

Commit c1df8b2

Browse files
committed
add basic RAG with static data
in this change i hard code the commit messaged related to the query and use that to initalise a vector store using sqlite as the backed. this vector store is then used the buid a retriver to agment the context query context. finally the prompt and query logic are extracted into funcitons, built into a graph and executed with this we now get output that more reasonable """ The commit in question is I63208c7bd5f9f4c3d5e4a40bd0f6253d0f042a37 from the OpenStack Nova repository (<https://opendev.org/openstack/nova>). **Commit Message:** The commit message is repeated multiple times in the provided context, suggesting that this change involves resolving an issue where each foreign key points to a table that may not be the source table. The solution is to check if the target table is indeed the source and skip if it's not. This resolution allows for enabling errors on SAWarning warnings without any filters. **Author:** Stephen Finucane <sfinucan@redhat.com> **Change-Id:** I63208c7bd5f9f4c3d5e4a40bd0f6253d0f042a37 **Conflicts:** The conflicts are noted in nova/objects/cell_mapping.py and nova/tests/fixtures/nova.py. According to the NOTE(melwitt), these conflicts arise because: 1. Change Ia5304c552ce552ae3c5223a2bfb3a9cd543ec57c (db: Post reshuffle cleanup) is not in Wallaby. 2. The WarningsFixture is in nova/tests/fixtures.py in Wallaby, and change I07be8e16381592fc177eeecd4f0d7f2db93eb09d (tests: Enable SADeprecationWarning warnings) is also not in Wallaby. **Cherry-picked from:** This commit was cherry-picked from three other commits: 8142b9d, ce2cc54, and 5b1d487. **Resolution & Impact:** The commit aims to resolve an issue related to foreign key relationships in the database schema. By ensuring that each foreign key points to its correct source table, this change should improve data integrity and consistency within Nova's database. Additionally, enabling errors on SAWarning warnings without filters could help in early detection of potential issues or misconfigurations during runtime. **Additional Information:** The repeated commit message and the NOTE(melwitt) suggest that this commit might be part of a larger effort to address database schema-related issues or improve testing robustness within Nova. To get more specific details, you may want to review the actual code changes or related bug reports/discussions in the OpenStack Nova project. """ not perfect but it actully is relevent to the change.
1 parent c9a36d9 commit c1df8b2

6 files changed

Lines changed: 152 additions & 16 deletions

File tree

.evn.sample

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
OLLAMA_HOST = "http://localhost:11434"
22
OLLAMA_MODEL = "granite3.2:8b"
3+
EMBEDDING_MODEL = "all-minilm:l6-v2"

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,3 +53,6 @@ tools/ooo-tcib/output/
5353
# Per-project virtualenvs
5454
.venv*/
5555
.conda*/
56+
57+
# working state dir
58+
state

setup.cfg

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,9 +52,13 @@ python_requires = >=3.12
5252
# For more information, check out https://semver.org/.
5353
install_requires =
5454
langchain # mit
55+
langchain-community
5556
langchain-ollama
57+
langchain_text_splitters
58+
langgraph
5659
ollama
5760
python-dotenv
61+
sqlite-vec
5862

5963
[options.packages.find]
6064
where = src

src/ca_bhfuil/main.py

Lines changed: 144 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,40 @@
11
# -*- coding: utf-8 -*-
22
import os
33
import sys
4+
import typing as ty
45

56
import dotenv
67
import langchain_ollama as lo
8+
import langchain_text_splitters as lts
79

10+
from langchain_community import vectorstores
11+
from langchain_core import documents
12+
from langgraph import graph
813

9-
def main():
10-
dotenv.load_dotenv()
11-
host = os.environ.get("OLLAMA_HOST")
12-
model = os.environ.get("OLLAMA_MODEL")
1314

14-
if not all((host, model)):
15-
sys.exit(
16-
f"OLLAMA_HOST: {host} and OLLAMA_MODEL: {model} must be set in the environment"
17-
)
15+
# Define state for application
16+
class State(ty.TypedDict):
17+
question: str
18+
context: ty.List[documents.Document]
19+
answer: str
20+
db: vectorstores.VectorStore
21+
llm: lo.ChatOllama
1822

19-
llm = lo.ChatOllama(model=model, base_url=host, temperature=0)
23+
24+
# Define application steps
25+
def retrieve(state: State) -> dict:
26+
retrieved_docs = state["db"].similarity_search(state["question"])
27+
return {"context": retrieved_docs}
28+
29+
30+
def generate(state: State) -> dict:
31+
docs_content = "\n\n".join(doc.page_content for doc in state["context"])
32+
# Define prompt for question-answering
33+
prompt = f"""
34+
Question: {state["question"]}
35+
Context: {docs_content}
36+
Answer:
37+
"""
2038
messages = [
2139
(
2240
"system",
@@ -31,14 +49,124 @@ def main():
3149
"When you provide a summary you alway explain your reasoning and cite your sources"
3250
"providing links to supporting data when possible.",
3351
),
34-
(
35-
"human",
36-
"can you help me find a commit in https://opendev.org/openstack/nova and tell me about it?"
37-
"change-id: I63208c7bd5f9f4c3d5e4a40bd0f6253d0f042a37",
38-
),
52+
("human", prompt),
3953
]
40-
ai_msg = llm.invoke(messages)
41-
print(ai_msg.content)
54+
response = state["llm"].invoke(messages)
55+
return {"answer": response.content}
56+
57+
58+
def main():
59+
dotenv.load_dotenv()
60+
host = os.environ.get("OLLAMA_HOST")
61+
model = os.environ.get("OLLAMA_MODEL")
62+
embed_model = os.environ.get("EMBEDDING_MODEL")
63+
64+
if not all((host, model)):
65+
sys.exit(
66+
f"OLLAMA_HOST: {host}, OLLAMA_MODEL: {model} and "
67+
f"EMBEDDING_MODEL: {embed_model} must be set in the environment"
68+
)
69+
70+
llm = lo.ChatOllama(model=model, base_url=host, temperature=0)
71+
question = (
72+
"can you help me find a commit in https://opendev.org/openstack/nova and tell me about it?"
73+
"change-id: I63208c7bd5f9f4c3d5e4a40bd0f6253d0f042a37"
74+
)
75+
commit_message = """
76+
db: Resolve additional SAWarning warnings
77+
Resolving the following SAWarning warnings:
78+
79+
Coercing Subquery object into a select() for use in IN(); please pass
80+
a select() construct explicitly
81+
82+
SELECT statement has a cartesian product between FROM element(s)
83+
"foo" and FROM element "bar". Apply join condition(s) between each
84+
element to resolve.
85+
86+
While the first of these was a trivial fix, the second one is a little
87+
more involved. It was caused by attempting to build a query across
88+
tables that had no relationship as part of our archive logic. For
89+
example, consider the following queries, generated early in
90+
'_get_fk_stmts':
91+
92+
SELECT instances.uuid
93+
FROM instances, security_group_instance_association
94+
WHERE security_group_instance_association.instance_uuid = instances.uuid
95+
AND instances.id IN (__[POSTCOMPILE_id_1])
96+
97+
SELECT security_groups.id
98+
FROM security_groups, security_group_instance_association, instances
99+
WHERE security_group_instance_association.security_group_id = security_groups.id
100+
AND instances.id IN (__[POSTCOMPILE_id_1])
101+
102+
While the first of these is fine, the second is clearly wrong: why are
103+
we filtering on a field that is of no relevance to our join? These were
104+
generated because we were attempting to archive one or more instances
105+
(in this case, the instance with id=1) and needed to find related tables
106+
to archive at the same time. A related table is any table that
107+
references our "source" table - 'instances' here - by way of a foreign
108+
key. For each of *these* tables, we then lookup each foreign key and
109+
join back to the source table, filtering by matching entries in the
110+
source table. The issue here is that we're looking up every foreign key.
111+
What we actually want to do is lookup only the foreign keys that point
112+
back to our source table. This flaw is why we were generating the second
113+
SELECT above: the 'security_group_instance_association' has two foreign
114+
keys, one pointing to our 'instances' table but also another pointing to
115+
the 'security_groups' table. We want the first but not the second.
116+
117+
Resolve this by checking if the table that each foreign key points to is
118+
actually the source table and simply skip if not. With this issue
119+
resolved, we can enable errors on SAWarning warnings in general without
120+
any filters.
121+
122+
Conflicts:
123+
nova/objects/cell_mapping.py
124+
nova/tests/fixtures/nova.py
125+
126+
NOTE(melwitt): The conflicts are because
127+
128+
* Change Ia5304c552ce552ae3c5223a2bfb3a9cd543ec57c (db: Post
129+
reshuffle cleanup) is not in Wallaby
130+
131+
* The WarningsFixture is in nova/tests/fixtures.py in Wallaby and
132+
change I07be8e16381592fc177eeecd4f0d7f2db93eb09d (tests: Enable
133+
SADeprecationWarning warnings) is also not in Wallaby
134+
135+
Change-Id: I63208c7bd5f9f4c3d5e4a40bd0f6253d0f042a37
136+
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
137+
(cherry picked from commit 8142b9d)
138+
(cherry picked from commit ce2cc54)
139+
(cherry picked from commit 5b1d487)
140+
"""
141+
metadata = {
142+
"source": "https://github.com/openstack/nova/commit/5d0325a78b2a5f7a9109290fb23d46a841042da6"
143+
}
144+
cm_doc = documents.Document(page_content=commit_message, metadata=metadata)
145+
text_splitter = lts.RecursiveCharacterTextSplitter(
146+
chunk_size=1000, chunk_overlap=200
147+
)
148+
docs = text_splitter.split_documents([cm_doc])
149+
texts = [doc.page_content for doc in docs]
150+
151+
embedding_function = lo.OllamaEmbeddings(model=embed_model)
152+
153+
# load it in sqlite-vss in a table named state_union.
154+
# the db_file parameter is the name of the file you want
155+
# as your sqlite database.
156+
db = vectorstores.SQLiteVec.from_texts(
157+
texts=texts,
158+
embedding=embedding_function,
159+
table="state_union",
160+
db_file="./state/db/vec.db",
161+
)
162+
163+
# Compile application and test
164+
graph_builder = graph.StateGraph(State).add_sequence([retrieve, generate])
165+
graph_builder.add_edge(graph.START, "retrieve")
166+
app = graph_builder.compile()
167+
168+
response = app.invoke({"question": question, "llm": llm, "db": db})
169+
print(response["answer"])
42170

43171

44172
if __name__ == "__main__":

state/db/.keep

Whitespace-only changes.

state/repos/.keep

Whitespace-only changes.

0 commit comments

Comments
 (0)