# Llama-Index Text-To-SQL Retrieval Agent
### Thoughts:
- Too inconsistent in its performance
- Easily makes up facts in the absence of results
- Isn't really able to grasp the full context of the data structure and meaning
- Underlying functionality difficult to modify, particularly the prompt template for the text-to-sql process prior to response synthesis.

In [None]:
import os
from dotenv import load_dotenv
from IPython.display import Markdown, display
import pandas as pd

from llama_index.core import SQLDatabase
from llama_index.llms.openai import OpenAI
from llama_index.core.indices.struct_store.sql_query import (
    SQLTableRetrieverQueryEngine,
)
from llama_index.core.objects import (
    SQLTableNodeMapping,
    ObjectIndex,
    SQLTableSchema,
)
from llama_index.core import VectorStoreIndex, PromptTemplate

from src.db.database import engine
from src.db import models
from src.db.database import session_scope


load_dotenv()


OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")


with session_scope() as session:
    res = session.query(models.Firms.sector).distinct().all()
    res = [r[0] for r in res]

sector = res[4]

llm = OpenAI(temperature=0.1, model="gpt-4o-mini", api_key=OPENAI_API_KEY)

sql_database = SQLDatabase(engine)

table_node_mapping = SQLTableNodeMapping(sql_database)

table_schema_objs = [
    (SQLTableSchema(table_name=table.__tablename__, context_str=table.__context_str__))
    for table in models.__dict__.values()
    if hasattr(table, "__tablename__")
]

obj_index = ObjectIndex.from_objects(
    table_schema_objs,
    table_node_mapping,
    VectorStoreIndex,
)

response_synthesis_prompt_str = (
    "Given an input question, synthesize a response from the query results. \
    You must ensure your response is completely factual.\n"
    "<query>{query_str}</query>\n"
    "<sql>{sql_query}</sql>\n"
    "<sql response>SQL Response: {context_str}</sql response>\n"
    "Response: "
)
response_synthesis_prompt = PromptTemplate(
    response_synthesis_prompt_str,
)

query_engine = SQLTableRetrieverQueryEngine(
    sql_database,
    obj_index.as_retriever(similarity_top_k=1),
    response_synthesis_prompt=response_synthesis_prompt,
)

# query = "What are the fields in the meetings table and what do they represent contextually?"
# query = "Using just your provided system messaging and without using SQL, \
#     What are the fields in the meetings table and what do they represent contextually?"
# query = "What is the name of the firm that has the most meetings and how many meets do they have?"
# query = "Can you show me the first 5 rows of meetings?"
query = "Fetch the first 5 meetings and their content which have a firm attended that are in the {sector} sector.".format(
    sector=sector
)
response = query_engine.query(query)

print("SQL Query:")
print("```\n" + response.metadata["sql_query"] + "\n```")
print("Response:")
display(Markdown(f"<b>{response}</b>"))
if "result" in response.metadata:
    display(
        pd.DataFrame(response.metadata["result"], columns=response.metadata["col_keys"])
    )

SQL Query:
```
SELECT m.title, m.content
FROM meetings m
JOIN firms f ON m.firm_attended_id = f.firm_id
WHERE f.sector = 'Electronic Equipment & Instruments'
ORDER BY m.date
LIMIT 5;
```
Response:


<b>The first 5 meetings with content that have a firm attended in the Electronic Equipment & Instruments sector are as follows:
1. Email with Trimble Inc. regarding Follow-Up on Potential Opportunities
2. Email with Trimble Inc. regarding Follow-Up on Potential Collaborations
3. Call with Teledyne Technologies discussing Hologic's performance metrics and growth potential
4. Call with Teledyne Technologies discussing Cognizant's growth in digital services and potential synergies with Essex Property Trust
5. Email with Trimble Inc. discussing Potential Investment Opportunities.</b>

Unnamed: 0,title,content
0,Email with Trimble Inc.,Subject: Follow-Up on Potential Opportunities\...
1,Email with Trimble Inc.,Subject: Follow-Up on Potential Collaborations...
2,Call with Teledyne Technologies,- Discussed Hologic's recent performance metri...
3,Call with Teledyne Technologies,- Discussed Cognizant's recent growth in digit...
4,Email with Trimble Inc.,Subject: Discussion on Potential Investment Op...


# Custom Implementation
- Orinally I tried a full text to sql agent into a responder agent, however the text to sql agent was highly unstable when required to perform queries that used association tables.
- This has now therefore changed from being completely text-to-sql process to a more structured workflow:
    1. User sends prompt to the agent
    2. Agent re-writes the prompt as a natural language instruction to the text-to-sql agent. The text-to-sql agent only writes the sql to find the meeting_ids of the meetings relevant to the instructions. Programmatic querying then completes the full output by joining the relevant tables.
    3. The returned table is converted to markdown and passed to the original agent for generating a response to the user.
- This is more stable than before, however still have some issues where a query asks for a list of meetings but also to identify which ones the user was not in attendance of - this returns only the meetings they were/weren't in attendance, never both.

In [1]:
import os
from dotenv import load_dotenv
from textwrap import dedent
from datetime import datetime, timedelta
from IPython.display import Markdown, display

import pandas as pd
from llama_index.llms.openai import OpenAI
from llama_index.core import PromptTemplate
from sqlalchemy import func

from src.db.database import session_scope
from src.db import models
from src.rag.sql_responder import MeetingsSQLQnAAgent
from src.rag.sql_retriever import MeetingsSQLRetrieverAgent


load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

with session_scope() as session:
    res = (session.query(func.max(models.Meetings.date))).one_or_none()

current_date = res[0].strftime("%Y-%m-%d")

current_date_dt = datetime.strptime(current_date, "%Y-%m-%d")
minus_30_days = current_date_dt - timedelta(days=30)

with session_scope() as session:
    res = (
        session.query(models.Meetings)
        .filter(
            models.Meetings.date >= minus_30_days.strftime("%Y-%m-%d"),
            models.Meetings.date <= current_date,
        )
        .all()
    )
    for meeting in res:
        meeting.employees


res = [x for y in [[str(y.employee_id) for y in x.employees] for x in res] for x in y]
user_id = pd.Series(res).value_counts().sort_values(ascending=False).index[0]

with session_scope() as session:
    res = (
        session.query(models.Employees.name)
        .filter(models.Employees.employee_id == user_id)
        .first()
    )
user_name = res[0]

In [5]:
query_template = PromptTemplate(
    dedent(
        """
        **User Query:**\n
        {query}\n\n

        **Key Information:**\n
        - User's Employee ID: {employee_id}\n
        - User's Name: {user_name}\n
        - Current Date: {current_date}\n
        """
    )
)
query = query_template.format(
    # query="Give me a summary of meetings I have attended in the last month.",
    # query="Write a report on the last five meetings we have had with Marathon Petroleum.",
    # query="Write a report on the last five months of meetings we have had with Marathon Petroleum.",
    # query="Write a report on the last five months of meetings we have had with Marathon Petroleum or where they were discussed.",
    # query="Summarise all of our interactions with Marathon Petroleum.",
    # query="Summarise all of our interactions with Marathon Petroleum, including discussion with other firms where Marathon Petroleum were discussed.",
    # query="Create a summary of our relationship with Marathon Petroleum",
    # query="What were the follow-up actions from the last 10 meetings with Marathon Petroleum?",
    query="In the last 10 months, who are the companies we have mentioned merger and acquisition initiatives?",
    # query="What were all the meetings in the last 2 months and which ones did I not attend?",
    # query="Which meetings have been tagged as interesting?", # NOTE: BOGUS QUERY
    employee_id=user_id,
    user_name=user_name,
    current_date=current_date,
)

qna_agent = MeetingsSQLQnAAgent(
    llm=OpenAI(temperature=0.1, model="gpt-4o-mini", api_key=OPENAI_API_KEY),
    agent=MeetingsSQLRetrieverAgent(
        llm=OpenAI(temperature=0.1, model="gpt-4o-mini", api_key=OPENAI_API_KEY),
        schema_file_path="src/db/models.py",
        verbose=True,
    ),
    verbose=True,
)

response = qna_agent.complete(query)
print("RESPONSE:")
display(Markdown(f"<b>{response.response}</b>"))
print(response.beam_ids)
print(response.response_clipped)

AI QUERY:
Retrieve all companies mentioned in merger and acquisition initiatives in the last 10 months.
CHAIN OF THOUGHTS:
Thoughts: To retrieve all companies mentioned in merger and acquisition initiatives, I need to focus on the meetings that discussed firms related to mergers and acquisitions. I will filter the meetings based on the date to ensure they are within the last 10 months.
Outcome: I will need to join the meetings table with the meetings_firms_discussed_association and firms table to get the relevant firms discussed in those meetings. 

Thoughts: I need to filter the meetings based on the date, which should be within the last 10 months from the current date.
Outcome: I will add a date filter to the SQL query. 

SQL QUERY:
```
SELECT meetings.meeting_id 
FROM meetings 
JOIN meeting_firms AS meeting_firms_1 ON meetings.meeting_id = meeting_firms_1.meeting_id 
JOIN firms ON firms.firm_id = meeting_firms_1.firm_id 
WHERE meetings.date >= NOW() - INTERVAL '10 months';
```
RETUR

<b>### Summary of Recent Merger and Acquisition Initiatives

In the last 10 months, we have discussed several companies regarding merger and acquisition initiatives. Here are the most recent mentions:

1. **GE Vernova**
   - **Date:** December 26, 2025
   - **Meeting ID:** <ref>49bc2578-849a-439e-b534-f2d4878f9789</ref>
   - **Discussion:** Potential synergies with Ameriprise Financial and exploring strategic partnerships in financial services.

2. **Marathon Petroleum**
   - **Date:** December 22, 2025
   - **Meeting ID:** <ref>8928f960-c378-4c41-b3f7-86b11de6da57</ref>
   - **Discussion:** Follow-up on potential collaborations with Ameriprise Financial, NXP Semiconductors, and Cognizant.

3. **NVR, Inc.**
   - **Date:** December 21, 2025
   - **Meeting ID:** <ref>88bb960f-304e-4286-836a-6176b9075c03</ref>
   - **Discussion:** Interested in potential acquisition opportunities.

4. **PNC Financial Services**
   - **Date:** December 21, 2025
   - **Meeting ID:** <ref>8947f5f5-5b98-4a08-a083-1dbe08c7939b</ref>
   - **Discussion:** Interested in ServiceNow's potential acquisition targets.

5. **News Corp (Class B)**
   - **Date:** December 20, 2025
   - **Meeting ID:** <ref>0c8210ea-9407-4ddc-8623-2d18cb138827</ref>
   - **Discussion:** Follow-up on recent discussions regarding potential investment opportunities.

### Total Records Found
A total of **186 records** were found in the database, but the above are the most recent relevant meetings discussing merger and acquisition initiatives. If you need more specific details or further analysis, please let me know!</b>

['fbc2bf45-1815-4351-83d8-5640f18df92a', 'eb0365b2-616d-4607-a87d-767277373892', 'f3744b51-50ff-403c-a5cd-e673f3fd5af8', 'f2201495-22b4-4816-bd6f-8bc8a9b06512', '5e0752e0-649d-4ac2-88ea-b7d324ce233d', 'd9ada0fa-8dfb-4d84-a50e-594d7efc19d7', 'dd25b015-c42c-4da8-9b9e-5911620d8238', '213f4d7a-3c68-4a7e-822a-1ee75692bc05', 'ece800bd-f3bf-4898-9164-02e5c73f5292', '6d9cb6e7-3140-4b7f-bea2-1bd72d1901a0', '22704fc7-e0d6-4bf3-97eb-9f02e39b6bb2', 'd127ce85-75c1-4e71-84ed-c3354af7c1d6', 'f591740b-7f03-48f6-a2f8-6e2d65866cea', 'f951c69c-4c33-4a93-b955-cd1dbd5abb8a', '9c6d3b5c-f5aa-46e4-a55a-3859197f3690', 'b211b93e-2451-4580-be34-155c7b62a900', 'be301756-9ce8-4f95-9f78-916ed3e5ccd7', 'ec57b877-4cdd-4b0e-8278-83b44196d3e3', '98f87e68-d4f1-425b-9768-c3072bb516b0', '403e7ae0-6b9e-49dd-8c93-bbbc58cc8778', '474cda1b-0846-4623-978e-c3c650c47693', 'f8e43400-4909-4c4d-b87e-e4468d696591', '3a4b62b4-d53a-4806-9b5d-fe9e0094284f', '22a77e71-f26b-4d5c-8655-12bfbf376df1', '0125e739-767e-4d0e-a835-436923a6e01c',