# Summary of this Notebook
- First section is implementing and testing Llama-Index's Text-To-SQL Retrieval Agent.
- Second section is a custom implementation of the same thing.

# Llama-Index Text-To-SQL Retrieval Agent
### Thoughts:
- Too inconsistent in its performance
- Easily makes up facts in the absence of results
- Isn't really able to grasp the full context of the data structure and meaning
- Underlying functionality difficult to modify, particularly the prompt template for the text-to-sql process prior to response synthesis.

In [None]:
import os
from dotenv import load_dotenv
from IPython.display import Markdown, display
import pandas as pd

from llama_index.core import SQLDatabase
from llama_index.llms.openai import OpenAI
from llama_index.core.indices.struct_store.sql_query import (
    SQLTableRetrieverQueryEngine,
)
from llama_index.core.objects import (
    SQLTableNodeMapping,
    ObjectIndex,
    SQLTableSchema,
)
from llama_index.core import VectorStoreIndex, PromptTemplate

from src.db.database import engine
from src.db import models
from src.db.database import session_scope


load_dotenv()


OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")


with session_scope() as session:
    res = session.query(models.Firms.sector).distinct().all()
    res = [r[0] for r in res]

sector = res[4]

llm = OpenAI(temperature=0.1, model="gpt-4o-mini", api_key=OPENAI_API_KEY)

sql_database = SQLDatabase(engine)

table_node_mapping = SQLTableNodeMapping(sql_database)

table_schema_objs = [
    (SQLTableSchema(table_name=table.__tablename__, context_str=table.__context_str__))
    for table in models.__dict__.values()
    if hasattr(table, "__tablename__")
]

obj_index = ObjectIndex.from_objects(
    table_schema_objs,
    table_node_mapping,
    VectorStoreIndex,
)

response_synthesis_prompt_str = (
    "Given an input question, synthesize a response from the query results. \
    You must ensure your response is completely factual.\n"
    "<query>{query_str}</query>\n"
    "<sql>{sql_query}</sql>\n"
    "<sql response>SQL Response: {context_str}</sql response>\n"
    "Response: "
)
response_synthesis_prompt = PromptTemplate(
    response_synthesis_prompt_str,
)

query_engine = SQLTableRetrieverQueryEngine(
    sql_database,
    obj_index.as_retriever(similarity_top_k=1),
    response_synthesis_prompt=response_synthesis_prompt,
)

# query = "What are the fields in the meetings table and what do they represent contextually?"
# query = "Using just your provided system messaging and without using SQL, \
#     What are the fields in the meetings table and what do they represent contextually?"
# query = "What is the name of the firm that has the most meetings and how many meets do they have?"
# query = "Can you show me the first 5 rows of meetings?"
query = "Fetch the first 5 meetings and their content which have a firm attended that are in the {sector} sector.".format(
    sector=sector
)
response = query_engine.query(query)

print("SQL Query:")
print("```\n" + response.metadata["sql_query"] + "\n```")
print("Response:")
display(Markdown(f"<b>{response}</b>"))
if "result" in response.metadata:
    display(
        pd.DataFrame(response.metadata["result"], columns=response.metadata["col_keys"])
    )

SQL Query:
```
SELECT m.title, m.content
FROM meetings m
JOIN firms f ON m.firm_attended_id = f.firm_id
WHERE f.sector = 'Electronic Equipment & Instruments'
ORDER BY m.date
LIMIT 5;
```
Response:


<b>The first 5 meetings with content that have a firm attended in the Electronic Equipment & Instruments sector are as follows:
1. Email with Trimble Inc. regarding Follow-Up on Potential Opportunities
2. Email with Trimble Inc. regarding Follow-Up on Potential Collaborations
3. Call with Teledyne Technologies discussing Hologic's performance metrics and growth potential
4. Call with Teledyne Technologies discussing Cognizant's growth in digital services and potential synergies with Essex Property Trust
5. Email with Trimble Inc. discussing Potential Investment Opportunities.</b>

Unnamed: 0,title,content
0,Email with Trimble Inc.,Subject: Follow-Up on Potential Opportunities\...
1,Email with Trimble Inc.,Subject: Follow-Up on Potential Collaborations...
2,Call with Teledyne Technologies,- Discussed Hologic's recent performance metri...
3,Call with Teledyne Technologies,- Discussed Cognizant's recent growth in digit...
4,Email with Trimble Inc.,Subject: Discussion on Potential Investment Op...


# Custom Implementation
- Orinally I tried a full text to sql agent into a responder agent, however the text to sql agent was highly unstable when required to perform queries that used association tables.
- This has now therefore changed from being completely text-to-sql process to a more structured workflow:
    1. User sends prompt to the agent
    2. Agent re-writes the prompt as a natural language instruction to the text-to-sql agent. The text-to-sql agent only writes the sql to find the meeting_ids of the meetings relevant to the instructions. Programmatic querying then completes the full output by joining the relevant tables.
    3. The returned table is converted to markdown and passed to the original agent for generating a response to the user.
- This is more stable than before, however still have some issues where a query asks for a list of meetings but also to identify which ones the user was not in attendance of - this returns only the meetings they were/weren't in attendance, never both.

In [1]:
import os
from dotenv import load_dotenv
from textwrap import dedent
from datetime import datetime, timedelta
from IPython.display import Markdown, display

import pandas as pd
from llama_index.llms.openai import OpenAI
from llama_index.core import PromptTemplate
from sqlalchemy import func

from src.db.database import session_scope
from src.db import models
from src.rag.sql_responder import MeetingsSQLQnAAgent
from src.rag.sql_retriever import MeetingsSQLRetrieverAgent


load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

with session_scope() as session:
    res = (session.query(func.max(models.Meetings.date))).one_or_none()

current_date = res[0].strftime("%Y-%m-%d")

current_date_dt = datetime.strptime(current_date, "%Y-%m-%d")
minus_30_days = current_date_dt - timedelta(days=30)

with session_scope() as session:
    res = (
        session.query(models.Meetings)
        .filter(
            models.Meetings.date >= minus_30_days.strftime("%Y-%m-%d"),
            models.Meetings.date <= current_date,
        )
        .all()
    )
    for meeting in res:
        meeting.employees


res = [x for y in [[str(y.employee_id) for y in x.employees] for x in res] for x in y]
user_id = pd.Series(res).value_counts().sort_values(ascending=False).index[0]

with session_scope() as session:
    res = (
        session.query(models.Employees.name)
        .filter(models.Employees.employee_id == user_id)
        .first()
    )
user_name = res[0]

In [8]:
query_template = PromptTemplate(
    dedent(
        """
        **User Query:**\n
        {query}\n\n

        **Key Information:**\n
        - User's Employee ID: {employee_id}\n
        - User's Name: {user_name}\n
        - Current Date: {current_date}\n
        """
    )
)
query = query_template.format(
    # query="Give me a summary of meetings I have attended in the last month.",
    # query="Write a report on the last five meetings we have had with Marathon Petroleum.",
    # query="Write a report on the last five months of meetings we have had with Marathon Petroleum.",
    # query="Write a report on the last five months of meetings we have had with Marathon Petroleum or where they were discussed.",
    # query="Summarise all of our interactions with Marathon Petroleum.",
    # query="Summarise all of our interactions with Marathon Petroleum, including discussion with other firms where Marathon Petroleum were discussed.",
    # query="Create a summary of our relationship with Marathon Petroleum",
    # query="What were the follow-up actions from the last 10 meetings with Marathon Petroleum?",
    query="In the last 10 months, who are the companies we have mentioned merger and acquisition initiatives?",
    # query="What were all the meetings in the last 2 months and which ones did I not attend?",
    # query="Which meetings have been tagged as interesting?", # NOTE: BOGUS QUERY
    employee_id=user_id,
    user_name=user_name,
    current_date=current_date,
)

qna_agent = MeetingsSQLQnAAgent(
    llm=OpenAI(temperature=0.1, model="gpt-4o-mini", api_key=OPENAI_API_KEY),
    agent=MeetingsSQLRetrieverAgent(
        llm=OpenAI(temperature=0.1, model="gpt-4o-mini", api_key=OPENAI_API_KEY),
        schema_file_path="src/db/models.py",
        verbose=True,
    ),
    verbose=True,
)

response = qna_agent.complete(query)
display(Markdown("# RESPONSE:"))
display(Markdown(response.response))
print(response.beam_ids)
print(response.response_clipped)

AI QUERY:
Retrieve all companies mentioned in merger and acquisition initiatives in the last 10 months.
CHAIN OF THOUGHTS:
Thoughts: To retrieve all meetings related to merger and acquisition initiatives, I need to focus on the firms discussed during those meetings. The relevant information is stored in the 'meetings' table and the 'meeting_firms' association table. I will filter the meetings based on the date to ensure they are within the last 10 months.
Outcome: I will join the 'meetings' table with the 'meeting_firms' association table and filter the results based on the date. 

Thoughts: I need to calculate the date range for the last 10 months from the current date. This will be used to filter the meetings based on their date.
Outcome: The date filter will be set to meetings.date >= (current_date - interval '10 months'). 

Thoughts: I will also need to ensure that I am only selecting the meeting_id from the meetings table as per the requirements.
Outcome: The final query will sele

# RESPONSE:

### Summary of Companies Mentioned in Merger and Acquisition Initiatives

In the last 10 months, the following companies have been discussed in relation to merger and acquisition initiatives:

1. **Ameriprise Financial**
   - Discussed in a meeting with GE Vernova on December 26, 2025. Potential synergies and M&A history were highlighted. <ref>49bc2578-849a-439e-b534-f2d4878f9789</ref>

2. **F5, Inc.**
   - Mentioned in a meeting with NVR, Inc. on December 21, 2025. NVR expressed interest in potential acquisition opportunities. <ref>88bb960f-304e-4286-836a-6176b9075c03</ref>

3. **ServiceNow**
   - Discussed in a meeting with PNC Financial Services on December 21, 2025. PNC was interested in potential acquisition targets. <ref>8947f5f5-5b98-4a08-a083-1dbe08c7939b</ref>

4. **News Corp (Class B)**
   - Mentioned in an email on December 20, 2025, regarding potential investment opportunities. <ref>0c8210ea-9407-4ddc-8623-2d18cb138827</ref>

5. **Kellanova**
   - Discussed in a meeting with Host Hotels & Resorts on December 16, 2025, regarding potential growth opportunities. <ref>2610d94b-0c5d-43c7-a158-5016353e638b</ref>

6. **Cognizant**
   - Mentioned in an email on December 13, 2025, regarding potential collaboration opportunities. <ref>5514476e-09e6-4153-8b8f-60569da5cb0f</ref>

7. **NXP Semiconductors**
   - Discussed in an email on December 1, 2025, regarding potential collaboration opportunities. <ref>f3744b51-50ff-403c-a5cd-e673f3fd5af8</ref>

8. **Ventas**
   - Mentioned in a meeting with Kellanova on November 28, 2025, regarding potential acquisition synergies. <ref>16f99d7b-714a-4f6d-9bbc-f05acafab5e8</ref>

### Conclusion

A total of **8 companies** have been mentioned in recent discussions related to merger and acquisition initiatives. If you need more detailed information about any specific company or meeting, please let me know!

['379e469b-b7c3-4203-9734-d6fbd2d4c0ec', 'eb0365b2-616d-4607-a87d-767277373892', 'f4489bd7-2090-4a1d-9a7f-bd323e4494bb', '88827faf-715f-4a04-8193-a0be3091a014', '5bd63145-0499-4130-afe1-15ae7f856388', '0b056295-ff25-4a8e-a281-003bd2859c6a', '231b0728-bc47-4695-9550-7b8476b66539', 'f64ef649-55ed-44ff-ad18-17b61ff2df7f', '03000108-aad6-4c87-90e4-2056464d705e', '63e1aa1b-b275-4c49-bd73-a99577021bf0', '306d515c-783e-4477-a59c-274df131c2e3', '21031f2c-c018-4356-975f-ed3151f65b3c', '5aa317cd-226c-4cca-9140-210f7906c943', '8d1b9cb0-041a-4de6-b5af-9d4d323e8ade', 'faceaa29-7c55-4c52-a2cc-ee78351769ea', '88192bc1-305e-4de3-859c-121d91eadb2f', '610b4c9f-8893-439e-9b53-5cbc3b3ee98a', 'ec57b877-4cdd-4b0e-8278-83b44196d3e3', '9258927a-9c4b-4210-942e-0fa88a50f975', '403e7ae0-6b9e-49dd-8c93-bbbc58cc8778', '3bf8fb45-5ebd-44b1-8944-64518811f203', '6b4dbe11-506d-4a1d-9655-0bdf47f0cdd6', '9f3042a5-8f36-4c27-9c54-b8c116f70534', '550e6e8f-7be7-4ede-be8c-93355c1d2f3a', 'f5bb0633-4fa8-4c5d-b843-9a39c825e7ef',