# Llama-Index Text-To-SQL Retrieval Agent
### Thoughts:
- Too inconsistent in its performance
- Easily makes up facts in the absence of results
- Isn't really able to grasp the full context of the data structure and meaning
- Underlying functionality difficult to modify, particularly the prompt template for the text-to-sql process prior to response synthesis.

In [None]:
import os
from dotenv import load_dotenv
from IPython.display import Markdown, display
import pandas as pd

from llama_index.core import SQLDatabase
from llama_index.llms.openai import OpenAI
from llama_index.core.indices.struct_store.sql_query import (
    SQLTableRetrieverQueryEngine,
)
from llama_index.core.objects import (
    SQLTableNodeMapping,
    ObjectIndex,
    SQLTableSchema,
)
from llama_index.core import VectorStoreIndex, PromptTemplate

from src.db.database import engine
from src.db import models
from src.db.database import session_scope


load_dotenv()


OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")


with session_scope() as session:
    res = session.query(models.Firms.sector).distinct().all()
    res = [r[0] for r in res]

sector = res[4]

llm = OpenAI(temperature=0.1, model="gpt-4o-mini", api_key=OPENAI_API_KEY)

sql_database = SQLDatabase(engine)

table_node_mapping = SQLTableNodeMapping(sql_database)

table_schema_objs = [
    (SQLTableSchema(table_name=table.__tablename__, context_str=table.__context_str__))
    for table in models.__dict__.values()
    if hasattr(table, "__tablename__")
]

obj_index = ObjectIndex.from_objects(
    table_schema_objs,
    table_node_mapping,
    VectorStoreIndex,
)

response_synthesis_prompt_str = (
    "Given an input question, synthesize a response from the query results. \
    You must ensure your response is completely factual.\n"
    "<query>{query_str}</query>\n"
    "<sql>{sql_query}</sql>\n"
    "<sql response>SQL Response: {context_str}</sql response>\n"
    "Response: "
)
response_synthesis_prompt = PromptTemplate(
    response_synthesis_prompt_str,
)

query_engine = SQLTableRetrieverQueryEngine(
    sql_database,
    obj_index.as_retriever(similarity_top_k=1),
    response_synthesis_prompt=response_synthesis_prompt,
)

# query = "What are the fields in the meetings table and what do they represent contextually?"
# query = "Using just your provided system messaging and without using SQL, \
#     What are the fields in the meetings table and what do they represent contextually?"
# query = "What is the name of the firm that has the most meetings and how many meets do they have?"
# query = "Can you show me the first 5 rows of meetings?"
query = "Fetch the first 5 meetings and their content which have a firm attended that are in the {sector} sector.".format(
    sector=sector
)
response = query_engine.query(query)

print("SQL Query:")
print("```\n" + response.metadata["sql_query"] + "\n```")
print("Response:")
display(Markdown(f"<b>{response}</b>"))
if "result" in response.metadata:
    display(
        pd.DataFrame(response.metadata["result"], columns=response.metadata["col_keys"])
    )

SQL Query:
```
SELECT m.title, m.content
FROM meetings m
JOIN firms f ON m.firm_attended_id = f.firm_id
WHERE f.sector = 'Electronic Equipment & Instruments'
ORDER BY m.date
LIMIT 5;
```
Response:


<b>The first 5 meetings with content that have a firm attended in the Electronic Equipment & Instruments sector are as follows:
1. Email with Trimble Inc. regarding Follow-Up on Potential Opportunities
2. Email with Trimble Inc. regarding Follow-Up on Potential Collaborations
3. Call with Teledyne Technologies discussing Hologic's performance metrics and growth potential
4. Call with Teledyne Technologies discussing Cognizant's growth in digital services and potential synergies with Essex Property Trust
5. Email with Trimble Inc. discussing Potential Investment Opportunities.</b>

Unnamed: 0,title,content
0,Email with Trimble Inc.,Subject: Follow-Up on Potential Opportunities\...
1,Email with Trimble Inc.,Subject: Follow-Up on Potential Collaborations...
2,Call with Teledyne Technologies,- Discussed Hologic's recent performance metri...
3,Call with Teledyne Technologies,- Discussed Cognizant's recent growth in digit...
4,Email with Trimble Inc.,Subject: Discussion on Potential Investment Op...


# Custom Implementation
- Orinally I tried a full text to sql agent into a responder agent, however the text to sql agent was highly unstable when required to perform queries that used association tables.
- This has now therefore changed from being completely text-to-sql process to a more structured workflow:
    1. User sends prompt to the agent
    2. Agent re-writes the prompt as a natural language instruction to the text-to-sql agent. The text-to-sql agent only writes the sql to find the meeting_ids of the meetings relevant to the instructions. Programmatic querying then completes the full output by joining the relevant tables.
    3. The returned table is converted to markdown and passed to the original agent for generating a response to the user.
- This is more stable than before, however still have some issues where a query asks for a list of meetings but also to identify which ones the user was not in attendance of - this returns only the meetings they were/weren't in attendance, never both.

In [1]:
import os
from dotenv import load_dotenv
from textwrap import dedent
from datetime import datetime, timedelta
from IPython.display import Markdown, display

import pandas as pd
from llama_index.llms.openai import OpenAI
from llama_index.core import PromptTemplate
from sqlalchemy import func

from src.db.database import session_scope
from src.db import models
from src.rag.sql_responder import MeetingsSQLQnAAgent
from src.rag.sql_retriever import MeetingsSQLRetrieverAgent


load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

with session_scope() as session:
    res = (session.query(func.max(models.Meetings.date))).one_or_none()

current_date = res[0].strftime("%Y-%m-%d")

current_date_dt = datetime.strptime(current_date, "%Y-%m-%d")
minus_30_days = current_date_dt - timedelta(days=30)

with session_scope() as session:
    res = (
        session.query(models.Meetings)
        .filter(
            models.Meetings.date >= minus_30_days.strftime("%Y-%m-%d"),
            models.Meetings.date <= current_date,
        )
        .all()
    )
    for meeting in res:
        meeting.employees


res = [x for y in [[str(y.employee_id) for y in x.employees] for x in res] for x in y]
user_id = pd.Series(res).value_counts().sort_values(ascending=False).index[0]

with session_scope() as session:
    res = (
        session.query(models.Employees.name)
        .filter(models.Employees.employee_id == user_id)
        .first()
    )
user_name = res[0]

In [4]:
query_template = PromptTemplate(
    dedent(
        """
        **User Query:**\n
        {query}\n\n

        **Key Information:**\n
        - User's Employee ID: {employee_id}\n
        - User's Name: {user_name}\n
        - Current Date: {current_date}\n
        """
    )
)
query = query_template.format(
    # query="Give me a summary of meetings I have attended in the last month.",
    # query="Write a report on the last five meetings we have had with Marathon Petroleum.",
    # query="Write a report on the last five months of meetings we have had with Marathon Petroleum.",
    # query="Write a report on the last five months of meetings we have had with Marathon Petroleum or where they were discussed.",
    # query="Summarise all of our interactions with Marathon Petroleum.",
    # query="Summarise all of our interactions with Marathon Petroleum, including discussion with other firms where Marathon Petroleum were discussed.",
    # query="Create a summary of our relationship with Marathon Petroleum",
    query="What were the follow-up actions from the last 10 meetings with Marathon Petroleum?",
    # query="What were all the meetings in the last 2 months and which ones did I not attend?",
    # query="Which meetings have been tagged as interesting?", # NOTE: BOGUS QUERY
    employee_id=user_id,
    user_name=user_name,
    current_date=current_date,
)

qna_agent = MeetingsSQLQnAAgent(
    llm=OpenAI(temperature=0.1, model="gpt-4o-mini", api_key=OPENAI_API_KEY),
    agent=MeetingsSQLRetrieverAgent(
        llm=OpenAI(temperature=0.1, model="gpt-4o-mini", api_key=OPENAI_API_KEY),
        schema_file_path="src/db/models.py",
        verbose=True,
    ),
    verbose=True,
)

response = qna_agent.complete(query)
print("RESPONSE:")
display(Markdown(f"<b>{response.response}</b>"))
print(response.beam_ids)
print(response.response_clipped)

AI QUERY:
Retrieve the last 10 meetings with Marathon Petroleum and their follow-up actions.
CHAIN OF THOUGHTS:
Thoughts: I need to retrieve the last 10 meetings that involved Marathon Petroleum. This means I will need to join the meetings table with the firms table to filter by the firm's name. I will also need to order the results by the meeting date to get the most recent meetings.
Outcome: Join the meetings table with the firms table using the firm_attended_id and firm_id, filter by the firm's name using ILIKE, and order by the meeting date in descending order, limiting the results to 10. 

SQL QUERY:
```
SELECT meetings.meeting_id 
FROM meetings 
JOIN firms ON meetings.firm_attended_id = firms.firm_id 
WHERE firms.name ILIKE '%Marathon Petroleum%' 
ORDER BY meetings.date DESC 
LIMIT 10;
```
RETURNED DATA:
**The database returned 10 records.**

|    | meeting_id                           | date of interaction   | beam_id                              | title                         

<b>### Follow-Up Actions from Recent Meetings with Marathon Petroleum

I found **10 records** related to your query about follow-up actions from meetings with Marathon Petroleum. Here are the key follow-up actions discussed in the most recent meetings:

1. **Email with Marathon Petroleum (December 22, 2025)**  
   - Follow up on potential collaborations with Ameriprise Financial, NXP Semiconductors, and Cognizant.  
   - Request insights from Kathleen on Ameriprise's strategic initiatives.  
   - Seek Robert's perspective on potential synergies with NXP Semiconductors.  
   - Discuss initiating dialogue with Cognizant regarding digital transformation expertise.  
   <ref>8928f960-c378-4c41-b3f7-86b11de6da57</ref>

2. **Call with Marathon Petroleum (June 8, 2025)**  
   - Follow up with detailed financial analyses.  
   - Schedule a follow-up meeting to discuss potential acquisitions and strategic alignment.  
   <ref>6ed6a2a5-561d-44a6-8539-dc334bad542b</ref>

3. **Email with Marathon Petroleum (May 6, 2025)**  
   - Discuss potential partnerships with Valero Energy Corporation, Phillips 66, and HollyFrontier Corporation.  
   - Gather insights on these firms' current market positions.  
   <ref>61ca730c-5873-4a9b-85ff-557ba9db9c0f</ref>

4. **Email with Marathon Petroleum (April 10, 2025)**  
   - Schedule a call to explore collaboration opportunities with Union Pacific Corporation.  
   <ref>e3c46529-b96d-427e-839f-43a42ad5fd56</ref>

5. **Call with Marathon Petroleum (October 26, 2024)**  
   - Follow up on specific targets for potential acquisitions.  
   - Compile a list of potential firms for the next discussion.  
   <ref>173b391e-d579-451d-aaab-fbd8ea1ac3f6</ref>

If you need more details or have further questions, feel free to ask!</b>

['d73ecc0e-f6ba-4166-b7d6-39c1fd5d5bd5', '61ca730c-5873-4a9b-85ff-557ba9db9c0f', '6ed6a2a5-561d-44a6-8539-dc334bad542b', '629e2b39-ba8c-4e56-b0c8-e0b232c73548', 'b81098b7-0838-4d02-aa85-334a303a59de', '173b391e-d579-451d-aaab-fbd8ea1ac3f6', '8928f960-c378-4c41-b3f7-86b11de6da57', 'de8edb79-9fc0-4bb8-a66c-64a938d706ff', 'e3c46529-b96d-427e-839f-43a42ad5fd56', 'a17be9fa-504a-42e7-ab9a-06df2c640c1e']
False
