**LLM Quant Tudor**

This agent acts as a tudor for the quantiative investment field answering any questions related to quant.
This agent will provide both a description of the problem and why it is using the approach it suggests as well as provide example code.
The agent will follow a two step process following a multi-agent response:

1) Query LLaMA 3.2 locally for the initial draft
2) Refine the output with GPT-4o-mini.

The output is formatted in a very professional and clean method using streaming and the rich library to render the output in html 

In [1]:
# imports

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [2]:
from rich.console import Console
from rich.markdown import Markdown
from rich.panel import Panel

def stream_html_output(final_answer: str):
    """
    Pretty-prints the final answer using HTML-like formatting in the terminal.
    Also optionally saves it as an HTML file.
    """
    console = Console()

    # Render markdown-styled response in terminal
    console.rule("[bold green]Final Answer from Ensemble Model[/bold green]")
    console.print(Panel.fit(Markdown(final_answer), title="Quant Tutor Output", border_style="blue"))

    # Save to HTML file (optional)
    html_output = f"""
    <html>
    <head><title>Quant Answer</title></head>
    <body style="font-family: monospace; padding: 20px; background-color: #f9f9f9;">
        <h2 style="color: #333;">Final Answer from LLaMA + GPT-4o-mini</h2>
        <pre style="background: #eee; padding: 15px; border-radius: 10px;">{final_answer}</pre>
    </body>
    </html>
    """
    with open("quant_answer.html", "w") as f:
        f.write(html_output)

    print("\nAnswer also saved to 'quant_answer.html'. Open it in your browser.\n")


In [3]:
# Initialize and constants

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


In [4]:
def format_quant_prompt_for_llm():
    """
    Prompts the user for a coding question related to quant trading and returns a structured prompt
    that can be fed into an LLM API for completion.
    """
    user_question = input("Enter your quant trading coding question: ")

    structured_prompt = f"""You are a technical tutor for the quantitative investment industry. 
Your job is to help users understand code as it relates to quantitative algorithmic trading and to generate code that will answer their technical question. 

Always answer questions with:
1. A concise explanation of **what the code does and why** it's relevant to quant trading and the users question.
2. The actual code.

Use clear, technical language with accurate reasoning. Prioritize clarity and correctness over verbosity. Avoid fluff. Use real-world quant concepts (e.g., time series, order books, execution logic, alpha factors, etc.) when relevant.

#### Q: {user_question}
1) What the code does and why:  
2) The actual code:
```python"""

    return structured_prompt

In [5]:
from openai import OpenAI
ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')

def query_llama3_local(prompt: str) -> str:
    """
    Uses OpenAI-compatible interface to query local Ollama (LLaMA 3) and return the response text.
    """
    response = ollama_via_openai.chat.completions.create(
        model="llama3.2",  # Match your actual local model name
        messages=[
            {"role": "system", "content": "You are a technical tutor for quantitative algorithmic trading."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.3
    )
    return response.choices[0].message.content.strip()

In [6]:
def query_gpt4o_mini(prompt: str) -> str:
    """
    Sends the prompt to GPT-4o-mini (via OpenAI SDK v1) and returns the improved response.
    """
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a technical tutor for the quantitative trading industry."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.3
    )
    return response.choices[0].message.content.strip()

In [7]:
def generate_quant_answer():
    """
    Runs the full pipeline: formats user input, gets draft from LLaMA 3.2, refines with GPT-4o-mini.
    """
    formatted_prompt = format_quant_prompt_for_llm()

    print("\n[Stage 1] Querying LLaMA 3.2 locally for initial draft...\n")
    llama_response = query_llama3_local(formatted_prompt)

    refinement_prompt = f"""Please improve the following draft answer for clarity, technical correctness, and completeness.
Stick to the format: 
1) What the code does and why  
2) The actual code

Here is the draft:
{llama_response}
"""

    print("\n[Stage 2] Refining with GPT-4o-mini...\n")
    final_response = query_gpt4o_mini(refinement_prompt)

    print("\n✅ Final Answer:\n")
    stream_html_output(final_response)

In [8]:
generate_quant_answer()

Enter your quant trading coding question:  How do you build a risk model using PCA in python?



[Stage 1] Querying LLaMA 3.2 locally for initial draft...


[Stage 2] Refining with GPT-4o-mini...


✅ Final Answer:




Answer also saved to 'quant_answer.html'. Open it in your browser.

