# Calculus Code Agent

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Danselem/GenAI_Tutorial/blob/main/arize/notebook/agent.ipynb)

<div style="text-align: center;">
  <img src="https://github.com/Danselem/GenAI_Tutorial/blob/main/arize/assets/Mathcode.png?raw=true" width="800"/>
</div>

The Calculus Agent is an AI assistant powered by a large language model (LLM) that solves calculus problems interactively. It combines natural language understanding with a Code Interpreter tool to execute Python code for symbolic math, numerical analysis, and plotting. Users can ask questions in plain English, and the agent decides when to run code—handling differentiation, integration, limits, series, and visualizations. To track and monitor performance, the project uses **Arize Phoenix** to log all LLM calls, ensuring transparency and observability. The agent itself is built using **LangGraph**, providing a flexible framework for managing tool calls, reasoning steps, and user interactions. This makes the Calculus Agent a reliable tutor and problem-solving companion for learners and researchers alike.



## Importing modules
We will use a number of mathematical packages such as numpy, pandas, scipy and sympy which is required for the mini Code Interpreter sandbox. In addition, we will use Langchain and Langgraph for building the agent and Arize Phoenix for Observability and later on Evaluation.

In [None]:
!pip install -q langchain langchain_community langchain-google-genai langgraph

!pip install -q arize-phoenix google-generativeai sympy scipy python-dotenv

!pip install -q openinference-instrumentation-langchain openinference-instrumentation

In [127]:
import os
import io
import base64
import traceback
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import sympy as sp
import scipy
from contextlib import redirect_stdout, redirect_stderr
from typing import Dict, List, Any, TypedDict, Annotated
from PIL import Image
from pathlib import Path
from dotenv import load_dotenv
from langchain_core.tools import tool
from langchain_core.messages import HumanMessage
from langchain_core.prompts import ChatPromptTemplate


from langchain_google_genai import ChatGoogleGenerativeAI


### Langraph
from langgraph.graph import StateGraph, START, END


### Tracing
from phoenix.otel import register
from phoenix.client import Client
from phoenix.client.types import PromptVersion


## Setting up Environment Variables

This project requires two APIs:

- **Google API**: The Google Gemini LLM and Embedding will be used for natural language reasoning and mathematical problem-solving. To get a key, create a project in the [Google AI Studio](https://aistudio.google.com/) and generate an API key.


- **Arize Phoenix**: Used for logging, monitoring, and analyzing LLM calls. To get a key, sign up on [Arize Phoenix](https://phoenix.arize.com/) and create an API token.


Both keys must be set in your environment (e.g., `.env` file) if running the project locally or in `Colab` secrets if running from Google Colab.

In [128]:
## Load environment variables from local environment

load_dotenv(Path('../../.env'))
os.environ["PHOENIX_API_KEY"] = os.getenv("PHOENIX_API_KEY")
os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = os.getenv("PHOENIX_COLLECTOR_ENDPOINT")
os.environ["GOOGLE_API_KEY"] = os.getenv("GOOGLE_API_KEY")

If running this notebook from Google Colab, then run the next cell

In [None]:
from google.colab import userdata
os.environ["PHOENIX_API_KEY"] = userdata.get('PHOENIX_API_KEY')
os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = userdata.get('PHOENIX_COLLECTOR_ENDPOINT')
os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY')

In [129]:
# configure the Phoenix tracer
tracer_provider = register(
  project_name="calculus-code-agent", 
  auto_instrument=True 
)

Overriding of current TracerProvider is not allowed
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented


🔭 OpenTelemetry Tracing Details 🔭
|  Phoenix Project: calculus-code-agent
|  Span Processor: SimpleSpanProcessor
|  Collector Endpoint: https://app.phoenix.arize.com/s/comradedaniel/v1/traces
|  Transport: HTTP + protobuf
|  Transport Headers: {'authorization': '****'}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.
|  
|  
|  `register` has set this TracerProvider as the global OpenTelemetry default.
|  To disable this behavior, call `register` with `set_global_tracer_provider=False`.



## Code Interpreter

The CodeInterpreter class is a mini sandbox that lets you safely run bits of Python code, then grab whatever the code prints, the tables it creates (Pandas DataFrames), and any plots it draws (Matplotlib figures).

Here’s the explanation step by step:

1. **Setup (`__init__`)**  
   Preloads libraries (`np`, `pd`, `plt`, `sp`, `scipy`) so they’re ready to use.

2. **Execute (`execute_code`)**  
   - Redirects `print()` and errors into memory (not the screen).  
   - Runs the given code inside the safe workspace.  

3. **Collect DataFrames**  
   Finds any Pandas tables created, saves their shape, columns, and first few rows.

4. **Collect Plots**  
   Grabs Matplotlib plots, saves them as PNG (base64), then closes them.

5. **Capture Last Result**  
   Stores the final evaluated value (if any).

6. **Error Handling**  
   If code fails, saves the error instead of crashing.

7. **Return Report**  
   Gives back a dictionary with:  
   - status (`success`/`error`)  
   - stdout (prints)  
   - stderr (errors)  
   - dataframes (tables summary)  
   - plots (as images)  
   - result (last value)  


In [130]:
class CodeInterpreter:
    def __init__(self):
        # Allowed libraries
        self.globals = {
            "np": np,
            "pd": pd,
            "plt": plt,
            "sp": sp,
            "scipy": scipy,
        }

    def execute_code(self, code: str):
        """Execute Python code and capture stdout, stderr, DataFrames, plots."""
        stdout = io.StringIO()
        stderr = io.StringIO()
        dataframes = []
        plots = []
        result = None

        try:
            with redirect_stdout(stdout), redirect_stderr(stderr):
                # Execute code in self.globals namespace
                exec(code, self.globals)

                # Capture DataFrames
                for var_name, var_value in self.globals.items():
                    if isinstance(var_value, pd.DataFrame) and len(var_value) > 0:
                        dataframes.append({
                            "name": var_name,
                            "shape": var_value.shape,
                            "columns": list(var_value.columns),
                            "head": var_value.head().to_dict(orient="records")
                        })

                # Capture plots
                figs = [plt.figure(n) for n in plt.get_fignums()]
                for fig in figs:
                    buf = io.BytesIO()
                    fig.savefig(buf, format="png")
                    buf.seek(0)
                    plots.append(base64.b64encode(buf.read()).decode("utf-8"))
                    plt.close(fig)

                # If last expression result exists, store it
                result = self.globals.get("_")

            status = "success"

        except Exception:
            status = "error"
            stderr.write(traceback.format_exc())

        return {
            "status": status,
            "stdout": stdout.getvalue(),
            "stderr": stderr.getvalue(),
            "dataframes": dataframes,
            "plots": plots,
            "result": result,
        }

In [131]:
### =============== CODE INTERPRETER TOOLS =============== ###


@tool
def execute_python_code(code: str) -> str:
    """
    Execute Python code using CodeInterpreter and return a summarized response.

    Args:
        code (str): Python source code to execute.

    Returns:
        str: Summary of execution, including stdout, stderr, DataFrames, and plots if any.
    """
    interpreter = CodeInterpreter()
    result = interpreter.execute_code(code)
    response = []

    if result["status"] == "success":
        response.append("Code executed successfully")

        if result.get("stdout"):
            response.append(f"\n**Standard Output:**\n```\n{result['stdout'].strip()}\n```")

        if result.get("stderr"):
            response.append(f"\n**Standard Error (if any):**\n```\n{result['stderr'].strip()}\n```")

        if result.get("result") is not None:
            response.append(f"\n**Execution Result:**\n```\n{result['result']}\n```")

        if result.get("dataframes"):
            for df_info in result["dataframes"]:
                df_preview = pd.DataFrame(df_info["head"])
                response.append(
                    f"\n**DataFrame `{df_info['name']}` (Shape: {df_info['shape']})**\n"
                    f"First 5 rows:\n```\n{df_preview}\n```"
                )

        if result.get("plots"):
            response.append(f"\n**Generated {len(result['plots'])} plot(s)** (image data available separately)")

    else:
        response.append("Code execution failed")
        if result.get("stderr"):
            response.append(f"\n**Error Log:**\n```\n{result['stderr'].strip()}\n```")

    return "\n".join(response)


## Defining Prompts

In [132]:
system_message = """
You are a Python code assistant specialized in solving calculus problems
(derivatives, integrals, limits, series, etc.). Given a user question,
your task is to generate syntactically correct Python code to solve the problem.

Constraints:
- Use only SymPy or SciPy for symbolic or numerical computation.
- Do NOT solve the problem manually; code must compute the solution.
- Display the equation or expression symbolically if possible.
- Code must be valid Python and ready to execute in the `execute_python_code` tool.
"""

user_prompt = "Calculus Question: {input}"

calculus_prompt_template = ChatPromptTemplate(
    [("system", system_message), ("user", user_prompt)]
)

# Optional: print messages to verify
for message in calculus_prompt_template.messages:
    message.pretty_print()




You are a Python code assistant specialized in solving calculus problems
(derivatives, integrals, limits, series, etc.). Given a user question,
your task is to generate syntactically correct Python code to solve the problem.

Constraints:
- Use only SymPy or SciPy for symbolic or numerical computation.
- Do NOT solve the problem manually; code must compute the solution.
- Display the equation or expression symbolically if possible.
- Code must be valid Python and ready to execute in the `execute_python_code` tool.


Calculus Question: [33;1m[1;3m{input}[0m


## Logging Prompts
Arize Phoenix allows you to log your prompts and update them when necessary. This can be achieved with `phoenix.client.prompts`.

In [13]:
prompt_name = "calculus-code-prompt"
prompt = Client().prompts.create(
    name=prompt_name,
    prompt_description="Write a Python code to solve a calculus problem",
    version=PromptVersion(
        [{"role": "user", "content": system_message}],
        model_name="gemini-2.0-flash-lite-001",
        model_provider="GOOGLE"
    ),
)

## Application State
Here we try to controls the data input to the agent, transferred between steps, and output of the agent. 

We will keep track of input question, generated code, code result, and generated answer:

In [133]:
class State(TypedDict):
    question: str
    code: str
    result: str
    answer: str

## Convert Query to Python Code

In [134]:
# Create Google Gemini LLM instance
llm = ChatGoogleGenerativeAI(
            model="gemini-2.0-flash-lite-001",
            temperature=0,
            max_tokens=1024,
        )

class CalcOutput(TypedDict):
    """Structured response from LLM with math expression/code to execute."""
    code: Annotated[str, ..., "Valid Python code using sympy or scipy for solving the calculus problem."]

def write_calculus_code(state: dict):
    """Use LLM to generate executable Python code for a calculus problem."""

    prompt = calculus_prompt_template.invoke({"input": state["question"]})
    
    # Enforce structured output
    structured_llm = llm.with_structured_output(CalcOutput)
    result = structured_llm.invoke(prompt)
    
    # Return only the Python code
    return {"code": result["code"]}

Let's test the `write_calculus_code` function.

In [12]:
code_result= write_calculus_code({"question": "integrate 2x+7"})

code_result

{'code': 'import sympy\nx = sympy.Symbol("x")\nresult = sympy.integrate(2*x + 7, x)\nprint(result)'}

In [14]:
print(code_result['code'])

import sympy
x = sympy.Symbol("x")
result = sympy.integrate(2*x + 7, x)
print(result)


As you can see from above, it worked.

In [135]:
def run_calculus_code(state: State):
    """
    Execute Python code for a calculus problem using the execute_python_code tool.
    
    Args:
        state (State): Dictionary containing the generated Python code under "code".
        
    Returns:
        dict: Execution result returned from execute_python_code.
    """
    # Call the tool with the code from the agent
    result = execute_python_code.invoke(state["code"])
    return {"result": result}

## Testing the code in a mini sandbox

Let's test if the code successfully executes the code in the mini sandbox by running the next cell. 

In [16]:
run_calculus_code({
    "code": """
import sympy as sp

# Define symbol
x = sp.symbols('x')

# Define function
f = x**3 - 4*x

# Display function
sp.pprint(f)

# Compute derivative
df = sp.diff(f, x)
print(df)
"""
})

{'result': 'Code executed successfully\n\n**Standard Output:**\n```\n3      \nx  - 4⋅x\n3*x**2 - 4\n```'}

From the above, we can see the code ran succeesully using the `run_calculus_code` tool.

In the next cell, we create a function that generates the final answer to the question.

In [136]:
def generate_answer(state: State):
    """
    Answer a calculus question using the generated Python code and its execution result.

    Args:
        state (State): Dictionary containing:
            - "question": the user’s calculus question
            - "code": the generated Python code
            - "result": the output from running the code

    Returns:
        dict: LLM-generated answer based on code execution.
    """
    prompt = (
        "Given the following user question, the Python code generated to solve it, "
        "and the result of executing that code, answer the user question.\n\n"
        f"Question: {state['question']}\n"
        f"Generated Code:\n{state['code']}\n"
        f"Execution Result:\n{state['result']}"
    )
    
    response = llm.invoke(prompt)
    return {"answer": response.content}


## Building the Agent
We will use Langgraph `StateGraph` to build the agent and pass the tools required to run the agent.

In [137]:
calculus_graph_builder = StateGraph(State).add_sequence(
    [write_calculus_code, run_calculus_code, generate_answer]
)
calculus_graph_builder.add_edge(START, "write_calculus_code")
calculus_graph_builder.add_edge("generate_answer", END)
calculus_graph = calculus_graph_builder.compile()

In [None]:
from IPython.display import Image, display

display(Image(calculus_graph.get_graph().draw_mermaid_png()))

In [None]:
question = "Integrate 4x raise to power 4 wrt x?"

for step in calculus_graph.stream(
    {"question": question}, stream_mode="updates"
):
    print(step)


{'write_calculus_code': {'code': 'import sympy\nx = sympy.Symbol("x")\nresult = sympy.integrate(4*x**4, x)\nprint(result)'}}
{'run_calculus_code': {'result': 'Code executed successfully\n\n**Standard Output:**\n```\n4*x**5/5\n```'}}
{'generate_answer': {'answer': 'The integral of 4x^4 with respect to x is 4x^5/5.'}}


Wooh! That's it! Our calculus agent runs. It understands the question, then write the Python code and parse it to the mini sandbox and therafter, returns the answer with some comments.

Head over to your Arize Phoenix profile to see all the traces from you calling the agent. Keep in mind you will see the traces in the `calculus-code-agent` project.

In the cell below, the code is formatted such that it prints nicely.

In [138]:
question = "Integrate 4x raise to power 4 wrt x?"

for step in calculus_graph.stream({"question": question}, stream_mode="updates"):
    print("-"*80)
    for key, value in step.items():
        print(f"{key.upper()}:")
        # Extract the single value from the dict and print nicely
        if isinstance(value, dict):
            for subkey, subval in value.items():
                print(f"{subkey.capitalize()}: {subval}")
        else:
            print(value)
        


--------------------------------------------------------------------------------
WRITE_CALCULUS_CODE:
Code: import sympy
x = sympy.symbols("x")
result = sympy.integrate(4*x**4, x)
print(result)
--------------------------------------------------------------------------------
RUN_CALCULUS_CODE:
Result: Code executed successfully

**Standard Output:**
```
4*x**5/5
```
--------------------------------------------------------------------------------
GENERATE_ANSWER:
Answer: The integral of 4x^4 with respect to x is 4x^5/5.


In [139]:
def stream_and_print(question: str):
    """
    Stream through the LangGraph and print each step in a nicely formatted way.
    
    Args:
        graph: The compiled LangGraph object.
        question: The calculus question to process.
    """
    for step in calculus_graph.stream({"question": question}, stream_mode="updates"):
        print("-" * 80)
        for key, value in step.items():
            print(f"{key.upper()}:")
            if isinstance(value, dict):
                for subkey, subval in value.items():
                    print(f"{subkey.capitalize()}: {subval}")
            else:
                print(value)


## Additional Questions

Below are a list of Calculus questions to pass to the agent.

In [140]:
calculus_questions = [
    "Differentiate f(x) = x^3 * sin(x) with respect to x.",
    "Find the second derivative of y = ln(x^2 + 1).",
    "Evaluate the definite integral from 0 to pi of x * cos(x) dx.",
    "Compute the Taylor series expansion of e^x about x = 0 up to order 5.",
    "Solve the differential equation dy/dx + y = e^x.",
    "Find the limit as x approaches 0 of sin(3x)/x.",
    "Determine the critical points of f(x) = x^4 - 4x^3 + 6x^2.",
    "Evaluate the improper integral from 1 to infinity of 1/x^2 dx.",
    "Find the Maclaurin series for cos(x) up to the x^6 term.",
    "Compute the Laplace transform of f(t) = t^2 * e^(-3t).",
    "Evaluate the double integral of (x + y) dA over the region 0 <= x <= 1, 0 <= y <= 1.",
    "Find the radius of convergence of the power series sum from n=1 to infinity of (x^n / n^2).",
    "Differentiate f(x) = e^(2x) / (1 + x^2).",
    "Solve the initial value problem dy/dx = 3y, y(0) = 2.",
    "Evaluate the definite integral from -1 to 1 of 1/(1 + x^2) dx.",
    "Differentiate f(x) = (x^2 + 1)^5.",
    "Find the third derivative of y = cos(2x).",
    "Evaluate the limit as x approaches infinity of (1 + 1/x)^x.",
    "Compute the indefinite integral of (2x + 3)/(x^2 + 3x + 2) dx.",
    "Solve the differential equation dy/dx = x^2 with y(0) = 1.",
    "Expand ln(1 + x) as a Maclaurin series up to the x^4 term.",
    "Find the area under the curve y = x^2 from x = 0 to x = 2.",
    "Evaluate the double integral of xy dA over the region bounded by x=0, y=0, and x+y=1.",
    "Determine the convergence of the series sum from n=1 to infinity of 1/n^p for p=2.",
    "Compute the derivative of f(x) = tan^(-1)(x).",
    "Solve the initial value problem dy/dx = y * cos(x), y(0) = 1.",
    "Evaluate the improper integral from 0 to infinity of e^(-x) dx.",
    "Find the Fourier series expansion of f(x) = x on the interval (-pi, pi).",
    "Differentiate f(x) = ln(sin(x)).",
    "Evaluate the definite integral from 0 to 1 of sqrt(1 - x^2) dx.",
    "Find the Maclaurin series of sinh(x) up to the x^5 term.",
    "Compute the Laplace transform of f(t) = cos(2t).",
    "Evaluate the triple integral of x dV over the region 0 <= x, y, z <= 1.",
    "Find the limit as x approaches 0 of (e^x - 1 - x)/x^2.",
    "Differentiate f(x) = x^x.",
    "Solve the differential equation d^2y/dx^2 - y = 0.",
    "Evaluate the definite integral from 0 to pi/2 of sin^2(x) dx.",
    "Find the volume of the solid generated by revolving y = x^2 around the x-axis from x=0 to x=1.",
    "Compute the gradient of f(x, y) = x^2 + y^2.",
    "Evaluate the Jacobian of the transformation u = x + y, v = x - y.",
    "Find the critical points of f(x, y) = x^2 + y^2 - 4x - 6y.",
    "Compute the divergence of the vector field F = (x, y, z).",
    "Evaluate the curl of the vector field F = (-y, x, 0).",
    "Solve the partial differential equation ∂u/∂t = k ∂^2u/∂x^2.",
    "Evaluate the line integral ∫C (x dx + y dy), where C is the unit circle x^2 + y^2 = 1.",
    "Use Green's theorem to evaluate ∮C (y dx - x dy), where C is the unit circle.",
    "Find the Taylor polynomial of order 3 for f(x) = cos(x) about x = 0.",
    "Evaluate the definite integral from -∞ to ∞ of e^(-x^2) dx.",
    "Compute the directional derivative of f(x, y) = x^2y at (1, 2) in the direction of (3, 4)."
]


If using a free Gemini API, then you need to slow down the request call, that's why the time is set to 20 seconds to slow the request.

In [141]:
import time

for question in calculus_questions:
    stream_and_print(question) 
    time.sleep(20) 


--------------------------------------------------------------------------------
WRITE_CALCULUS_CODE:
Code: import sympy
x = sympy.symbols("x")
f = x**3 * sympy.sin(x)
df = sympy.diff(f, x)
print(df)
--------------------------------------------------------------------------------
RUN_CALCULUS_CODE:
Result: Code executed successfully

**Standard Output:**
```
x**3*cos(x) + 3*x**2*sin(x)
```
--------------------------------------------------------------------------------
GENERATE_ANSWER:
Answer: The derivative of f(x) = x^3 * sin(x) with respect to x is x**3*cos(x) + 3*x**2*sin(x).
--------------------------------------------------------------------------------
WRITE_CALCULUS_CODE:
Code: import sympy
from sympy import symbols, diff, ln

x = symbols("x")
y = ln(x**2 + 1)

# First derivative
dy_dx = diff(y, x)

# Second derivative
d2y_dx2 = diff(dy_dx, x)

print(d2y_dx2)
--------------------------------------------------------------------------------
RUN_CALCULUS_CODE:
Result: Code execute

## Final Thoughts

You have learnt how to write a simple Agent that uses `Google Gemini` to answer calculus questions. It processes the question and write the correct Python code, then run the code in the mini sandbox and returns the answer.

## Next Steps
In the next tutorial, we will go through how to evaluate the Agent with Arize Phoenix Evaluate.