### let GPT propose a symbolic law from sampled projectile data

In [45]:
import os
# cd llm-physics-discovery/
import numpy as np
from openai import OpenAI
from dotenv import load_dotenv
from src.simulate import projectile

In [46]:
load_dotenv()
print("Key loaded?", os.getenv("OPENAI_API_KEY") is not None)

Key loaded? True


In [47]:
# generate the data
t, y, meta = projectile(v0 = 20.0, theta_deg = 45.0, g = 9.81, noise = 0.05, num_points = 200,  seed  = 123)
# generate ~10 to 12 points 
idx = np.linspace(0, len(t)-1, 12, dtype=int)
t_sample, y_sample = t[idx], y[idx]
t_list, y_list = t_sample.round(4).tolist(), y_sample.round(4).tolist()

### Why I only sample 10–12 points

Although the simulation generated 200 data points, I don’t feed all of them into the LLM.  
Instead, I provide a smaller, representative sample (about 10–12 points). This choice is important for several reasons:

1. **Token efficiency:** Large prompts with hundreds of numbers waste tokens and clutter the input 
2. **Noise robustness:** Too many noisy points can mislead the LLM; a smaller sample highlights the overall trend (the parabola) 
3. **Scientific analogy:** Just like a physicist in a lab only needs a handful of measurements to guess a law, the LLM doesn’t need every point
4. **Readability:** For documentation and presentation (Medium article), showing 10–12 points keeps the prompt clear and easy to follow 

With ~10–12 points evenly spaced across the trajectory, GPT sees enough structure to propose the underlying law (`y = A*t - B*t**2`) without being distracted by noise.


In [48]:
# build the GPT prompt
prompt = f"""
You are a scientist.
Given these (t, y) measurements of a projectile’s motion, propose a compact analytic formula y(t).

Rules:
- Use SymPy/Python syntax (sin, cos, exp, *, **).
- Use symbols for constants (A, B, g, v0, etc) — do not hardcode numeric constants.
- Keep the expression short.
- Return only the formula.

t: {t_list}
y: {y_list}
""".strip()

print(prompt)

You are a scientist.
Given these (t, y) measurements of a projectile’s motion, propose a compact analytic formula y(t).

Rules:
- Use SymPy/Python syntax (sin, cos, exp, *, **).
- Use symbols for constants (A, B, g, v0, etc) — do not hardcode numeric constants.
- Keep the expression short.
- Return only the formula.

t: [0.0, 0.2608, 0.5216, 0.7824, 1.0432, 1.304, 1.5648, 1.8255, 2.0863, 2.3471, 2.6079, 2.8832]
y: [-0.1518, 3.3067, 6.1739, 7.9882, 9.5613, 10.2902, 10.2506, 9.4823, 8.2178, 6.0526, 3.4459, -0.2545]


In [49]:
# call the model
client = OpenAI() # initialize the client

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": prompt}]
)

formula = response.choices[0].message.content
print("GPT guess: \n", formula)

GPT guess: 
 ```python
from sympy import symbols, sin, cos

t, A, B, g, v0, h0 = symbols('t A B g v0 h0')
y = h0 + v0 * t - (1/2) * g * t**2 + A * sin(B * t)
```


In [50]:
def clean_formula(text):
    # remove triple-backtick code fences and language tags
    if text.startswith("```"):
        text = text.strip("`")
        # after stripping fences, split off the first line if it's a language tag (e.g., 'python' as shown in the cell above)
        lines = text.splitlines()
        if lines and lines[0].strip().lower() in {"python","py"}:
            text = "\n".join(lines[1:])
    # keep only the line with the expression if user added imports/variables
    # take the last non-empty line as the formula (common pattern)
    lines = [ln.strip() for ln in text.splitlines() if ln.strip()]
    return lines[-1] if lines else text

formula = clean_formula(formula)
print("GPT raw:\n", formula, "\n")
print("Parsed formula:\n", formula)

GPT raw:
 y = h0 + v0 * t - (1/2) * g * t**2 + A * sin(B * t) 

Parsed formula:
 y = h0 + v0 * t - (1/2) * g * t**2 + A * sin(B * t)


In [51]:
os.makedirs("responses", exist_ok=True)

with open("responses/projectile_formula.txt", "w") as f:
    f.write(formula + "\n")