# Large Language Models

You've undoubtedly heard of ChatGPT, Claude, and other AI assistants that seem to have exploded onto the scene. These are **large language models (LLMs)**—massive neural networks built on the Transformer architecture, trained on existing text, code, and data (*not all ethically sourced*). While they were initially designed for natural language processing, they've evolved into remarkably capable tools for code generation, debugging, mathematical reasoning, and scientific problem-solving.

For computational physicists, LLMs represent both an opportunity and a responsibility. Used properly, they can accelerate your work, help you learn unfamiliar libraries, debug tricky code, and even suggest novel approaches to problems. Used carelessly, they can lead you astray with confident-sounding nonsense, mathematically incorrect solutions, or code that *looks* right but fails subtly. The key is learning to use them as intelligent collaborators while maintaining your own critical judgment.

## The Major Players 

### GitHub Copilot
- **Developer**: GitHub (Microsoft/OpenAI partnership)
- **Best for**: Real-time code completion and inline suggestions
- **Integration**: Seamlessly integrated into VSCode, JetBrains IDEs, and other editors
- **Strengths**: 
  - Autocompletes entire functions based on docstrings or comments
  - Suggests idiomatic Python patterns
  - Learns from your coding style within a session
  - Excellent for repetitive coding tasks and boilerplate
- **Limitations**: 
  - Works best with well-established patterns; can struggle with novel physics algorithms
  - Suggestions require verification—always test the generated code
  - No ability to explain its reasoning or discuss alternatives

### ChatGPT (GPT-5, GPT-5.1, GPT-5.2)
- **Developer**: OpenAI
- **Best for**: Explanations, debugging, algorithmic discussions, and code generation with context
- **Available at**: [chat.openai.com](https://chat.openai.com)
- **Strengths**:
  - Excellent at explaining concepts and debugging logic
  - Can discuss trade-offs between numerical methods
  - Handles multi-file projects and longer conversations
  - GPT-4o ("o" for "omni") handles multimodal input (text, images, code)
  - Can write and execute Python code in its environment
  - Good mathematical reasoning when prompted carefully
- **Limitations**:
  - Can "hallucinate" functions, methods, or even entire libraries that don't exist
  - Sometimes confidently wrong about numerical stability or physics
  - May not understand subtle issues in computational physics (e.g., symplectic integrators, conservation laws)
  - Free tier has rate limits and uses older models


### Claude (Claude 4.5 Sonnet, Claude Opus 4.5)
- **Developer**: Anthropic
- **Best for**: Long-form analysis, extended context, and careful reasoning
- **Available at**: [claude.ai](https://claude.ai)
- **Strengths**:
  - Can handle very long context windows (200K+ tokens)—you can paste entire notebooks or papers
  - Strong at careful, step-by-step reasoning about physics and mathematics
  - Excellent for understanding and refactoring existing code
  - More likely to express uncertainty when unsure
  - Good at following specific instructions and constraints
  - Can analyze and write LaTeX
- **Limitations**:
  - Web interface only (no IDE integration like Copilot)
  - May be more conservative in suggestions
  - Still capable of hallucination, especially about obscure libraries

### Google Gemini 3 Pro/3 Flash
- **Developer**: Google DeepMind
- **Best for**: Integration with Google services, multimodal reasoning
- **Available at**: [gemini.google.com](https://gemini.google.com)
- **Strengths**:
  - Native integration with Google Workspace (Docs, Sheets, Colab)
  - Strong multimodal capabilities (can analyze plots, diagrams, equations in images)
  - Can access and synthesize information from Google Search
  - Good at mathematical notation and LaTeX
- **Limitations**:
  - Code generation capabilities developing but not yet at GPT-4/Claude level, *quickly approaching, this statement may be out of data*
  - Less specialized for scientific computing compared to competitors


# Copilot: Trying to be ubiquitous


Microsoft is making big moves to integrate Copilot in all Microsoft and their partner's software. While rarely rated the best, Copilot is often the most accessible. In this class, you may find Copilot when coding using Spyder or on the GitHub website.  

Once install/activated in your IDE/software of choice, you have access to a (possibly) useful code completion tool. We'll begin with `fibonacci = ` and see what it gives.

In [None]:
fibonacci = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]

If it did as expected in Jan. 2025, it just gave a list of Fibonacci numbers. Maybe this was what you wanted, but maybe it wasn't! What I wanted when I wrote this was a Fibonacci generating function. So let's try that.

In [5]:
def fib(n):
    if n == 0:
        return 0
    if n == 1:
        return 1
    return fib(n-1) + fib(n-2)

In [2]:
def fib_iter(n):
    a, b = 0, 1
    for _ in range(n):
        a, b = b, a + b
    return a

In [6]:
fib(20)

6765

We can also use comments to prompt Copilot. For example, `# compute pi as a series` will give us a series for computing pi.

In [7]:
# compute pi as a series expansion
def compute_pi(n_terms):
    pi_over_4 = 0
    for k in range(n_terms):
        pi_over_4 += ((-1)**k) / (2*k + 1)
    return pi_over_4 * 4    

In [11]:
compute_pi(20)

3.09162380666784

In [9]:
# compute e as series expansion
def compute_e(n_terms):
    e_approx = 0
    factorial = 1
    for k in range(n_terms):
        if k > 0:
            factorial *= k
        e_approx += 1 / factorial
    return e_approx

In [10]:
compute_e(20)

2.7182818284590455

# Using LLMs as Assistants

To give you a sense of how I use LLMs these days, let's walk through a project that I recently began developing. I teach some form of introductory physics every semester and often wish certain simulations existed to illustrate concepts. To this end, I decided to write my own. The problem, of course, is that students wouldn't necessarily know how to run Python/MATLAB/C++ scripts/programs I could write. I needed a better interface, but didn't have the time to build and maintain it. Let's outsource some of the work!

I used Claude to construct `html` simulations that I can post on the internet through GitHub pages. I'm using LLMs to help me write `html` code, but relying on my own physics knowledge to tell me whether it's accurate and realistic. You can see the work in progress here: [dr-kelley.github.io/kinematics/](https://dr-kelley.github.io/kinematics/). 

## Snapshot of my work
Let's walk through just some of my prompts and the responses I received from my LLM. 

**Prompt to construct 1D motion:** can you construct a simulation that shows a simple animation of a point running through a 1-d landscape and on the right side of it show the disp vs time, velocity vs time, and acceleration vs time plots of the character being constructed. For each point that is plotted in the plots, have the animation put a point on the screen where the point was. I'm hoping this illustrates the connection among the kinematic graphs and the time graphs of the point

In [12]:
from IPython.display import IFrame

# IFrame embeds an HTML file in an iframe that can be viewed in a jupyter notebook
IFrame(src='html_files/kinematics_simulationv1.html', width=800, height=600)

What do you see? Any suggestion for improvement?

I had plenty of ideas. 

**Prompt 3:** this is looking good. Can you add an arrow for acceleration to show that velocity and acceleration are not always pointing in the same direction? Also, could you delete the text below the animation (the note) and just put a key for color of arrow = acceleration/velocity?

In [None]:
# Embed the HTML file in an iframe
IFrame(src='html_files/kinematics_simulationv3.html', width=800, height=600)

Yes, there are still a few problems.

**Prompt 4:** some of these profiles has the point leaving the axis and going across the screen. Can you fix that. Also, could you eliminate the profiles button and actually put buttons below the animation, one for each of the 8 profiles, you mentioned above?

In [None]:
# Embed the HTML file in an iframe
IFrame(src='html_files/kinematics_simulationv4.html', width=800, height=600)

Sometimes you need to tell an LLM more than once, I guess.

### Projectile Motion
The projectile motion process was smoother given all that I learned from the 1D/2D kinematic plotter. 

**Prompt to create projectile calculator:** can you create a projectile motion simulator that allows the user to adjust initial conditions of an object (even acceleration due to gravity) and graph the trajectory of the object. You should also display the results - range, time in air, impact velocity, angle of impact. Include a checkbox for 'include air resistance' that when checked opens another sub menu with options for the introductory version of v^2 air resistance, C =  drag coefficient, A=cross-sectional area of object, rho = air density, m= mass of object. Include 4 preset values for the air resistance parameters. Make this interaction simulation in the style of the other .html files in the knowledge base of this project

In [None]:
# Embed the HTML file in an iframe
IFrame(src='html_files/projectile-motion.html', width=800, height=600)

What needs improvement here? Be as specific as possible.

**Prompt 2:** Good start. I like the general set up. Can you adjust the panel spacing, the trajectory panel with the bleeds into the results panel, so the last 1/4 of the trajectory path is hidden.  I like the key concepts at the bottom, but let's get rid of the Theoretical values area and the Air Resistance Effects area (when the Include Air Resistance checkbox is checked). Let's eliminate the velocity vector, it confuses things as it is trying to project a velocity vector in position space. Let's add an option in Include Air Resistance to Compare to no Air resistance case, so that when checked, you see the trajectory of no air resistance plotted along side the air resistance case.  Finally, can you adjust the numbers of the graph in increments of 5 meters so that all the measurements on the axis end in either 0 or 5 at all times?

In [None]:
# Embed the HTML file in an iframe
IFrame(src='html_files/projectile-motionv2.html', width=800, height=600)

## Final Thoughts: Using LLMs Wisely

After working through these examples, you've seen both the power and the pitfalls of LLMs for computational physics. Let's solidify some key lessons.

### Discussion Questions

Reflect on the kinematic simulations we just built with Claude:

- **What made this a good use case for an LLM?** (Well-established physics, standard numerical methods, verifiable results)
- **Where did we need to intervene?** (Checking physics reasoning, verifying conservation laws, testing edge cases)
- **When would you NOT want to use an LLM?** Consider:
  - Novel research algorithms with no established implementations
  - Problems where subtle physics matters (symplectic integrators, gauge invariance)
  - Situations where you're learning foundational concepts for the first time
  - Code that will be used in published research without thorough independent verification

### The Interpolation vs. Extrapolation Principle

Here's a useful mental model: **LLMs excel at interpolation within established knowledge domains, but fail at extrapolation beyond them.**

**Good at (interpolation)**:
- Combining known physics with established numerical methods
- Translating algorithms from pseudocode or papers into Python
- Connecting concepts you already understand ("Make my Euler solver symplectic")
- Standard problems with well-documented solutions

**Poor at (extrapolation)**:
- Cutting-edge research methods with limited documentation
- Novel combinations that require genuine physical insight
- Problems at the frontier of your own knowledge (dangerous territory!)
- Subtle correctness issues that require deep expertise

### Can it be Dangerous using LLMs

*Example:* I once asked ChatGPT for pandas code to wrangle messy data. It confidently returned code using function parameters that *don't exist*, plausible-sounding options that were completely fabricated. The code looked professional, used correct syntax, but failed on execution.

In programming, hallucinations fail fast, Python throws an error, you catch it, you fix it. **But what about physics?**

Venturing beyond your own knowledge or understanding can lead to danger
Consider these scenarios:
- An LLM suggests a numerical integration method that's subtly wrong but gives plausible results
- It implements boundary conditions that violate conservation laws but only matter over long timescales
- It recommends a perturbation theory approach that's formally invalid for your regime, but you don't know enough to catch it

**The difference**: In programming, invalid syntax fails immediately. In physics, invalid physics can give plausible results initially and fail silently. Your plots might look reasonable even when the physics is deeply wrong.

### Your Responsibility

You are learning to be computational physicists. That means:

1. **Understand before you implement**: If you can't derive or explain the method, don't use LLM-generated code for it yet
2. **Verify against physics**: Conservation laws, symmetries, limiting cases, analytical solutions
3. **Test, test, test**: Convergence studies, stability analysis, comparison with established codes
4. **Stay within your knowledge boundary**: When learning new physics, use LLMs sparingly—build intuition first
5. **Be honest in citations**: If an LLM helped, say so and describe how

### The Bottom Line

LLMs are powerful tools that can accelerate your work and help you learn—when used with appropriate skepticism and verification. They're collaborators, not oracles. The judgment, physical insight, and verification responsibility is always yours.

As you use these tools throughout the course, keep asking: *"How do I know this is correct?"* If your only answer is "the LLM said so," you're not done yet.