# Session 2: Working with LLM Coding Agents

## Overview

This assignment explores how large language models function as coding agents within integrated development environments. You'll work with climate emissions data to understand both the capabilities and limitations of LLM-assisted data analysis, with particular attention to how execution environment shapes what's possible.

## Learning Objectives

Through structured exercises with the OWID CO2 dataset, you will:

- **Distinguish** between chat-based LLM interfaces and coding agents with local execution capabilities
- **Develop** fluency in specifying data operations through natural language while understanding the underlying computational patterns
- **Evaluate** the boundary between tasks coding agents handle autonomously versus those requiring human guidance
- **Assess** when programmatic approaches offer advantages over manual analysis methods

## The Execution Environment Question

The same foundation model—Claude Sonnet 4.5, GPT-4, or similar—exhibits markedly different capabilities depending on its execution context. A browser-based chat interface can generate syntactically correct code but lacks the infrastructure to execute, test, and refine it. The model generates a response and moves on.

In contrast, a coding agent operating within VS Code has access to a Python interpreter, your installed packages, the filesystem, and terminal. This enables an iterative cycle: execute code, parse errors or unexpected output, modify the approach, and re-execute. The model maintains context across this loop, accumulating information about what works in your specific environment.

This architectural difference—not model sophistication—determines whether you can request "analyze emissions trends by region" and receive working results, or whether you must manually execute generated code, diagnose failures, and prompt for corrections.

## Working with OWID Climate Data

Our World in Data maintains a comprehensive CO2 emissions dataset with country-level historical data, fuel source breakdowns, and economic indicators. The dataset's accessible format (direct CSV URL, documented schema, standard tabular structure) makes it appropriate for examining how coding agents handle common analytical workflows: filtering, aggregation, reshaping, and visualization.

Your role is to specify analyses in natural language and observe how the coding agent approaches each task. Note what it handles autonomously, where it requires clarification, what libraries it selects, and how it recovers from errors.

---
## Part 0: Setup

Import the necessary libraries for data manipulation and visualization. Request this from your coding agent and observe library selection.

In [47]:
# Setup - import libraries


---
## Part 1: Loading and Initial Exploration

### Task 1.1: Load and Explore

Load the OWID CO2 dataset: `https://github.com/owid/co2-data/raw/master/owid-co2-data.csv`

Request from your coding agent:
1. Load the data using appropriate library
2. Report dimensions (rows × columns)
3. Identify temporal coverage
4. Display sample rows
5. List emissions-related columns

In [42]:
# Task 1.1 - Your code here


### Task 1.2: Filtering and Ranking

Identify the top 5 CO2 emitting countries (not aggregate regions) in 2022.

**Note:** Dataset includes aggregate entities (World, continents, income groups). The coding agent needs to distinguish actual countries - observe whether it infers this from ISO codes or requires explicit guidance.

In [43]:
# Task 1.2 - Your code here


### Task 1.3: Time Series Visualization

Create a line chart showing CO2 emissions trajectories for the top 5 emitters (1990-2022).

**Expected observation:** China overtaking US around 2005-2006.

**What to notice:** Does the agent select appropriate visualization library? Handle missing data? Create informative labels?

In [48]:
# Task 1.3 - Your code here


### Task 1.4: Compositional Analysis

For the top 5 countries in 2022, visualize emissions breakdown by source:
- Coal (`coal_co2`)
- Oil (`oil_co2`)
- Gas (`gas_co2`)  
- Cement (`cement_co2`)

Use a stacked bar chart or equivalent compositional visualization.

**Data reshaping challenge:** This requires pivoting from wide format (separate columns per source) to long format (source as variable). The coding agent should handle this transformation autonomously.

In [49]:
# Task 1.4 - Your code here


---
## Part 2: Advanced Operations

### Task 2.1: Aggregation by Grouping Variable

Calculate average per capita CO2 emissions by continent for 2022. Return results sorted descending.

**Key operation:** This requires `groupby()` with aggregation. The coding agent should:
1. Identify the continent variable in the schema
2. Filter to 2022 and valid per capita values
3. Compute mean by group
4. Handle missing data appropriately

In [50]:
# Task 2.1 - Your code here



### Task 2.2: Percentage Change Calculation

Identify countries with:
1. Five largest emission increases (2010→2022, percentage)
2. Five largest emission decreases (2010→2022, percentage)

Visualize both groups for comparison.

**Complexity:** Requires joining data across years, computing percentage change, handling countries with missing baseline or endpoint data. Observe error handling strategies.

In [44]:
# Task 2.2 - Your code here



### Task 2.3: Faceted Visualization

Create small multiples showing per capita CO2 trends (1990-2022) with one panel per continent.

**Visualization technique:** Faceting/small multiples allow pattern comparison across categories. The coding agent should select appropriate library (Altair, matplotlib with subplots, etc.) and configure layout.

In [45]:
# Task 2.3 - Your code here



---
## Part 3: Independent Analysis

### Task 3.1: Design and Execute Analysis

Select ONE question and complete the analysis:

**Option A:** Emissions Intensity Improvements
- Identify countries with largest reductions in CO2 per unit GDP (2000-2022)
- Visualize as efficiency gains

**Option B:** Coal Transition Analysis
- Find countries that significantly reduced coal's share of total emissions (2010-2022)
- Show before/after fuel composition

**Option C:** Population-Emissions Relationship
- Analyze correlation between population growth and emissions growth
- Create annotated scatter plot highlighting outliers

**Deliverables:**
- Working code that executes without errors
- At least one publication-quality visualization
- Interpretation of findings and limitations

### My Analysis:

**Question I'm investigating:**

[Describe your chosen question here]

In [46]:
# Your analysis code here



### Analysis Summary

**Research question:**

**Methodology:**

**Key findings:**

**Limitations and caveats:**

---
## Part 4: Critical Reflection

### Evaluating Coding Agent Performance

Analyze your experience across the preceding exercises:

**1. Autonomous Completion Rate**  
What proportion of tasks executed correctly on first attempt? Where did the agent require clarification, correction, or multiple iterations?

**2. Error Recovery Patterns**  
When code failed, document the agent's diagnostic approach. Did it parse error messages effectively? Make appropriate modifications? Or require explicit guidance to identify the problem?

**3. Technical Choices**  
Examine the agent's library selections (ibis, pandas, polars) and data manipulation strategies. Were choices appropriate for the data scale and operation complexity? Could you identify more efficient approaches?

**4. Edge Case Handling**  
How did the agent address missing data, type inconsistencies, or ambiguous specifications? What assumptions did it make, and were they reasonable?

**5. Conceptual Transfer**  
You've worked with groupby aggregation, wide-to-long reshaping, faceted visualization, and percentage change calculations. Can you now explain these patterns to a colleague and recognize when to apply them, independent of specific syntax?

**6. Execution Environment Dependency**  
Which tasks could a browser-based LLM complete versus those requiring local execution? Where precisely does the boundary lie?

**7. Workflow Integration**  
How does this approach to exploratory analysis compare to your current methods? Where do you see coding agents adding value in professional contexts? Where do they introduce friction or uncertainty?

**Your analysis:**

1. 

2. 

3. 

4. 

5. 

6. 

7. 

---
## Submission Guidelines

**Required Components:**
1. Executed notebook with all code cells producing correct output
2. Completed reflection responses in Part 4
3. Original analysis from Part 3 with supporting visualization

**Evaluation Framework:**

*Technical Execution (30%)*  
Code quality, error handling, appropriate method selection, efficient data operations

*Conceptual Understanding (30%)*  
Demonstrated grasp of data manipulation patterns and coding agent capabilities through reflection responses

*Visualization Design (20%)*  
Clear, accurate, publication-ready graphics with appropriate encodings

*Analytical Depth (20%)*  
Part 3 investigation shows thoughtful problem formulation, appropriate scope, and clear interpretation

The assessment focuses on your understanding of how coding agents operate within execution environments and when to rely on versus scrutinize their output—not on independent Python proficiency.