# Day 1 - Afternoon Session Exercises
## Lists, NumPy Arrays, and Pandas DataFrames

**Instructions:**
- Complete exercises appropriate to your skill level
- Experiment and modify the code
- Ask questions if you get stuck!

---

## Exercise 1.4: Lists and Dictionaries (30 min)

### Physics Context
Detectors record "hits" when particles pass through sensor layers. Each hit has a position in space.

### Beginner Version
Work with detector hit positions.

In [None]:
import math

# List of hit positions as (x, y) tuples in meters
hits = [
    (0.5, 0.3),
    (1.2, 0.8),
    (0.2, 1.5),
    (2.0, 0.5),
    (0.8, 1.1)
]

# TODO: Calculate distance from origin for each hit
# Distance formula: r = sqrt(xÂ² + yÂ²)

distances = []  # Store results here

# Method 1: Using a loop
for hit in hits:
    x, y = hit  # Unpack tuple
    # YOUR CODE HERE : compute r 
    # YOUR CODE HERE : store r in distances


print("Distances from origin:")
for i, d in enumerate(distances):
    print(f"Hit {i}: {d:.3f} m")

Distances from origin:


<details>
<summary>ðŸ’¡ Click to reveal solution</summary>

```python
distances = []

for hit in hits:
    x, y = hit
    r = math.sqrt(x**2 + y**2)
    distances.append(r)

print("Distances from origin:")
for i, d in enumerate(distances):
    print(f"Hit {i}: {d:.3f} m")
```

</details>

In [None]:
# Method 2: Using list comprehension (more Pythonic)
# YOUR CODE HERE

print("\nDistances (list comprehension):")
print(distances)

<details>
<summary>ðŸ’¡ Click to reveal solution</summary>

```python
# Using list comprehension
distances = [math.sqrt(x**2 + y**2) for x, y in hits]

print("\nDistances (list comprehension):")
print(distances)
```

</details>

In [None]:
# TODO: Find the hit closest to the origin
min_distance = # YOUR CODE HERE (use min())
min_index = # YOUR CODE HERE (find index of minimum)
closest_hit = hits[min_index]

print(f"\nClosest hit: {closest_hit} at distance {min_distance:.3f} m")

<details>
<summary>ðŸ’¡ Click to reveal solution</summary>

```python
min_distance = min(distances)
min_index = distances.index(min_distance)
closest_hit = hits[min_index]

print(f"\nClosest hit: {closest_hit} at distance {min_distance:.3f} m")
```

</details>

### Advanced Version
Work with detector geometry using nested dictionaries.

In [None]:
import math

# TODO: Create a nested dictionary for detector geometry
detector = {
    'barrel': {
        'layers': [
            {'name': 'Inner', 'radius': 0.5, 'z_max': 2.0},
            {'name': 'Middle', 'radius': 1.0, 'z_max': 2.5},
            {'name': 'Outer', 'radius': 1.5, 'z_max': 3.0}
        ]
    },
    'endcap': {
        'disks': [
            {'name': 'EC1', 'z': 2.5, 'r_min': 0.3, 'r_max': 1.5},
            {'name': 'EC2', 'z': 3.5, 'r_min': 0.3, 'r_max': 1.5}
        ]
    }
}

# Print detector info
print("Detector Configuration:")
print(f"Number of barrel layers: {len(detector['barrel']['layers'])}")
print(f"Number of endcap disks: {len(detector['endcap']['disks'])}")

In [None]:
# TODO: Implement coordinate transformations

def cartesian_to_cylindrical(x, y, z):
    """
    Convert Cartesian (x, y, z) to cylindrical (r, phi, z) coordinates.
    
    Parameters:
    -----------
    x, y, z : float
        Cartesian coordinates
    
    Returns:
    --------
    tuple : (r, phi, z) in cylindrical coordinates
        r: radial distance from z-axis
        phi: azimuthal angle in radians [-Ï€, Ï€]
        z: same as input z
    """
    r = # YOUR CODE HERE (sqrt(xÂ² + yÂ²))
    phi = # YOUR CODE HERE (use math.atan2(y, x))
    return r, phi, z

def cylindrical_to_cartesian(r, phi, z):
    """
    Convert cylindrical (r, phi, z) to Cartesian (x, y, z) coordinates.
    """
    x = # YOUR CODE HERE
    y = # YOUR CODE HERE
    return x, y, z

# Test transformations
x, y, z = 1.0, 1.0, 2.0
r, phi, z_cyl = cartesian_to_cylindrical(x, y, z)
print(f"Cartesian ({x}, {y}, {z}) â†’ Cylindrical ({r:.3f}, {phi:.3f}, {z_cyl})")

x2, y2, z2 = cylindrical_to_cartesian(r, phi, z_cyl)
print(f"Back to Cartesian: ({x2:.3f}, {y2:.3f}, {z2:.3f})")

<details>
<summary>ðŸ’¡ Click to reveal solution</summary>

```python
def cartesian_to_cylindrical(x, y, z):
    """
    Convert Cartesian (x, y, z) to cylindrical (r, phi, z) coordinates.
    """
    r = math.sqrt(x**2 + y**2)
    phi = math.atan2(y, x)
    return r, phi, z

def cylindrical_to_cartesian(r, phi, z):
    """
    Convert cylindrical (r, phi, z) to Cartesian (x, y, z) coordinates.
    """
    x = r * math.cos(phi)
    y = r * math.sin(phi)
    return x, y, z

# Test transformations
x, y, z = 1.0, 1.0, 2.0
r, phi, z_cyl = cartesian_to_cylindrical(x, y, z)
print(f"Cartesian ({x}, {y}, {z}) â†’ Cylindrical ({r:.3f}, {phi:.3f}, {z_cyl})")

x2, y2, z2 = cylindrical_to_cartesian(r, phi, z_cyl)
print(f"Back to Cartesian: ({x2:.3f}, {y2:.3f}, {z2:.3f})")
```

</details>

In [None]:
# TODO: Determine which detector layer each hit corresponds to

# Simulated hits in Cartesian coordinates
hits_3d = [
    (0.4, 0.3, 1.0),   # Should be in Inner layer
    (0.7, 0.7, 1.5),   # Should be in Middle layer
    (1.2, 0.8, 2.0),   # Should be in Outer layer
    (0.5, 0.3, 3.0),   # Should be in endcap
]

def identify_layer(x, y, z, detector_config):
    """
    Identify which detector layer a hit belongs to.
    
    Returns: tuple (region, layer_name) or None if outside acceptance
    """
    r, phi, z_pos = cartesian_to_cylindrical(x, y, z)
    
    # Check barrel layers
    for layer in detector_config['barrel']['layers']:
        # YOUR CODE HERE
        # Check if r is close to layer radius and |z| < z_max
        # Use tolerance for radius matching (e.g., Â±0.05 m)
        pass
    
    # Check endcap disks
    for disk in detector_config['endcap']['disks']:
        # YOUR CODE HERE
        # Check if |z| is close to disk z and r_min < r < r_max
        pass
    
    return None  # Outside detector acceptance

# Test the function
for i, (x, y, z) in enumerate(hits_3d):
    result = identify_layer(x, y, z, detector)
    print(f"Hit {i} at ({x}, {y}, {z}): {result}")

<details>
<summary>ðŸ’¡ Click to reveal solution</summary>

```python
def identify_layer(x, y, z, detector_config):
    """
    Identify which detector layer a hit belongs to.
    
    Returns: tuple (region, layer_name) or None if outside acceptance
    """
    r, phi, z_pos = cartesian_to_cylindrical(x, y, z)
    
    # Check barrel layers
    for layer in detector_config['barrel']['layers']:
        radius_tolerance = 0.05
        if abs(r - layer['radius']) < radius_tolerance and abs(z_pos) < layer['z_max']:
            return ('barrel', layer['name'])
    
    # Check endcap disks
    for disk in detector_config['endcap']['disks']:
        z_tolerance = 0.1
        if abs(abs(z_pos) - disk['z']) < z_tolerance:
            if disk['r_min'] < r < disk['r_max']:
                return ('endcap', disk['name'])
    
    return None

# Test the function
for i, (x, y, z) in enumerate(hits_3d):
    result = identify_layer(x, y, z, detector)
    print(f"Hit {i} at ({x}, {y}, {z}): {result}")
```

</details>

---
## Exercise 1.5: Advanced NumPy Operations (50 min)

### Import Libraries

In [3]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

### Beginner Version
Load and analyze simulated detector data.

In [None]:
# Generate simulated detector data (normally you'd load from file)
np.random.seed(42)
n_events = 10000

# Simulate particle properties
energies = np.random.exponential(scale=30, size=n_events)  # Exponential spectrum
eta = np.random.uniform(-2.5, 2.5, n_events)  # Pseudorapidity
phi = np.random.uniform(-np.pi, np.pi, n_events)  # Azimuthal angle

# Add some detector noise
energies += np.random.normal(0, 2, n_events)
energies = np.maximum(energies, 0)  # Ensure positive energies

print(f"Generated {n_events} events")
print(f"Energy range: {energies.min():.2f} - {energies.max():.2f} GeV")

In [None]:
# TODO: Apply energy threshold cuts using boolean masks

# Define cut
energy_threshold = 20.0  # GeV

# Create boolean mask
passes_cut = # YOUR CODE HERE

# Apply cut
energies_selected = energies[passes_cut]
eta_selected = # YOUR CODE HERE
phi_selected = # YOUR CODE HERE

print(f"\nEvents passing E > {energy_threshold} GeV: {np.sum(passes_cut)}")
print(f"Efficiency: {np.sum(passes_cut) / len(energies) * 100:.1f}%")

<details>
<summary>ðŸ’¡ Click to reveal solution</summary>

```python
# Define cut
energy_threshold = 20.0  # GeV

# Create boolean mask
passes_cut = energies > energy_threshold

# Apply cut
energies_selected = energies[passes_cut]
eta_selected = eta[passes_cut]
phi_selected = phi[passes_cut]

print(f"\nEvents passing E > {energy_threshold} GeV: {np.sum(passes_cut)}")
print(f"Efficiency: {np.sum(passes_cut) / len(energies) * 100:.1f}%")
```

</details>

In [None]:
# TODO: Calculate statistics before and after cuts

def print_statistics(data, label):
    """Print statistics for an array."""
    print(f"\n{label}:")
    print(f"  Mean:   {np.mean(data):.2f}")
    print(f"  Std:    {np.std(data):.2f}")
    print(f"  Median: {np.median(data):.2f}")
    print(f"  Min:    {np.min(data):.2f}")
    print(f"  Max:    {np.max(data):.2f}")

print_statistics(energies, "All Events")
print_statistics(energies_selected, f"E > {energy_threshold} GeV")

In [None]:
# TODO: Plot distributions before and after cuts

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Energy distribution
axes[0].hist(energies, bins=50, alpha=0.5, label='All events', range=(0, 150))
axes[0].hist(energies_selected, bins=50, alpha=0.5, label=f'E > {energy_threshold} GeV', range=(0, 150))
axes[0].axvline(energy_threshold, color='red', linestyle='--', label='Threshold')
axes[0].set_xlabel('Energy (GeV)')
axes[0].set_ylabel('Events')
axes[0].legend()
axes[0].set_yscale('log')

# Eta distribution
# YOUR CODE HERE (similar to above)

# Phi distribution
# YOUR CODE HERE

plt.tight_layout()
plt.show()

<details>
<summary>ðŸ’¡ Click to reveal solution</summary>

```python
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Energy distribution
axes[0].hist(energies, bins=50, alpha=0.5, label='All events', range=(0, 150))
axes[0].hist(energies_selected, bins=50, alpha=0.5, label=f'E > {energy_threshold} GeV', range=(0, 150))
axes[0].axvline(energy_threshold, color='red', linestyle='--', label='Threshold')
axes[0].set_xlabel('Energy (GeV)')
axes[0].set_ylabel('Events')
axes[0].legend()
axes[0].set_yscale('log')

# Eta distribution
axes[1].hist(eta, bins=50, alpha=0.5, label='All events')
axes[1].hist(eta_selected, bins=50, alpha=0.5, label=f'E > {energy_threshold} GeV')
axes[1].set_xlabel('Î· (pseudorapidity)')
axes[1].set_ylabel('Events')
axes[1].legend()

# Phi distribution
axes[2].hist(phi, bins=50, alpha=0.5, label='All events')
axes[2].hist(phi_selected, bins=50, alpha=0.5, label=f'E > {energy_threshold} GeV')
axes[2].set_xlabel('Ï† (azimuthal angle)')
axes[2].set_ylabel('Events')
axes[2].legend()

plt.tight_layout()
plt.show()
```

</details>

### Advanced Version
Vectorized calculations and efficiency corrections.

In [None]:
# TODO: Calculate invariant mass for all particle pairs in an event

# Simulate an event with multiple particles
# For simplicity, use (E, px, py, pz) representation
np.random.seed(123)
n_particles = 5

# Generate 4-vectors
E = np.random.uniform(20, 100, n_particles)
px = np.random.uniform(-50, 50, n_particles)
py = np.random.uniform(-50, 50, n_particles)
pz = np.random.uniform(-50, 50, n_particles)

# Ensure physical constraint: EÂ² â‰¥ pÂ²
p2 = px**2 + py**2 + pz**2
E = np.maximum(E, np.sqrt(p2) + 0.1)

print(f"Event with {n_particles} particles")
for i in range(n_particles):
    print(f"  Particle {i}: E={E[i]:.1f}, px={px[i]:.1f}, py={py[i]:.1f}, pz={pz[i]:.1f}")

In [None]:
# Method 1: Using loops (slow, for comparison)
import time

def invariant_mass_loops(E, px, py, pz):
    """Calculate all pair masses using loops."""
    n = len(E)
    masses = []
    pairs = []
    
    for i in range(n):
        for j in range(i + 1, n):  # Avoid double-counting
            E_sum = E[i] + E[j]
            px_sum = px[i] + px[j]
            py_sum = py[i] + py[j]
            pz_sum = pz[i] + pz[j]
            
            p_sum2 = px_sum**2 + py_sum**2 + pz_sum**2
            m2 = E_sum**2 - p_sum2
            
            if m2 >= 0:
                masses.append(np.sqrt(m2))
                pairs.append((i, j))
    
    return np.array(masses), pairs

start = time.time()
masses_loop, pairs = invariant_mass_loops(E, px, py, pz)
time_loop = time.time() - start

print(f"\nLoop method: {len(masses_loop)} pairs in {time_loop*1000:.3f} ms")
for (i, j), m in zip(pairs, masses_loop):
    print(f"  Pair ({i}, {j}): M = {m:.2f} GeV/cÂ²")

In [None]:
# Method 2: Vectorized approach (fast!)

def invariant_mass_vectorized(E, px, py, pz):
    """
    Calculate all pair masses using vectorized operations.
    
    Uses broadcasting to compute all pairs at once.
    """
    # YOUR CODE HERE
    # Hint: Use broadcasting with E[:, None] + E[None, :]
    # This creates a matrix of all pair sums
    
    E_sum = # YOUR CODE HERE
    px_sum = # YOUR CODE HERE
    py_sum = # YOUR CODE HERE
    pz_sum = # YOUR CODE HERE
    
    # Calculate invariant mass squared
    p_sum2 = px_sum**2 + py_sum**2 + pz_sum**2
    m2 = E_sum**2 - p_sum2
    
    # Extract upper triangle (avoid double-counting and self-pairs)
    n = len(E)
    i_indices, j_indices = np.triu_indices(n, k=1)
    
    masses = np.sqrt(np.maximum(m2[i_indices, j_indices], 0))
    
    return masses

start = time.time()
masses_vec = invariant_mass_vectorized(E, px, py, pz)
time_vec = time.time() - start

print(f"\nVectorized method: {len(masses_vec)} pairs in {time_vec*1000:.3f} ms")
print(f"Speedup: {time_loop/time_vec:.1f}x")
print(f"Results match: {np.allclose(masses_loop, masses_vec)}")

<details>
<summary>ðŸ’¡ Click to reveal solution</summary>

```python
def invariant_mass_vectorized(E, px, py, pz):
    """
    Calculate all pair masses using vectorized operations.
    
    Uses broadcasting to compute all pairs at once.
    """
    # Broadcasting: [:, None] creates column, [None, :] creates row
    E_sum = E[:, None] + E[None, :]
    px_sum = px[:, None] + px[None, :]
    py_sum = py[:, None] + py[None, :]
    pz_sum = pz[:, None] + pz[None, :]
    
    # Calculate invariant mass squared
    p_sum2 = px_sum**2 + py_sum**2 + pz_sum**2
    m2 = E_sum**2 - p_sum2
    
    # Extract upper triangle (avoid double-counting and self-pairs)
    n = len(E)
    i_indices, j_indices = np.triu_indices(n, k=1)
    
    masses = np.sqrt(np.maximum(m2[i_indices, j_indices], 0))
    
    return masses

start = time.time()
masses_vec = invariant_mass_vectorized(E, px, py, pz)
time_vec = time.time() - start

print(f"\nVectorized method: {len(masses_vec)} pairs in {time_vec*1000:.3f} ms")
print(f"Speedup: {time_loop/time_vec:.1f}x")
print(f"Results match: {np.allclose(masses_loop, masses_vec)}")
```

</details>

In [None]:
# TODO: Implement detector efficiency corrections

# Create 2D efficiency map (eta vs phi)
eta_bins = np.linspace(-2.5, 2.5, 50)
phi_bins = np.linspace(-np.pi, np.pi, 50)

# Simulate realistic efficiency (lower at edges)
eta_centers = (eta_bins[:-1] + eta_bins[1:]) / 2
phi_centers = (phi_bins[:-1] + phi_bins[1:]) / 2

# Create 2D efficiency map
eta_2d, phi_2d = np.meshgrid(eta_centers, phi_centers, indexing='ij')
efficiency_map = 0.95 * np.exp(-0.1 * eta_2d**2)  # Gaussian in eta

# Visualize efficiency map
plt.figure(figsize=(10, 6))
plt.imshow(efficiency_map, extent=[-np.pi, np.pi, -2.5, 2.5], 
           origin='lower', aspect='auto', cmap='viridis')
plt.colorbar(label='Efficiency')
plt.xlabel('Ï† (rad)')
plt.ylabel('Î·')
plt.title('Detector Efficiency Map')
plt.show()

In [None]:
# TODO: Apply efficiency corrections to event weights

def get_efficiency(eta, phi, efficiency_map, eta_bins, phi_bins):
    """
    Look up efficiency for given eta, phi values.
    """
    # YOUR CODE HERE
    # Use np.digitize to find which bin each value falls into
    # Then look up efficiency from the map
    pass

# Apply to our data
event_efficiency = get_efficiency(eta_selected, phi_selected, 
                                   efficiency_map, eta_bins, phi_bins)

# Weight each event by 1/efficiency to correct for losses
weights = 1.0 / event_efficiency

print(f"\nEfficiency statistics:")
print(f"  Mean efficiency: {np.mean(event_efficiency):.3f}")
print(f"  Min efficiency:  {np.min(event_efficiency):.3f}")
print(f"  Max weight:      {np.max(weights):.3f}")

<details>
<summary>ðŸ’¡ Click to reveal solution</summary>

```python
def get_efficiency(eta, phi, efficiency_map, eta_bins, phi_bins):
    """
    Look up efficiency for given eta, phi values.
    """
    # Find bin indices for each value
    eta_idx = np.digitize(eta, eta_bins) - 1
    phi_idx = np.digitize(phi, phi_bins) - 1
    
    # Clip to valid range
    eta_idx = np.clip(eta_idx, 0, len(eta_bins) - 2)
    phi_idx = np.clip(phi_idx, 0, len(phi_bins) - 2)
    
    # Look up efficiency
    return efficiency_map[eta_idx, phi_idx]

# Apply to our data
event_efficiency = get_efficiency(eta_selected, phi_selected, 
                                   efficiency_map, eta_bins, phi_bins)

# Weight each event by 1/efficiency to correct for losses
weights = 1.0 / event_efficiency

print(f"\nEfficiency statistics:")
print(f"  Mean efficiency: {np.mean(event_efficiency):.3f}")
print(f"  Min efficiency:  {np.min(event_efficiency):.3f}")
print(f"  Max weight:      {np.max(weights):.3f}")
```

</details>

---
## Exercise 1.6: Pandas DataFrames (35 min)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

### Beginner Version
Load and filter collision data.

In [None]:
# Generate simulated collision data
np.random.seed(42)
n_events = 1000

# Create DataFrame
data = pd.DataFrame({
    'event': range(n_events),
    'run': np.random.choice([1, 2, 3], n_events),
    'trigger': np.random.choice([True, False], n_events, p=[0.3, 0.7]),
    'energy': np.random.exponential(40, n_events),
    'px': np.random.normal(0, 20, n_events),
    'py': np.random.normal(0, 20, n_events),
    'eta': np.random.uniform(-2.5, 2.5, n_events),
    'phi': np.random.uniform(-np.pi, np.pi, n_events)
})

print(f"Generated {len(data)} events")
data.head()

In [None]:
# TODO: Filter events by trigger condition

triggered_data = # YOUR CODE HERE

print(f"\nEvents passing trigger: {len(triggered_data)}")
print(f"Trigger efficiency: {len(triggered_data) / len(data) * 100:.1f}%")

<details>
<summary>ðŸ’¡ Click to reveal solution</summary>

```python
triggered_data = data[data['trigger'] == True]

print(f"\nEvents passing trigger: {len(triggered_data)}")
print(f"Trigger efficiency: {len(triggered_data) / len(data) * 100:.1f}%")
```

</details>

In [None]:
# TODO: Calculate and add transverse momentum column
# pt = sqrt(pxÂ² + pyÂ²)

data['pt'] = # YOUR CODE HERE

# Also calculate transverse energy
# Et = E / cosh(eta)
data['Et'] = # YOUR CODE HERE

print("\nNew columns added:")
data[['energy', 'px', 'py', 'pt', 'eta', 'Et']].head()

<details>
<summary>ðŸ’¡ Click to reveal solution</summary>

```python
# Calculate transverse momentum
data['pt'] = np.sqrt(data['px']**2 + data['py']**2)

# Calculate transverse energy
data['Et'] = data['energy'] / np.cosh(data['eta'])

print("\nNew columns added:")
data[['energy', 'px', 'py', 'pt', 'eta', 'Et']].head()
```

</details>

In [None]:
# TODO: Create summary statistics

print("\nSummary Statistics:")
print(data[['energy', 'pt', 'Et']].describe())

# Statistics by run
print("\nStatistics by run:")
# YOUR CODE HERE (use groupby)

<details>
<summary>ðŸ’¡ Click to reveal solution</summary>

```python
print("\nSummary Statistics:")
print(data[['energy', 'pt', 'Et']].describe())

# Statistics by run
print("\nStatistics by run:")
print(data.groupby('run')[['energy', 'pt', 'Et']].mean())
```

</details>

### Advanced Version
Complex multi-DataFrame analysis.

In [None]:
# TODO: Create event-level and particle-level DataFrames

# Event-level info
events = pd.DataFrame({
    'event': range(100),
    'run': np.random.choice([1, 2, 3], 100),
    'luminosity': np.random.uniform(1e33, 2e33, 100),
    'trigger_HLT': np.random.choice([True, False], 100, p=[0.3, 0.7])
})

# Particle-level info (multiple particles per event)
particle_list = []
for event_id in range(100):
    n_particles = np.random.poisson(3)  # Average 3 particles per event
    for _ in range(n_particles):
        particle_list.append({
            'event': event_id,
            'particle_type': np.random.choice(['e', 'mu', 'photon']),
            'energy': np.random.exponential(30),
            'pt': np.random.exponential(20),
            'eta': np.random.uniform(-2.5, 2.5),
            'phi': np.random.uniform(-np.pi, np.pi)
        })

particles = pd.DataFrame(particle_list)

print(f"Events: {len(events)}")
print(f"Particles: {len(particles)}")
print(f"Average particles per event: {len(particles) / len(events):.2f}")

In [None]:
# TODO: Merge DataFrames

# Merge particle data with event info
merged = # YOUR CODE HERE (use pd.merge)

# Apply event-level cuts
merged_triggered = # YOUR CODE HERE (filter by trigger_HLT)

print(f"\nParticles after merging and trigger: {len(merged_triggered)}")
merged_triggered.head(10)

<details>
<summary>ðŸ’¡ Click to reveal solution</summary>

```python
# Merge particle data with event info
merged = pd.merge(particles, events, on='event', how='left')

# Apply event-level cuts
merged_triggered = merged[merged['trigger_HLT'] == True]

print(f"\nParticles after merging and trigger: {len(merged_triggered)}")
merged_triggered.head(10)
```

</details>

In [None]:
# TODO: Calculate event-level quantities using groupby

event_summary = merged_triggered.groupby('event').agg({
    'energy': ['sum', 'mean', 'max'],  # Total, average, max energy
    'pt': 'sum',                        # Total pt
    'particle_type': 'count'            # Number of particles
})

event_summary.columns = ['_'.join(col).strip() for col in event_summary.columns]
event_summary = event_summary.rename(columns={'particle_type_count': 'n_particles'})

print("\nEvent-level summary:")
print(event_summary.head())
print(f"\nAverage particles per triggered event: {event_summary['n_particles'].mean():.2f}")

In [None]:
# TODO: Find leading (highest pT) particle per event

# Method 1: Using groupby and idxmax
leading_particles = merged_triggered.loc[
    merged_triggered.groupby('event')['pt'].idxmax()
]

print("\nLeading particles:")
print(leading_particles[['event', 'particle_type', 'pt', 'eta']].head())

# Plot leading particle pT distribution by type
fig, ax = plt.subplots(figsize=(10, 6))

for ptype in ['e', 'mu', 'photon']:
    subset = leading_particles[leading_particles['particle_type'] == ptype]
    ax.hist(subset['pt'], bins=20, alpha=0.5, label=ptype, range=(0, 100))

ax.set_xlabel('Leading particle pT (GeV/c)')
ax.set_ylabel('Events')
ax.set_title('Leading Particle pT Distribution')
ax.legend()
ax.grid(True, alpha=0.3)
plt.show()

In [None]:
# TODO: Handle missing data

# Introduce some missing values
test_data = particles.copy()
mask = np.random.random(len(test_data)) < 0.1
test_data.loc[mask, 'energy'] = np.nan

print(f"\nMissing values:")
print(test_data.isnull().sum())

# YOUR CODE HERE
# Options:
# 1. Drop rows with missing energy
# 2. Fill with median energy
# 3. Fill with mean by particle type

# Method 3 example:
test_data['energy'] = test_data.groupby('particle_type')['energy'].transform(
    lambda x: x.fillna(x.median())
)

print(f"\nAfter filling:")
print(test_data.isnull().sum())

---
## Bonus: Integration Exercise

Combine everything into a complete analysis pipeline.

In [None]:
# TODO: Create a complete analysis pipeline that:
# 1. Loads data
# 2. Applies quality cuts
# 3. Calculates derived quantities
# 4. Makes summary plots
# 5. Exports results

def analyze_collision_data(data_df, energy_cut=20, eta_cut=2.5):
    """
    Complete analysis pipeline for collision data.
    
    Parameters:
    -----------
    data_df : pd.DataFrame
        Input data
    energy_cut : float
        Minimum energy in GeV
    eta_cut : float
        Maximum |eta|
    
    Returns:
    --------
    pd.DataFrame : Analyzed data with cuts applied
    """
    # YOUR CODE HERE
    pass

# Test the pipeline
# results = analyze_collision_data(data)
# results.to_csv('analysis_results.csv', index=False)

<details>
<summary>ðŸ’¡ Click to reveal solution</summary>

```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Generate simulated collision events
np.random.seed(42)
n_events = 1000

# Create event data
events = pd.DataFrame({
    'event': range(n_events),
    'n_particles': np.random.poisson(5, n_events),
    'trigger': np.random.choice([True, False], n_events, p=[0.7, 0.3])
})

# Apply trigger cut
triggered_events = events[events['trigger']]

# Generate particle-level data for triggered events
particle_data = []
for idx, row in triggered_events.iterrows():
    for i in range(row['n_particles']):
        particle_data.append({
            'event': row['event'],
            'energy': np.random.exponential(30),
            'eta': np.random.uniform(-2.5, 2.5),
            'phi': np.random.uniform(-np.pi, np.pi)
        })

particles = pd.DataFrame(particle_data)

# Calculate transverse energy
particles['Et'] = particles['energy'] / np.cosh(particles['eta'])

# Event selection: at least one high-Et particle
good_events = particles[particles['Et'] > 50]['event'].unique()
final_data = particles[particles['event'].isin(good_events)]

print(f"Started with {n_events} events")
print(f"After trigger: {len(triggered_events)} events")
print(f"After Et cut: {len(good_events)} events")
print(f"Final particles: {len(final_data)}")

# Visualize
plt.figure(figsize=(12, 4))

plt.subplot(1, 3, 1)
plt.hist(particles['energy'], bins=50)
plt.xlabel('Energy (GeV)')
plt.ylabel('Particles')

plt.subplot(1, 3, 2)
plt.hist(particles['Et'], bins=50)
plt.xlabel('Et (GeV)')
plt.ylabel('Particles')

plt.subplot(1, 3, 3)
plt.hist2d(particles['eta'], particles['phi'], bins=30)
plt.xlabel('Î·')
plt.ylabel('Ï†')
plt.colorbar(label='Particles')

plt.tight_layout()
plt.show()

```

</details>

---
## Summary

Today you learned:

âœ… Python **lists** and **dictionaries** for flexible data structures  
âœ… Advanced **NumPy** techniques: boolean masking, broadcasting, vectorization  
âœ… **Pandas DataFrames** for tabular data analysis  
âœ… Realistic particle physics workflows: loading, filtering, grouping, merging  
âœ… Performance optimization: vectorized operations are 10-100x faster!  

**Tomorrow:** Functions, classes, and structuring analysis code professionally!

---

**Great work today! ðŸŽ‰**