# Week 1: Functions and the Language of Scientific Analysis

**SCIE1500 — Analytical Methods for Scientists**

*Act I: Understanding Systems*

---

## Welcome! 🎉

**Don't worry if you've never programmed before.** This lab is designed for beginners. You'll learn by doing, and it's completely okay to:
- Make mistakes (that's how we learn!)
- Ask for help (your demonstrators are here for you)
- Take your time (quality over speed)

---

## What You'll Learn

By the end of this lab, you will be able to:

1. ✅ Define and use Python functions to calculate values
2. ✅ Work with linear and quadratic mathematical functions
3. ✅ Identify the **domain** of a function (important for exams!)
4. ✅ Find the **vertex** of a quadratic function
5. ✅ Load and explore real scientific data with pandas
6. ✅ Create professional visualizations

---

## Exam Alignment

| Notebook Section | Exam Question | Topic |
|-----------------|---------------|-------|
| Part C (Domains) | **MCQ Q10** | Domain identification |
| Part E (Quadratics) | **MCQ Q22** | Quadratic functions |
| Part E.5 (Schaefer) | **Q13** | Fish population model |

---

## Estimated Time

- **In-lab:** Parts A-D (~90 minutes)
- **Take-home:** Parts E-G (~60 minutes)

---

## What to Submit

1. **During lab:** Show Exercise A to your demonstrator
2. **Upload:** Screenshots of completed Exercises A, B, C, and D

---

# Setup: Import Required Libraries

Run this cell first to load all the tools we'll need.

In [None]:
# Standard imports for Week 1
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import math

# Display settings
pd.set_option('display.max_rows', None)
plt.rcParams['figure.figsize'] = [10, 5]

print("✅ Libraries loaded successfully!")
print(f"   NumPy version: {np.__version__}")
print(f"   Pandas version: {pd.__version__}")

---

# PART A: Why Functions Matter — The Plastic Pollution Crisis

Before we dive into Python, let's understand **why** functions are so important in science.

---

## A.1 The Scale of the Problem

Our oceans provide extraordinary benefits:
- Over **50%** of the world's oxygen
- Climate regulation across **70%** of Earth's surface
- **$282 billion** in economic activity (US alone)
- Food security for billions

Yet these vital systems face a growing threat: **plastic pollution**.

Consider these facts:
- We produce **200 times more plastic** than we did in 1950
- **4.8 to 12.7 million metric tonnes** of plastic entered the oceans in 2010 alone
- Some projections suggest there could be **more plastic than fish** (by weight) by 2050

## A.2 How Scientists Quantify the Problem

Scientists don't just observe—they **quantify**. They use **mathematical functions** to:

1. **Describe** what they measure (e.g., plastic concentration vs. depth)
2. **Model** how systems behave (e.g., population growth)
3. **Predict** future outcomes (e.g., plastic accumulation by 2050)

For example, the Reisser et al. (2013) study used this function to relate surface plastic measurements to total ocean plastic:

$$C_i = \frac{C_s}{1 - e^{-\alpha d}}$$

where $C_s$ is surface concentration, $d$ is sampling depth, and $\alpha$ is a parameter.

**This single equation captures a complex relationship—that's the power of functions!**

Today, you'll learn to work with functions like this one.

---

# PART B: Python Functions Fundamentals

A **Python function** is a reusable block of code that performs a specific task—just like a mathematical function converts inputs to outputs.

---

## B.1 Defining Functions

To define a function in Python:

```python
def function_name(argument1, argument2):
    # Do something
    result = ...
    return result
```

**Key parts:**
- `def` — keyword that starts a function definition
- `function_name` — what you call the function
- `arguments` — inputs to the function
- `return` — what the function outputs

## B.2 Example: Temperature Conversion

Let's create a function to convert Fahrenheit to Celsius:

$$C = \frac{5}{9}(F - 32)$$

In [None]:
# Define the temperature conversion function
def fahrenheit_to_celsius(F):
    """Convert temperature from Fahrenheit to Celsius."""
    C = (F - 32) * (5/9)
    return C

# Test with a single value
print(f"0°F = {fahrenheit_to_celsius(0):.1f}°C")
print(f"32°F = {fahrenheit_to_celsius(32):.1f}°C")
print(f"100°F = {fahrenheit_to_celsius(100):.1f}°C")

In [None]:
# Functions work with arrays too!
F_values = np.array([0, 32, 68, 100, 212])
C_values = fahrenheit_to_celsius(F_values)

print("Fahrenheit:", F_values)
print("Celsius:   ", np.round(C_values, 1))

## ✏️ EXERCISE A: Triangle Area Function

**Task:** Create a function to calculate the area of a triangle.

$$\text{Area} = \frac{1}{2} \times \text{base} \times \text{height}$$

Then use it to calculate:
1. Area when base = 4, height = 6
2. Area when base = 4.5, height = 8
3. Area when base = 6.5, height = 10

**Show this to your demonstrator when complete.**

In [None]:
# EXERCISE A: Define your triangle area function
def triangle_area(base, height):
    """Calculate the area of a triangle."""
    area = 0.5 * base * height
    return area

# Test your function with the three cases
print(f"1. base=4, height=6: Area = {triangle_area(4, 6)}")

# YOUR CODE: Calculate areas for cases 2 and 3
# print(f"2. base=4.5, height=8: Area = {triangle_area(?, ?)}")
# print(f"3. base=6.5, height=10: Area = {triangle_area(?, ?)}")

✅ **Checkpoint Complete!** You've successfully defined your own Python function. This is the same skill used by data scientists around the world.

---

# PART C: Linear Functions and Domains

Now let's connect Python functions to **mathematical functions**.

---

## C.1 Linear Functions: $y = mx + c$

A **linear function** has the form:

$$f(x) = mx + c$$

where:
- $m$ is the **slope** (rate of change)
- $c$ is the **y-intercept**

Linear functions describe **constant rates of change**.

In [None]:
# Define a general linear function
def linear_fun(m, c, x):
    """Calculate y = mx + c for given slope m, intercept c, and input x."""
    y = m * x + c
    return y

# Example: y = 2x + 5
x_values = np.linspace(-4, 4, 9)
y_values = linear_fun(m=2, c=5, x=x_values)

print("x values:", x_values)
print("y values:", y_values)

## C.2 Domain and Range (⚠️ Important for Exams!)

### What is a Function?

A **function** $f$ is a rule that assigns to each input $x$ **exactly one** output $f(x)$.

### Domain and Range

- **Domain**: The set of all valid **inputs** (x-values)
- **Range**: The set of all possible **outputs** (y-values)

### Common Domain Restrictions

| Expression | Restriction | Example |
|------------|-------------|----------|
| $\frac{1}{g(x)}$ | $g(x) \neq 0$ | $\frac{1}{x-2}$: $x \neq 2$ |
| $\sqrt{g(x)}$ | $g(x) \geq 0$ | $\sqrt{x-3}$: $x \geq 3$ |
| $\ln(g(x))$ | $g(x) > 0$ | $\ln(x+1)$: $x > -1$ |

In [None]:
# Example: f(x) = sqrt(x - 1)
# Domain: x - 1 >= 0, so x >= 1

def f(x):
    """f(x) = sqrt(x - 1). Domain: x >= 1"""
    return np.sqrt(x - 1)

# Valid inputs
print("Valid inputs (x >= 1):")
for x in [1, 2, 5, 10, 17]:
    print(f"  f({x}) = {f(x)}")

# What happens with invalid input?
print("\nInvalid input (x = 0):")
print(f"  f(0) = {f(0)}  ← This is 'nan' (not a number) because sqrt(-1) is undefined!")

In [None]:
# Visualize the domain restriction
x = np.linspace(-1, 10, 200)
y = f(x)  # Will give NaN for x < 1

plt.figure(figsize=(10, 5))
plt.plot(x, y, 'b-', linewidth=2, label=r'$f(x) = \sqrt{x-1}$')

# Mark the domain boundary
plt.axvline(x=1, color='red', linestyle='--', label='Domain starts at x = 1')
plt.scatter([1], [0], color='red', s=100, zorder=5)

# Shade invalid region
plt.axvspan(-1, 1, alpha=0.2, color='gray', label='Invalid domain (x < 1)')

plt.xlabel('x', fontsize=12)
plt.ylabel('f(x)', fontsize=12)
plt.title(r'Domain of $f(x) = \sqrt{x-1}$ is $x \geq 1$', fontsize=14)
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)
plt.xlim(-1, 10)
plt.ylim(-0.5, 4)

plt.tight_layout()
plt.show()

print("\n📝 EXAM TIP: For sqrt functions, set the inside >= 0 and solve for x.")

## C.3 Plotting Multiple Linear Functions

In [None]:
# Compare three linear functions
x = np.linspace(-4, 4, 50)

# Create DataFrame for organized plotting
df = pd.DataFrame()
df['x'] = x
df['y = 2x + 5'] = linear_fun(m=2, c=5, x=x)
df['y = 2x'] = linear_fun(m=2, c=0, x=x)
df['y = -3x + 4'] = linear_fun(m=-3, c=4, x=x)

# Plot using pandas
df.plot(x='x', 
        y=['y = 2x + 5', 'y = 2x', 'y = -3x + 4'],
        title='Comparison of Linear Functions',
        xlabel='x',
        ylabel='y',
        grid=True,
        figsize=(10, 6))

plt.axhline(y=0, color='black', linewidth=0.5)
plt.axvline(x=0, color='black', linewidth=0.5)
plt.tight_layout()
plt.show()

print("Notice: Same slope (m=2) means parallel lines!")

---

# PART D: Exploring Global Plastic Production Data

Let's apply our skills to real data on plastic production.

---

## D.1 Loading the Dataset

In [None]:
# Load the global plastics production data
# The data shows annual plastic production from 1950-2015

try:
    gpp = pd.read_csv("https://raw.githubusercontent.com/ahailu95/scie1500-content/main/SCIE1500Materials/Week_1/LabFiles/global-plastics-production.csv")
    print("✅ Data loaded successfully!")
except FileNotFoundError:
    print("Check your internet connection - the CSV is loaded from the course GitHub repository.")
    print("   is in the same folder as this notebook.")
    gpp = None

# Explore the data
if gpp is not None:
    print(f"\nDataset shape: {gpp.shape[0]} rows × {gpp.shape[1]} columns")
    print(f"\nColumn names: {list(gpp.columns)}")
    print("\nFirst 5 rows:")
    display(gpp.head())

## D.2 Creating Derived Variables

In [None]:
# Create useful derived variables

# Time trend (t = 1 for 1950, t = 2 for 1951, etc.)
gpp['t'] = gpp['Year'] - 1949

# Convert to million metric tonnes for easier reading
gpp['GPP (MMT)'] = gpp['GPP (MT)'] / 1_000_000

# Check our work
print("Updated DataFrame:")
display(gpp[['Year', 't', 'GPP (MT)', 'GPP (MMT)']].head())

print("\nLast 5 years:")
display(gpp[['Year', 't', 'GPP (MMT)']].tail())

## D.3 Visualization

In [None]:
# Plot global plastic production over time
gpp.plot(x='Year', 
         y='GPP (MMT)',
         title='Global Plastic Production (1950-2015)',
         xlabel='Year',
         ylabel='Production (Million Metric Tonnes)',
         kind='line',
         grid=True,
         color='red',
         legend=False,
         figsize=(12, 6))

plt.tight_layout()
plt.show()

## D.4 Is Plastic Growth Linear?

**Look at the plot above.** Does plastic production grow at a constant rate (linear), or does it accelerate over time?

**The answer:** The curve bends **upward**, meaning growth is **accelerating**. This is NOT linear—it's more like **exponential** or **quadratic** growth.

This motivates why we need to study **non-linear functions** like quadratics!

---

# PART E: Quadratic Functions (⚠️ Exam Critical!)

Quadratic functions model **accelerating or decelerating change**—exactly what we see in the plastic production data.

---

## E.1 The General Form

$$f(x) = ax^2 + bx + c$$

**Key features:**
- If $a > 0$: parabola opens **upward** (U-shape)
- If $a < 0$: parabola opens **downward** (∩-shape)
- The **vertex** is the turning point (maximum or minimum)

In [None]:
# Define a general quadratic function
def quad_fun(x, a, b, c):
    """Calculate y = ax² + bx + c"""
    y = a * (x**2) + b * x + c
    return y

# Example: y = x² - 4x + 3
x = np.linspace(-2, 6, 50)
y = quad_fun(x, a=1, b=-4, c=3)

plt.figure(figsize=(10, 6))
plt.plot(x, y, 'b-', linewidth=2, label=r'$y = x^2 - 4x + 3$')
plt.axhline(y=0, color='black', linewidth=0.5)
plt.axvline(x=0, color='black', linewidth=0.5)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Quadratic Function: Opens Upward (a > 0)')
plt.grid(True, alpha=0.3)
plt.legend()
plt.show()

## E.2 Finding the Vertex (⚠️ Exam Formula!)

The vertex of $y = ax^2 + bx + c$ is at:

$$x_{vertex} = -\frac{b}{2a}$$

Then substitute back to find $y_{vertex}$.

**This is critical for optimization problems on the exam!**

In [None]:
def find_vertex(a, b, c):
    """Find the vertex of quadratic y = ax² + bx + c
    
    Returns:
        (x_vertex, y_vertex)
    """
    x_v = -b / (2 * a)
    y_v = a * x_v**2 + b * x_v + c
    return (x_v, y_v)

# Example: y = x² - 4x + 3
vertex = find_vertex(a=1, b=-4, c=3)
print(f"Vertex of y = x² - 4x + 3:")
print(f"  x_vertex = -(-4)/(2×1) = {vertex[0]}")
print(f"  y_vertex = {vertex[1]}")
print(f"  Vertex point: ({vertex[0]}, {vertex[1]})")

In [None]:
# Visualize the vertex
x = np.linspace(-2, 6, 100)
y = quad_fun(x, a=1, b=-4, c=3)
vertex = find_vertex(1, -4, 3)

plt.figure(figsize=(10, 6))
plt.plot(x, y, 'b-', linewidth=2, label=r'$y = x^2 - 4x + 3$')
plt.scatter([vertex[0]], [vertex[1]], color='red', s=150, zorder=5, label=f'Vertex ({vertex[0]}, {vertex[1]})')

# Mark x-intercepts (roots)
plt.scatter([1, 3], [0, 0], color='green', s=100, zorder=5, label='X-intercepts (1, 0) and (3, 0)')

plt.axhline(y=0, color='black', linewidth=0.5)
plt.axvline(x=0, color='black', linewidth=0.5)
plt.xlabel('x', fontsize=12)
plt.ylabel('y', fontsize=12)
plt.title('Quadratic with Vertex and X-Intercepts Marked', fontsize=14)
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)
plt.show()

print("\n📝 EXAM TIP: The vertex formula x = -b/(2a) gives you the maximum or minimum!")

## E.3 Finding X-Intercepts (Roots)

The x-intercepts are where $y = 0$. Use the **quadratic formula**:

$$x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}$$

In [None]:
def quadratic_formula(a, b, c):
    """Solve ax² + bx + c = 0 using the quadratic formula.
    
    Returns:
        Tuple of roots, or message if no real roots
    """
    discriminant = b**2 - 4*a*c
    
    if discriminant < 0:
        return "No real roots (discriminant < 0)"
    elif discriminant == 0:
        x = -b / (2*a)
        return (x, x)  # One repeated root
    else:
        x1 = (-b + math.sqrt(discriminant)) / (2*a)
        x2 = (-b - math.sqrt(discriminant)) / (2*a)
        return (x1, x2)

# Example: x² - 4x + 3 = 0
roots = quadratic_formula(a=1, b=-4, c=3)
print(f"Roots of x² - 4x + 3 = 0:")
print(f"  x = {roots[0]} and x = {roots[1]}")
print(f"  (These factor as (x-1)(x-3) = 0)")

## ✏️ EXERCISE B: Plot Three Quadratic Functions

**Task:** Plot these three quadratics on the same chart:

| Function | a | b | c |
|----------|---|---|---|
| y₁ | 1 | -1 | -2 |
| y₂ | -2 | 0 | 150 |
| y₃ | 3 | 2 | -50 |

Use x values from -10 to 10.

In [None]:
# EXERCISE B: Plot three quadratic functions

# Create x values
x = np.linspace(-10, 10, 50)

# Create DataFrame
df = pd.DataFrame()
df['x'] = x

# Calculate y values for each function
df['y1'] = quad_fun(x, a=1, b=-1, c=-2)

# YOUR CODE: Add y2 and y3
# df['y2'] = quad_fun(x, a=?, b=?, c=?)
# df['y3'] = quad_fun(x, a=?, b=?, c=?)

# Plot (uncomment when y2 and y3 are added)
# df.plot(x='x', y=['y1', 'y2', 'y3'],
#         title='Three Quadratic Functions',
#         xlabel='x', ylabel='y',
#         grid=True, figsize=(12, 6))
# plt.show()

## E.5 Preview: The Schaefer Fish Growth Model (⚠️ Exam Q13!)

In Week 3, you'll study the **Schaefer growth model** for fish populations:

$$G(S) = g \cdot S \cdot \left(1 - \frac{S}{K}\right)$$

where:
- $S$ = fish stock (population)
- $g$ = intrinsic growth rate
- $K$ = carrying capacity

**Key insight:** This is a **quadratic function in S**!

Expanding: $G(S) = gS - \frac{g}{K}S^2 = -\frac{g}{K}S^2 + gS$

This has the form $aS^2 + bS + c$ where $a = -g/K$, $b = g$, $c = 0$.

In [None]:
def schaefer(S, g=0.5, K=1000):
    """Schaefer fish growth model: G(S) = g * S * (1 - S/K)
    
    This is a QUADRATIC in S!
    """
    return g * S * (1 - S/K)

# Create stock values
S = np.linspace(0, 1000, 101)
G = schaefer(S)

# Find maximum growth using vertex formula
# For G(S) = -(g/K)S² + gS, vertex at S = -b/(2a) = -g / (2×(-g/K)) = K/2
S_max = 500  # K/2
G_max = schaefer(S_max)

# Plot
plt.figure(figsize=(12, 6))
plt.plot(S, G, 'b-', linewidth=2, label='Growth rate G(S)')
plt.scatter([S_max], [G_max], color='red', s=150, zorder=5, 
            label=f'Maximum Sustainable Yield at S = {S_max}')

plt.xlabel('Fish Stock (S)', fontsize=12)
plt.ylabel('Growth Rate G(S)', fontsize=12)
plt.title('Schaefer Fish Growth Model (g=0.5, K=1000)', fontsize=14)
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)

# Annotate
plt.annotate(f'Maximum at S = K/2 = {S_max}', 
             xy=(S_max, G_max), xytext=(600, G_max+10),
             fontsize=11, arrowprops=dict(arrowstyle='->', color='red'))

plt.tight_layout()
plt.show()

print("\n📝 EXAM CONNECTION:")
print("   Q13 asks you to find the stock level that maximizes growth.")
print(f"   Answer: S* = K/2 (using the vertex formula!)")
print(f"   Domain: 0 ≤ S ≤ K (fish population can't be negative or exceed carrying capacity)")

---

# PART F: Scientific Application — The Reisser Model

Now let's apply everything we've learned to a real scientific model.

---

## F.1 The Depth-Integration Model

Scientists researching ocean plastic use surface-towed nets to collect samples. But plastic is distributed throughout the water column, not just at the surface!

Reisser et al. (2013) developed a model to estimate **total (depth-integrated) concentration** $C_i$ from **surface measurements** $C_s$:

$$\frac{C_i}{C_s} = \frac{1}{1 - e^{-\alpha d}}$$

where:
- $d$ = net immersion depth (metres)
- $\alpha$ = parameter depending on plastic buoyancy and turbulence

For this exercise, we'll use $\alpha = 0.4$ m⁻¹ (a typical value from the literature).

## ✏️ EXERCISE C: Implement the Reisser Model

**Tasks:**
1. Write a Python function that calculates $C_i/C_s$ given depth $d$
2. Calculate the ratio for depths: 0.1m, 0.2m, 0.35m, 0.5m, 0.75m, 1.0m
3. Plot the ratio against depth

**Include screenshots in your upload.**

In [None]:
# EXERCISE C: Reisser depth-integration model

# Task 1: Define the function
def CiCs_ratio(d, alpha=0.4):
    """Calculate Ci/Cs ratio for depth-integration correction.
    
    Args:
        d: immersion depth in metres (must be > 0)
        alpha: parameter (default 0.4 m⁻¹)
    
    Returns:
        Correction ratio Ci/Cs
    """
    ratio = 1 / (1 - np.exp(-alpha * d))
    return ratio

# Task 2: Calculate for specified depths
d_values = np.array([0.1, 0.2, 0.35, 0.5, 0.75, 1.0])

# Create DataFrame
cdf = pd.DataFrame()
cdf['Depth (m)'] = d_values
cdf['Ci/Cs Ratio'] = CiCs_ratio(d_values)

print("Depth-Integration Correction Ratios:")
print(cdf.to_string(index=False))

In [None]:
# Task 3: Plot the relationship

# Use more points for a smooth curve
d_fine = np.linspace(0.05, 2.0, 100)
ratio_fine = CiCs_ratio(d_fine)

plt.figure(figsize=(10, 6))
plt.plot(d_fine, ratio_fine, 'b-', linewidth=2, label='Model: α = 0.4')
plt.scatter(d_values, CiCs_ratio(d_values), color='red', s=100, zorder=5, label='Calculated points')

plt.xlabel('Net Depth (metres)', fontsize=12)
plt.ylabel('Ci/Cs Ratio', fontsize=12)
plt.title('Depth-Integration Correction: How Much Does Surface Sampling Underestimate?', fontsize=13)
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)

# Add interpretation
plt.annotate('Shallow nets\nunderestimate more', 
             xy=(0.15, 20), fontsize=10, ha='center')
plt.annotate('Deeper nets\nmore accurate', 
             xy=(1.5, 2), fontsize=10, ha='center')

plt.tight_layout()
plt.show()

print("\n💡 Interpretation:")
print("   A ratio of 2.0 means surface samples underestimate total plastic by 50%.")
print("   Deeper nets (larger d) give more accurate estimates (ratio closer to 1).")

---

# PART G: Self-Assessment

---

## G.1 Practice Questions

Test your understanding with these exam-style questions:

### Question 1: Domain Identification (Exam Q10 style)

What is the domain of $y = \frac{\sqrt{3x + 6}}{4} - 1$?

<details>
<summary>Click for solution</summary>

For the square root, we need $3x + 6 \geq 0$

$3x \geq -6$

$x \geq -2$

**Domain:** $D = \{x \in \mathbb{R} : x \geq -2\}$
</details>

### Question 2: Vertex Formula

Find the vertex of $E(t) = -0.5t^2 + 4t + 2$.

<details>
<summary>Click for solution</summary>

$t_{vertex} = -\frac{b}{2a} = -\frac{4}{2(-0.5)} = -\frac{4}{-1} = 4$

$E(4) = -0.5(16) + 4(4) + 2 = -8 + 16 + 2 = 10$

**Vertex:** $(4, 10)$
</details>

### Question 3: Schaefer Model (Exam Q13 preview)

If $G(S) = 0.5S(1 - S/1000)$, at what stock level is growth maximized?

<details>
<summary>Click for solution</summary>

$S^* = \frac{K}{2} = \frac{1000}{2} = 500$

This is the Maximum Sustainable Yield (MSY) stock level.
</details>

## G.2 Learning Outcomes Checklist

Before finishing, make sure you can:

- [ ] Define and call Python functions with multiple arguments
- [ ] Identify domain restrictions (square roots, fractions)
- [ ] Write the vertex formula: $x = -b/(2a)$
- [ ] Recognize that Schaefer model is a quadratic
- [ ] Load CSV data with pandas
- [ ] Create derived variables in a DataFrame
- [ ] Plot functions using pandas and matplotlib

---

## 🎉 Congratulations!

You've completed Week 1! You now have the foundational skills to:
- Work with mathematical functions in Python
- Analyze real scientific data
- Prepare for exam questions on domains and quadratics

**Next Week:** Logarithmic and Logistic Functions — Modeling Bounded Growth