<a href="https://colab.research.google.com/github/JackieVeatch/GCC_2025_PythonWorkshop/blob/main/GCC_CTDworkbook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Welcome to the CTD (1D data) breakout group!
Jacquelyn Veatch, November 8th 2025 <br> <br>
In this notebook, we will load in CTD data from an Argo float and plot temperature profiles (how variables change with depth). This code will build upon what we learned in "GCC_PythonReview.ipynb", and then explore more data visualization and statistical methods in common Python packages!

In [None]:
# it is considered best practice to include all imports at the top of your code
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

In [None]:
# load data using numpy

# This is a path to where this text file is stored on my GitHub
url = 'https://raw.githubusercontent.com/JackieVeatch/GCC_2025_PythonWorkshop/main/argo_temperature_2019-03-23.csv'
## Load the text file with the loadtxt() function from numpy
argoTemp = np.loadtxt(url, delimiter=',') # delimeter is a comma in CSVs
np.shape(argoTemp) # The array is set up with dimensions (depth, time)

Check Point: can you draw (or described) the shape of this array? What are the dimensions? what are the coordinates? How many descrete measurements of temperature does it have?

In [None]:
# Here is your x data for a time series
time_days = np.arange(0,570,10) # start, stop, step

# Here is your y data for your profile
depth = np.arange(-982,-4,2)

In [None]:
# Plot a time series of surface temperature
surface = argoTemp[-1, :] # The row in the depth column is the surface because depth increases from -982 up to -4 meters
np.shape(surface)

In [None]:
plt.plot(time_days, surface)
plt.xlabel('days')
plt.ylabel('temp (deg C)')
plt.title('surface temperature from argo float')

Plot a Temperature vs. Depth profile

In [None]:
profile = argoTemp[:,0]
plt.plot(profile,depth)

### Try it on your own
(1) make the above plot look better by adding axes labels and a title <br>
(2) creat a new code cell and plot the 10th profile (hint: the example above plots the first profile)

In [None]:
# plotting the first ten profiles using a loop
numProfiles = 10
for i in range(numProfiles):
    profile = argoTemp[:,i]
    plt.plot(profile,depth)

plt.xlabel('Temperature (deg C)')
plt.ylabel('Depth (m)')
plt.title('Ten Temperature Profiles')

A t-test is used to determine if difference between the means of two groups are statistically significantly. We can run a t-test using the function `stats.ttest_ind(sample_1, sample_2)`. This syntax is slightly different than what we've seen thus far because the t-test function have 2 outputs. We assign these two outputs to the variables `difference_in_mean` and `pvalue`

In [None]:
firstProfile = argoTemp[:,0]
lastProfile = argoTemp[:,-1]
difference_in_mean, pvalue = stats.ttest_ind(firstProfile, lastProfile)
print('The difference in the mean first profile and mean last profile temperature is', difference_in_mean)
print('The p-value is', pvalue)

The above analysis doesn't mean a whole lot scientifically... <br>
Instead, lets compute the Mixed Layer Depth of each profile and see how this value changes over time.

In [None]:

# Handle columns that are all-NaN
valid_cols = ~np.all(~np.isfinite(argoTemp), axis=0)

# Compute vertical gradient dT/dz for every profile
# Uses depth spacing; returns array same shape as argoTemp
dTdz = np.gradient(argoTemp, depth, axis=0, edge_order=2)

#MLD index: where |dT/dz| is maximum (strongest gradient)
grad_mag = np.abs(dTdz)
# Put NaNs where columns are invalid so argmax ignores them
grad_mag[:, ~valid_cols] = np.nan

# np.nanargmax fails if a column is all-NaN; handle safely:
imax = np.full(argoTemp.shape[1], fill_value=-1, dtype=int)
for j in range(argoTemp.shape[1]):
    col = grad_mag[:, j]
    if np.any(np.isfinite(col)):
        imax[j] = np.nanargmax(col)

# MLD depth per profile (NaN where no valid data)
MLD = np.full(argoTemp.shape[1], np.nan, dtype=float)
good = imax >= 0
MLD[good] = depth[imax[good]]

print("Computed MLD for", good.sum(), "profiles (", argoTemp.shape[1], "total ).")


Now, we've created a new array called `MLD` that contains one value for each profile.

In [None]:
MLD.shape

In [None]:
days = np.arange(0,57)
plt.plot(days, MLD)

In [None]:
# the syntax here is stats.linregress(x, y)
slope, intercept, r, p, se = stats.linregress(days, MLD)
print('The slope of the line is',slope)
print('The y-intercept of the line is',intercept)

Okay, so not much of a linear trend here... We can probably learn more about the `MLD` data by understanding the variability.

In [None]:
plt.figure(figsize=(6, 4))
plt.hist(MLD, bins=15, edgecolor='k', alpha=0.7)
plt.xlabel('Mixed Layer Depth (m)')
plt.ylabel('Count')
plt.title('Distribution of Mixed Layer Depths')
plt.grid(True, linestyle='--', alpha=0.5)
plt.show()

Cool! Now lets use some statistical tools to describe these data. Note, some statistical tools are built in to `numpy` and some need a more advanced statistical package that we imported as `stats`.

In [None]:
# --- Basic descriptive statistics ---
n = len(MLD)
mean = np.mean(MLD)
median = np.median(MLD)
mode_result = stats.mode(MLD, keepdims=True)
std = np.std(MLD, ddof=1)    # sample standard deviation
var = np.var(MLD, ddof=1)
skew = stats.skew(MLD)
kurtosis = stats.kurtosis(MLD)
min_val = np.min(MLD)
max_val = np.max(MLD)
range_val = max_val - min_val
iqr = stats.iqr(MLD)

# --- Print nicely ---
print(f"Number of valid profiles: {n}")
print(f"Mean MLD: {mean:.2f} m")
print(f"Median MLD: {median:.2f} m")
print(f"Mode MLD: {mode_result.mode[0]:.2f} m (count = {mode_result.count[0]})")
print(f"Standard deviation: {std:.2f} m")
print(f"Variance: {var:.2f} mÂ²")
print(f"Range: {range_val:.2f} m (min={min_val:.2f}, max={max_val:.2f})")
print(f"IQR (interquartile range): {iqr:.2f} m")
print(f"Skewness: {skew:.2f}")
print(f"Kurtosis: {kurtosis:.2f}")

### Try it on your own
(1) create a heat map of the 2D array `argoTemp`. What are the dimensions? What is the meaning of the coordinates? <br>
(2) Create a new histogram, changing the number of bins. What are some other things you can say about these data?