# What Happened to the Seashells?  (Ocean Acidification Week 3 Data Analysis) (Student Version) (Piloted Spring 2025)

#### This module is designed to support the published laboratory experiment "Ocean Acidification: Investigation and Presentation of the Effects of Elevated Carbon Dioxide Levels on Seawater Chemistry and Calcareous Organisms" by Jeffrey M. Buth, *J. Chem. Educ.* **2016**, *93*, 718–721.

In Week 1, we used CO$_2$ to acidify seashells in artificial seawater.  We also processed some seashells in artificial seawater without added CO$_2$.  

In Week 2, we measured the pH and concentrations of carbonate and bicarbonate ions in these two solutions, and in a control solution.  What happened?  

The purpose of this Jupyter notebook is to complete the required statistical analysis for our Week 3 data, using class data in three CSV files.

This week, we measured final seashell masses, and titrated both calcium and magnesium, for three solutions:
*   CO$_2$-acidified seashell solution (with shells removed)
*   Non-CO$_2$-acidified seashell solution (with shells removed)
*   Artificial seawater control solution (no shells, ever)

One hypothesis might be that the seashells could have dissolved to yield bicarbonate ions:

$CaCO_3(s) + H^+(aq) \rightarrow Ca^{2+}(aq) + HCO_3^-(aq)$

$MgCO_3(s) + H^+(aq) \rightarrow Mg^{2+}(aq) + HCO_3^-(aq)$

Because of the randomness inherent in this process, statistical analysis is necessary to answer this question.  Before you begin, please:  
1.   Record your individual data in a downloaded copy of the provided Excel template;
2.   Transfer the mass values and volume changes into appropriate tabs in the shared class data spreadsheet (along with your names); and
3.   After everyone else has entered their data; download *each* tab in the shared class data spreadsheet as a CSV file.  Give each CSV file an appropriate filename.

**Prior Knowledge Needed**
*   Statistical concepts including mean, standard deviation, Student T test, and confidence interval
*   Chemical concepts including solubility, molar concentration, and stoichiometry in an aqueous solution
*   Python skills practiced in Week 2

**Content Learning Objectives:**
*   Calculate percent differences relative to a control
*   Use statistical concepts to quantify uncertainty in class data
*   Apply the Student T test to determine whether acidification with CO$_2$ significantly affected the measured quantities

**Process Learning Objectives:**
*   Modify Python code to transform data using structures such as arrays and tables
*   Work in teams to manage and share data
*   Work in teams to evaluate outcomes of statistical tests

**Overview:**

This Jupyter notebook can be used to generate code in Python to perform a variety of data analysis tasks:
1.    Use stoichiometry to calculate molar concentrations for analytes in a titration experiment.
2.    Upload multiple CSV files using Pandas.
3.    Reorganize class data into NumPy arrays; calculate the mean and standard deviation over each NumPy array.
4.    Calculate percent differences relative to a control; propagate standard deviations through the calculation.
5.    Apply the Student T test to determine whether acidification with CO$_2$ significantly affected the measured quantities.



### Task 1: Using stoichiometry to calculate molar concentrations

One important step in problem solving is to identify known quantities that will be useful in solving the problem.  When writing code to solve a problem, it is a good idea to define variables right away and use them to store these known quantities so we can use them later in the code.

The balanced, net ionic equation for the titration reaction between Ca$^{2+}$ or Mg$^{2+}$ and CO$_3^{2-}$ is as follows:

$Ca^{2+}(aq) + H_2EDTA^{2-}(aq) \rightarrow CaH_2EDTA(aq)$

$Mg^{2+}(aq) + H_2EDTA^{2-}(aq) \rightarrow MgH_2EDTA(aq)$

Note that the titration using Eriochrome Black T measures the total amount of Ca$^{2+}$ or Mg$^{2+}$, while the titration using strong base and hydroxynapthol blue measures only the amount of Ca$^{2+}$.  This should be subtracted from the total amount to calculate the initial amount of Mg$^{2+}$.

Calcium and magnesium ion analyte concentrations can be used to estimate masses of calcium carbonate and magnesium carbonate dissolved, using solution stoichiometry:

$CaCO_3(s) \rightarrow Ca^{2+}(aq) + CO_3^{2-}(aq)$

$MgCO_3(s) \rightarrow Mg^{2+}(aq) + CO_3^{2-}(aq)$

1a)  Using the information above, double-click here in this **text cell** and type into each of the comment lines to **explain the purpose** of each line of sample-code below in this text cell (they should look like the example shown here).

---
```
#### The code below is for:
```
---

1b)  Enter the stoichiometric coefficients from the net ionic equations above, formula masses of ionic compounds, and the EDTA titrant concentration recorded in your lab notebook into the sample-code below.

1c) Copy/paste all the code along with your explanations into the **code cell** just below this text cell, and run it.

---
```python
#### The code below is for:
Stoichiometric_coefficient_of_Ca_ion =
Stoichiometric_coefficient_of_Mg_ion =
Stoichiometric_coefficient_of_EDTA =
Stoichiometric_coefficient_of_CaCO3 =
Stoichiometric_coefficient_of_MgCO3 =
#### The code below is for:
Formula_mass_of_CaCO3 =
Formula_mass_of_MgCO3 =
#### The code below is for:
Concentration_of_titrant_in_M =
#### The code below is for:
Volume_of_sample_in_mL = 1.00
Volume_of_ASW_in_L = 0.150
```
---

### Task 2: Uploading CSV files using Pandas

#### **Important**: Make sure you have downloaded all three CSV files from the shared class data file, each named appropriately.

#### **Important**: Upload the CSV files to Colab:
* First click the file folder icon to expand the Files sidebar.
* Next click the file upload icon to upload THIS WEEK'S CSV files in class data.
* Enter the filenames into the code as instructed below.

To upload multiple CSV files containing data for different quantities, we will use several packages.  The IO package allows us to read files, and the Pandas package allows us to interpret CSV files in the database format using the rows and columns of the CSV file. This will import the CSV data into a dataframe within this Jupyter notebook.

2a) Using the information above, **go back** to the corresponding **text cell** in your Ocean Acidification Week 2 Jupyter notebook.  Copy / paste your annotated code into this **text cell** below this list.

2b) **Change** the word "pH" to "mass"; "carbonate" to "calcium"; and "bicarbonate" to "magnesium", everywhere you see them in the code.  **Also change** the dataframe titles from "big_dp" to "big_dm"; "big_dc" to "big_dca"; and "big_db" to "big_dmg".

2c) Enter **all** filenames of this week's uploaded CSV files into the sample-code below enclosed in quotes (like the example shown here):

---
```
filename = 'filename_1'
```
---

2d) Copy/paste all the **modified** code into the **code cell** just below this text cell, and run it.  

2e) Finally, **answer the key question below** in your laboratory notebook. Discuss as a group. Ask for help if needed.

#### **Thinking About the Data, Task 2**: Suppose you thought you uploaded all the data files, but when you run the code below, the output doesn't look right.  *What should you do if*:
*   Q2a. Running the code below yields an error message.
*   Q2b. Running the code below yields printed messages that seem wrong, such as "pH filename = 'Mass.csv'"

---
```
#### Copy/paste and modify code here:


```
---

### Task 3: Calculating the mean and standard deviation over each category of class data.

#### **Important**: Complete Task 2 first.

The goal in this task is to summarize data from the imported dataframes, covering all three types of solutions and all three measured quantities.  Because the class data spreadsheet may contain null (blank) values, the sample-code uses a *logical test* to remove those.  The symbol "~" means "not"; only the values that are not blank will be included for analysis.  The Numpy package is used to summarize the data and store it in arrays for further analysis.

The initial, final, and changes in mass of seashells should have been recorded for carbonated and non-carbonated solutions.  There were no seashells in the control.  Therefore, the mass dataframe should have six columns of data instead of three.

3a) Using the information above, **go back** to the corresponding **text cell** in your Ocean Acidification Week 2 Jupyter notebook.  Copy / paste your annotated code into this **text cell** below this list.


3b) **Change** the word "pH" to "mass"; "carbonate" to "calcium"; "Carbonate" to "Calcium"; "bicarbonate" to "magnesium"; and "Bicarbonate" to "Magnesium", everywhere you see them in the code.  
  **Also change** the dataframe titles from "big_dp" to "big_dm"; "big_dc" to "big_dca"; and "big_db" to "big_dmg".  
  **Finally change** the abbreviations "carb" to "ca" and "bicarb" to "mg" **Double check**: you should now have sample-code for seashell mass, calcium ions, magnesium ions in this **text cell**.

3c) Copy/paste all the code along with your explanations into the **code cell** just below this text cell, and run it.  

3d) Take a look at the output and **decide as a group** if it looks reasonable.  Ask for help if needed.

3e) Finally, **answer the key question below** in your laboratory notebook.  Discuss as a group.  Ask for help if needed.

#### **Thinking About the Data, Task 3** This is an opportunity to make sure the data make sense before proceeding any further.
*   Q3a. Seashells can dissolve in acidic solution.  What was the average change in seashell mass when CO$_2$ was added?  How about when CO$_2$ was not added?  Do these average values make sense, and are they consistent with the values you measured in lab?
*   Q3b. Calcium carbonate is present in seawater, and also accounts for a large percentage of seashell mass.   What was the average calcium ion titration volume when CO$_2$ was added?  How about when CO$_2$ was not added?  How about in the control solution?  Do these average values make sense, and are they consistent with the values you measured in lab to the hydroxynapthol blue endpoint?
*   Q3c. Magnesium carbonate is present in seawater, and also accounts for a small percentage of seashell mass.  What was the average magnesium ion titration volume in mL when CO$_2$ was added?  How about in the control solution?  Do these average values make sense, and are they consistent with the values you measured in lab at the Eriochrome Black T endpoint?

---
```
#### Copy/paste and modify code here:


```
---

### Task 4: Calculate percent differences relative to a control; propagate uncertainties.

#### **Important**: Complete Task 3 first.

The goal of this task is to calculate percent differences for the average of each measured metal ion concentration in the carbonated and non-carbonated solutions, relative to the control solution.  Measured titration volumes are then converted to analyte concentrations using solution stoichiometry. Finally, analyte concentrations are used to predict changes in seashell mass using solution stoichiometry.  The standard deviations are propagated through all calculations using the rules from your textbook.

For this part, sample-code is given for the calcium ions.  You will need to copy/paste and modify it to produce code for the magnesium ions.

4a)  Using the information above, double-click here in this **text cell** and type into each of the comment lines to **explain the purpose** of each line of sample-code below in this text cell (they should look like the example shown here).

---
```
#### The code below is for:
```
---

4b)  Next, copy the existing sample-code for calcium ions in this **text cell**, and paste it below the existing sample-code in this **text cell** as indicated.

4c)  In the second copy of the sample-code (only), **Change** the word "calcium" to "magnesium", "Calcium" to "Magnesium", "ca" to "mg", and "Ca" to "Mg" everywhere you see them in the code.  **Double check**: you should now have sample-code for calcium ions **and** magnesium ions, both in this **text cell**.

4d) Copy/paste all the code along with your explanations into the **code cell** just below this text cell, and run it.  

4e) Take a look at the output and **decide as a group** if it looks reasonable. Ask for help if needed.

4f) Finally,  **answer the key question below** in your laboratory notebook, discuss as a group, and take any steps your group deems appropriate.  Ask for help if needed.

#### **Thinking About the Data, Task 4** Our main goal is to find out what happened to the seashells.  How could the output from Task 4 help you to answer these questions?  For each question, first *make a prediction* as to what you might see in the output below if CO$_2$ acidification has a measurable impact, and how that might differ if it does not.  Finally, take a look at the output and write down what you notice.
*   Q4a. Seashells contain calcium carbonate and magnesium carbonate.  How might pH affect the solubility of these ionic compounds?  Which solution had the lower pH?  Which solution had the greater change in seashell mass?  How could you explain the changes in seashell mass, as a chemist?
*   Q4b. Seashells contain calcium carbonate and magnesium carbonate.  What do you predict seashells could do to the calcium and magnesium ion concentrations in seawater?  Which solution had the lower pH?  Which solution had greater changes in ion concentrations?  Which ion concentration changed more, calcium ions or magnesium ions? How could you explain these results, as a chemist?
*   Q4c. Dissolution of calcium carbonate and magnesium carbonate in seashells is one possible explanation for the change in mass of seashells.  Do the results support this explanation?  Which solution had a greater change in seashell mass?  Which solution had a greater estimated change in mass, based on calcium and magnesium ion concentrations?  

---
```python
### The code below is for:
import math
import scipy as sp

#### The code below is for:
mmol_ca_mean = ca_mean * Concentration_of_titrant_in_M * Stoichiometric_coefficient_of_Ca_ion/Stoichiometric_coefficient_of_EDTA
mmol_ca_stdev = ca_stdev * Concentration_of_titrant_in_M * Stoichiometric_coefficient_of_Ca_ion/Stoichiometric_coefficient_of_EDTA
#### The code below is for:
M_ca_mean = mmol_ca_mean/Volume_of_sample_in_mL
M_ca_stdev = mmol_ca_stdev/Volume_of_sample_in_mL
#### The code below is for:
print(f"Calcium ion concentrations were:\n {M_ca_mean[0]:.2e} ± {M_ca_stdev[0]:.2e} M for control;\n {M_ca_mean[1]:.2e} ± {M_ca_stdev[1]:.2e} M for non-carbonated;\n and {M_ca_mean[2]:.2e} ± {M_ca_stdev[2]:.2e} M for carbonated samples.\n")

#### The code below is for:
M_ca_mean_rel = M_ca_mean-M_ca_mean[0]
#### The code below is for: Propagating the uncertainty through the difference calculation
M_ca_stdev_rel = np.sqrt((np.square(M_ca_stdev)+M_ca_stdev[0]**2))
#### The code below is for:
g_caco3_mean_rel = M_ca_mean_rel*Volume_of_ASW_in_L*Formula_mass_of_CaCO3*Stoichiometric_coefficient_of_CaCO3/Stoichiometric_coefficient_of_Ca_ion
g_caco3_stdev_rel = M_ca_stdev_rel*Volume_of_ASW_in_L*Formula_mass_of_CaCO3*Stoichiometric_coefficient_of_CaCO3/Stoichiometric_coefficient_of_Ca_ion
#### The code below is for:
print(f"Relative to control, the calcium ion concentration in non-carbonated samples changed by {M_ca_mean_rel[1]:.2e} ± {M_ca_stdev_rel[1]:.2e} M...")
print(f" As a percentage, this is a change of {M_ca_mean_rel[1]/M_ca_mean[0]:.1%} ± {M_ca_stdev_rel[1]/M_ca_mean[0]:.1%};")
print(f" This would correspond to a dissolved mass of {g_caco3_mean_rel[1]:.3f} ± {g_caco3_stdev_rel[1]:.3f} g of calcium carbonate from seashells in ASW...")
print(f" ...compared to an observed change in mass of {mass_mean[4]:.2f} ± {mass_stdev[4]:.2f}%, or {mass_mean[4]*mass_mean[0]*0.01:.2f} ± {mass_stdev[4]*mass_mean[0]*0.01:.2f} g.\n")
print(f"Relative to control, the calcium ion concentration in carbonated samples changed by {M_ca_mean_rel[2]:.2e} ± {M_ca_stdev_rel[2]:.2e} M...")
print(f" As a percentage, this is a change of {M_ca_mean_rel[2]/M_ca_mean[0]:.1%} ± {M_ca_stdev_rel[2]/M_ca_mean[0]:.1%};")
print(f" This would correspond to a dissolved mass of {g_caco3_mean_rel[2]:.3f} ± {g_caco3_stdev_rel[2]:.3f} g of calcium carbonate from seashells in ASW...")
print(f" ...compared to an observed change in mass of {mass_mean[5]:.2f} ± {mass_stdev[5]:.2f}%, or {mass_mean[5]*mass_mean[0]*0.01:.2f} ± {mass_stdev[5]*mass_mean[0]*0.01:.2f} g.\n")

#### Copy/paste and modify code here:

```
---

### Task 5: Calculate 95\% confidence intervals; apply the Student T test.

#### **Important**: Complete Task 4 first.

While it is always a good idea to propagate uncertainty, a  difference between measured values will always have a higher  uncertainty than the values being compared.  So, you may have calculated some large propagated uncertainties in Task 4, and you might be wondering whether these differences are significant.

Fortunately, statistical testing can tell us instantly if the *average* measured values are statistically different from one another.  If the calculated Student t value is greater than the critical Student t value, then the difference in mean values is significant.  

The goal of this task is to compare Student t values at the 95% confidence level.  We will also calculate 95% confidence intervals for the mean values, which tend to narrow as more trials are taken together.  (*This is why we needed to pool together everyone's data as a class.*)

For this part, sample-code is given for the seashell masses and calcium ions.  You will need to copy/paste and modify it to produce code for the magnesium ions.

5a)  Using the information above, double-click here in this **text cell** and type into each of the comment lines to **explain the purpose** of each line of sample-code below in this text cell (they should look like the example shown here).

---
```
#### The code below is for:
```
---

5b)  Next, copy the existing sample-code for calcium ions (but not for seashell masses) in this **text cell**, and paste it below the existing sample-code in this **text cell** as indicated.

5c)  In the second copy of the sample-code (only), **Change** the word "calcium" to "magnesium", "Calcium" to "Magnesium", "ca" to "mg", and "Ca" to "Mg" everywhere you see them in the code.  **Double check**: you should now have sample-code for calcium ions **and** magnesium ions, both in this **text cell**.

5d) Copy/paste all the code along with your explanations into the **code cell** just below this text cell, and run it.  

5e) Take a look at the output and **decide as a group** if it looks reasonable. Ask for help if needed.

5f) Finally,  **answer the key question below** in your laboratory notebook, discuss as a group, and take any steps your group deems appropriate.  Ask for help if needed.

#### **Thinking About the Data, Task 5** Any good chemical analysis *must* be statistically significant in order to draw conclusions.  Do any of the changes you noted in Task 4 *seem* significant based on only the standard deviations in the percent differences?  How could the output from Task 5 help you to determine whether you can draw statistically significant conclusions from the data?
*   Q5a. Based on the the 95% confidence intervals and the Student t test, are one or both of the changes in seashell mass that you noted in Task 4 statistically significant?  How does this affect your conclusions about seawater pH?
*   Q5b. Based on the the 95% confidence intervals and the Student t test, are one or both of the changes in calcium ion concentration that you noted in Task 4 statistically significant?  How does this affect your conclusions about what may have happened  when seashells were exposed to carbonic acid?
*   Q5c. Based on the the 95% confidence intervals and the Student t test, are one or both of the changes in magnesium ion concentration that you noted in Task 4 statistically significant?  How does this affect your conclusions about what may have happened when seashells were exposed to carbonic acid?
*   Q5d. Are estimated changes in mass based on changes in ion concentrations consistent with observed changes in mass?  How does this affect your conclusions about what may have happened when seashells were exposed to carbonic acid?

---
```python
#### The code below is for:
def t_calc_pooled(s1,s2,x1,x2,n1,n2):
#### The code below is for: Using the formula for a pooled standard deviation
    s_pooled = math.sqrt((((s1**2)*(n1-1))+((s2**2)*(n2-1)))/(n1+n2-2))
#### The code below is for:
    t_calc = (abs(x1-x2))*math.sqrt((n1*n2)/(n1+n2))/s_pooled
#### The code below is for: Setting the function to output the Student t value
    return t_calc

#### The code below is for:
t_crit_mass = np.array([abs(sp.special.stdtrit(mass_dof[4],0.025)),abs(sp.special.stdtrit(mass_dof[5],0.025))])
#### The code below is for:
ci_mass = np.divide(np.multiply(t_crit_mass,mass_stdev[4:6]),np.sqrt(mass_dof[4:6]+1))
#### The code below is for:
print(f"95% confidence intervals for average seashell mass changes:\n {mass_mean[4]:.2f} ± {ci_mass[0]:.2f}% for non-carbonated solution; \n {mass_mean[5]:.2f} ± {ci_mass[1]:.2f}% for carbonated solution.")

#### The code below is for:
t_crit_mass_diff = abs(sp.special.stdtrit(mass_dof[4]+mass_dof[5],0.025))
#### The code below is for:
t_calc_mass_diff = t_calc_pooled(mass_stdev[4],mass_stdev[5],mass_mean[4],mass_mean[5],mass_dof[4]+1,mass_dof[5]+1)
#### The code below is for:
print(f"Student t values for comparing the average seashell mass changes in non-carbonated and carbonated solutions:\n t-calc = {t_calc_mass_diff:.2f}, t-crit = {t_crit_mass_diff:.2f}.\n")

#### The code below is for:
t_crit_ca = np.array([abs(sp.special.stdtrit(ca_dof[0],0.025)),abs(sp.special.stdtrit(ca_dof[1],0.025)),abs(sp.special.stdtrit(ca_dof[2],0.025))])
#### The code below is for:
ci_M_ca = np.divide(np.multiply(t_crit_ca, M_ca_stdev),np.sqrt(ca_dof+1))
#### The code below is for:
print(f"95% confidence intervals for average calcium concentration:\n {M_ca_mean[0]:.2e} ± {ci_M_ca[0]:.2e} M for control solution;\n {M_ca_mean[1]:.2e} ± {ci_M_ca[1]:.2e} M for non-carbonated solution; \n {M_ca_mean[2]:.2e} ± {ci_M_ca[2]:.2e} M for carbonated solution.")

#### The code below is for:
t_crit_ca_non_carb_diff = abs(sp.special.stdtrit(ca_dof[0]+ca_dof[1],0.025))
t_crit_ca_carb_diff = abs(sp.special.stdtrit(ca_dof[0]+ca_dof[2],0.025))
#### The code below is for:
t_calc_ca_non_carb_diff = t_calc_pooled(M_ca_stdev[0],M_ca_stdev[1],M_ca_mean[0],M_ca_mean[1],ca_dof[0]+1,ca_dof[1]+1)
t_calc_ca_carb_diff = t_calc_pooled(M_ca_stdev[0],M_ca_stdev[2],M_ca_mean[0],M_ca_mean[2],ca_dof[0]+1,ca_dof[2]+1)
#### The code below is for:
print(f"Student t values for comparing the average calcium concentration in non-carbonated solution to the control:\n t-calc = {t_calc_ca_non_carb_diff:.2f}, t-crit = {t_crit_ca_non_carb_diff:.2f}")
print(f"Student t values for comparing the average calcium concentration in carbonated solution to the control:\n t-calc = {t_calc_ca_carb_diff:.2f}, t-crit = {t_crit_ca_carb_diff:.2f}\n")

#### Copy/paste and modify code here:
```
---

**Congratulations, you did it!**

Please download your copy of this notebook to turn in.

Remember to write a summary in your laboratory notebook, by hand, and to turn in copies of your notebook pages.

Results also will be used in your formal report.