# Solution Preparation (Piloted Spring 2025) (Student Version)

**Prior Knowledge Needed**
*  Familiarity with *arrays*
*  Familiarity with data sets that can be fit to a *model*, such as a linear calibration curve

**Content Learning Objectives**
*  Explain *calibration* and *analysis* for spectrophotometric data
*  Use a calibration curve to find the concentration of an analyte in solution.

**Process Learning Objectives**
*  Use Python code to transform data using structures such as arrays
*  Use Python code to visualize data using different types of graphs

This Jupyter notebook can be used to generate code in Python to perform four data analysis tasks commonly used in spectrophotometry:
1. Input concentration and absorbance data into arrays
2. Find the best-fit parameters and standard uncertainties
3. Generate a calibration curve using data arrays and best-fit model arrays
4. Calculate unknown concentrations and propagated uncertainties from measured absorbance data

This notebook is based on a notebook that was originally authored by Jonathan Gutow, Melissa Reeves, and Tricia Shepherd, which has been extended with additional examples from the POGIL-PCL Intro to Jupyter Notebooks Workshop team. (Please provide attribution if you use this notebook in another setting, including if you use an altered version.)

The equation for propagated uncertainty in the $x$-intercept used in Task 4 is based on a Python notebook for Analytical Chemistry Laboratory that was developed and implemented by Dr. Eleanor Gillette, Dr. David De Haan and Dr. Julia Schafer as described in J. Chem. Educ. 2021, 98, 10, 3245–3250 https://pubs.acs.org/doi/10.1021/acs.jchemed.1c00456.  

### Task 1: Entering Concentration and Absorbance Data into NumPy Arrays

In our Basic Tasks notebook, we used sample-code that already contained data in NumPy arrays.  For this new data entry task, you will need to enter the data from your laboratory notebook into the empty arrays in the sample-code.

Using the information above:

1a) Double-click here in this **text cell** and type into each of the comment lines to **explain the purpose** of each line of sample-code below in this text cell(they should look like the example shown here).

---
```
#### The code below is for:
```
---

1b) **Enter the data** from your lab notebook into the appropriate arrays in the sample-code.

1c)  Copy/paste all the code along with your explanations into the **code cell** just below this text cell, and run it.  

Compare results with team members, and discuss as a group.  Ask for help if needed.

---
```python
#### The code below is for:
import numpy as np
#### The code below is for:
Stock_Concentration_Array = np.array([ , , , , ])
#### The code below is for:
Measured_Absorbance_Array = np.array([ , , , , ])
#### The code below is for:
#### Important: If your team measured each stock concentration twice, *uncomment the two lines below* and fill in those numbers.
#Measured_Absorbance_Array_2 = np.array([ , , , , ])
#Measured_Absorbance_Array = np.average(Measured_Absorbance_Array,Measured_Absorbance_Array_2)
#### The code below is for:
N = len(Stock_Concentration_Array)
#### The code below is for:
for row in range(N):
    print(Stock_Concentration_Array[row],Measured_Absorbance_Array[row])

```
---

### Task 2: Fitting Calibration Data to a Model with Uncertainty

In the Basic Tasks notebook, we used NumPy to fit data to a model.  In this notebook, we will use SciPy for this purpose.  We will use linregress, a function in the statistics module of the SciPy package.  The fitting routine in SciPy includes uncertainties along with best-fit parameters.  

Using the information above:

2a)  Double-click here in this **text cell** and type into each of the comment lines to **explain the purpose** of each line of sample-code below in this text cell(they should look like the example shown here).

---
```
#### The code below is for:
```
---

2b)  Copy/paste all the code along with your explanations into the **code cell** just below this text cell, and run it.  

2c)  Answer the **key question below** in your copy of this notebook by typing your answer below the question in this text cell.

Compare results with team members, and discuss as a group.  Ask for help if needed.

#### **Thinking About The Data Question, Task 2**: This notebook and the Basic Tasks notebook used a different Python library for the same purpose.  Look at a completed Basic Tasks notebook to compare.
* Q2a.  How are the two fitting functions different?  Discuss as a team and identify *at least* three key differences: *at least* one noted in the information above, *at least* one in the sample-code, and *at least* one in the output after running the code.

---
```python
####The code below is for:
import scipy
from scipy.stats import linregress
####The code below is for:
linear_best_fit = linregress(Stock_Concentration_Array,Measured_Absorbance_Array)
####The code below is for:
print(linear_best_fit)
```
---

### Task 3: Plotting Calibration Data with a Best-Fit Model

In the Basic Tasks notebook, we used Matplotlib to plot data and a best-fit model.  Here we will feature some additional capabilities of Matplotlib: to **display error bars**; and to construct a **composite figure** that displays two plots side-by-side.

Calibration curves are plots of calibration data and a best-fit model, often linear.  Typically a calibration curve is plotted without error bars, but the sample-code here also adds error bars to illustrate the propagated uncertainty in the model $y=mx+b$.

The standard uncertainty in the $y$-coordinate is propagated through the linear model from the standard uncertainties in the slope and intercept.    

Based on Appendix B in the textbook, the propagated uncertainty in the model $y=mx+b$ is:

$u_y = \sqrt{x^2\cdot u_m^2 + u_b^2 + 2x\cdot u_{mb}}$

where the *covariance*, given by $u_{mb}=\frac{-s_y^2\cdot \sum_i x_i}{N\cdot\sum(x_i^2)-(\sum_i x_i)^2}$, is used to calculate a small correction not found in simple error propagation rules.  To calculate the covariance, we also must find the squared standard deviation in $y$, given by $s_y^2 = \frac{\sum(y-y_{model})^2}{N-2}$, which will be useful later.

Using the information above:

3a)  Double-click here in this **text cell** and type into each of the comment lines to **explain the purpose** of each line of sample-code below in this text cell(they should look like the example shown here).

---
```
#### The code below is for:
```
---

3b)  Copy/paste all the code along with your explanations into the **code cell** just below this text cell, and run it.  

3c)  Answer the **key question below** in your copy of this notebook by typing your answer below the question in this text cell.

Compare results with team members, and discuss as a group.  Ask for help if needed.

#### **Thinking About The Data Question, Task 3**: This task generates two types of plots.  Look at your output to compare.
* Q3a.  How well does the linear regression model fit your data?  Discuss as a team and decide whether all data points fit *exactly* to the model.  Also decide whether all data points fit to the model *within* standard error bars.  Explain briefly.
* Q3b. (Challenge) Standard error bars for the model are displayed in the y-coordinate, not in the x-cooordinate.  Why?  Discuss first as a team, and consult with other teams if needed.

---
```python
#### The code below is for:
import matplotlib
from matplotlib import pyplot as plt
#### The code below is for:
dual_figure = plt.figure(figsize=(14,6))
####The code below is for:
plot1, plot2 = dual_figure.subplots(1,2,sharex=True,sharey=True)
#### The code below is for:
plot1.set_title('Linear Best Fit')
plot2.set_title('Linear Best Fit With Error Bars')
#### The code below is for:
plot1.set_xlabel('Stock Concentration in M')
plot2.set_xlabel('Stock Concentration in M')
plot1.set_ylabel('Absorbance')
plot2.set_ylabel('Absorbance')
#### The code below is for:
label1 = "Linear best fit: y = {slope:.2e}x + {intercept:.2e}".format(slope=linear_best_fit.slope,intercept=linear_best_fit.intercept)
#### The code below is for:
model = Stock_Concentration_Array*linear_best_fit.slope + linear_best_fit.intercept
#### The code below is for:
plot1.plot(Stock_Concentration_Array,Measured_Absorbance_Array,'ob')
plot1.plot(Stock_Concentration_Array,model,'--k',label=label1)
plot1.legend()
#### The code below is for:
label2 = "Linear best fit with standard uncertainties:\n y = ({slope:.2e}±{m_stderr:.2e})x + ({intercept:.2e}±{b_stderr:.2e})".format(slope=linear_best_fit.slope,intercept=linear_best_fit.intercept,m_stderr=linear_best_fit.stderr,b_stderr=linear_best_fit.intercept_stderr)
#### The code below is for:
stdev_y_squared = np.sum(np.square(Measured_Absorbance_Array - model))/(N-2)
D = N*np.sum(np.square(Stock_Concentration_Array))-(np.sum(Stock_Concentration_Array))**2
covariance = -stdev_y_squared * np.sum(Stock_Concentration_Array) / D
#### The code below is for:
model_y_stderr = np.sqrt(np.square(linear_best_fit.stderr*Stock_Concentration_Array)+(linear_best_fit.intercept_stderr**2)*np.ones(N)+(2*covariance)*Stock_Concentration_Array)
#### The code below is for:
plot2.plot(Stock_Concentration_Array,Measured_Absorbance_Array,'ob')
plot2.errorbar(Stock_Concentration_Array,model,yerr=model_y_stderr,fmt='--k', capsize=4, label=label2)
plot2.legend()
#### The code below is for:
dual_figure.show()
```
---

### Task 4: Using a Best-Fit Model to Calculate an Unknown Concentration from Measured Absorbance With Standard Uncertainty

When applying a best-fit model to calculations, the uncertainty in the model should be *propagated* to estimate the uncertainty in the calculated value.

When applying a linear regression model to calculate an unknown concentration $x$, the propagated standard uncertainty in the calculated value is:

$ s_x= \frac{s_y}{|m|} \sqrt{\frac{1}{k} + \frac{1}{N} + \frac{(y-\bar{y})^2}{m^2 \sum(x_i-\bar{x})^2}}$

where $s_y = \sqrt{\frac{\sum(y-y_{model})^2}{N-2}}$ is the square root of "stdev_y_squared" which we calculated in Task 3 to plot the model with y-error bars; $m$ is the absolute value of the slope; $k$ is the number of replicate measurements (if you only measured each stock solution once, k = 1) of each data point; $N$ is the number of data points used to fit the model; $y$ is the measured value; and $\bar{x}$ and $\bar{y}$ are the average $x$-value and $y$-value of data points used to fit the model.

This uncertainty calculation is not included in the linregress function.  Instead, we will define a new function for this calculation.

Using the information above:

4a)  Double-click here in this **text cell** and type into each of the comment lines to **explain the purpose** of each line of sample-code below in this text cell(they should look like the example shown here).

---
```
#### The code below is for:
```
---

4b) **Enter the data** from your lab notebook into the appropriate place in the sample-code.

4c)  Copy/paste all the code along with your explanations into the **code cell** just below this text cell, and run it.  

4d)  Answer the **key question below** in your copy of this notebook by typing your answer below the question in this text cell. If you decide to complete the challenge part using code, please add your new code to the end of the code cell below.

Compare results with team members, and discuss as a group.  Ask for help if needed.

#### **Thinking About The Data Question, Task 4**: This task calculates the concentration of your unknown and its standard uncertainty.  Use the information above, along with your output, to answer the following:
* Q4a.  Where did the uncertainty come from?  List at least two sources.
* Q4b.  We reported, but *did not graph*, the calculated concentration with its propagated uncertainty.  Why not?
* Q4c. (Challenge) You were tasked to prepare a solution with a molar concentration *accurate* to within at least 1% of an assigned target molar concentration.  However, the *uncertainty* calculated in this task is a measure of *precision*, not *accuracy*.  Calculate the *percent error* in your solution preparation to assess *accuracy*, and report your calculated percent error here.  (You may *optionally* do this calculation by writing a few lines of Python code, adding it to the code cell below, and re-running the code cell.)  

---
```python
#### The code below is for:
#### Important: The value of k is the number of times your group measured each standard solution for the calibration graph.
Value_of_k =
#### The code below is for:
Unknown_Absorbance =
#### The code below is for:
Unknown_Concentration = (Unknown_Absorbance - linear_best_fit.intercept)/linear_best_fit.slope
#### The code below is for:
def standard_uncertainty(y,k,n,m,s_y,x_data,y_data):
    x_average = np.average(x_data)
    y_average = np.average(y_data)
    s_xx = np.sum((x_data-x_average)**2)
    s_x = (s_y/abs(m))*np.sqrt((1/k)+(1/n)+(((y-y_average)**2)/((m*m*s_xx))))
    return s_x
#### The code below is for:
Propagated_Uncertainty = standard_uncertainty(Unknown_Absorbance,Value_of_k,N,linear_best_fit.slope,np.sqrt(stdev_y_squared),Stock_Concentration_Array,Measured_Absorbance_Array)
#### The code below is for:
print(f"Calculated Molar Concentration = {Unknown_Concentration:.4f} ± {Propagated_Uncertainty:.4f} M")
```
---

***Congratulations, you did it!***  
Please download your copy of this notebook to turn in.

Remember to write a summary in your laboratory notebook, by hand, and to turn in copies of your notebook pages.