# 3  Assessing the quality of a fit

## 3.1  Root mean square deviation

One quantity that can be used to decide how good a fit is (and therefore which function best fits a set of data) is the **root mean square deviation**. This is true not just for linear fits but for any type of fit. For example, if the data you are fitting does not follow a linear relationship, you may test several different functions and use the root mean square deviation to decide which one provides the best fit.

Suppose we have a function whose values  depend on a variable $x$, in other words $y(x)$ is a function of $x$. If we use $(x_i, y_i)$ to indicate that we have a pair of measured $(x,y)$ values, and we are able to calculate the values of the function (we’ll call these $y_i^{calc}$ ) for each $x_i$, then the deviation of the measured point $y_i$ with respect to the calculated point $y_i^{calc}$  is:

$$y_i - y_i^{calc}$$

If we then take the square of this quantity:

$$(y_i - y_i^{calc})^2$$

the mean square deviation is the sum of the squares of the deviations for all  points divided by $N$, the number of points:
$$\frac{\sum_i^N (y_i - y_i^{calc})^2}{N}$$

Finally, the root mean square deviation is given by:

$$\sqrt{\frac{\sum_i^N (y_i - y_i^{calc})^2}{N}}$$                        (Eqn 4.3)

The phrase ‘root mean square deviation’ is therefore a shorthand for saying ‘the square root of the mean of the squares of the deviations’. The smaller the root mean square deviation, the better the fit.

As noted above, normally, we would use a function of $x$ to obtain the different values of $y_i^{calc}$. If different functions can possibly be used to determine the values of $y_i^{calc}$, then calculating the root mean square deviation will tell us which function gives the best fit – it will be the one with the smallest value of the root mean square deviation. Similarly, if different sets of data, for example from different experiments, are available, the root mean square deviation will tell us which is the best fit. In other words, which measured results are closer to a straight line.

Below, we have coded a function that calculates the root mean square deviation.

In [None]:
# Function that calculates the root mean square deviation for
# a set of measured, y_data, and calculated, y_model, values.

import numpy as np

def rmsd(y_data,y_model):
    dev = y_data - y_model
    square_dev = dev**2 
    sum_square_dev = sum(square_dev)
    mean_square_dev = sum_square_dev/len(square_dev)
    root_mean_square_dev = np.sqrt(mean_square_dev)
    return root_mean_square_dev

> What is `square_dev` in line 8 above?

:::{hint} Answer
:class: dropdown

It is an array containing the squares of the values of the deviations, e.g. $(y_i -y_i^{calc})^2$ 

:::

> What does line 9 in the cell above do?

:::{hint} Answer
:class: dropdown
    
It uses  the built-in `sum` function to calculate the sum of all the elements in array `square_dev`, i.e. to calculate the sum of all the $(y_i -y_i^{calc})^2$ 

:::

> What is `len`in line 10 and why is it needed?

:::{hint} Answer
:class: dropdown

`len` is another built-in function that returns the number of elements in an array, in this case, in array `square_dev`. It is needed to determine the mean square deviation by dividing the sum of the square deviations`sum_square_dev` by the number of elements, or points in the data.

:::

> What input would you require to use the function `rmsd` above in your own program?

:::{hint} Answer
:class: dropdown
    
Two arrays containing the $y_i$ and $y_i^{calc}$.

:::

### Exercise 3.1

Use the function in the code cell above to calculate root mean square deviation for the fit to to the force-extension data used in Python 4 Nobook1 and Notebook 2: 

<code> x_values = array([0.053,0.042,0.029,0.025,0.017,0.010,0.008,0.002],float)</code>
    
<code> y_values = array([7.05,5.93,4.08,4.01,2.83,2.05,1.393,0.452],float)
</code>

Once you've answered the exercise, click on the <u>**+ 1 cell hidden** </u> button below to to see a possible solution.

In [None]:
import numpy as np

# Create two arrays
x_values = np.array([0.053,0.042,0.029,0.025,0.017,0.010,0.008,0.002],float)
y_values = np.array([7.05,5.93,4.08,4.01,2.83,2.05,1.393,0.452],float)

# Function that calculates the root mean square deviation for
# a set of measured, y_data, and calculated, y_model, values.
def rmsd(y_data,y_model):
    dev = y_data - y_model
    square_dev = dev**2 
    sum_square_dev = sum(square_dev)
    mean_square_dev = sum_square_dev/len(square_dev)
    root_mean_square_dev = mean_square_dev**0.5
    return root_mean_square_dev

# Use polyfit to perform a linear fit
grad, intc =  np.polyfit(x_values, y_values, deg=1) # Call polyfit to fit a straight line (polynomial of degree 1)
                                                    # to the data provided. 

# Use the gradient and intercept to determine the y values given by the fit line
y_model=intc + grad*x_values

# Print the rms calculated calling the root mean square deviation function
print ("The root mean square deviation is:",rmsd(y_values,y_model))

### &nbsp;

In [None]:
# Write your python code here.




## 3.2   Confirming the Rydberg constant 

You have seen in Topic 4, Section 2.8 that the spectrum of the hydrogen atom can be understood in terms of transitions between different quantum states with discrete energies. The energy of a photon in the  spectrum corresponds to the difference in energies between quantum states of the  atom. Some of these photons correspond to  visible light while others correspond to infrared or ultraviolet light. Figure 1 shows an example of several transitions from different initial states to a final state with principal quantum number 2.

![Some emmission transitions in the hydrogen atom](image_transition.jpg)

:::{hint} Figure caption
:class: dropdown

Schematic depiction of the transitions between several initial states of hydrogen with quantum numbers $n_i$ > 2 to the final state with $n_f$=2.
:::

The properties of the hydrogen spectrum can therefore be understood and predicted from its energy levels. You saw in Topic 4 that energy levels of the hydrogen atom are given by:

$$E_n = -\frac{13.6 \; \text{eV}}{n^2}=-\frac{R}{n^2}$$

Here I have used $R$ to replace 13.6 eV which corresponds to the **Rydberg** constant. So the energy difference between an initial state with    principal quantum number  $n_i$ and a final state with principal quantum number $n_f$ is:

$$ E_{\text{photon}}= |E_f -E_i| = \left|-\frac{R}{n_f^2} -\left( - \frac{R}{n_i^2} \right) \right|= R\left(\frac{1}{n_f^2} -\frac{1}{n_i^2} \right)$$

Note that we take the absolute value of the energy difference because the photon energy is always positive.

It is possible to confirm the value of the Rydberg constant  by  determining the energy of the photons emmited emmitted by a gas of hydrogen atoms. Experiments a actually measure the wavelength of the light  by means of a  diffraction grating. But from the wavelength, it is straightforward to determine the energy. We know from Topic 4 that the energy of a photon of light of a specific frequency is  $E= hf$ where $f$ is the frequency  and $h$ is the Planck constant. We also know that we can relate the frequency and wavelength of light:  $f=\frac{c}{\lambda}$. We can use this knowlege to rewrite the expression above using $\lambda$:  

$$E_{\text{photon}}= h f= \frac{hc}{\lambda}$$

If the measurements correspond to transitions to the *same* final state, $n_f$ is fixed. In this case, we can rewrite equation (6) as that of a straight line if we set $y=E_{\text{photon}}=$ and $x=\frac{1}{n_i^2}$:

$$ \underbrace{E_{\text{photon}}}_{y}=\underbrace{\frac{R}{n_f^2}}_{b} +\underbrace{(-R)}_{a} \underbrace{\frac{1}{n_i^2}}_{x} $$

The gradient of this line corresponds to $-R$ whereas intercept is:

intercept=$\frac{R}{n_f^2}$ 

and therefore $n_f=\sqrt{\frac{R}{\text{intercept}}}$

Therefore, fitting this straight line and determining its  gradient and intercept, we can determine $R$ and $n_f$:


### Python activity 3.1 Confirm the value of the Rydberg constant

*Allow approximately 4 hours*

The files <code>Lseries.csv</code>, and <code>Student-measurements.csv</code> contain data of hydrogen emission transitions. The transitions in each file correspond to a specific final state, so to a specific $n_f$. Each file lists  the principal quantum number $n_i$ of the initial state in the transition and the energy of the emitted photon. File <code>Lseries.csv</code> contains the results of very accurate measurements, whereas the file <code>Student-measurements.csv</code> contains results from measurements carried out with less sophisticated equipment.

Write a program that performs a linear fit to the data and determines the values of the Rydberg constant $R$  and the principal quantum number of the final state $n_f$. Determine the root mean square deviation to assess the quality of the fit. The program should produce a plot of the initial data and the fit line and print the value of $R$ and $n_f$. Use the program to determine $R$ and $n_f$ for the data in the two files provided.

You  should only use any built-in and user-defined functions you have come across in SM123. Some hints to help you put together the the program can  be found by clicking on the "Hints" button below. 

No model solution to this Activity is provided, as it forms part of an assessed task in TMA 04.

In order to make your progam available to your tutor for marking as part of TMA04 you will need to save it as a separate notebook. For this, open a new notebook, for example, by going to the Launcher or by clicking the plus sign at the top and then selecting Python 3. Then cut and paste the program from here into the new notebook.  Download this new notebook from the OCL. You will be asked to submit the notebook  together with TMA04. You will also be asked to include some output values as well as  graphs you have produced.  Remember you can either save the graph using the `pyplot` command `savefig`, you can copy and paste it by right-clicking on it in the notebook, or you can use the Snipping Tool in Windows.

*The source for the data in the files Lseries.csv is*: Kramida, A., Ralchenko, Yu., Reader, J., and NIST ASD Team (2024). NIST Atomic Spectra Database (ver. 5.12), [Online]. Available: https://physics.nist.gov/asd [2025, November 25]. National Institute of Standards and Technology, Gaithersburg, MD. DOI: https://doi.org/10.18434/T4W30F 

:::{hint} Hints
:class: dropdown

Here is some  guidance on how you might create your program:

1. Your program should do the following
   * Read the data in the input file and store it so that it can be used for fitting and plotting. Remember that you have to use the exact string at the top of the column to allocate the data in that column to a list.
   * Operate with the data, if required, to have the correct input for the fit function
   * Determine the gradient and intercept by fitting a straigh line to the data
   * Produce the required outputs
3. You do not need to write a program that will perform the fit to all the data files in one go. You can use the program to fit one set of data, then change it, if needed, to fit the other sets of data.
4. Remember, to fit a line to the date the $x$ values correspond to $\frac{1}{n_i^2}$
5. Notice that your gradient corresponds to $-R$ but you are asked for $R$
6. Notice that the principal quantum number if an integer not a real or floating point number so you may need to convert a float to a real.

:::

**In this notebook you have  written a program that confirming the value of the Rydberg constant by fitting spectroscopic data. You should now return to the VLE to review what you have learnt this week.**