# Goal of this notebook:

Use a spectrum with known peaks to calibrate the channels of a detector,
and then use the calibration to plot an unknown spectrum.

To calibrate a spectrum we calculate:
  - dispersion [keV / channel], which is the width of each channel
  - offset [channels], which is the distance from first data entry to keV=0

When we have calibrated a spectrum, we could also find:
  - resolution,
  - quantities of each element, 
  - ... ?


### What we will do:

1. Read in the data of a known spectrum
2. Fit the data to a gaussian
3. Calibrate the x-axis
4. Plot the calibrated data
5. Use the calibrated x-axis on an unknown spectrum from the same source



# We will be working with the data stored in a dictionary like this:

~~~python
 spectrum = {
    "name": name_of_the_spectrum,
    "filepath": filepath_to_the_spectrum,
    "channel": uncalibrated_channels,
    "intensity": intensity_of_the_spectrum_nomalized_to_one,
    "peaks_keV": theoretical_peaks_keV,
    "peaks_names": list_of_named_peaks,
    "peaks_channel": channel_values_of_the_peaks,
    "plot_title": title_to_add_to_the_plot,
    "dispersion": channel_width_in_keV,
    "offset": distance_from_first_channel_to_zero_keV,
    "kev_calibrated": calibrated_channels_in_keV,
    "fit_params": amplitude_mean_std_of_the_gaussian_fit,
    "fit_cov": covariance_matrix_of_the_gaussian_fit,
    "intensity_fit": intensity_of_the_gaussian_fit,
    "start_str": start_str,
    "stop_str": stop_str,
    "line_endings": line_endings,
    "delimiter": delimiter
}
~~~

In [1]:
# import all you need
import numpy as np
import plotly.graph_objects as go
from scipy.optimize import curve_fit

# functions imported from helper_files
from helper_files.plotting import plot_lines, plotly_plot
from helper_files.gaussian_fitting import gaussian, n_gaussians, fit_n_peaks_to_gaussian

# from helper_files.read_data import read_xy_data, read_only_y_data
from helper_files.error_calculation import rms_error
from helper_files.calibration import calibrate_channel_width_two_peaks
from helper_files.spectrum_dict import (
    init_known_spectrum,
    init_unknown_spectrum_with_known,
)

In [2]:
# this will load the helper modules each time you make changes to them, without having to restart the kernel
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Reading the data into arrays, plotting.py


When making the spectrum dictionary, we need the following variables:


- filepath
    - set the path to the file, or just use the example file
- name
    - just a name of the spectrum
- start_str
    - open the data file and find out where the data starts and stops.
        - in the .emsa file the data starts after "#SPECTRUM    : Spectral Data Starts Here",
- stop_str
    - the .emsa file ends the data with "#ENDOFDATA   : "
- line_endings
    - take note of the line endings,
        - in the .emsa file data the line endings are '\n'
- delimiter, but ONLY if the data file contains both x and y values, 
    - take note of what the data is separated by
        - in the example data it is separated by a comma and a space ", "
        - do not specify the delimiter if the data file only contains y values

You *can* also specify if you already know:

- peaks_keV
    - theoretical value of the two peaks to use in calibration
- peaks_names
    - name of the two peaks, optional
- peaks_channel
    - the peak positions in channels to do gaussian fitting
    - the two first peaks must correspond to the peaks in peaks_keV for the calibration

**If you do not specify the three values above, you must do it later down in the notebook.**


In [3]:
# reading the file

# Ex1, GaAs in .emsa file
s_GaAs_emsa = init_known_spectrum(
    name="SEM: GaAs 30kV from .emsa",
    filepath="Ex1_EDS_GaAs_30kV.emsa",
    start_str="#SPECTRUM    : Spectral Data Starts Here",
    stop_str="#ENDOFDATA   : ",
    line_endings="\n",
    delimiter=", ",
    peaks_keV=[1.098, 9.2517],
    peaks_names=["Ga_La", "Ga_Ka"],
    # peaks_channel=[130, 944, 22, 46, 117,  149, 195,  1045, 1072, 1191],
    peaks_channel=[130, 944],
    plot_title="GaAs 30kV from .emsa",
)

# eventually, the GaAs .emsa-file can also use the As_La and As_Ka to calibrate.
# then peaks_keV=[1.2819, 10.5436], peaks_names=["As_La", "As_Ka"], peaks_channel=[149, 1072, ...]

# # Ex2 NiO in a .msa file
s_NiO_msa = init_known_spectrum(
    name="NiO ?kV from .msa",
    filepath="Ex2_NiO_on_Mo_not_calibrated.msa",
    start_str="#SPECTRUM    : Spectral Data Starts Here",
    stop_str="#ENDOFDATA   : End Of Data and File",
    line_endings=", \n",
    peaks_keV=[0.8511, 7.4781],
    peaks_names=["Ni_La", "Ni_Ka"],
    peaks_channel=[105, 766],
    plot_title="SEM?: NiO ?kV from .msa",
)

s_GaSb_msa = init_known_spectrum(
    name="SEM GaSb from .msa",
    filepath="example_data/EDS-SEM_GaSb.msa",
    start_str="#SPECTRUM    : Spectral Data Starts Here",
    stop_str="#ENDOFDATA   : End Of Data and File",
    line_endings=", \n",
    plot_title="SEM GaSb from .msa",
)

s_Cu_mca = init_known_spectrum(
    name="Cu ?kV from .mca",
    filepath="Ex3_Cu.mca",
    start_str="<<DATA>>",
    stop_str="<<END>>",
    line_endings="\n",
    peaks_keV=[0.8515, 7.478],
    peaks_names=["Ni_La?", " Ga_Ka?"],
    peaks_channel=[105, 766],
    plot_title="XRD?: Cu ?kV from .mca",
)

Reading Ex1_EDS_GaAs_30kV.emsa
The first line looks like this: '#FORMAT      : EMSA/MAS Spectral Data File\n'
Reading from line 42 to 2091.
2048 data points, first entry = [-0.2, 0.0], last entry = [20.27, 52.0]

Reading Ex2_NiO_on_Mo_not_calibrated.msa
The first line looks like this: '#FORMAT      : EMSA/MAS Spectral Data File\n'
Reading from line 26 to 2075.
2048 data points, first entry = [0. 0.], last entry = [2047.   10.]

Reading example_data/EDS-SEM_GaSb.msa
The first line looks like this: '#FORMAT      : EMSA/MAS Spectral Data File\n'
Reading from line 15 to 1040.
1024 data points, first entry = [0. 0.], last entry = [1023.   13.]

Reading Ex3_Cu.mca
The first line looks like this: '<<PMCA SPECTRUM>>\n'
Reading from line 16 to 1041.
1024 data points, first entry = [0. 0.], last entry = [1023.    0.]



In [4]:
# s is the spectrum that are used in the following code blocks.
# s is a view, which means that the operations on s are done on the dict it is equal,
# since they are both pointing to the same memory.

# comment in the one you want to run

# s = s_GaAs_emsa
s = s_NiO_msa
# s = s_GaSb_msa


# s = s_GaAs_emsa.copy()  # only necessary if you want to work with a copy of the spectrum. Don't use this unless you know what you are doing.

In [5]:
fig_intitial = plotly_plot(
    x=s["channel"],
    y=s["intensity"],
    title=s["plot_title"],
    xaxis_title="channel",
    yaxis_title="intensity",
)
fig_intitial.show()

print(
    f"Check if the peaks in s['peaks_channel'] are in the right place in the plot here: {s['peaks_channel']}"
)
print(
    "The two first peaks are the ones we will use for calibration later, ie. the ones we know the energy of."
)
print(
    f"You have specified s['peaks_keV'] as: {s['peaks_keV']} keV, and are named {s['peaks_names']}"
)

Check if the peaks in s['peaks_channel'] are in the right place in the plot here: [105, 766]
The two first peaks are the ones we will use for calibration later, ie. the ones we know the energy of.
You have specified s['peaks_keV'] as: [0.8511, 7.4781] keV, and are named ['Ni_La', 'Ni_Ka']


### Fittin the peaks to gaussians

##### see the helper file gaussian_fitting.py

Now we find two or more peaks in the data, and we want to fit a gaussian to each of them. 
We need to know what the theoretical value of at least two the guessed peaks, so that we can calibrate the spectrum later.

In the Ex1 we can use Ga_Ka=9.2517 keV and Ga_La=1.098 keV, or we can use As_Ka=10.5336 keV and As_La=1.2819 keV. Or we can use both

eg peak_guesses = [9.2517, 10.5336]

Short overview of the used functions in gaussian_fitting.py
- def gaussian(x, amp, mu, std):
    - the function gaussian defines a gaussian function.

- def n_gaussians(x, *args):
    - since the gaussians could potentially partially overlap, we need to define a function that returns the sum of n gaussians.
        - Eg. like Ga_Kb=10.2642 and As=10.5436 in Ex1

- def fit_n_peaks_to_gaussian(x, raw_y, guessed_peaks, guessed_std=1, guessed_amp=1,):
    - now we need a functions which fits peak guesses to gaussian curves.
    - we will use the scipy.optimize.curve_fit function for this. More info in the function.
    - with normalized counts it usually works nice with guessing all std and amplitues as 1

Additional info:
I tried fitting to the raw keV-valus of the .emsa file, but that did not work. 
I think it is because the x values are not integers, and for some reason that hindered the fitting.
I do suspect that it migh be because the std and amp guesses are way off, but I am not sure.

This is the code that did not work:
~~~
peak_guesses = [0.02, 0.27, 0.97, 1.1, 1.29, 1.75, 9.2517, 10.24, 10.5336, 11.75] 
fit_vals = fit_n_peaks_to_gaussian(data[0], data[1], peak_guesses)
~~~

**Be aware: sometimes the fitting sets a peak as the background. Always inspect the plot**


In [6]:
# Fitting the data to gaussians

#
#
# NB! if this cell crash with something like:
# "RuntimeError: Optimal parameters not found: Number of calls to function has reached maxfev = 5000."
# then adjust your peak guesses, or try to fit fewer/more peaks at once.
#
#

# # Eg. for GaAs if we want to fit all the peaks:
# s['peaks_channel'] = [130, 944, 22, 46, 117, 149, 195, 1045, 1072, 1191]


# change these if necessary
# s["peaks_channel"] = [130, 1070, 1044]
# s["peaks_names"] = ["Ga_La", "As_Ka"]
# s["peaks_keV"] = [1.098, 10.543]


if s["peaks_channel"] is None:
    print(
        "ERROR: no peaks specified, please specify the peaks in the spectrum. No fitting was done."
    )
    print(
        "Comment out the line above to set s['peaks_channel'] =[peak1, peak2, ...] manually."
    )
else:
    fit_vals = fit_n_peaks_to_gaussian(
        x=s["channel"],
        y=s["intensity"],
        guessed_peaks=s["peaks_channel"],
    )
    s["fit_params"] = fit_vals[0]
    s["fit_cov"] = fit_vals[1]

    # update the channel values of the peaks
    s["peaks_channel"] = s["fit_params"][1::3]

    # addin the fitted gaussian to the spectrum-dictionary
    s["intensity_fit"] = n_gaussians(s["channel"], *fit_vals[0])


print(f'Fitted peaks at: {s["peaks_channel"]}')

Fitted peaks at: [105.53279574 766.52179357]


In [7]:
# plot using the following function, see helper_files/plotting.py

fig_fit = plotly_plot(
    s["channel"],
    s["intensity"],
    y_fit=s["intensity_fit"],
    vlines=s["peaks_channel"],
    vlines_name=s["peaks_names"],
    title=f"Gaussian fitting of {s['plot_title']}",
)

# # comment in the line below to add the fit of each peak as its own gaussian line
# fig_fit = plotly_plot(x=s['channel'], fig=fig_fit, fit_params=s['fit_params'])

fig_fit.show()

print("Check if the fit of the specified peaks are good in the plot here")

Check if the fit of the specified peaks are good in the plot here


In [8]:
# now we use the fitted peak center and the theoretical value

# it is important that the two first peaks in s['peaks_channel'] are corresponding to the two first peaks in s['peaks_keV']

# # GaAs
# s['peaks_keV'] = [1.098, 9.2517]

if s["peaks_keV"] is None:
    print(
        "ERROR: no peaks specified, please specify the peaks in the spectrum. No calibration was done."
    )
    print("Comment out the line above to set s['peaks_keV'] =[peak1, peak2] manually.")
else:
    calib = calibrate_channel_width_two_peaks(s["peaks_channel"], s["peaks_keV"])
    s["dispersion"] = calib[0]
    s["offset"] = calib[1]
    s["kev_calibrated"] = (s["channel"] - s["offset"]) * s["dispersion"]

The calibration factor is: 0.0100259 keV/channel, with 20.643 channels zero offset


In [9]:
# plotting the calibrated data

fig_calib = plotly_plot(
    x=s["kev_calibrated"],
    y_named=[s["intensity"], "x calibrated with Ga_Ka and Ga_La"],
    vlines=s["peaks_keV"],
    vlines_name=s["peaks_names"],
    title=f"Calibrated {s['plot_title']}",
)

fig_calib.show()

# Unknown spectrum

## Using dispersion and offset from a calibrated spectrum to calibrate a unknown sample

now we use the fucntion init_unknown_spectrum_with_known(...)

In [10]:
# We copy dispersion, offset, start_str, stop_str, line_endings, evt delimiter from s-dictionary to the new dictionary

s_unknown_msa = init_unknown_spectrum_with_known(
    known_spectrum=s,
    name="SEM?: Unknown .msa ?kV from .mca",
    filepath="example_data/unknown_not_calibrated.msa",
    plot_title="SEM?: Unknown .msa ?kV from .mca",
)

# TODO:

Calibrating SEM?: Unknown .msa ?kV from .mca with NiO ?kV from .msa using:
	Dispersion = 0.010025885486372777
	Offset = 20.64253829529784
	And the calibrated keV x-axis from NiO ?kV from .msa
Reading example_data/unknown_not_calibrated.msa
The first line looks like this: '#FORMAT      : EMSA/MAS Spectral Data File\n'
Reading from line 15 to 2064.
2048 data points, first entry = [0. 0.], last entry = [2047.    5.]


Success! example_data/unknown_not_calibrated.msa was read into a dictionary



In [11]:
# Now we plot it

fig_unknown_plot = plotly_plot(
    x=s_unknown_msa["kev_calibrated"],
    y_named=[s_unknown_msa["intensity"], "x calibrated with Ga_Ka and Ga_La"],
    vlines=s_unknown_msa["peaks_keV"],
    vlines_name=s_unknown_msa["peaks_names"],
    title=f"Calibrated {s_unknown_msa['plot_title']}",
    xaxis_title="keV calibrated",
)
fig_unknown_plot

### NB! The code below is not generally adapted, and works now on the file "unknown_not_calibrated.msa", calibrated from NiO-msa

In [14]:
# We could also do fitting on the new plot by copying the steps from earlier

#
# # NB! This is not generally adapted, and works now on the file "unknown_not_calibrated.msa"
#

# we want to fit on the channels
fig_unknown_plot_channels = plotly_plot(
    x=s_unknown_msa["channel"],
    y_named=[s_unknown_msa["intensity"], "x calibrated with Ga_Ka and Ga_La"],
    title=f"Plotted on channels: Calibrated {s_unknown_msa['plot_title']}",
)
fig_unknown_plot_channels.show()


# noted down the peaks in channel numbers
s_unknown_msa["peaks_channel"] = [
    24,
    130,
    340,
    380,
    405,
    430,
    455,
    480,
    824,
    908,
    942,
    1047,
]


# do the fitting as we did earlier
fit_vals = fit_n_peaks_to_gaussian(
    x=s_unknown_msa["channel"],
    y=s_unknown_msa["intensity"],
    guessed_peaks=s_unknown_msa["peaks_channel"],
)
s_unknown_msa["fit_params"] = fit_vals[0]
s_unknown_msa["fit_cov"] = fit_vals[1]
s_unknown_msa["peaks_channel"] = s_unknown_msa["fit_params"][1::3]
s_unknown_msa["intensity_fit"] = n_gaussians(s_unknown_msa["channel"], *fit_vals[0])

print(f'Fitted peaks at: {s_unknown_msa["peaks_channel"]}')

# now we scale down the fittet peaks with the dispersion and the offset
s_unknown_msa["peaks_keV_fitted"] = (
    s_unknown_msa["peaks_channel"] - s_unknown_msa["offset"]
) * s_unknown_msa["dispersion"]

print(f'Fitted peaks in keV: {s_unknown_msa["peaks_keV_fitted"]}')

#
# # NB! Because of the background, there will be some error on the fitting for some of the peaks!
#


# plotting the new spectrum with its fit and its lines in keV
fig_unknown_plot_fit = plotly_plot(
    x=s_unknown_msa["kev_calibrated"],
    y=s_unknown_msa["intensity"],
    y_fit=s_unknown_msa["intensity_fit"],
    vlines=s_unknown_msa["peaks_keV_fitted"],
    title=f"Calibrated {s_unknown_msa['plot_title']} on keV with fitted lines",
    xaxis_title="Calibrated keV",
)
fig_unknown_plot_fit.show()

Fitted peaks at: [  23.59831917  130.69927327  742.00275364  380.67449376  405.97241647
  429.43539045  454.63340443  743.40719337  823.37785178  909.55729498
  943.14252684 1045.07862325]
Fitted peaks in keV: [ 0.02963432  1.10341622  7.23227491  3.60963916  3.86327323  4.09851032
  4.35114273  7.24635567  8.04813233  8.91215756  9.24887925 10.27087888]


In [13]:
# TODO: area under the peak

# Step 0: Figure out how it should be calulated

# Steps:
# 1. select the peak(s) you want the area of
# 2. use the amp, mu and std to calculate the area under the peak
# 3. compare the two integrated(?) sums