## Artificial spectra, real problem

As a way of evaluating the feasibility of using AI models to detect peaks in gamma spectra, I developed an algorithm to simulate, in a simplified way, spectra acquired by a spectrometry system within several different scenarios. The advantage of the simulated spectra is the possibility of studying specific cases that would be impossible without the generation of a large amount of radioactive samples.

In [4]:
import sys
sys.path.append('../')
from helpers import spectrum_generator

In [None]:
X,y,dummy = spectrum_generator.simulated_spectra(1000, bg_range=(40,81), snr_db_range=(1,4), include_dummy=True)
pickle.dump((X,y,dummy), gzip.open('../data/artificial.pickle', 'wb'))

The counts of a simulated peak are distributed in a Gaussian manner around the chosen channel for the peak centroid. Although the peaks of gamma spectra have slight flow in the low energy part, the Gaussian function is a reasonable approximation of the real effect [@ helmer_analytical_1980]. 

The spectrum continuum appears due to the backscattering effect of higher energy gamma emissions and its intensity is proportional to the rate of gamma emissions of different energies observed by the detector. Thus, samples containing different concentrations of radionuclides and natural interferents that vary according to seasonality, will show variation in the height of the spectrum continuum.

In this scenario, an attempt was made to approximate the simulated spectra to real situations and with a greater number of obstacles to the models. The algorithm generated a continuous background of variable height from spectrum to spectrum. The average background count per channel varied uniformly across the spectrum (between 40 and 80 counts). 1000 spectra were generated so that, randomly, 50% had peaks with SNR varying between 1 and 3 dB. Rewriting @eq: snrlog to estimate the peak area as a function of SNR and $ B $, we have

$$n = 10^{0.1 \times \text{SNR}} + \frac{\sqrt{8 \times B \times 10^{0.2 \times \text{SNR}}} + 10^{0.4 \times \text{SNR}}}{2}.$$

Spurious peaks were added to 5% of the spectra, simulating the presence of unexpected radionuclides in the samples. These had random values ​​between 100 and 200 counts, although they did not overlap the peak of classification interest. An example is shown in @fig: artificialex.

! [Example of artificially generated spectrum {Source: own authorship} {Legend: The dashed blue line represents the main peak, which should be classified by the models. The dashed orange line represents spurious spikes that appeared intentionally in a small portion of the generated spectra. The artificially generated spectra have a background continuum with sampled counts of a Poisson distribution. The average of this distribution varied uniformly between each spectrum, as well as the intensity of the peaks in the spectra that contained them.}] (Figures / artificialex.pdf) {# fig: artificialex}