# CMSE 491 Final Project


###  Kyle Taft
#### December 4, 2023

## INSTRUCTIONS FOR RUNNING THE NOTEBOOK

* This notebook is used to train models that extract nuclear decay energy peaks from 2D total absorption spectra (TAS) data. In order to produce enough data for machine learning, the data is simulated using the open source physics toolkit [Geant4](https://geant4.web.cern.ch/). Unfortunately, given the nature of the data being tens of thousands of matrices of size 512x512, the data is far too large to be attached to this notebook or uploaded to D2L in full. However, a small sample set of 15 matrices (5 single cascades, 5 double cascades, 5 two independent single cascades) and their corresponding labels are provided in the file `sample_data.npz` in the same folder as this notebook. The data can be loaded using the following code:

```python
import numpy as np
data = np.load("sample_data.npz")
```

* The data is stored in a dictionary, with the keys being the names of the matrices. The matrices are stored as numpy arrays, and can be accessed using the following code:

```python
matrix = data["matrix_name"]
```



# **Utilizing Machine Learning to Perform Total Absorption Spectroscopy**

## Background and Motivation
In experimental physics, an incredibly powerful method in bettering our understanding of the underlying physics of an experiment is to collect data on the spectrum of light produced in said experiment. The analysis of this data, known as spectroscopy, is vital to make conclusions on the unknown physics that underpin the experiment. Specifically in my research area of nuclear physics we use spectroscopy to analyze the structure of exotic unstable isotopes. This process occurs by an unstable isotope produced by an accelerator being implanted into our detector, this isotope decays into a new isotope, this new isotope can be fed into an excited state so it releases one or more quantized characteristic $\gamma$-rays to reach the ground state. We collect this data in two ways in order to reconstruct the manner in which this isotope decays. First, we produce a spectra in the straightforward way of simply recording what energy of $\gamma$-rays we see in our detectors. Second, we produce a spectra based on the total energy the isotope had when it was implanted into the detector by summing together all the recorded $\gamma$-rays in a period of time. This allows us to utilize a specialized technique known as Total Absorption Spectroscopy (TAS) to determine the energy levels of the isotope.    

Spectroscopy, the study of the interaction of light with matter, stands as a fundamental method for deciphering the underlying physics of various experiments. In the specific domain of nuclear physics, our research explores exotic, unstable isotopes using spectroscopy, where these isotopes, created by an accelerator, undergo decays not fully explained by theoretical models. They transition into new isotopes, often in excited states, releasing quantized characteristic $\gamma$-rays in order to reach the ground state. This data is collected in two forms by our detector: individual $\gamma$-ray energies and the total summed $\gamma$-ray energies, the latter enabling the application of Total Absorption Spectroscopy (TAS) to reveal energy levels. The overarching project goal is to extract precise information about the energies, relative intensities, and the specific $\gamma$-rays involved in these transitions. However, the task is challenging due to unwanted data points caused by $\gamma$-ray scattering, particle interference, and background noise. To address this complexity, machine learning emerges as a solution, learning patterns from the data to uncover the underlying physics. This is achieved by transforming the problem into a supervised machine learning task, using simulated spectra as inputs and their corresponding labels, encompassing energy levels, individual $\gamma$-rays, and intensities. By harnessing machine learning, we aim to bridge the gap between experimental observations and a deeper understanding of nuclear physics, facilitating the analysis of real data and the extraction of invaluable insights.

## Goal
_(Clearly state the question(s) you set out to answer.)_

## Exploratory Data Analysis
_(Describe your data and make meaningful plots here)_

## Methodology

_(How did you go about answering your question(s)? Most of your code will be contained in this section. Here is where you can subdivide with Hyperparameter tuning, cross-validation, feature engineering, baseline.)_


### ML Model.
Make a section for each ML model you used. example

### Linear Regresssion

### Support Vector Machines

### Convolutional Neural Network


## Results

_(How do your models compare? What did you find when you carried out your methods? Some of your code related to
presenting results/figures/data may be replicated from the methods section or may only be present in
this section. All of the plots that you plan on using for your presentation should be present in this
section)_

## Discussion and Conclusion

_(What did you learn from your results? What obstacles did you run into? What would you do differently next time? Clearly provide quantitative answers to your question(s)?  At least one of your questions should be answered with numbers.  That is, it is not sufficient to answer "yes" or "no", but rather to say something quantitative such as variable 1 increased roughly 10% for every 1 year increase in variable 2.)_

### References

_(List the source(s) for any data and/or literature cited in your project.  Ideally, this should be formatted using a formal citation format (MLA or APA or other, your choice!).   Multiple free online citation generators are available such as <a href="http://www.easybib.com/style">http://www.easybib.com/style</a>. **Important:** if you use **any** code that you find on the internet for your project you **must** cite it or you risk losing most/all of the points for you project.)_