This is a project that is working towards building a quantized and more efficient version of the Particle Transformer (ParT) machine learning model described in this paper and its associated repo.
- Abijit Jayachandran ajayachandran@ucsd.edu
- Andrew Masek amasek@ucsd.edu
- Juan Yin j9yin@ucsd.edu
Running large machine learning models on the edge at the Large Hadron Collider (LHC) is a challenging task because of the limited compute available and the time constraints. Existing models are often too large and therefore take a lot of memory and time to process the data. One method to get around this problem is to quantize existing models and use FPGAs (as opposed to general-purpose GPUs) for faster and more specialized processing. Our project aims to quantize an existing Particle Transformer model from PyTorch to QKeras. This new quantized model can then be implemented on an FPGA using the DeepSoCFlow library. We hope to maintain similar accuracy levels but achieve faster inference time.
All documentation is contained in the cse145-particle-transformer/Documentation
directory. The directory contents are:
presentations/
: all presentations of the projectreports/
: all reports of the projectCITATION.cff
: citation file for ParTrequirements.txt
: a list containing different libraries necessary for the project.
All test files are contained in the cse145-particle-transformer/test
directory, which records our previous attempts at loading data from the dataset.
Our keras implementation of nanoGPT is in the cse145-particle-transformer/nanogpt
directory.
Our Keras implementation of the ParT is contained in the cse145-particle-transformer/final_particle_transformer/
directory. You can use the train.py
file to train this model. Please note, most of the other files are from the original ParT repo. As such you can also run the original ParT here by following the instructions in their repo.
Quantized Particle Transformer needs to run in particular environments. To do this we have included a requirements.txt file that contains the required environment for running the Keras ParT. You can download these dependencies using:
pip install -r ./Documentation/requirements.txt
After making sure the environment is correct, you download the JetClass dataset using the command below. Please note that this is a very large dataset that will download well over 100GB of files. If you would like to download a subset of the data, please modify get_datasets.py
or download directly from this website.
./get_datasets.py JetClass -d [DATA_DIR]
Make sure you have CUDA installed in your computer. For our project, we need cuda version 10.1 since that is compatible with our TensorFlow version (2.10.0)
Our model is implemented in Tensorflow instead of Pytorch, so the training is not based on the weaver framework for dataset loading and transformation. After you get CUDA installed, make sure you've downloaded the entire dataset and changed the training and validation data paths in the final_particle_transformer/train.py
file:
train.py
datapath:
# line 35
training_data_root = '/home/particle/particle_transformer/retrain_test_1/JetClass/Pythia/train_100M/'
to:
# line 35
training_data_root = '[USER_DIR]/[DATA_DIR]/JetClass/Pythia/train_100M/'
Finally, make sure you are in the cse145-particle-transformer/final_particle_transformer/
directory and run the following command.
python3 train.py
*image from this website
-
The original Parallel Transformer operates with the default precision, which is 32-bit floating point numbers (FP32). The high precision results in more accurate computations, but it requires more memory and computational resources.
-
It is suitable for training and used to generate high accuracy where there are sufficient resources to handle the computational load. However, it is more memory-intensive and slower due to the high precision of calculations.
-
The quantized ParT operates with reduced precision. Lower precision leads to faster computations and reduced memory usage but with a potential trade-off in accuracy.
-
It is faster for training and ideal for inference scenarios where speed and resource efficiency are prioritized, such as deployment on edge devices or in real-time applications (such as at the LHC). It uses less memory and faster computations due to lower precision. And the size of the model is smaller because weights and activations are stored with reduced precision.
- Precision and Accuracy: Original ParT uses high precision (FP32), leading to higher accuracy, whereas quantized ParT uses lower precision (INT8 or smaller).
- Performance and Efficiency: Quantized ParT offers better performance and efficiency in speed and memory usage.
- Use Cases: Original ParT is used where accuracy is important, while quantized ParT is used for efficient inference, especially in production conditions. And in order to make it compatible to be implemented in FPGA, quantization is needed.
- Model Size: The quantized ParT model is smaller and requires fewer computational resources, making it faster to run, whereas the original ParT model is larger and takes more time to do the computation.
We have a large particle jets dataset used for training and testing by downloading from website, it is a large dataset with more than 100GB of data.
JetClass is a new large-scale jet tagging dataset proposed in "Particle Transformer for Jet Tagging". It consists of 100M jets for training, 5M for validation and 20M for testing. The dataset contains 10 classes of jets, simulated with MadGraph + Pythia + Delphes:
In this section, we download the large particle jets dataset to train the original ParT model in PyTorch.
In this section, we try to get familiar with the transformer model, so we reimplement a transformer model called nanoGPT.
First of all, we followed instructions to rewrite the transformer architecture in order to get a sense of how to create a transformer and get familiar with all the components like multi-head attention blocks and embedded layers with different activation functions.
After simulating and training the NanoGPT model, we translated the nanoGPT from Pytorch to Tensorflow to verify that the translation won't affect the performance of the model.
In this section we begin to do the translation on the particle transformer model from weaver-core repo. We implementing the data loadar, embedded layers, particle multi-head attention blocks, and class attention blocks. The attention portions of the model are similar to the NanoGPT that we completed in section 2, and adding the particle transformer features. According to the Particle Transformer paper the feature matrix U
is a pair matrix that has three dimensions (N, N, C')
, and instead of feeding one embedded layer to each particle multi-head attention blocks, we feed two embedded layers: particle layer and interactive U layer.
We first make sure the Keras works on the model, then continue translate them to QKeras and try to maintain the accuracy of training.
Finally we got our model running: [Add images when finished training] [images]