GitHub - EliaFantini/Higgs-Boson-Classifier-using-LHC-CERN-data: An AICrowd Challenge: Logistic Regression classifier that predicts whether an event's decay signature was the one of a Higgs Boson

This project aims at classifying the decay signature of events measured by the Large Hadron Collider at CERN, predicting whether it's the one of a Higgs Boson or not, thanks to Logistic Regression.

The problem was part of a Machine Learning challenge from AICrowd. Our team, called pasta-balalaika, reached the position 50/307 on the leaderboard, with an F1 score of 0.74 and an accuracy of 0.82. This project was also done as an assignment of the EPFL course CS-433 Machine Learning.

Higgs Boson

The Higgs boson is an elementary particle in the Standard Model of physics which explains why other particles have mass. Its discovery at the Large Hadron Collider at CERN was announced in March 2013.

In this project, we applied machine learning techniques to actual CERN particle accelerator data to recreate the process of “discovering” the Higgs particle. Physicists at CERN smash protons into one another at high speeds to generate even smaller particles as by-products of the collisions. Rarely, these collisions can produce a Higgs boson. Since the Higgs boson decays rapidly into other particles, scientists don’t observe it directly, but rather measure its “decay signature”, or the products that result from its decay process.

Since many decay signatures look similar, we estimated the likelihood that a given event’s signature was the result of a Higgs boson (signal) or some other process/particle (background). To do this, we implemented a pre-processing pipeline and different binary classification techniques and compared their performance with hyperparameters tuning and cross validation.

Authors

How to install and reproduce results

Download this repository as a zip file and extract it into a folder The easiest way to run the code is to install Anaconda 3 distribution (available for Windows, macOS and Linux). To do so, follow the guidelines from the official website (select python of version 3): https://www.anaconda.com/download/

Additional package versions are specified in the requirements.txt file , you can just run the following command on Anaconda Prompt (anaconda3):

cd *THE_FOLDER_PATH_WHERE_YOU_DOWNLOADED_AND_EXTRACTED_THIS_REPOSITORY*
conda install --file requirements.txt

Download the training and testing datasets here (logging into AICrowd might be required to download)

Then, just run run.py with the following command to train and test the model:

python run.py

Files description

experiments/experiments_models.ipynb : this Jupyter notebook contains our cross validation and hyperparameter experiments with different models
experiments/experiments_preprocesing.ipynb: this Jupyter notebook contains our experiments with different preprocessing techniques
experiments/generate_graphs.ipynb: notebook that generates the graphs for the paper
helper.py: contains helper functions which were used for setting up our experiments
implementations.py: contains 6 default required funcitons + additional minimization algorithms, and accoring loss funcitons
metrics.py: contains our implementations of different metrics
preprocessing.py: contains methods for the preprocessing of data
report.pdf: pdf with the report of the project
run.py: contains the code for reproducing our best submission file
utils.py: miscellaneous other functions, e.g. loading data, splitting it, etc..
requirements.txt: file which includes package requirements for running the code

Others

For further details on the implementation choice and the experiments, please read the report.pdf file.

🛠 Skills

Python, PyTorch, Matplotlib, Jupyter Notebooks. Machine learning, Logistic Regression, analysis of the impact of different preprocessing techniques on training, shallow modelling, plotting the experiments, ensuring reproducibility.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Higgs Boson

Authors

How to install and reproduce results

Files description

Others

🛠 Skills

🔗 Links

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
experiments		experiments
README.md		README.md
helper.py		helper.py
implementations.py		implementations.py
metrics.py		metrics.py
preprocessing.py		preprocessing.py
report.pdf		report.pdf
requirements.txt		requirements.txt
run.py		run.py
utils.py		utils.py

EliaFantini/Higgs-Boson-Classifier-using-LHC-CERN-data

Folders and files

Latest commit

History

Repository files navigation

Higgs Boson

Authors

How to install and reproduce results

Files description

Others

🛠 Skills

🔗 Links

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages