# Higgs Boson Discovery

2023-03-14. Because of this date, I include the following sub-header.

## rediscovery (?), re-analysis (?), one more time (?)

A nice visualization to take me back to grad school - where I worked at [BNL](https://www.bnl.gov) / [RHIC](https://www.bnl.gov/rhic/) / [PHENIX](https://www.phenix.bnl.gov/) ([archived](https://web.archive.org/web/20230314100142/https://www.bnl.gov/world/) / [archived](https://web.archive.org/web/20230312202521/https://www.bnl.gov/rhic/) / [archived](https://web.archive.org/web/20230314191656/https://www.phenix.bnl.gov/)) which basically means I worked at a government particle accelerator. I wasn't at [CERN](https://home.cern/) / [LHC]() / [CMS](https://home.cern/science/experiments/cms) ([archived](https://web.archive.org/web/20230314192144/https://home.cern/) / [archived]() / [archived](https://web.archive.org/web/20230314192110/https://home.cern/science/experiments/cms)), which is one of the experiments contributing to the discovery of the Higgs boson and the provider of the data I'll be using. I did participate in _some_ collaboration with LHC groups. I was even invited to the [4<super>th</super> of July Discovery Announcement](https://en.wikipedia.org/wiki/Higgs_boson#Search_before_4_July_2012) ([archived](https://web.archive.org/web/20230314192829/https://en.wikipedia.org/wiki/Higgs_boson)), but I got my keys locked in my car while hanging out with friends halfway to Columbia University. (I think reality makes for a better story - we got in my car using tree branches, but it was long past the midnight announcement time.)

The results (when I accomplish my goal) should look something like what follows. The image comes from https://arxiv.org/format/1207.7235, the discovery paper for the Higgs.

<br/>
<div>
  <img src="./publication_4lepton_spectrum.png"
       alt="Publication histogram - our goal for the analysis is to be similar to this"
       width="400px">
</div>
<br/>

## Setup

Using `conda` for environment setup. This time, because of computer availability, I did it on the Windows CMD prompt (`Anaconda Prompt (miniconda3)`, specifically). `Jupyter` makes for simpler visualizations, so here I am.

### When are we? (good for experimental notebooks)

```
>powershell -c (Get-Date -UFormat "%s_%Y%m%dT%H%M%S%Z00") -replace '[.][0-9]{5}_', '_'
1678797612_20230314T124012-0600

>::  Or, in case you don't have access to PowerShell,
>::+ you can extract that from the following mess.
>::
>(set LOCALE_INFO=) && (echo( > nul) && (for /f "usebackq tokens=*" %k in (`systeminfo ^| findstr ";" ^| cmd /q /v:on /c "set/p .=&echo(!.!"`) do @if not defined LOCALE_INFO set "LOCALE_INFO=%k") && (echo() && (for /f "tokens=*" %i in ('tzutil /g') do @echo "%date%" "%time%" "%i" "%LOCALE_INFO%") && (echo() && (w32tm /tz) && (echo()

"Tue 03/14/2023" "12:39:37.15" "Mountain Standard Time" "System Locale:             en-us;English (United States)"

Time zone: Current:TIME_ZONE_ID_DAYLIGHT Bias: 420min (UTC=LocalTime+Bias)
  [Standard Name:"Mountain Standard Time" Bias:0min Date:(M:11 D:1 DoW:0)]
  [Daylight Name:"Mountain Daylight Time" Bias:-60min Date:(M:3 D:2 DoW:0)]


>::  Gosh, that makes me miss bash
```
### Environment stuff, `pip` installs, and on to Jupyter 

```
>conda create --name higgs_boson python=3.10
>conda activate higgs_boson
(higgs_boson)>
(higgs_boson)>python -m pip install --upgrade pip
(higgs_boson)>pip install jupyter
(higgs_boson)>pip install numpy
(higgs_boson)>pip install pandas
(higgs_boson)>pip install matplotlib
## Adding this 2023-03-16
(higgs_boson)>pip install requests
```

Now, we can do our imports and image dispay setup.

### Imports and other setup

In [None]:
import numpy  as np
import pandas as pd
from matplotlib import pyplot as plt

%matplotlib inline

## Adding these 2023-03-16
import requests
import shutil
from pathlib import Path

## Some Info (Physics)

Let's talk a little about the physics of the data we'll be looking at. You may know that Uranium can decay into lighter elements, as long as the combined mass-energy of the daughters (lighter elements' atoms, energy) is the same as the combined mass-energy - usually just the original mass - of the original Uranium.

The same happens for particles. The Higgs boson can decay into lighter particles and energy as long as its daughter decay particles and the energy is equal to the original mass-energy of the Higgs. One way that physics allows this to happen is for the Higgs to decay into four $\ell$eptons. (I use the script '$\ell$' to avoid confusion between '1' and 'l'.)

Here is a way physicists represent such a decay, which has the problems noted on the image.

<br/>
<div>
  <img src="./four_leptons_description_01.png"
       alt="Born (basically Feynman) diagrams for the two possible Higgs to four electrons scenario."
       width=100%>
</div>
<br/>

(In case you're curious, I'll tell you that the the $Z^{(*)}$ lines are for off-shell (virtual) Z-bosons. You can look up details if you'd like. Actually, to be technically correct, the Higgs boson should be off-shell (virtual), too, and we should see it labeled $H^{(*)}$. Just for fun, I'll put in this image from the [Wikipedia article on the Higgs boson](https://en.wikipedia.org/wiki/Higgs_boson#Discovery_of_candidate_boson_at_CERN) ([archived](https://web.archive.org/web/20230314192829/https://en.wikipedia.org/wiki/Higgs_boson)). It should show up even if changes are made to the wiki article and or the image, because I'm using an [archived image](https://web.archive.org/web/20230314233304/https://en.wikipedia.org/wiki/File:4-lepton_Higgs_decay.svg). This diagram doesn't show whether something is off-shell/virtual with the $^{(*)}$, because one learns that anything in the middle of such a diagram is virtual/off-shell.

<br/>
<div>
  <img src="https://web.archive.org/web/20230314233304/https://upload.wikimedia.org/wikipedia/commons/thumb/b/b2/4-lepton_Higgs_decay.svg/320px-4-lepton_Higgs_decay.svg.png"
       alt="Both sides of the diagram for Higgs to four leptons"
       width="480px">
</div>
<br/>

That represents to protons coming in and four leptons coming out.

### Three decay possibilites

In the first image, I said that we could simplify things while including more leptons than the electron. Let's use these symbols,

<br/>
<div>
  <img src="./four_leptons_description_02.png"
       alt="Symbols for electron, positron (positive version of the electron), muon, and antimuon (positive version of the muon)"
       width=300px>
</div>
<br/>

and give the three possibilites. We'll see these possibilities in the names of the `CSV` files.

2023-03-15

**Possibility 1: ... $H^{(*)} \rightarrow \{ 2 Z^{(*)} \} \rightarrow \{ 4e \}$**<br/>
**Higgs to 4 electrons (via two Z bosons and after previous interactions)**

<br/>
<div>
  <img src="./decay_four_leptons_four_electrons.png"
       alt="Off-mass-shell (virtual) Higgs to two off-mass-shell (virtual) Z bosons to four electrons"
       width=450px>
</div>
<br/>

**Possibility 2: ... $H^{(*)} \rightarrow \{ 2 Z^{(*)} \} \rightarrow \{ 4\mu$ \}**<br/>
**Higgs to four muons (via two Z bosons and after previous interactions)**

<br/>
<div>
  <img src="./decay_four_leptons_four_muons.png"
       alt="Off-mass-shell (virtual) Higgs to two off-mass-shell (virtual) Z bosons to four muons"
       width=450px>
</div>
<br/>

**Possibility 3: ... $H^{(*)} \rightarrow \{ 2 Z^{(*)} \} \rightarrow \{ 2e, 2\mu$ \}**<br/>
**Higgs to 2 electrons and 2 muons (via two Z bosons and after previous interactions)**

<br/>
<div>
  <img src="./decay_four_leptons_two_electrons_two_muons.png"
       alt="Off-mass-shell (virtual) Higgs to two off-mass-shell (virtual) Z bosons to two electrons and two muons"
       width=450px>
</div>
<br/>

### Some code that didn't work ...

... is now in a [note near the end](#Pandas-File-Loading-Problems).

2023-03-16

Basically, using `pandas.load(<internet-url>)` kept giving me a `HTTPError: HTTP Error 504: Gateway Time-out` error. I figured it will be better to use `requests` and then `pandas`.

A quick note, I added a `pip` install of requests to the stuff at the beginnnings. I don't need to do the same for `shutil` or `pathlib`, because each of those are part of the standard.


## Starting the direction of work using `requests`, then `pandas`

### URLs

In [None]:
data_csv_urls = []

# 4 electrons as daughter particles (two years' data)
url_4e_2011 = 'http://opendata.cern.ch/record/5200/files/4e_2011.csv'
data_csv_urls.append(url_4e_2011)
url_4e_2012 = 'http://opendata.cern.ch/record/5200/files/4e_2012.csv'
data_csv_urls.append(url_4e_2012)

# 4 muons as daughter particles (two years' data)
url_4mu_2011 = 'http://opendata.cern.ch/record/5200/files/4mu_2011.csv'
data_csv_urls.append(url_4mu_2011)
url_4mu_2012 = 'http://opendata.cern.ch/record/5200/files/4mu_2012.csv'
data_csv_urls.append(url_4mu_2012)

# 2 electrons and 2 muons as daughter particles (two years' data)
url_2e2mu_2011 = 'http://opendata.cern.ch/record/5200/files/2e2mu_2011.csv'
data_csv_urls.append(url_2e2mu_2011)
url_2e2mu_2012 = 'http://opendata.cern.ch/record/5200/files/2e2mu_2012.csv'
data_csv_urls.append(url_2e2mu_2012)

### Download using `requests` and `shutil`, reading-in by `pandas`

#### Looking at one file

In [None]:
#wait# for this_url in data_csv_urls:
#wait#  local_filename = this_url.split('/')[-1] #  not general solution 
#wait#                                           #+ (that would require HTML
#wait#                                           #+ headers), but okay for
#wait#                                           #+ our known urls.

##  Wait, before I loop through all of them,I'll just 
##+ load in the first one, then we can look at the physics

##########################################
##  A peek at the data for 4e from 2011 ##
##+ Side scrolling will be necessary    ##
##########################################

my_url = 'http://opendata.cern.ch/record/5200/files/4e_2011.csv'
local_filename = my_url.split('/')[-1]

with requests.get(my_url, stream=True) as r:
    r.raise_for_status()
    with open(local_filename, 'wb') as f:
        shutil.copyfileobj(r.raw, f)
    ##endof:  with open ... as f
##endof:  with requests ... as r

our_file = Path(local_filename)
we_have_our_file = ( our_file.is_file() )

assert(we_have_our_file)

No AssertionError! We have our file!

I want to see if the file size was causing the timeout, or if it was (the most likely explanation) something with my local gateway.

For a discussion of the fastest and most-memory-friendly way to count lines in a file (especially big files), see [this note](#Line-Count-Discussion) near the end of the notebook.

##### Line-count functions

In [None]:
def _make_count_generator(reader):
    b = reader(1024 * 1024)
    while b:
        yield b
        b = reader(1024*1024)
    ##endof:  while b
##endof:  def _make_count_generator(reader)


#  DWB note: 20230316T164800-0600
#+ I don't know what will happen with an empty file
def raw_generator_line_count(filename):
    with open(filename, 'rb') as fp:
        my_generator = _make_count_generator(fp.raw.read)
        count = sum(buffer.count(b'\n') for buffer in my_generator)
    ##endof:  with open ... fp
    
    return count + 1
##endof:  def raw_generator_line_count(filename)

##### Line count for file

In [None]:
n_lines = raw_generator_line_count(local_filename)
print("\nFile: " + str(local_filename) + "\nhas " + str(n_lines) + " lines.")

Well, that was anticlimactic. The output was

```
File: 4e_2011.csv
has 9 lines.
```

#### Let's have some physics fun  with the first batch of data

After all, we can visualize it all. As I was saying, some side scrolling will be needed, so we need to change some `pandas` stuff.

In [None]:
pd.options.display.max_columns = 50

# Get our dataframe
my_4e2011_df = pd.read_csv(local_filename)

my_4e2011_df.describe(include="all")

That might be more than the head.

In [None]:
my_4e2011_df.head(10)

In particle physics, we have something called the [invariant mass](https://en.wikipedia.org/wiki/Invariant_mass) ([archived](https://web.archive.org/web/20230313220738/https://en.wikipedia.org/wiki/Invariant_mass)). The invariant mass in such a collision - when talking about daughter particles - is the mass of the particle that decayed. It's the big 'M' in the last column. (If you don't understand that, don't worry - just forget I said anything.)

There are a lot of different masses in that last column. Considering that an electron has a mass of about $0.5 MeV/c^2$, which is the same as $0.0005 GeV/c^2$. 

***&lt;part-you-can-skip&gt;***
Particle physicists, again with our own quirky little ways, use a system called [natural units](https://en.wikipedia.org/wiki/Natural_units#Natural_units_(particle_and_atomic_physics)) ([archived](https://web.archive.org/web/20230316232513/https://en.wikipedia.org/wiki/Natural_units)). Some will say that this allows us to write $c$ (the speed of light) as $1$. That's oversimplifying it (it's more like making the physical-constant coefficients equal to $1$, in this case by making the length unit the same as the time unit ... -ish). Let's just say it lets us get away with writing equations and measurements without including the defining constants, which include $c$. If you want to know more, you can also look up [nondimensionalization](https://en.wikipedia.org/wiki/Nondimensionalization) ([archived](https://web.archive.org/web/20230316040337/https://en.wikipedia.org/wiki/Nondimensionalization))); I'm guessing you probably _don't_ want to know more, or even as much as I've said.

***&lt;/part-you-can-skip&gt;***

All of that to say that we can approximate the masses of the electron and the Higgs as 

`{mass_electron, mass_higgs} = {0.5 MeV, 126000 MeV} = {0.0005 GeV, 126 GeV}`

Any way you look at it, that Higgs is a _lot_ more massive than four electrons. Where does the extra mass go? Well, some of the mass gets converted into energy (remember $E = m c^2$), a lot of it being energy of motion. We can check up on this by looking at each electron's invariant mass. For one particle, in any frame of reference (ignore that last part if it confuses you), the invariant mass is given by

$m_0 = \sqrt{E^2 - \lvert \textbf{p} \rvert^2}$

I'll list the column headers in a nice format to show you how we're going to check on this, i.e. check that each of the electrons has the correct (invariant) mass.

In [None]:


for my_header in my_4e2011_df.columns.values.tolist():
    print(my_header)


## Pandas File-Loading Problems

### We could call it Appendix A

```
anchor_ref="https://stackoverflow.com/questions/16630969/" + \
"ipython-notebook-anchor-link-to-refer-a-cell-directly-from-outside"
archived_anchor_ref="https://web.archive.org/web/20230316174643/" + \
"https://stackoverflow.com/questions/16630969/" + \
"ipython-notebook-anchor-link-to-refer-a-cell-directly-from-outside"
```

From here

[This link/anchor is not used for the other, get-to-appendix task](#not-using-this-anchor-because-we-can-use-header), because the header name can be used, but showing a concept with &lt;a&gt;. The concept with &lt;a&gt; doesn't seem to work.

### Start of important stuff in Appendix A - an attempt to read in the files

Now, let's read in the CSV files for each of those three possibilities. We'll take a quick peek at the data, but not too much (there's a bunch). We'll look more carefully at a few of the columns and figure out the physics.

( _DWB note_, 20230315T175999-0600 ) Using `pd.read_csv` continues to get me `HTTPError: HTTP Error 504: Gateway Time-out` errors. I'm going to switch to using the `requests` module. I will keep my reference on some options from [this SO reference](https://stackoverflow.com/questions/16694907/download-large-file-in-python-with-requests) ([archived](https://web.archive.org/web/20230316000241/https://stackoverflow.com/questions/16694907/download-large-file-in-python-with-requests)).

_I'm also including a possibly-relevant-but-I'm-not-taking-the-time-to-learn-something-that-new source_

_&lt;not-now-source&gt;_
<br/>
```
ref="https://stackoverflow.com/questions/15786421/" + \
"http-error-504-gateway-time-out-when-trying-to-read-a-reddit-comments-post"
archived_ref="https://web.archive.org/web/20230316001420/" + \
"https://stackoverflow.com/questions/15786421/" + \
"http-error-504-gateway-time-out-when-trying-to-read-a-reddit-comments-post"
```

_&lt;/not-now-source&gt;_

<br/>

After a git commit, I'm putting the non-working code in this markdown cell and trying to use the new code.

<br/>

_&lt;not-working-now-but-I-doubt-it's-broken-code&gt;_

<br/>

#### First attempt

```
#4 electrons as daughter particles (two years' data)
the_4e_lepton_participants_2011 = \
  pd.read_csv(
    'http://opendata.cern.ch/record/5200/files/4e_2011.csv')
the_4e_lepton_participants_2012 = \
  pd.read_csv(
    'http://opendata.cern.ch/record/5200/files/4e_2012.csv')

# 4 muons as daughter particles (two years' data)
the_4mu_lepton_participants_2011 = \
  pd.read_csv(
    'http://opendata.cern.ch/record/5200/files/4mu_2011.csv')
the_4mu_lepton_participants_2012 = \
  pd.read_csv(
    'http://opendata.cern.ch/record/5200/files/4mu_2012.csv')

# 2 electrons and 2 muons as daughter particles (two years' data)
the_2e2mu_lepton_participants_2011 = \
  pd.read_csv(
    'http://opendata.cern.ch/record/5200/files/2e2mu_2011.csv')
the_2e2mu_lepton_participants_2012 = \
  pd.read_csv(
    'http://opendata.cern.ch/record/5200/files/2e2mu_2012.csv')


##########################################
##  A peek at the data for 4e from 2011 ##
##+ Side scrolling will be necessary    ##
##########################################
the_4e_lepton_participants_2011.head()
```

##### Not getting anything usable due to the HTMLError

<br/><br/>

#### Trying the original reading-in from the guide

```
##  DWB, 20230315T180500-0600
##+ Trying the strategy used in the original guide.
##+ orig_ref="https://opendata-education.github.io/en_Workshops/" + \
##+ "exercises/Hunting-the-Higgs-4leptons.html"
##+ orig_ref_archived="https://web.archive.org/web/20230316000855/" + \
##+ "https://opendata-education.github.io/en_Workshops/" + \
##+ "exercises/Hunting-the-Higgs-4leptons.html"
##+
##+ It didn't work, either (same `HTTPError: HTTP Error 504: Gateway Time-out`)

# Data for later use. 
csvs = [pd.read_csv('http://opendata.cern.ch/record/5200/files/4mu_2011.csv'), pd.read_csv('http://opendata.cern.ch/record/5200/files/4e_2011.csv'), pd.read_csv('http://opendata.cern.ch/record/5200/files/2e2mu_2011.csv')]
csvs += [pd.read_csv('http://opendata.cern.ch/record/5200/files/4mu_2012.csv'), pd.read_csv('http://opendata.cern.ch/record/5200/files/4e_2012.csv'), pd.read_csv('http://opendata.cern.ch/record/5200/files/2e2mu_2012.csv')]

fourlep = pd.concat(csvs)
```

##### Not getting anything usable due to the HTMLError

<br/>

_&lt;not-working-now-but-I-doubt-it's-broken-code&gt;_

Will try something else.

<br/>
<br/>


... to here.

<a id="#not-using-this-anchor-because-we-can-use-header"></a>
The line above this only contains '`<a id="#not-using-this-anchor-because-we-can-use-header"></a>`

And

we

should

be

looking

at

what

is

beneath

that

line.

**But, apparently, it doesn't work.**


## Line Counting Discussion

### We could call it Appendix B

I think the best discussion of the most optimized line-counting algorithm is at

```
lc_ref_1="https://stackoverflow.com/questions/845058/" + \
"how-to-get-line-count-of-a-large-file-cheaply-in-python/27518377#27518377"
archived_lc_ref_1="https://web.archive.org/web/20230316221423/" + \
"https://stackoverflow.com/questions/845058/" + \
"how-to-get-line-count-of-a-large-file-cheaply-in-python/27518377#27518377"
```

Look for the answer from @Michael_Bacon, then you can read the answer by @Ryan_Ginstrom to give some background. I would go straight to the archived version if you're searching for those answers - that's why I archived it.

The answer by adds generator/buffer solutions, not discussed by @Ryan_Ginstrom, but one of which is also discussed in another good source,

```
lc_ref_2="https://pynative.com/python-count-number-of-lines-in-file/"
archived_lc_ref_2="https://web.archive.org/web/20230316215432/" + \
"https://pynative.com/python-count-number-of-lines-in-file/"
```

There's also a decent discussion from geeksforgeeks, which includes some Big Oh discussion, but it's not as comprehensive as the other two and doesn't discuss the generator stuff.

```
lc_ref_3="https://www.geeksforgeeks.org/count-number-of-lines-" + \
"in-a-text-file-in-python/"
archived_anchor_ref="https://web.archive.org/web/20230306171715/" + \
"https://www.geeksforgeeks.org/count-number-of-lines-" + \
"in-a-text-file-in-python/"
```