# Higgs Boson Discovery

2023-03-14. Because of this date, I include the following sub-header.

## rediscovery (?), re-analysis (?), one more time (?)

A nice visualization to take me back to grad school - where I worked at [BNL](https://www.bnl.gov) / [RHIC](https://www.bnl.gov/rhic/) / [PHENIX](https://www.phenix.bnl.gov/) ([archived](https://web.archive.org/web/20230314100142/https://www.bnl.gov/world/) / [archived](https://web.archive.org/web/20230312202521/https://www.bnl.gov/rhic/) / [archived](https://web.archive.org/web/20230314191656/https://www.phenix.bnl.gov/)) which basically means I worked at a government particle accelerator. I wasn't at [CERN](https://home.cern/) / [LHC]() / [CMS](https://home.cern/science/experiments/cms) ([archived](https://web.archive.org/web/20230314192144/https://home.cern/) / [archived]() / [archived](https://web.archive.org/web/20230314192110/https://home.cern/science/experiments/cms)), which is one of the experiments contributing to the discovery of the Higgs boson and the provider of the data I'll be using. I did participate in _some_ collaboration with LHC groups. I was even invited to the [4<super>th</super> of July Discovery Announcement](https://en.wikipedia.org/wiki/Higgs_boson#Search_before_4_July_2012) ([archived](https://web.archive.org/web/20230314192829/https://en.wikipedia.org/wiki/Higgs_boson)), but I got my keys locked in my car while hanging out with friends halfway to Columbia University. (I think reality makes for a better story - we got in my car using tree branches, but it was long past the midnight announcement time.)

The results (when I accomplish my goal) should look something like what follows. The image comes from https://arxiv.org/format/1207.7235, the discovery paper for the Higgs.

<br/>
<div>
  <img src="./publication_4lepton_spectrum.png"
       alt="Publication histogram - our goal for the analysis is to be similar to this"
       width="400px">
</div>
<br/>

If you're seeing this on GitHub and want to run it, try 

1) (I'm going to try getting it set up with [MyBinder](https://mybinder.org/), but I haven't gotten that to work, yet.) 

2) Set up a virtual environment. Make a new folder (directory) for the project and go there.

On a normal `Python` setup (this one with Windows)

>mkdir higgs_venv
>python -m venv .\higgs_venv
>.\higgs_venv\Scripts\activate
(higgs_boson)>

```

## Setup

Using `conda` for environment setup. This time, because of computer availability, I did it on the Windows CMD prompt (`Anaconda Prompt (miniconda3)`, specifically). `Jupyter` makes for simpler visualizations, so here I am.

### When are we? (good for experimental notebooks)

```
>powershell -c (Get-Date -UFormat "%s_%Y%m%dT%H%M%S%Z00") -replace '[.][0-9]{5}_', '_'
1678797612_20230314T124012-0600

>::  Or, in case you don't have access to PowerShell,
>::+ you can extract that from the following mess.
>::
>(set LOCALE_INFO=) && (echo( > nul) && (for /f "usebackq tokens=*" %k in (`systeminfo ^| findstr ";" ^| cmd /q /v:on /c "set/p .=&echo(!.!"`) do @if not defined LOCALE_INFO set "LOCALE_INFO=%k") && (echo() && (for /f "tokens=*" %i in ('tzutil /g') do @echo "%date%" "%time%" "%i" "%LOCALE_INFO%") && (echo() && (w32tm /tz) && (echo()

"Tue 03/14/2023" "12:39:37.15" "Mountain Standard Time" "System Locale:             en-us;English (United States)"

Time zone: Current:TIME_ZONE_ID_DAYLIGHT Bias: 420min (UTC=LocalTime+Bias)
  [Standard Name:"Mountain Standard Time" Bias:0min Date:(M:11 D:1 DoW:0)]
  [Daylight Name:"Mountain Daylight Time" Bias:-60min Date:(M:3 D:2 DoW:0)]


>::  Gosh, that makes me miss bash
```


### Environment stuff, pip installs, and on to Jupyter 

```
>conda create --name higgs_boson python=3.10
>conda activate higgs_boson
(higgs_boson)>
(higgs_boson)>python -m pip install --upgrade pip
(higgs_boson)>pip install jupyter
(higgs_boson)>pip install numpy
(higgs_boson)>pip install pandas
(higgs_boson)>pip install matplotlib
## Adding this 2023-03-16
(higgs_boson)>pip install requests
```

Now, we can do our initial imports and image-dispay setup.



### Imports and other setup

For one particle, in any frame of reference (ignore that last part if it confuses you), the invariant mass is given by

$m_0 = \sqrt{E^2 - \lvert \textbf{p} \rvert^2}$

I'll list the column headers in a nice format to show you how we're going to check on this, i.e. check that each of the electrons has the correct (invariant) mass.

In [None]:
import numpy  as np
import pandas as pd
from matplotlib import pyplot as plt

%matplotlib inline

Run this next cell. It will clear any weird (HTML) table formatting (maybe from a `pandas` `dataframe.head()` call). Things just look better. [HTML-Jupyter Ref](#HTML-Jupyter-Ref)

In [None]:
%%HTML
<style>
table {
    margin-right:auto; 
    margin-left: 0px !important;
    display: block;
}
th,tr,td {
    font-size: 150%;
}
</style>

## Some Info (Physics)

***If you want to skip the physics, and I understand if you do, head to*** [the start of the analysis](#Starting-The-Analysis). **I suggest you at least look at the pictures in this first part, though.** 

Further down in the notebook, and here for that matter, you can skip past some physics stuff by mashing the colored button in a table that looks like this:

|Wanna|Skip|Physics?|
|-|-|-|
|Mash|`->`|<span style="color: red; background-color: #00ee00;"><a href="#Starting-The-Analysis">HERE</a></span>|

<br/>

<div style="font-size:200%">
    <p>At least skim this one, though,</p>
    <p>if only for the pictures.</p>
</div>

<br/>

Let's talk a little about the physics of the data we'll be looking at. You may know that Uranium can decay into lighter elements, as long as the combined mass-energy of the daughters (lighter elements' atoms, energy) is the same as the combined mass-energy - usually just the original mass - of the original Uranium.

The same happens for particles. The Higgs boson can decay into lighter particles and energy as long as its daughter decay particles and the energy is equal to the original mass-energy of the Higgs. One way that physics allows this to happen is for the Higgs to decay into four $\ell$eptons. (I use the script '$\ell$' to avoid confusion between '1' and 'l'.)

Here is a way physicists represent such a decay, which has the problems noted on the image.

<br/>
<div>
  <img src="./four_leptons_description_01.png"
       alt="Born (basically Feynman) diagrams for the two possible Higgs to four electrons scenario."
       width=100%>
</div>
<br/>

(In case you're curious, I'll tell you that the the $Z^{(*)}$ lines are for off-shell (virtual) Z-bosons. You can look up details if you'd like. Actually, to be technically correct, the Higgs boson should be off-shell (virtual), too, and we should see it labeled $H^{(*)}$. Just for fun, I'll put in this image from the [Wikipedia article on the Higgs boson](https://en.wikipedia.org/wiki/Higgs_boson#Discovery_of_candidate_boson_at_CERN) ([archived](https://web.archive.org/web/20230314192829/https://en.wikipedia.org/wiki/Higgs_boson)). It should show up even if changes are made to the wiki article and or the image, because I'm using an [archived image](https://web.archive.org/web/20230314233304/https://en.wikipedia.org/wiki/File:4-lepton_Higgs_decay.svg). This diagram doesn't show whether something is off-shell/virtual with the $^{(*)}$, because one learns that anything in the middle of such a diagram is virtual/off-shell.

<br/>
<div>
  <img src="https://web.archive.org/web/20230314233304/https://upload.wikimedia.org/wikipedia/commons/thumb/b/b2/4-lepton_Higgs_decay.svg/320px-4-lepton_Higgs_decay.svg.png"
       alt="Both sides of the diagram for Higgs to four leptons"
       width="480px">
</div>
<br/>

That represents to protons coming in and four leptons coming out.

### Three decay possibilites

In the first image, I said that we could simplify things while including more leptons than the electron. Let's use these symbols,

<br/>
<div>
  <img src="./four_leptons_description_02.png"
       alt="Symbols for electron, positron (positive version of the electron), muon, and antimuon (positive version of the muon)"
       width=300px>
</div>
<br/>

and give the three possibilites. We'll see these possibilities in the names of the `CSV` files.

2023-03-15

**Possibility 1: ... $H^{(*)} \rightarrow \{ 2 Z^{(*)} \} \rightarrow \{ 4e \}$**<br/>
**Higgs to 4 electrons (via two Z bosons and after previous interactions)**

<br/>
<div>
  <img src="./decay_four_leptons_four_electrons.png"
       alt="Off-mass-shell (virtual) Higgs to two off-mass-shell (virtual) Z bosons to four electrons"
       width=450px>
</div>
<br/>

**Possibility 2: ... $H^{(*)} \rightarrow \{ 2 Z^{(*)} \} \rightarrow \{ 4\mu$ \}**<br/>
**Higgs to four muons (via two Z bosons and after previous interactions)**

<br/>
<div>
  <img src="./decay_four_leptons_four_muons.png"
       alt="Off-mass-shell (virtual) Higgs to two off-mass-shell (virtual) Z bosons to four muons"
       width=450px>
</div>
<br/>

**Possibility 3: ... $H^{(*)} \rightarrow \{ 2 Z^{(*)} \} \rightarrow \{ 2e, 2\mu$ \}**<br/>
**Higgs to 2 electrons and 2 muons (via two Z bosons and after previous interactions)**

<br/>
<div>
  <img src="./decay_four_leptons_two_electrons_two_muons.png"
       alt="Off-mass-shell (virtual) Higgs to two off-mass-shell (virtual) Z bosons to two electrons and two muons"
       width=450px>
</div>
<br/>

<hr/>

### Some code that didn't work ...

... is now in a [note near the end](#Pandas-File-Loading-Problems).

2023-03-16

Basically, using `pandas.load(<internet-url>)` kept giving me a `HTTPError: HTTP Error 504: Gateway Time-out` error. I figured it will be better to use `requests` and then `pandas`.

A quick note, I added a `pip` install of requests to the stuff at the beginnnings. I don't need to do the same for `shutil`, nor `pathlib`, nor `re`, nor `pprint`, because each of those are part of the standard.

<hr/>

## Starting The Analysis

### URLs

In [None]:
data_csv_urls = []

# 4 electrons as daughter particles (two years' data)
url_4e_2011 = 'http://opendata.cern.ch/record/5200/files/4e_2011.csv'
data_csv_urls.append(url_4e_2011)
url_4e_2012 = 'http://opendata.cern.ch/record/5200/files/4e_2012.csv'
data_csv_urls.append(url_4e_2012)

# 4 muons as daughter particles (two years' data)
url_4mu_2011 = 'http://opendata.cern.ch/record/5200/files/4mu_2011.csv'
data_csv_urls.append(url_4mu_2011)
url_4mu_2012 = 'http://opendata.cern.ch/record/5200/files/4mu_2012.csv'
data_csv_urls.append(url_4mu_2012)

# 2 electrons and 2 muons as daughter particles (two years' data)
url_2e2mu_2011 = 'http://opendata.cern.ch/record/5200/files/2e2mu_2011.csv'
data_csv_urls.append(url_2e2mu_2011)
url_2e2mu_2012 = 'http://opendata.cern.ch/record/5200/files/2e2mu_2012.csv'
data_csv_urls.append(url_2e2mu_2012)

### Download using requests and shutil then reading in with pandas

Running the next cell makes some of the next cells look better.

In [None]:
%%HTML
<style>
table {
    margin-right:auto; 
    margin-left: 0px !important;
    display: block;
}
th,tr,td {
    font-size: 150%;
}
</style>

<br/>

|Wanna|Skip|Physics?|
|-|-|-|
|Mash|`->`|<span style="color: red; background-color: #00ee00;"><a href="#Putting-All-Data-In">HERE</a></span>|

<br/>

#### Looking at one file

In [None]:
#wait# for this_url in data_csv_urls:
#wait#  local_filename = this_url.split('/')[-1] #  not general solution 
#wait#                                           #+ (that would require HTML
#wait#                                           #+ headers), but okay for
#wait#                                           #+ our known urls.

##  Wait, before I loop through all of them,I'll just 
##+ load in the first one, then we can look at the physics

##########################################
##  A peek at the data for 4e from 2011 ##
##+ Side scrolling will be necessary    ##
##########################################

## Adding these 2023-03-16
import requests
import shutil
from pathlib import Path

my_url = 'http://opendata.cern.ch/record/5200/files/4e_2011.csv'
local_filename = my_url.split('/')[-1]

with requests.get(my_url, stream=True) as r:
    r.raise_for_status()
    with open(local_filename, 'wb') as f:
        shutil.copyfileobj(r.raw, f)
    ##endof:  with open ... as f
##endof:  with requests ... as r

our_file = Path(local_filename)
we_have_our_file = ( our_file.is_file() )

assert(we_have_our_file)

No AssertionError! We have our file!

I want to see if the file size was causing the timeout, or if it was (the most likely explanation) something with my local gateway.

For a discussion of the fastest and most-memory-friendly way to count lines in a file (especially big files), see [this note](#Line-Count-Discussion) near the end of the notebook.

##### Line-count functions

In [None]:
def _make_count_generator(reader):
    b = reader(1024 * 1024)
    while b:
        yield b
        b = reader(1024*1024)
    ##endof:  while b
##endof:  def _make_count_generator(reader)


#  DWB note: 20230316T164800-0600
#+ I don't know what will happen with an empty file
def raw_generator_line_count(filename):
    with open(filename, 'rb') as fp:
        my_generator = _make_count_generator(fp.raw.read)
        count = sum(buffer.count(b'\n') for buffer in my_generator)
    ##endof:  with open ... fp
    
    return count + 1
##endof:  def raw_generator_line_count(filename)

##### Line count for file

In [None]:
n_lines = raw_generator_line_count(local_filename)
print("\nFile: " + str(local_filename) + "\nhas " + str(n_lines) + " lines.")

Well, that was anticlimactic. The output was

```
File: 4e_2011.csv
has 9 lines.
```

#### Let's have some physics fun with the first batch of data

After all, we can visualize it all. As I was saying, some side scrolling will be needed, so we need to change some `pandas` stuff. It doesn't hurt to reset the HTML styles, either.

In [None]:
%%HTML
<style>
table {
    margin-right:auto; 
    margin-left: auto;
    display: auto;
}
th,tr,td {
    font-size: 100%;
}
</style>

In [None]:
pd.options.display.max_columns = 50

# Get our dataframe
my_4e2011_df = pd.read_csv(local_filename)

my_4e2011_df.describe(include="all")

That might be more than the head.

In [None]:
my_4e2011_df.head(10)

In particle physics, we have something called the [invariant mass](https://en.wikipedia.org/wiki/Invariant_mass) ([archived](https://web.archive.org/web/20230313220738/https://en.wikipedia.org/wiki/Invariant_mass)). The invariant mass in such a collision - when talking about daughter particles - is the mass of the particle that decayed. It's the big 'M' in the last column. (If you don't understand that, don't worry - just forget I said anything.)

There are a lot of different masses in that last column. Considering that an electron has a mass of about $0.5 MeV/c^2$, which is the same as $0.0005 GeV/c^2$. 

***&lt;part-you-can-skip&gt;***<br/>
Particle physicists, again with our own quirky little ways, use a system called [natural units](https://en.wikipedia.org/wiki/Natural_units#Natural_units_(particle_and_atomic_physics)) ([archived](https://web.archive.org/web/20230316232513/https://en.wikipedia.org/wiki/Natural_units)). Some will say that this allows us to write $c$ (the speed of light) as $1$. That's oversimplifying it (it's more like making the physical-constant coefficients equal to $1$, in this case by making the length unit the same as the time unit ... -ish). Let's just say it lets us get away with writing equations and measurements without including the defining constants, which include $c$. If you want to know more, you can also look up [nondimensionalization](https://en.wikipedia.org/wiki/Nondimensionalization) ([archived](https://web.archive.org/web/20230316040337/https://en.wikipedia.org/wiki/Nondimensionalization))); I'm guessing you probably _don't_ want to know more, or even as much as I've said.<br/>
***&lt;/part-you-can-skip&gt;***

All of that to say that we can approximate the masses of the electron and the Higgs as 

`{mass_electron, mass_higgs} = {0.5 MeV, 126000 MeV} = {0.0005 GeV, 126 GeV}`

Any way you look at it, that Higgs is a _lot_ more massive than four electrons. Where does the extra mass go? Well, some of the mass gets converted into energy (remember $E = m c^2$), a lot of it being energy of motion. 

Let's look at this two ways, both using the invariant mass. First, we'll find the invariant mass of the particle that decayed into four leptons (four electrons in the case we're doing). We'll take a look at one event to check things out. (I've always called it a sanity check.) Then, we can look at each electron's invariant mass.

Here's some code to let you see the info we get from one event - basically the headers and the values for the first event.

In [None]:
## Adding 2023-03-16
import re

In [None]:
table_of_tables = []
current_table = []
is_a_new_table = True
is_the_first_element = True

for my_header in my_4e2011_df.columns.values.tolist():
    if ( re.match(r"PID\d+", my_header) or
         re.match(r"mZ1", my_header) or
         re.match(r"M", my_header)):
        is_a_new_table = True
    ##endof:  if <and all the re.match>
    
    if is_a_new_table:
        if not is_the_first_element:
            table_of_tables.append(current_table)
        else:
            is_the_first_element = False
        ##endof:  if/else is_not_the_first_element
        
        current_table = [my_header]
        is_a_new_table = False
        
    else:
        current_table.append(my_header)
    ##endof:  if/else is_a_new_table
    
##endof:  for my_header in my_4e2011_df.columns.values.tolist()

table_of_tables.append(current_table)

In [None]:
import pprint

pprint.pprint(table_of_tables)

The output was

```
[['Run', 'Event'],
 ['PID1', 'E1', 'px1', 'py1', 'pz1', 'pt1', 'eta1', 'phi1', 'Q1'],
 ['PID2', 'E2', 'px2', 'py2', 'pz2', 'pt2', 'eta2', 'phi2', 'Q2'],
 ['PID3', 'E3', 'px3', 'py3', 'pz3', 'pt3', 'eta3', 'phi3', 'Q3'],
 ['PID4', 'E4', 'px4', 'py4', 'pz4', 'pt4', 'eta4', 'phi4', 'Q4'],
 ['mZ1', 'mZ2'],
 ['M']]
```

Note that we can use `my_4e2011_df.info()` to get even more info about these (not about their physics, but about the data types, count, etc. I've done that in a [note near the end](#More-Dataframe-Info).

### The parent particle

The general formula for invariant mass, $m_0$, for one particle in this case, is

$m_0 = \sqrt{E^2 - \lvert \textbf{p} \rvert^2}$

In a particle decay, the invariant mass of the the system of particle(s) stays the same - the invariant mass calculated using the energy and momentum of the daughter particles is equal to the invariant mass of the parent particle. However, we can't just find the equivalent mass for the first daughter particle, then the second daughter particle, etc. We need to look at them all together. This is shown in the general formula for the invariant mass of the system of daughter particles, usually notated as $W$, and which is equal to the invariant mass of the parent particle,

$W = \sqrt{ \left( \sum{E} \right)^2 - \left( \lvert \sum{\textbf{p}} \rvert \right)^2}$

Note that this, just like the general formula for invariant mass, is in natural units.

The way vectors work makes this nice for us. We can write the equation using only the values in our list.

$W = \sqrt{\left( \sum_N{E} \right)^2 - \sum_N{ \left[ (pxN)^2 + (pyN)^2 + (pzN)^2 \right]}}$ with $N \in \{1, 2, 3, 4\}$

or, even more exhaustively,

$W = \sqrt{(} \overline{ E1 + E2 + E3 + E4 )^2 - ... }$

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $\overline{ ... {[} (px1)^2 + (px2)^2 + (px3)^2 + (px4)^2 + ... }$

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $\overline{ ... (py1)^2 + (py2)^2 + (py3)^2 + (py4)^2 + ... }$

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $\overline{ ... (pz1)^2 + (pz2)^2 + (pz3)^2 + (pz4)^2 ] } $


### One daughter particle

For our one-particle invariant mass, let's look back to the structure of the data we're inspecting.

```
[['Run', 'Event'],
 ['PID1', 'E1', 'px1', 'py1', 'pz1', 'pt1', 'eta1', 'phi1', 'Q1'],
 ['PID2', 'E2', 'px2', 'py2', 'pz2', 'pt2', 'eta2', 'phi2', 'Q2'],
 ['PID3', 'E3', 'px3', 'py3', 'pz3', 'pt3', 'eta3', 'phi3', 'Q3'],
 ['PID4', 'E4', 'px4', 'py4', 'pz4', 'pt4', 'eta4', 'phi4', 'Q4'],
 ['mZ1', 'mZ2'],
 ['M']]
```

It it is now pretty easy to see how we can do $E^2$. For the $\lvert \textbf{p} \rvert^2$, we use a standard vector length formula (Pythagorean theorem). I'm not sure what to do with `ptN`, which I'm pretty sure is transverse momentum. That's basically the same as the momentum in $x$ and $y$. We can use our Pythagorean Theorem for the norm of the momentum vector using $x,y,z$. So, for each particle, (1,2,3,4), for which we have the components of momentum (we'll designate each particle with $N \in \{1, 2, 3, 4\}$

$\lvert \textbf{p} \rvert = \sqrt{(pxN)^2 + (pyN)^2 + (pzN)^2 - (ptN)^2}$

We'll do this with our dataframe, and to be a bit simpler (not as busy on the page), we'll do it one particle at a time. For the first try, we'll do it one event at a time as well.

For one particle, in any frame of reference (ignore that last part if it confuses you), the invariant mass is given by

$m_0 = \sqrt{E^2 - \lvert \textbf{p} \rvert^2}$

We'll check that each of the electrons has the correct (invariant) mass.

#### Invariant mass for each particle

We'll use our equation, rewritten as

$m_0 = \sqrt{E^2 - [(pxN)^2 + (pyN)^2 + (pzN)^2 - (ptN)^2]}$

In [None]:
## added 2023-03-17
from math import sqrt

In [None]:
def invariant_mass_from_values(energy, px, py, pz, do_avoid_imaginary = True):
    '''
    :brief: Calculates the invariant mass of a particle
    
    :param energy:  The energy of the particle
    :param px:      The x-component of the particle's velocity
    :param py:      The y-component of the particle's velocity
    :param pz:      The z-component of the particle's velocity
    
    :returns invariant_mass: The invariant mass (rest mass)
    '''
    
    e_squared_part = (energy ** 2)
    norm_p_squared_part = (px ** 2) + (py ** 2) + (pz ** 2)
    
    #  remember that the time-component of the vector-length
    #+ formula has a different sign
    e2_minus_p2 = e_squared_part - norm_p_squared_part
    
    invariant_mass = sqrt(  )
)
    invariant_mass = float(invariant_mass)
    
    return invariant_mass
##endof:  def invariant_mass_from_values(<params>)

#############################################################
#### SAVE THIS FOR LATER. LET'S GET THE MAIN STUFF DONE! ####
##
##def invariant_mass_from_dataframe(dataframe, 
##                                  event=Null, 
##                                  particle_number=Null):
##    '''
##    :brief: Get invariant mass of all or certain paricles from CMS data
##    
##    Note that we must get the information in the form it is found at
##    guide="https://opendata-education.github.io/en_Workshops/exercises/" + \
##          "Hunting-the-Higgs-4leptons.html"
##    archived_guide="https://web.archive.org/web/20230305034951/" + \
##    "https://opendata-education.github.io/en_Workshops/exercises/discussion.html"
##    i.e. the headers of the data must look like the headers in the CSVs
##    downloaded there. The CSVs come from
##    '''
##    #blah
####endof:  de invariant_mass_from_dataframe

 Let's set the HTML stuff correct for displaying `pandas` stuff.

In [None]:
%%HTML
<style>
table {
    margin-right:auto; 
    margin-left: auto;
    display: auto;
}
th,tr,td {
    font-size: 100%;
}
</style

In [None]:
do_display_first = True # You can change this to True, if you want

if do_display_first:
    print(my_4e2011_df.iloc[1])
##endof:  if do_display_first

#  archived_ref="https://web.archive.org/web/" + \
#+ "20230318225210/https://stackoverflow.com/questions/" + \
#+ "12021754/how-to-slice-a-pandas-dataframe-by-position"
first_event_df = my_4e2011_df[:1]

event_1_particle_1_df = \
  first_event_df.assign(
            Invariant_Mass_1 = lambda x: \
               invariant_mass_from_values(x.E1,
                                          x.px1,
                                          x.py1,
                                          x.pz1)) 

event_1_particle_1_df.head()

<hr/>

## Putting All Data In

<br/><br/>
<hr/><hr/>

## Notes/Appendixes

<hr/><hr/>

## HTML Jupyter Refs

### We could call it Appendix A-1

```
html_jupyter_ref="https://stackoverflow.com/a/39551936/6505499"
archived_html_jupyter_ref_earlier="https://web.archive.org/web/" + \
"20220525114434/" + \
"http://stackoverflow.com:80/questions/21892570/" + \
"ipython-notebook-align-table-to-the-left-of-cell"
archived_html_jupyter_ref_now="https://web.archive.org/web/" + \
"20230317043442/" + \
"https://stackoverflow.com/questions/21892570/" + \
"ipython-notebook-align-table-to-the-left-of-cell/"
```

That one is the best for what I needed. Here are others.

```
other_ref_1="https://stackoverflow.com/questions/36319252/" + \
"how-to-revert-to-normal-style-no-style-in-ipython-notebook-" + \
"after-calling-html/36336153#36336153"
archived_other_ref_1="https://web.archive.org/web/20230317044154/" + \
"https://stackoverflow.com/questions/18024769/" + \
"adding-custom-styled-paragraphs-in-markdown-cells"


other_ref_2="https://stackoverflow.com/questions/18024769/" + \
"adding-custom-styled-paragraphs-in-markdown-cells"
archived_other_ref_2="https://web.archive.org/web/20230317161215/" + \
"https://stackoverflow.com/questions/18024769/" + \
"adding-custom-styled-paragraphs-in-markdown-cells"

other_ref_3="https://stackoverflow.com/questions/52290219/" + \
"how-to-increase-the-font-size-of-the-markdown-" + \
"table-in-jupyter-notebook"
archived_other_ref_3="https://web.archive.org/web/20230317161621/" + \
"https://stackoverflow.com/questions/52290219/" + \
"how-to-increase-the-font-size-of-the-markdown-" + \
"table-in-jupyter-notebook"
```

#### Other info about magic functions. Just archived links for now.

(I can get the original link from them, anyway, as long as I'm not linking to a specific answer/anchor.)

```
archived_magic_ref_1="https://web.archive.org/web/20230316174653/" + \
"https://stackoverflow.com/questions/32565829/" + \
"simple-way-to-measure-cell-execution-time-in-ipython-notebook"

archived_magic_ref_2="https://web.archive.org/web/20230314233246/" + \
"https://ipython.org/ipython-doc/dev/" + \
"interactive/tutorial.html#magic-functions"

archived_magic_ref_3="https://web.archive.org/web/20221128083253/" + \
"https://nbviewer.org/github/ipython/ipython/" + \
"blob/1.x/examples/notebooks/Cell%20Magics.ipynb"
```


## Pandas File-Loading Problems

### We could call it Appendix A

```
anchor_ref="https://stackoverflow.com/questions/16630969/" + \
"ipython-notebook-anchor-link-to-refer-a-cell-directly-from-outside"
archived_anchor_ref="https://web.archive.org/web/20230316174643/" + \
"https://stackoverflow.com/questions/16630969/" + \
"ipython-notebook-anchor-link-to-refer-a-cell-directly-from-outside"
```

From here ... (The following link is broken - it was a noble attempt, though.)

[This link/anchor is not used for the other, get-to-appendix task](#not-this-anchor), because the header name can be used, but showing a concept with &lt;a&gt;. The concept with &lt;a&gt; doesn't seem to work.

### Start of important stuff in Appendix A - an attempt to read in the files

Now, let's read in the CSV files for each of those three possibilities. We'll take a quick peek at the data, but not too much (there's a bunch). We'll look more carefully at a few of the columns and figure out the physics.

( _DWB note_, 20230315T175999-0600 ) Using `pd.read_csv` continues to get me `HTTPError: HTTP Error 504: Gateway Time-out` errors. I'm going to switch to using the `requests` module. I will keep my reference on some options from [this SO reference](https://stackoverflow.com/questions/16694907/download-large-file-in-python-with-requests) ([archived](https://web.archive.org/web/20230316000241/https://stackoverflow.com/questions/16694907/download-large-file-in-python-with-requests)).

_I'm also including a possibly-relevant-but-I'm-not-taking-the-time-to-learn-something-that-new source_

_&lt;not-now-source&gt;_
<br/>
```
ref="https://stackoverflow.com/questions/15786421/" + \
"http-error-504-gateway-time-out-when-trying-to-read-a-reddit-comments-post"
archived_ref="https://web.archive.org/web/20230316001420/" + \
"https://stackoverflow.com/questions/15786421/" + \
"http-error-504-gateway-time-out-when-trying-to-read-a-reddit-comments-post"
```

_&lt;/not-now-source&gt;_

<br/>

After a git commit, I'm putting the non-working code in this markdown cell and trying to use the new code.

<br/>

_&lt;not-working-now-but-I-doubt-it's-broken-code&gt;_

<br/>

#### First attempt

```
#4 electrons as daughter particles (two years' data)
the_4e_lepton_participants_2011 = \
  pd.read_csv(
    'http://opendata.cern.ch/record/5200/files/4e_2011.csv')
the_4e_lepton_participants_2012 = \
  pd.read_csv(
    'http://opendata.cern.ch/record/5200/files/4e_2012.csv')

# 4 muons as daughter particles (two years' data)
the_4mu_lepton_participants_2011 = \
  pd.read_csv(
    'http://opendata.cern.ch/record/5200/files/4mu_2011.csv')
the_4mu_lepton_participants_2012 = \
  pd.read_csv(
    'http://opendata.cern.ch/record/5200/files/4mu_2012.csv')

# 2 electrons and 2 muons as daughter particles (two years' data)
the_2e2mu_lepton_participants_2011 = \
  pd.read_csv(
    'http://opendata.cern.ch/record/5200/files/2e2mu_2011.csv')
the_2e2mu_lepton_participants_2012 = \
  pd.read_csv(
    'http://opendata.cern.ch/record/5200/files/2e2mu_2012.csv')


##########################################
##  A peek at the data for 4e from 2011 ##
##+ Side scrolling will be necessary    ##
##########################################
the_4e_lepton_participants_2011.head()
```

##### Not getting anything usable due to the HTMLError

<br/><br/>

#### Trying the original reading-in from the guide

```
##  DWB, 20230315T180500-0600
##+ Trying the strategy used in the original guide.
##+ orig_ref="https://opendata-education.github.io/en_Workshops/" + \
##+ "exercises/Hunting-the-Higgs-4leptons.html"
##+ orig_ref_archived="https://web.archive.org/web/20230316000855/" + \
##+ "https://opendata-education.github.io/en_Workshops/" + \
##+ "exercises/Hunting-the-Higgs-4leptons.html"
##+
##+ It didn't work, either (same `HTTPError: HTTP Error 504: Gateway Time-out`)

# Data for later use. 
csvs = [pd.read_csv('http://opendata.cern.ch/record/5200/files/4mu_2011.csv'), pd.read_csv('http://opendata.cern.ch/record/5200/files/4e_2011.csv'), pd.read_csv('http://opendata.cern.ch/record/5200/files/2e2mu_2011.csv')]
csvs += [pd.read_csv('http://opendata.cern.ch/record/5200/files/4mu_2012.csv'), pd.read_csv('http://opendata.cern.ch/record/5200/files/4e_2012.csv'), pd.read_csv('http://opendata.cern.ch/record/5200/files/2e2mu_2012.csv')]

fourlep = pd.concat(csvs)
```

##### Not getting anything usable due to the HTMLError

<br/>

_&lt;not-working-now-but-I-doubt-it's-broken-code&gt;_

Will try something else.

<br/>
<br/>

<hr/>

... to here.

<a id="#not-this-anchor"></a>
The line above this only contains '`<a id="#not-this-anchor"></a>`'

And we should be looking at what is beneath that line.

**But, apparently, it doesn't work. I'll need to use the names of the headers.**

<hr/>


## Line Counting Discussion

### We could call it Appendix B

I think the best discussion of the most optimized line-counting algorithm is at

```
lc_ref_1="https://stackoverflow.com/questions/845058/" + \
"how-to-get-line-count-of-a-large-file-cheaply-in-python/27518377#27518377"
archived_lc_ref_1="https://web.archive.org/web/20230316221423/" + \
"https://stackoverflow.com/questions/845058/" + \
"how-to-get-line-count-of-a-large-file-cheaply-in-python/27518377#27518377"
```

Look for the answer from @Michael_Bacon, then you can read the answer by @Ryan_Ginstrom to give some background. I would go straight to the archived version if you're searching for those answers - that's why I archived it.

The answer by adds generator/buffer solutions, not discussed by @Ryan_Ginstrom, but one of which is also discussed in another good source,

```
lc_ref_2="https://pynative.com/python-count-number-of-lines-in-file/"
archived_lc_ref_2="https://web.archive.org/web/20230316215432/" + \
"https://pynative.com/python-count-number-of-lines-in-file/"
```

There's also a decent discussion from geeksforgeeks, which includes some Big Oh discussion, but it's not as comprehensive as the other two and doesn't discuss the generator stuff.

```
lc_ref_3="https://www.geeksforgeeks.org/count-number-of-lines-" + \
"in-a-text-file-in-python/"
archived_anchor_ref="https://web.archive.org/web/20230306171715/" + \
"https://www.geeksforgeeks.org/count-number-of-lines-" + \
"in-a-text-file-in-python/"
```

<hr/>

## More Dataframe Info

### We could call it Appendix C

In [None]:
my_4e2011_df.info()

The output was

```
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 41 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Run     7 non-null      int64  
 1   Event   7 non-null      int64  
 2   PID1    7 non-null      int64  
 3   E1      7 non-null      float64
 4   px1     7 non-null      float64
 5   py1     7 non-null      float64
 6   pz1     7 non-null      float64
 7   pt1     7 non-null      float64
 8   eta1    7 non-null      float64
 9   phi1    7 non-null      float64
 10  Q1      7 non-null      int64  
 11  PID2    7 non-null      int64  
 12  E2      7 non-null      float64
 13  px2     7 non-null      float64
 14  py2     7 non-null      float64
 15  pz2     7 non-null      float64
 16  pt2     7 non-null      float64
 17  eta2    7 non-null      float64
 18  phi2    7 non-null      float64
 19  Q2      7 non-null      int64  
 20  PID3    7 non-null      int64  
 21  E3      7 non-null      float64
 22  px3     7 non-null      float64
 23  py3     7 non-null      float64
 24  pz3     7 non-null      float64
 25  pt3     7 non-null      float64
 26  eta3    7 non-null      float64
 27  phi3    7 non-null      float64
 28  Q3      7 non-null      int64  
 29  PID4    7 non-null      int64  
 30  E4      7 non-null      float64
 31  px4     7 non-null      float64
 32  py4     7 non-null      float64
 33  pz4     7 non-null      float64
 34  pt4     7 non-null      float64
 35  eta4    7 non-null      float64
 36  phi4    7 non-null      float64
 37  Q4      7 non-null      int64  
 38  mZ1     7 non-null      float64
 39  mZ2     7 non-null      float64
 40  M       7 non-null      float64
dtypes: float64(31), int64(10)
memory usage: 2.4 KB
```

## Other Potentially-Useless Sources

### We could call it Appendix Z - I don't know how far I'll get with other appendices

```
archived_opus_ref_01="https://web.archive.org/web/20230318231017/" + \
"https://pandas.pydata.org/docs/getting_started/" + \
"intro_tutorials/03_subset_data.html"
```