# Wire Delay

In September of 2011, scientists at the OPERA experiment in Italy announced publicly a result that, if true, would upend nearly all of the past century of physics research.  Their detector had measured ghostly particles called neutrinos travelling faster than the speed of light, an observation that directly contradicts the special theory of relativity.

Most scientists, including those on the OPERA team itself, were certain that there must have been some mistake.  Indeed there was, but it was difficult to track down.  After 5 months spent reviewing their experiment's setup, GPS measurements, and data analysis techniques with a fine-toothed comb, the team finally identified two small equipment problems: A clock (one of many) was ticking slightly faster than it was supposed to, and the connector on a fiber-optic cable (also one of many) was loose, causing the signal it carried to be slower than expected.  After fixing these errors, their math was corrected: neutrinos travel slower than light after all, just like relativity requires.
    
This infamous episode in particle physics is a good reminder of an important rule: **scientific researchers must understand how their measuring devices affect the data they gather and be able to communicate this clearly when reporting their observations to others.**

### Wire delay in cosmic ray muon detectors

The cosmic ray muon detectors (CRMD) made by QuarkNet are susceptible to the same type of equipment errors that bedeviled the OPERA neutrino experiment.  Signals are delivered through cables and connectors from the detector panels to the data acquisition board (DAQ), and from the GPS unit to the DAQ.  Both the GPS unit and the DAQ itself contain clocks that, used together, allow the CRMD to measure the arrival of cosmic ray muons with timing as precise as 10 nanoseconds.  If these elements malfunction or if you don't know how to account for their effects, you may end up with nonsense data -- or, even worse, data that *looks* correct but is subtly flawed so that you end up with the wrong answer without knowing it.

The e-Lab cosmic ray studies use the `WireDelay.pl` data transformation script to ensure that the data used during an analysis correctly accounts for the time it takes for signals to travel between the counter panels and the DAQ and between the GPS unit and the DAQ.  In this notebook, we'll see how that's accomplished and see how big of an effect that ends up having on the data you use in cosmic ray studies.

### Abbreviations used in this notebook:

* **CRMD**: **C**osmic **R**ay **M**uon **D**etector
* **DAQ**: **D**ata **A**c**Q**uisition board
* **GPS**: **G**lobal **P**ositioning **S**ystem

## Cosmic Ray Muon Detectors: Clocks and Wires

Fig 1 is a diagram of a CRMD in operation.  A cosmic ray muon has just passed through the four stacked panels of the detector, sending signals from each panel to the DAQ through the wires that connect them.

![The CRMD setup](img/DetectorWithGPS.png) *Fig 1: A sketch of the CRMD showing signals traveling from the detector panels to the DAQ as a muon flies through, and the signal traveling from the GPS unit to the DAQ for timekeeping*

If we want to measure the exact time that the muon strikes each panel, we must understand what happens to the electrical signals as they travel through the wires that connect the different elements of the CRMD.

### The speed of signal

You might already be familiar with a very important number:

$$c = 3 \times 10^8 \rm{m/s}\,\rm{.}$$

This is, of course, the speed of light.  Not just the speed of light, though, but the speed of light *in a vacuum*, which is a very important distinction in this case.  When light travels through a medium like water or glass, its speed decreases.  The same is true in the CRMD wires: the signals travel through them at the speed of light, but it's the speed of light in the material the cable is made out of. 

As a rule of thumb, it turns out that this speed is about 2/3 of the speed of light in a vacuum, or

$$v_{\rm wire} = (2/3) c\,\rm{.}$$


We'll do a quick, back-of-the envelope calculation to see how this might affect our CRMD data.  Let's say we're connecting the GPS unit to the DAQ with a cable that's 10m long.  Using the formula

$${\rm distance} = {\rm rate}\times{\rm time}\,{\rm ,}$$

we'll substitute
$${\rm distance} = 10{\rm m}\,{\rm ,}$$
$${\rm rate} = (2/3)c\,{\rm ,}$$

and solve for ${\rm time}$.  No need to grab a calculator, we have Python available right here:

In [32]:
# Define distance and rate as variables:
distance = 10
rate = (2/3)*3*10**8

# Solve the formula for time:
time = distance/rate

# Print the result in seconds:
print(time)

5e-08


**Excercise 0.1**  By following the units through the calculation, and by using the definition of a nanosecond, show that the above result of `5e-08` is correctly interpreted as 50ns.

That's 50ns for a signal to travel from the GPS unit to the DAQ over a 10m cable.  If we want to have data that's precise to 10ns, then we clearly have to worry about cable lengths!  The `WireDelay` data transformation does just that -- it corrects the muon event times recorded in the input data by subtracting the extra time introduced by the wires.  Before we look at the data transformation itself, we'll first make sure we know about what we expect to happen.

## Wire Delays in CRMDs: What We Expect

We've seen how to calculate the time delay given by a wire using the $d=v_{wire} t$ formula.  Now, we'll go through a couple of thought experiments to help us understand whether we should *add* or *subtract* the calculated delay from the times recorded in the input cosmic ray data.

### 1) Wire Delays in the channel cables

Let's say a cosmic ray muon strikes the panel on channel 1 of the CRMD at exactly 12:00:00 noon.  Imagine that the cable connecting that panel to the DAQ is so long that it takes a full second for the signal to travel to the DAQ, so that the DAQ receives the signal at 12:00:01 pm.

The time we want to record is the time of the muon hit, 12:00:00.  Thus, we conclude that we must *subtract* the wire delay from the DAQ's clock reading of 12:00:01 in order to get the correct time.

### 2) Wire Delays in the GPS cable

Again, let's imagine that the GPS unit is connected to the DAQ with a cable so long that it takes a full second for time data to transmit over it.  When the DAQ wants to know what time it is, it asks the GPS unit, and the GPS responds with a message "It's exactly 5:00:00 pm".  This message travels down the wire, and the DAQ receives it at 5:00:01 pm.  By this time, though, the message is out-of-date!  To find what the correct time is at the moment the DAQ receives the GPS's message, we conclude that we must *add* the wire delay to the time reported by the GPS unit in order to know what time it really is.

## Using WireDelay.pl

We can execute the Perl script `WireDelay.pl` by calling the Perl interpreter from the command line along with the additional parameters that the script requires:

`$ perl ./perl/WireDelay.pl <threshIn> <outputs> <geoDir> <daqId> <fw>`

where the items in angled brackets `<>` are parameters we have to specify.  These are:

* `threshIn`:  The name of the Threshold data file we'll use as input
* `outputs`: What we want to name the output file that the script will write its results to
* `geoDir`: The directory that contains geometry files for the detectors whose data we're processing
* `daqId`: The 4-digit ID number of the CRMD DAQ that recorded this data, and
* `fw`: The version number of the firmware running on the DAQ

For example, we might want 

* `threshIn` = files/6119.2016.0104.1.thresh
* `outputs` = 6119.2016.0104.1.wd
* `geoDir` = ./geo
* `daqId` = 6119
* `fw` = 1.12

in which case the command is

`$ perl ./perl/WireDelay.pl files/6119.2016.0104.1.thresh 6119.2016.0104.1.wd ./geo 6119 1.12`

## Understanding WireDelay.pl

The input Threshold data file is `6119.2016.0104.1.thresh` in the example here.  We can use the UNIX shell command `head` to get a quick look at what it looks like before the `WireDelay` data transformation acts on it:

In [1]:
!head -10 files/6119.2016.0104.1.thresh

#$md5
#md5_hex(0)
#ID.CHANNEL, Julian Day, RISING EDGE(sec), FALLING EDGE(sec), TIME OVER THRESHOLD (nanosec), RISING EDGE(INT), FALLING EDGE(INT)
6119.1	2457392	0.3721863017828993	0.3721863017831598	22.50	3215689647404250	3215689647406500
6119.3	2457392	0.3721863017829138	0.3721863017831598	21.25	3215689647404375	3215689647406500
6119.2	2457392	0.3721885846820747	0.3721885846822772	17.50	3215709371653125	3215709371654875
6119.4	2457392	0.3721885846820747	0.3721885846822917	18.75	3215709371653125	3215709371655000
6119.4	2457392	0.3721901866161603	0.3721901866163773	18.75	3215723212363625	3215723212365500
6119.1	2457392	0.3721901866161748	0.3721901866164496	23.75	3215723212363750	3215723212366125
6119.1	2457392	0.3721903650327546	0.3721903650329427	16.25	3215724753883000	3215724753884625


### Applying the data transformation

Now we'll run `WireDelay` and see what changes it makes to the data.  The script `WireDelay.pl` is a Perl script designed to be executed from a UNIX shell, not from a Jupyter Notebook.  We can use the "!" trick that Jupyter gives us again, though, to call the program:

In [4]:
!perl ./perl/WireDelay.pl files/6119.2016.0104.1.thresh outputs/6119.2016.0104.1.wd ./geo 6119 1.12

The script has created the output file `outputs/6119.2016.0104.1.wd`, which we can examine the same way we did the input file earlier:

In [5]:
!head -10 outputs/6119.2016.0104.1.wd

#USING WIREDELAYS: ID.CHANNEL, Julian Day, RISING EDGE(sec), FALLING EDGE(sec), TIME OVER THRESHOLD (nanosec)
6119.1	2457392	0.3721978758576562	0.3721978758579167	22.50
6119.3	2457392	0.3721978758576707	0.3721978758579167	21.25
6119.2	2457392	0.3722001587568317	0.3722001587570342	17.50
6119.4	2457392	0.3722001587568317	0.3722001587570486	18.75
6119.4	2457392	0.3722017606909173	0.3722017606911343	18.75
6119.1	2457392	0.3722017606909317	0.3722017606912065	23.75
6119.1	2457392	0.3722019391075115	0.3722019391076997	16.25
6119.3	2457392	0.3722019391075405	0.3722019391077576	18.75
6119.2	2457392	0.3722025586529688	0.3722025586532147	21.25


So, what has `WireDelay.pl` script done to our data?  Look for differences between this file, `6119.2016.0104.1.wd`, and the input file `6119.2016.0104.1.thresh` shown above.

#### A closer look
First, we can see that the data appears to be in the same order.  The counters are recorded in the same order `6119.1, 6119.3, 6119.2, ...` and with the same "time over threshold" values `22.50, 21.25, 17.50, ...` in both files.  If you're not convinced, increase the number of lines returned by the `head` command until the pattern is clear.

Next, we note that `WireDelay` keeps the Julian day value from the Threshold file, but it drops the raw-integer rising and falling edge values.  Evidently we won't be needing these values for further analysis.

Last, we notice the most interesting thing: the rising and falling edge time values have changed!

Investigating further, we can see that for each line of data, the output file `6119.2016.0104.1.wd` has values that are exactly `0.0000115740747569` greater than the corresponding value in the input file `6119.2016.0104.1.thresh`.  This time represents the *wire delay*: After a counter registers a hit, it takes a small amount of time for that signal to travel through the connector cable to the DAQ board, where it's recorded as data. The `WireDelay.pl` script adjusts the data for this, helping us make time comparisons between detectors as precisely as possible.

## Exercise 1 - What do we expect?

We've seen that the `WireDelay` script adds about `0.0000115740747569` days to all time values in the input data.  Does this make sense?  In order to figure out if we're on the right track, we first have to know about what we should expect.

The length of cable used for a CRMD is stored in its `.geo` geometry file.  Using my inside knowledge of the format of this file, I've written a function to show us the number:

In [30]:
def findCableLength(daqID):
    '''A function to extract the length of cables used in a CRMD from its geometry file'''
    
    # Use the input DAQ ID to construct the file path to the geometry file,
    # i.e. './geo/6119/6119.geo'
    geoFile = './geo/'+str(daqID)+'/'+str(daqID)+'.geo'
    
    # Here's how to open a file for Python to read.
    # The 'r' option is for "read-only", so we can't accidentally change the source data
    with open(geoFile, 'r') as file:
        # I happen to know that the cable length is stored in line 9 of the file.
        # Scan through the file line-by-line, looking for line 9:
        for i, line in enumerate(file):
            if i == 9:
                print(line)
            # past the 9th line, we don't care, so exit the line-by-line scan:
            elif i > 9:
                break        

Running this function for DAQ 6119 will give us the answer:

In [31]:
findCableLength(6119)

50000.0



There you have it.  The length of the cable is "50000.0".

That is... 50000.0 *what*?

Here's a small but crucial rule when doing science:

** Values communicated without units are meaningless. **

Always include units.  Since the CRMD geometry files weren't really intended to be read by humans, though, we can give them a pass.  Still, we're going to have to use our brains a bit to figure out what this number means.

What are some common units of length that might be used 

**Question 1** Why are the corrected times *greater than* the original times?  Should the WireDelay correction *subtract* the extra time the signals spent travelling through the wires?

This isn't an exercise, I'm really asking.

#### Follow the data

The obvious next question is, "Why `0.0000115740747569`?"  That is, how does `WireDelay.pl` know how much to adjust the rising and falling edge times by?  The answer is in the other input file, `6119.geo`.  We could examine it to see what it contains, but it's more fun to try to guess first.

### Exercise 1

Remember that the Rising Edge and Falling Edge numbers are given in days, not in seconds or nanoseconds.  That is, a value of `0.5000000000000000` represents half a day, or 12 hours (or 720 minutes, etc.).  This isn't the most natural way to think of time when dealing with such fast signals, so it's useful to be able to convert these into something more intuitive, like seconds or nanoseconds.

**A)** Write a Python function to convert a time value given in days into nanoseconds.  You may name it whatever you like, but I'll suggest `daysToNanosec()`.

In [None]:
# Write your function here
# Hint: If there are 24 hours in a day, what fraction of a day is one hour?
# If there are X nanoseconds in a day, what fraction of a day is one nanosecond?
# What is X?
def daysToNanosec():
    ### Instructor's copy:
    ### 1 day = (24*60*60) sec = (24*60*60) e9 ns
    
    
    

**B)** Use the function you just wrote to convert `0.0000115740747569` days (what `WireDelay.pl` added to our data) into nanoseconds.

In [None]:
# Write your Python code here
### Instructor's copy: the answer is 1.00000005899616 seconds, or 1.00000005899616 e9 nanoseconds




**C)** Using the rule of thumb that signals travel about 2/3 ft/ns through cabling, estimate what length of cable the experimenter used when taking this cosmic ray data.  Do you think your answer is reasonable?  

In [None]:
### Instructor's copy: 0.66666670599744 e9 feet, or ~126,263 miles.
### This is just between the equatorial circumference of Saturn (117,649 mi) and that of Jupiter (139,559 mi)
### The moon's orbital radius is about twice this value, at 239,000 miles.
### This does not sound reasonable.



