# Combine

To use the data transformation script `Combine.pl`, we provide it with any number of input files (which must already exist) followed by what we want it to name the output file it creates:

`$ perl ./perl/Combine.pl [input file 1] [input file 2] ... [input file N] [output file]`

We'll try it out on the test data in the `test_data` directory.  Use the UNIX shell command `$ ls test_data` to see what's there:

In [1]:
!ls test_data

6119.2016.0104.1.test.thresh  6203.2016.0104.1.test.thresh


We'll run

`$ perl ./perl/Combine.pl test_data/6119.2016.0104.1.test.thresh test_data/6203.2016.0104.1.test.thresh test_data/combineOut`

to see what happens.  Before we do, though, let's get a closer look at the input files with the UNIX shell command `$ cat [filename]`.

**WARNING**

`$ cat` will print out the entire contents of `[filename]`.  Some of the data files used in the e-Labs are hundreds of thousands of lines long.  Trying to pipe that through a Jupyter Notebook might break it, and it will certainly impact readability.

The test data files we're about to examine have been specially prepared to be small for readability -- about 10 lines.  If you're not sure, though, check the line count with `$ wc -l [filename]` before `cat`-ting

In [2]:
!wc -l test_data/6119.2016.0104.1.test.thresh

10 test_data/6119.2016.0104.1.test.thresh


(`wc` stands for "word count", and the `-l` flag means "but count lines instead of words." The first number in the output, before the filename, is the number of lines, in this case 10)

That's a reasonable number of lines, and `test_data/6203.2016.0104.1.test.thresh` is pretty much the same (check for yourself if you don't believe me), so let's take a look at them:

In [3]:
!cat test_data/6119.2016.0104.1.test.thresh

#$md5
#md5_hex(0)
#ID.CHANNEL, Julian Day, RISING EDGE(sec), FALLING EDGE(sec), TIME OVER THRESHOLD (nanosec), RISING EDGE(INT), FALLING EDGE(INT)
6119.1	2457392	0.3721863017828993	0.3721863017831598	22.50	3215689647404250	3215689647406500
6119.3	2457392	0.3721863017829138	0.3721863017831598	21.25	3215689647404375	3215689647406500
6119.2	2457392	0.3721885846820747	0.3721885846822772	17.50	3215709371653125	3215709371654875
6119.4	2457392	0.3721885846820747	0.3721885846822917	18.75	3215709371653125	3215709371655000
6119.4	2457392	0.3721901866161603	0.3721901866163773	18.75	3215723212363625	3215723212365500
6119.1	2457392	0.3721901866161748	0.3721901866164496	23.75	3215723212363750	3215723212366125
6119.1	2457392	0.3721903650327546	0.3721903650329427	16.25	3215724753883000	3215724753884625


In [4]:
!cat test_data/6203.2016.0104.1.test.thresh

#$md5
#md5_hex(0)
#ID.CHANNEL, Julian Day, RISING EDGE(sec), FALLING EDGE(sec), TIME OVER THRESHOLD (nanosec), RISING EDGE(INT), FALLING EDGE(INT)
6203.1	2457392	0.2452114384916088	0.2452114384919415	28.75	2118626828567500	2118626828570375
6203.4	2457392	0.2452114384916232	0.2452114384919705	30.00	2118626828567625	2118626828570625
6203.2	2457392	0.2452114384916232	0.2452114384920283	35.00	2118626828567625	2118626828571125
6203.1	2457392	0.2452182596452402	0.2452182596455440	26.25	2118685763334875	2118685763337500
6203.4	2457392	0.2452182596452402	0.2452182596455874	30.00	2118685763334875	2118685763337875
6203.2	2457392	0.2452182596452402	0.2452182596456308	33.75	2118685763334875	2118685763338250
6203.4	2457392	0.2452190121639323	0.2452190121641204	16.25	2118692265096375	2118692265098000


Now we'll see what happens when we run these through `Combine.pl` using

`$ perl ./perl/Combine.pl test_data/6119.2016.0104.1.test.thresh test_data/6203.2016.0104.1.test.thresh test_data/combineOut`

The file `combineOut` gets created in the `test_data/` directory.  Before we try to `cat` it, let's see how many lines it is:

In [5]:
!wc -l test_data/combineOut

17 test_data/combineOut


Not so bad.  Let's take a look:

In [6]:
!cat test_data/combineOut

#344f56cc2ab825588ae1315357ab3096
#md5_hex(1528996872 1530043861 1530043909  test_data/6119.2016.0104.1.test.thresh test_data/6203.2016.0104.1.test.thresh)
#Combined data for files: test_data/6119.2016.0104.1.test.thresh test_data/6203.2016.0104.1.test.thresh 
6119.1	2457392	0.3721863017828993	0.3721863017831598	22.50	3215689647404250	3215689647406500
6119.3	2457392	0.3721863017829138	0.3721863017831598	21.25	3215689647404375	3215689647406500
6119.2	2457392	0.3721885846820747	0.3721885846822772	17.50	3215709371653125	3215709371654875
6119.4	2457392	0.3721885846820747	0.3721885846822917	18.75	3215709371653125	3215709371655000
6119.4	2457392	0.3721901866161603	0.3721901866163773	18.75	3215723212363625	3215723212365500
6119.1	2457392	0.3721901866161748	0.3721901866164496	23.75	3215723212363750	3215723212366125
6119.1	2457392	0.3721903650327546	0.3721903650329427	16.25	3215724753883000	3215724753884625
6203.1	2457392	0.2452114384916088	0.2452114384919415	28.75	2118626828567500	21

From the name `Combine.pl`, it's not hard to guess that the data transformation takes all of its input files and combines them into one file, and that's exactly what we see here.

Each of the input files was 10 lines long -- 3 lines of header, and 7 lines of data.  Here, we have a new header of 3 lines followed by 14 lines of data, for a total of 17 lines (just like `wc -l` gave us).  All of the data from the first input file, `test_data/6119.2016.0104.1.test.thresh`, comes first, followed by all of the data from the second input file, `test_data/6203.2016.0104.1.test.thresh`.  All the data is kept in the same order, and individual lines have not been altered: all of the columns, values, and decimal places from the original threshold files are the same.