Include the usual suspects. Pandas, numpy, matplotlib, and the plotter pyplot

In [None]:
import numpy as np
import pandas as pd

import matplotlib
import matplotlib.pyplot as plt

I almost exclusively use pandas to read common data file formats. Wes, the pandas developer, worked in finance for many years and claims in "Python for Data Analysis" book 

                  https://goo.gl/vbYvU0

that wrangle the data into analysis format and from a variety of different input formats take a significant fraction of analysis time. A great example is missing data, for example a CSV the data missing. Wes spend a good amount of time developing the IO for pandas to be as flexible and as fast as possible to the variety of input data you may come across.

Anyways I really really really recommend this book and it will improve your analysis speed by <b>10</b> (ask me for it) it covers ...

0) iPython
1) numpy arrays and vectorized computation including a whole chapter on advanced numpy
2) extensive pandas tutorial, data wrangling 
3) plotting and visualization
4) data aggregation and group operations (very powerful),
5) time series, and finance and economic data applications (hello finance people), 
6) data loading, storage from text files, database, web (even scraping!), HDF5, excel files (you might think this is lame but it's a really handy feature...), and pickle. 


I will just cover IO with text files and just show you that you can literally write the data to any of these formats with no effort.

Lots of data will come to you in a text file, which is really the lowest common denominator in terms of what raw data looks like to an analyzer. I guess binary data is even ``lower" but someone will have to tell you what the dataformat actually is before you can read it. 

Go ahead and grab these data files from vic's public folder

0) http://www.nevis.columbia.edu/~vgenty/public/C3silabs00001_DC.txt

1) http://www.nevis.columbia.edu/~vgenty/public/C3Trace00001_DC.txt

if you take a look inside

In [None]:
!head C3Trace00001_DC.txt

This text file is from a LeCroy oscillosope and contains a waveform from a single trigger. It also has a timestamp at the top of the file and some other garbage we don't care about. On the 5th line I see what the columns mean: Time and Voltage amplitude. This text file is nice it at least tells us what the columns mean.

Lets read this into pandas with read_csv

In [None]:
pd.read_csv("C3Trace00001_DC.txt")

Wow, looks terrible, the header gave us some trouble. No matter, we can tell pandas to ignore this top part!

In [None]:
pd.read_csv("C3Trace00001_DC.txt",    #the file
            sep=',',                  #how is the data separated, in this case it's a comma
            header=4)                 #take the 4th line as the actual column header

That's nice, but in general I like to make my own column names, lets do that

In [None]:
pd.read_csv("C3Trace00001_DC.txt",      #the file
            sep=',',                    #how is the data separated, in this case it's a comma
            skiprows=5,                 #skip rows 0,1,2,3,4
            names=['time','amplitude']) #give the 'names' of the columns.

Great, seems to work. Lets read in both data files into a list and plot it

In [None]:
# Tell python the path to the data files with the os module
import os 

# Use list comprehension (?) to fill a list called files that contains
files = [file_ for file_ in os.listdir(".") if file_.startswith("C3")]

print files

In [None]:
# oscilloscope data frames (list of dataframes)
scope_dfs = []
for file_ in files:
    a = pd.read_csv(file_,   
                    sep=',', 
                    skiprows=5,
                    names=['time','amplitude'])
    scope_dfs.append(a)

# Both data frames now live in a python list

This is just for fun but lets get all the data into one place. If I've done the "measurement" correctly then we the data should have the same X data -- these are the time divisions on the oscilloscope. Sure enough this is easy in pandas with contatenation of data frames

In [None]:
scope_df =pd.concat(scope_dfs, #list of data frames
                    axis=1,     #axis to join on (0==row, 1==column)
                    keys=['silicone','pletronics'])
scope_df

Let's take a look. Pandas has a really nice built in graphing function although I hardly ever us it. It's worth checking out for time series analysis but to be 100% sure what's going on I do it the hard way

In [None]:
%matplotlib inline
matplotlib.rcParams['font.size'] = 16
matplotlib.rcParams['font.family'] = 'serif'

In [None]:
fig,ax=plt.subplots(figsize=(10,6))


ax.plot(scope_df.pletronics.time,
        scope_df.pletronics.amplitude,
        '-o',
        lw=2,
        color='red',
        label='Pletronics')

ax.plot(scope_df.silicone.time,
        scope_df.silicone.amplitude,
        '-o',
        lw=2,
        color='blue',
        label='SiLabs')

ax.set_xlabel("Time [s]",fontweight='bold')
ax.set_ylabel("Voltage [V]",fontweight='bold')
ax.set_title("LVDS Oscillators",fontweight='bold')
ax.set_ylim(1.0,2)
ax.legend()
ax.grid()
plt.show()

Cool! You are looking at two different clocks used by the 156.25 MHz low voltage differential signaling waveform from from actual MicroBooNE readout electronics. This is the "read clock" for the optical links between MicroBooNE electronics and the read out computer. 

Can we verify that the period is what Chi says it is? Sure!

In [None]:
# Lets find a couple of the minimums and see how many "ticks" they vary from one another
pletronics_df = scope_df.pletronics
sorted_amps = pletronics_df.amplitude.values.argsort()

fmax1 = sorted_amps[0]
fmax2 = sorted_amps[1]

In [None]:
# How many ticks do these maxes differ from one another 
print fmax2-fmax1

In [None]:
#Which peaks are these?
fig,ax=plt.subplots(figsize=(10,6))
ax.plot(pletronics_df.time,
        pletronics_df.amplitude,'-o',lw=2)
ax.axvline(x=pletronics_df.iloc[fmax1].time,color='red',lw=2)
ax.axvline(x=pletronics_df.iloc[fmax2].time,color='red',lw=2)
plt.show()

Looks like one period to me... lets get the frequency

In [None]:
times = pletronics_df.time.values # Get the values as a numpy array
period = times[fmax2] - times[fmax1]
print
print "Period is {} seconds".format(period)
print
print "Frequency is {} MHz".format(1/period / 1.0e6)
print