## When do you start developing a pipeline

A typical research plan that a grad student might present may look like 

Order supplies --> Build a device --> Collect some data --> Analyze said data --> ~~Profit~~ Publish

This process is a pipeline! But it's out of order. Let's find out how with a real-world example.

### A rat behavior experiment
A post-doc wants to do a rat behavior experiment. For the experiment to work, it's critical to know whether or not the rat is moving at any give point in time. Each experiment runs for over an hour, but our post-doc reasons that she can set up a video camera synchronized to the rest of the experimental equipment, record the rat moving around its habitat, and determine when motion happened later, perhaps by doing some image processing. A labmate suggests she attach a bright red light to the rat's head to make it easy to track by computer - just find the reddest part in the red channel at each frame! Our post-doc does this, then collects more than 30 hours worth of data over two weeks.

[Video clip here]

How would you begin processing this data to track the rat?

Now let's look at the data. This is a video in color. How many dimensions do we expect this data to have?

In [None]:
vid.shape

As a first step, let's see if we can in fact find a red dot in a still frame.

In [None]:
frame = vid[...]
plt.imshow(frame)

And just the red channel?

In [None]:
red = frame[:,:,0]
plt.imshow(red)

This looks tractable! How might we grab the brightest point?

In [None]:
np.argmax(red)

In [None]:
# smooth
# argmax

Now what?

We need to do this for every frame in the video.

In [None]:
num_frames = vid.shape[-1]

# a time x dim table. dim = 2 since we have 2D frames
dot_position = np.zeros((num_frames, 2))

for frame_num in range(num_frames):
    frame = vid[:,:,:,frame_num]
    red = frame[:,:,0]
    smooth = # smoothing
    peak = np.argmax(smooth)
    position = # idx --> pos
    dot_position[frame_num, :] = position # set the whole row of the table
    
plt.plot(dot_position)

We have a teleporting rat! Let's look at a time where the rat seems to teleport. Surely we are zeroing in on a major scientific discovery.

The problem seems to be that the light reflects off the container, and now we need a way to identify a reflection versus the real light. This seems like a much harder problem.

That's because the real problem was made weeks ago, here: "Collect some data --> Analyze said data". Collecting and analyzing data are not separable steps. You should be building your data processing pipeline while you're building your experiment, and iterating on it as you're collecting data.