## Control System Data

This is an open-ended project to look at some data from an experiment's control system.

Briefly, the system provides cooling to two separate parts of the experiment.  To do that, it pumps water from an external supply chiller through a storage tank and into two cooling loop chillers (heat exchangers):

![drawing of equipment](PhysicalPlant.pdf "Physical Plant")

The water comes in from the supply cold. When cooling is needed, loop pumps (one for each of A and B, not shown) start pumping warm coolant through the chiller and open the respective water flow valve.  

We set up the usual includes:

In [None]:
from datascience import Table
import pandas as pd
import numpy as np

import matplotlib
matplotlib.use('Agg')
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
plt.rcParams['figure.figsize'] = (10.0, 5.0)

We have five channels of data:

In [None]:
data = Table.read_table("data.csv")
data

T5 and the four Spare columns don't have useful data. Let's remove them to reduce the size of our computations:

In [None]:
data = data.drop("T5", "Spare1", "Spare2", "Spare3", "Spare4").copy()
data

The data is in Imperial units:  Degrees Fahrenheit and PSI.  Convert the columns to metric units for convenience.

In [None]:
data["T1"] = (data["T1"]-32)*5/9
data["T2"] = (data["T2"]-32)*5/9
data["T3"] = (data["T3"]-32)*5/9
data["T4"] = (data["T4"]-32)*5/9
data["Pressure"] = (data["Pressure"]/14.2)*101325 # convert to Atm, then to Pascals, kPa or bar
data

Let's plot these by row (as a stand-in for by-time):

In [None]:
plt.plot(data['T1'],"b");

In [None]:
plt.plot(data['T2'],"k");

There's a lot of similarity there. Try plotting the difference

In [None]:
plt.plot(data['T2']-data['T1'],"r");

In [None]:
plt.plot(data['T2'], data['T1'], '.k')

Is that going back and forth between two states? Check by histogramming.

In [None]:
plt.hist(data['T1']);

How about the difference?

In [None]:
plt.hist(data['T2']-data['T1']);

It's hard to see structure in the entire data set.  Select out a sub-sample:

In [None]:
# selection by rows is easiest done in Pandas, which has strong indexing
pTable = data.to_df()[0:4000]
start = Table.from_df(pTable)
start

In [None]:
plt.plot(start['T2']-start['T1'],"r");

Which part corresponds to the chiller running?  Do both A and B run at the same time?

In [None]:
plt.plot(start['T2']-start['T1'],"r");
plt.plot(start['T4']-start['T3'],"b");

What fraction of time is chiller A running? Chiller B? What fraction are they both running together?

In [None]:
data = data.with_column("T2-T1", data["T2"]-data["T1"])  # add a column with temp difference
data.where(data["T2-T1"] > 5).num_rows / data.num_rows # select out running values and count

What does the water pressure do?

In [None]:
plt.plot(start['Pressure'],"y");

There seem to be two kinds of changes on two timescales. What causes them?  (Hint: Can you zoom in around sample 1000?  2400?)

In [None]:
# (this may take multiple lines)



Some more questions to investigate:
 - There seems to be a leak in one of the valves. (Why do I think that?) Use the temperature data to find which one.
 - Look at few tank refills when the chiller is not running. Are they doing something unexpected?  What is it? Can you find a way to pull samples of data to investigate a number of these? (Data around 30,000 might be useful, as might '.diff()' from the Old Faithful example)
 - Why did the temperature change at about sample 2500? It might help to find more of these and see what happened around that time.
 - Was the DAQ system ever off? If so, when and for how long?  (Working directly with 'Time' can be slow because each row needs to be de-formatted each time; we provide a fast way to convert time to elapsed seconds and interval integers below)
 - If I tell you that 40kg/min of water flows through each chiller when it's running, can you find how much heat is being removed? (What data do you need, and how do you get it?) How much water the pump can provide? How much water flows through the leak?

In [None]:
# converting the "Time" column to an integer involves several conversions, so we provide this example
df = data.to_df()  # pandas has time conversion routines
times = pd.to_datetime(df['Time']).astype(int)/1000000000  # time is in nsec

start = times[0]    # find first sample 
timeColumn = times - start   # and subtract off to keep numbers small

data = data.with_column("Seconds", timeColumn) # add "Seconds" column to our Table

diffTimeColumn = timeColumn.diff()  # compute difference between times
data = data.with_column("deltaSeconds", diffTimeColumn)  # add "deltaSeconds" to our table

data

In [None]:
plt.plot(data['deltaSeconds']);