# December 16, 2018
## Databases
I've been investigating how to output the data into databases. I found that relational databases aren't really the best place to store these types of data. These data are most definitely time series data, and they should be treated as such. [Time Series Databases](https://en.wikipedia.org/wiki/Time_series_database) provide an excellent starting point to look at this. 

## Database Choice
I chose to use [Timescale](https://www.timescale.com/) as our time series database. It's based on [PostgreSQL](https://www.postgresql.org/docs/10/index.html), which provides us with some advantages over other databases. It allows us to store JSON documents (e.g. header/firmware information). We're able to set the time column to be something other than a date (e.g. an integer). It might not be as redundant as something like MapR, but it's a good start. 

## Setup Timescale DB
I followed the Ubuntu [install instructions](https://docs.timescale.com/v1.1/getting-started/installation/ubuntu/installation-apt-ubuntu). Logging into the cli with
```bash
sudo service postgresql restart
psql -U postgres -W
```
Then creating the database
```sql
# create the database
CREATE database bagel;

# load the database
\c bagel

# convert to a timescaledb
create extension if not exists timescaledb cascade;

# shutoff telemetry tracking
ALTER [SYSTEM | DATABASE | USER] { *db_name* | *role_specification* } SET timescaledb.telemetry_level=off

# Create the table
CREATE TABLE data (
  id        smallint          NOT NULL,
  energy    DOUBLE PRECISION  NOT NULL,
  time      bigint            NOT NULL,
  cfd_time  DOUBLE PRECISION  NOT NULL
);

# create the hypertable for data table using the time column. Chunk time interval is set to 1 day for unix epoch time in ms. 
# Will need to think more about how this works. 
SELECT create_hypertable('data', 'time', chunk_time_interval => 86400000);

# Inserting a data point
INSERT INTO DATA VALUES(14,304.08124800000000398,19606962062518.000777,39213924125036.353274);
```

### Creating Users
```sql
create user vincent with password 'password';
grant all privileges on database bagel to vincent;
```

## Uploading data from sqlite
I uploaded the data from the sqlite by outputting as a CSV, truncating the decimal from the time column, and importing into the data table. 

# December 17, 2018
## Discussion w/ George
I had a discussion w/ George about the databases and how to setup the tables. I have a couple of options here. Having just a single big table might work for a while. His recommendation was that we have several tables. One table has {timestamps, crate, module, channel}. The other channel has all the details. That allows us to do quick lookups for what we need from the main table and then cross reference them to the details table. 

## Installing postgresql for windows
I'm going to install the database on windows as well. That way I can connect to the linux box without issues. I'm thinking that we might be able to connect w/o the database installed by using python. Let's check. 

## Testing the database connection

In [4]:
import keyring
import psycopg2
import yaml

with open('../cfg.yaml') as f:
    cfg = yaml.safe_load(f)
    
conn = psycopg2.connect(user=cfg['username'], password=keyring.get_password(cfg['host'], cfg['username']),
                                      host=cfg['host'], port=cfg['port'], database=cfg['db'])
cursor = conn.cursor()
cursor.execute('select * from data where id=14 limit 20')
print(cursor.fetchall())

[(14, 7576.572214, 19693238423165, 39386476846330.4), (14, 417.343303, 19693238484790, 39386476969580.0), (14, 305.573827, 19693238637156, 39386477274312.0), (14, 80.61771, 19693238698727, 39386477397454.5), (14, 342.357624, 19693238755370, 39386477510740.1), (14, 326.773797, 19693238906728, 39386477813456.2), (14, 1738.296982, 19693238914296, 39386477828592.8), (14, 232.901547, 19693238998831, 39386477997662.1), (14, 122.707598, 19693239018862, 39386478037724.8), (14, 135.301811, 19693239025359, 39386478050718.4), (14, 229.683297, 19693239082540, 39386478165080.2), (14, 561.489045, 19693239112649, 39386478225298.6), (14, 584.930979, 19693239202662, 39386478405324.2), (14, 1840.136679, 19693239228775, 39386478457550.2), (14, 6171.735934, 19693239298645, 39386478597290.2), (14, 160.691131, 19693239301822, 39386478603644.3), (14, 1532.055122, 19693239349315, 39386478698630.7), (14, 218.15213, 19693239404802, 39386478809604.4), (14, 869.480467, 19693239502743, 39386479005486.1), (14, 95.5

**HA!! It works!** Now we just need to figure out how to make this all more efficient. Right now it takes a long time to get the data out for a single clover channel. I'm going to try copying them out to another table and executing the same query.

## Copy values from one table to another
```sql
CREATE TABLE id04(id smallint NOT NULL, energy DOUBLE PRECISION NOT NULL, time bigint, cfd_time DOUBLE PRECISION NOT NULL);
INSERT INTO Table2 SELECT * FROM Table1 WHERE [Conditions]
```
Whelp, that didn't fare much better than the previous attempt. 

# December 29, 2018
## Test tables
Our test trigger time will be 19623818309088. We can find the gated gammas with

In [6]:
cursor = connection.cursor()
cursor.execute('select * from data where time between 19623818309088-126 and 19623818309088-63;')
print(cursor.fetchall())

[(10, 3401.897553, 19623818308976, 39247636617952.7), (9, 1284.855562, 19623818308966, 39247636617932.0)]


We'll create a table to hold results:
```sql
create table gated (id int, ge_energy double precision, ge_time bigint);
```

# December 30, 2018
## Slow as dirt
Things still are not too performant. I’m looking at 1000 trigger times and it takes about 5 min to generate our list. According to [this](https://stackoverflow.com/questions/8134602/psycopg2-insert-multiple-rows-with-one-query) forum we could see an improvement by generating the query ourself and executing it.
## Generating our own query
That does seem to have speeded things up a little bit. Let's try giving it the whole dataset. Generating the full statement list for 10 threads takes about 20 seconds each. Executing the queries ended in failure. I will need to determine a different way to execute this query. Error message says: 
> psycopg2.OperationalError: no connection to the server



# December 31, 2018
## Simple Kafka Cosumer
We're going to start working on our simple Kafka Consumer/Producer while it's fresh on my mind. 

# January 1, 2019
## Setting up Kafka on Windows
Following these [instructions](https://kafka.apache.org/quickstart). Using [Java SE Runtime 1.8.192](https://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html)

```
# Starts the Zookeeper server 
D:\Programs\kafka_2.11-2.1.0\bin\windows\zookeeper-server-start.bat D:\Programs\kafka_2.11-2.1.0\config\zookeeper.properties

# Starts the actual Kafka server with the provided properties file.
D:\Programs\kafka_2.11-2.1.0\bin\windows\kafka-server-start.bat D:\Programs\kafka_2.11-2.1.0\config\server.properties

# Created a topic named test with replication 1, and a single partition. 
D:\Programs\kafka_2.11-2.1.0\bin\windows\kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

# Starts the console based producer
D:\Programs\kafka_2.11-2.1.0\bin\windows\kafka-console-producer.bat --broker-list localhost:9092 --topic test

# Starts the console based consumer
D:\Programs\kafka_2.11-2.1.0\bin\windows\kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic test --from-beginning
```
**NOTE: We need to have more kafka brokers (e.g. running instances of kafka) than our desired replication factors.**

I created a topic called daq to start testing communication between the producer and consumer. 

# January 2, 2019
## SQL efficiency
We could boost our efficiency finding overlapping events by doing a join on the main data table. The image below describes this process at a high level. 
![ansi join](ansi-join.png)

# January 5, 2019
## Consumer Working
The consumer is now consuming. We needed to set the bootstrap server to `localhost` so that it knew where the topics were located. 