# Beer Cooling Modeling - Solution

Using the same dataset as for the ADF Prediction Demo notebook, this time we'll model the cooling phase. Similarly to the ADF Prediction with fermentation stages, this time we should identify cooling stages and compute elapsed times to align the data for regression and comparison. 

![Beer Cooling](https://academicpi.blob.core.windows.net/software/beer-cooling-setting.png)

In [1]:
### For interaction with OCS
from ocs_datascience import OCSClient, timer

import configparser
import datetime as dt
from dateutil import parser
import time
from enum import Enum
from pathlib import Path

import plotly.graph_objs as go
import plotly.io as po
import numpy as np
import pandas as pd
from scipy.optimize import curve_fit

pd.set_option('display.expand_frame_repr', False)
pd.options.mode.chained_assignment = None

The main function is `compute_cooling_predictions` with the following specification: 

### Input parameters:

* Brand of beer
* Which set of temperature sensor to use: bottom, middle, top
* Training days: how many days (starting at 2017-03-17) to consider for cooling curve regression

### Output: 

* Data used for regression
* Data for regression curve 
* Number of fermentation found (must be at least 1) 

### Function steps (the number are referred to in the function body)

| Step # | Function called | Description |  
|-------|-----------------|:-------------:|
| 0 | `get_all_brand_data` | get data for all 6 fermenter (this step happens before calling `compute_cooling_predictions`) |
| 1 | none | keep only data of the selected brand given in input | 
| 2 | `brand_df_cleanup` | clean data: remove bad values, keep only right stages | 
| 3 | `fermentation_starts` | identify all fermentation starts | 
| 4 | `cooling_data_extraction` | build a dataframe with all cooling data 
 
All possible beer brands are:
* Realtime Hops
* 5450
* Alistair
* Kerberos
* Red Wonder 
* Grey Horse 

We'll start with the following input parameters: 

* Brand: Realtime Hops
* Temperature sensor: Middle
* Training days: 20 days starting at 2017-03-17T07:00
* Interval: 2 minutes (00:02:00)

## Your task 

Function `compute_cooling_predictions` in the next cell contains `TODO` items in comments. Complete each of them to get a working notebook. If your code is correct, you should see the following graph appear at the bottom of this notebook:

![Beer Cooling Prediction](https://academicpi.blob.core.windows.net/software/beer-cooling-prediction.png)

## Function `compute_cooling_predictions`

In [2]:
# %%debug
# import pdb
# from pdb import set_trace as bp
@timer
def compute_cooling_predictions(all_brands_df, brand, temp_sensors, training_days, interval='00:01:00'):
    """
    Input parameters:
    * brand to consider
    * temperature sensor position to use for computation
    * number of days to compute prediction parameters
    """
    # All possible brands, start with Realtime Hops 
    # ['5450' 'Bad Input' 5450 nan 'Alistair' 'Kerberos' 'Realtime Hops'
    #  'Red Wonder' 'Grey Horse']
    use_temp_position = {
        Pos.bottom: temp_sensors['bottom'],
        Pos.middle: temp_sensors['middle'],
        Pos.top: temp_sensors['top']
    }
    # STEP 1: Keep only data for input brand
    # TODO: write filter expression for all_brands_df, return result in brand_df
    # 
    # =========== STUDENT BEGIN ==========
    brand_df = all_brands_df[all_brands_df['Brand'] == brand] 
    # =========== STUDENT END ==========
    # 
    # STEP 2: clean data: remove bad values, keep only right stages
    # TODO: complete code block within function brand_df_cleanup 
    # 
    brand_status_df = brand_df_cleanup(brand_df)
    # 
    # STEP 3: identify all fermentation starts
    # TODO: complete code of function fermentation_starts 
    #
    fermentation_df = fermentation_starts(brand_status_df)
    #
    if len(fermentation_df) == 0:  
        raise Exception('!!! No fermentation data for brand:', brand)
    else:
        print(f'  @@@ Number of fermentation for brand {brand}: {len(fermentation_df)}')
    # 
    # STEP 4: build a dataframe with all cooling data 
    # TODO: complete code of function cooling_data_extraction
    #
    cooling_data = cooling_data_extraction(fermentation_df, brand_status_df, use_temp_position)
    # print(cooling_data)
    # 
    # Verify that it was possible to extract the data for a complete cooling phase 
    # 
    if len(cooling_data) == 0:
        raise Exception('!!! Error, no cooling data for brand:', brand)
    else:       
        ############### CURVE FIT REGRESSION BEGIN - DO NOT CHANGE #############
        # Get all cooling data in a single dataframe
        cool_df = pd.concat(cooling_data)

        # sort the temperatures in a descending fashion
        cool_df = cool_df.sort_values(by=['temperature'], ascending=False)

        # get the y value for the x, this will be used in curve fitting
        cool_df['temp_y'] = cool_df['temperature'].shift(-1)
        cool_df = cool_df[:-1]  # drop the last row

        # Select first label which has cooling data
        cool_df_training = pd.DataFrame()
        lbl = 0
        while cool_df_training.empty:
            cool_df_training = cool_df[cool_df.label == lbl]
            lbl += 1
            
        x1_train = cool_df_training.temperature.values  # training temperature feature
        x2_train = cool_df_training.Volume.values.astype(float)  # training Volume feature
        x = [x1_train, x2_train]  # [temperature, volume]

        # Training of non-linear least squares model
        # Nonlinear curve-fitting pass a tuple in curve fitting
        popt, pcov = curve_fit(temperature_profile, x, cool_df_training.temp_y.values) 
        
        a = popt[0]  # get the coefficient a (alpha) in the model
        b = popt[1]  # get the coefficient b (beta) in the model
 
        # Get the initial point of all temperature curves
        # y_first = [x1_train[0] + i for i in range(-8, 9, 4)]  # plot on either side of the initial temperature
        y_first = [x1_train[0]]  # if you want to plot a single data field

        # Compute the prediction for each individual start temperature
        for y_predicted in y_first:
            y_pred = [y_predicted]
            cool_df_training = cool_df_training.sort_values(by=['tsc'])
            for i in range(1, len(x2_train)):
                y_predicted = y_predicted * (1 + (a / x2_train[i])) - (a * b / x2_train[i])
                y_pred.append(y_predicted)
                
        ############### CURVE FIT REGRESSION END - DO NOT CHANGE #############

    return cool_df, y_pred, cool_df_training, y_first[0], len(fermentation_df), len(cooling_data)

### Standard OCS initialization code

In [3]:
config = configparser.ConfigParser()
config.read('config.ini')

ocs_client = OCSClient(config.get('Access', 'ApiVersion'),config.get('Access', 'Tenant'), config.get('Access', 'Resource'), 
                     config.get('Credentials', 'ClientId'), config.get('Credentials', 'ClientSecret'))

namespace_id = config.get('Configurations', 'Namespace')
headers = ocs_client.authorization_headers(namespace_id)
headers

{'Authorization': 'bearer eyJhbGciOiJSUzI1NiIsImtpZCI6IjJDQjI4MzFEREJFRDc1NzAyM0NCMTM5OUVBRjRDMjkxQzE3MkQ5RjQiLCJ0eXAiOiJKV1QiLCJ4NXQiOiJMTEtESGR2dGRYQWp5eE9aNnZUQ2tjRnkyZlEifQ.eyJuYmYiOjE1NTY0NzA1NDksImV4cCI6MTU1NjQ3NDE0OSwiaXNzIjoiaHR0cHM6Ly9kYXQtYi5vc2lzb2Z0LmNvbS9pZGVudGl0eSIsImF1ZCI6WyJodHRwczovL2RhdC1iLm9zaXNvZnQuY29tL2lkZW50aXR5L3Jlc291cmNlcyIsIm9jc2FwaSJdLCJjbGllbnRfaWQiOiIxNDE1ZjgzZC01OTQwLTRmYjctYTJjNy1lYTE1ODU1OGE2YmMiLCJ0aWQiOiI2NTI5MmI2Yy1lYzE2LTQxNGEtYjU4My1jZTdhZTA0MDQ2ZDQiLCJqdGkiOiI3MmJjY2VkNDIxY2YyZjZhZTRjNzQ0ZmZmNmE0MzhlYSIsInNjb3BlIjpbIm9jc2FwaSJdfQ.zHhoh-RuFFluu7HpLX8hPai7c-Tn9xr4WOiXQY0cysMd8si1b98XG1FsMddXDVKXekEkbQirkeKOd00ZdJd9n74y1RkeyWJ4HDLjKHGCHqM1Z5Ar3ZKT6Htz9hd7Qe_-mrR8huxJISuFlUfF5k2nc0Chn6zWn7ef-xohrFCtc92EkIPr5V4ciYc-qhX6qlnE9Dyb_byZXxK1AJIe2SCdUXJdhfqKSzzIEiJ4DkE4zt2iFZ3uYmoULOQsjEpgH1PE6hrzBz0WxM_7AC50s0eIrJVsBdpVzvw4vbQaO-az5m0mUv8ERwPNnDjRvTColeXII6Q09uUv0KngENx9F77GqA',
 'Content-type': 'application/json',
 'Accept': 'text/plain',
 'Request-Timeout

### Auxiliary variables to make code more readable

In [4]:
# Sensor positions 
class Pos(Enum):
    bottom = 1
    middle = 2
    top = 3

# Legend: 
# TIC == Temperature Indicator Controller, PV == Process Value

# TIC PV column names 
TIC_PV_COLUMNS = ['Bottom TIC PV', 'Middle TIC PV', 'Top TIC PV']
# Dictionary of column names indexed by position 
process_value = {Pos.bottom: 'Bottom TIC PV', Pos.middle: 'Middle TIC PV', Pos.top: 'Top TIC PV'}

# TIC OUT column names 
TIC_OUT_COLUMNS = ['Bottom TIC OUT', 'Middle TIC OUT', 'Top TIC OUT'] 

# Digital states - present in Dataview results, indicates a problem
BAD_INPUT = 'Bad Input'
IO_TIMEOUT = BAD_INPUT
COMM_FAIL = BAD_INPUT
# IO_TIMEOUT = 'I/O Timeout'
# COMM_FAIL = 'Comm Fail'

# All stages associated to the full cooling phase 
POST_FERMENTATION_STAGES = ['Fermentation', 'Free Rise', 'Diacetyl Rest', 'Cooling']

### STEP 0 Cell: get fermenter vessels data 

Complete function `get_all_brand_data` using what you've seen in the ADF Prediction notebook

In [5]:
@timer
def get_all_brand_data(num_days, start_timestamp, interval):
    #
    # 
    # TODO: complete code to return a single dataframe with all the required data 
    #   
    # =========== STUDENT BEGIN ==========
    start_time = parser.parse(start_timestamp)
    delta_time = dt.timedelta(days=num_days)
    end_timestamp = (start_time + delta_time).isoformat()
    df = ocs_client.get_all_fermenters_dataviews(start_timestamp, end_timestamp, interval)
    # =========== STUDENT END ==========
    
    return df 

# Test code 
# all_brands_df = get_all_brand_data(20, '2017-03-17T07:00', '00:01:00')
# all_brands_df

### STEP 2 Cell: clean data 

Complete each `TODO` section in the function `brand_df_cleanup`

In [6]:
@timer
def brand_df_cleanup(brand_df):
    # TODO: Remove all data point with bad input. 
    # All the following columns can have value BAD_INPUT: 
    #   Brand, Status, Bottom TIC PV, Middle TIC PV, Top TIC PV  
    #     
    brand_df = brand_df.drop(brand_df[brand_df['Brand'] == BAD_INPUT].index)
    brand_df = brand_df.drop(brand_df[brand_df['Status'] == BAD_INPUT].index)
    brand_df = brand_df.drop(brand_df[brand_df['Top TIC PV'] == BAD_INPUT].index)
    # =========== STUDENT BEGIN ==========
    brand_df = brand_df.drop(brand_df[brand_df['Middle TIC PV'] == BAD_INPUT].index) 
    brand_df = brand_df.drop(brand_df[brand_df['Bottom TIC PV'] == BAD_INPUT].index)
    # =========== STUDENT END ==========

    # Keep only fermentation or post-fermentation stages
    brand_status_df = brand_df[brand_df['Status'].isin(POST_FERMENTATION_STAGES)]

    # Remove all data points from brand_status_df dataframe with communication issues
    # TODO: for columns in TIC_PV_COLUMNS, remove all rows with communication failures status (COMM_FAIL)
    #            and IO timeout (IO_TIMEOUT) 
    for tic_pv in TIC_PV_COLUMNS:
        # =========== STUDENT BEGIN ==========
        brand_status_df = brand_status_df.drop(brand_status_df[brand_status_df[tic_pv] == IO_TIMEOUT].index)
        brand_status_df = brand_status_df.drop(brand_status_df[brand_status_df[tic_pv] == COMM_FAIL].index)
        # =========== STUDENT END ==========
        brand_status_df[tic_pv] = brand_status_df[tic_pv].astype(float)

    return brand_status_df

### STEP 3 Cell: get the list of rows when fermentation starts 

You need to identify rows where the Status is 'Fermentation' and the previous row is not 'Fermentation'. The syntax to access the status of the previous row is:

    brand_df['Status'].shift(1)
    
Moreover it is possible to combine conditions to select dataframe rows with the syntax:

    (condition1) & (condition2)

In [7]:
# Return the list of rows when fermentation start for a brand
@timer 
def fermentation_starts(brand_df):
    # =========== STUDENT BEGIN ==========
    df = brand_df[(brand_df['Status'] == 'Fermentation') & (brand_df['Status'].shift(1) != 'Fermentation')]
    # =========== STUDENT END ==========
    fermentation_starts = [row for _, row in df.iterrows()]
    return fermentation_starts

### STEP 4: Extract all rows related to cooling phase

In [8]:
@timer
def cooling_data_extraction(fermentation_df, brand_status_df, use_temp_position):
    # Provides the corrected time offset post fermentation
    brand_status_df = fermentation_times(brand_status_df, fermentation_df, brand)

    for tic_out in TIC_OUT_COLUMNS:
        brand_status_df[tic_out] = pd.to_numeric(brand_status_df[tic_out], errors='coerce')
        
    # condition for it to be in cooling phase
    # TODO: the condition is that 'Top TIC OUT', 'Middle TIC OUT' and 'Bottom TIC OUT' are above 99.99
    #          
    # =========== STUDENT BEGIN ==========
    cool_stage = brand_status_df[
        (brand_status_df['Top TIC OUT'] > 99.99) &
        (brand_status_df['Middle TIC OUT'] > 99.99) &
        (brand_status_df['Bottom TIC OUT'] > 99.99)
    ]
    # =========== STUDENT END ==========

    # get the first cooling step for each fermentation stage
    cooling_start_frame = cool_stage.groupby('label').first().reset_index()

    # Collect data only for the selected temperature position 
    cooling_data = []
    for position in use_temp_position:
        if use_temp_position[position]:
            cooling_data.append(get_cooling_frames(cool_stage, cooling_start_frame, position))

    return cooling_data

## Legacy code cell --- do not change unless you know what you're doing

In [9]:
def get_cooling_frames(cool_stage, cooling_start_frame, position): 
    start_time = 0
    end_time = 3.5  # in days, longest cooling period possible

    cooling_column = 'Time since cooling'
    cool_stage.loc[:, cooling_column] = -1
    cooling_stage = pd.DataFrame()
    if len(cooling_start_frame) > 0:
        for index, row in cooling_start_frame.iterrows():
            label = row['label2']  # get the unique label
            cool_start_time = row['tsf3']  # get the unique start of cooling to each label
            # Each unique label is associated with a fermentation stage for a brand
            mask = (cool_stage['label2'] == label)  # get those rows with that same label
            cool_stage_valid = cool_stage[mask]

            tic_pv = process_value[position]
            # get only frames that have the bottom process variable greater than 50
            if float(row[tic_pv]) > 50:  # and keep [CF]
                # subtract the start of cooling from each individual cooling step
                cool_stage.loc[mask, cooling_column] = cool_stage_valid['tsf3'] - cool_start_time

                cool_stage_current = cool_stage[(cool_stage[cooling_column] >= start_time) &
                                                (cool_stage[cooling_column] < end_time)]
                # make sure the labels are all positive, make sure these are post fermentation stages
                cool_stage_current = cool_stage_current[cool_stage_current['label'] >= 0]
                # get only the max of the post fermentation stages
                cool_stage_current[tic_pv] = cool_stage_current.groupby([cooling_column])[tic_pv].transform(max)
                cool_stage_current = cool_stage_current.rename(columns={tic_pv: 'temperature', cooling_column: 'tsc'})
                cool_stage_current = cool_stage_current[['temperature', 'tsc', 'Brand', 'label', 'Volume']]
                cooling_stage = cool_stage_current
    else:
        print("!!! Sorry no cooling stage found!")

    return cooling_stage

# Get the time since fermentation
@timer
def fermentation_times(brand_frame, fermentation_frames, brand):
    brand_frame['tsf2'] = 100000  # initializing the temp variables
    brand_frame['tsf3'] = 100000  # init the temp variables
    brand_frame['label'] = -1  # label is to label all fermentation processes
    count = 0
    for index, fermentation_frame in enumerate(fermentation_frames):
        fermentation_time = fermentation_frame['Timestamp']
        brand_frame['label'] = brand_frame['Timestamp'].apply(
            lambda x: count if pd.Timestamp(x) >= pd.Timestamp(fermentation_time) else -1)
        brand_frame['tsf2'] = brand_frame['Timestamp'].apply(
            lambda x: ((pd.Timestamp(x)) - (pd.Timestamp(fermentation_time))).total_seconds() if pd.Timestamp(
                x) >= pd.Timestamp(fermentation_time) else 1000000000)
        brand_frame['tsf2'] = brand_frame['tsf2'].apply(lambda x: x / 86400)  # convert time to days
        if count > 0:
            # the min of the two is the actual time since fermentation start
            brand_frame['tsf2'] = brand_frame[['tsf2', 'tsf3']].min(axis=1)
            mask = (brand_frame['label'] == -1)
            brand_frame_valid = brand_frame[mask]
            brand_frame.loc[mask, 'label'] = brand_frame_valid['label2']

        brand_frame['tsf3'] = brand_frame['tsf2']
        brand_frame['label2'] = brand_frame['label']
        count += 1

    # if there is any zero just remove that
    brand_frame = brand_frame[(brand_frame['tsf3'] <= 100000) & (brand_frame['label2'] >= 0)]

    return brand_frame

## Temperature equation

The cell bellow implementation this equation:

![Cooling equation](https://academicpi.blob.core.windows.net/software/cooling-equation.png)

The curve fitting algorithm finds the value of `a` (alpha) and `b` (beta)

In [10]:
def temperature_profile(x, a, b):
    # Unpack x values
    temperature = x[0]
    volume = x[1]
    return np.multiply(1 + np.multiply(a, np.reciprocal(volume)), temperature) - a * b * np.reciprocal(volume)

---
---
# Main section 
---
---
Once all functions above are fully implemented, below are the cell to:

1. Set the input parameters
2. Read the input dataframe
3. Call `compute_cooling_predictions`
4. Plot result data

Note that each time you touch the code of a function in a cell, you have to execute that cell for that code to become effective. You can come back here and then rerun the 1-2-3-4 sequence to check the new result. 

In [22]:
# Selected brand
brand = '3' # 'Realtime Hops'
# Temperature sensor position to consider
temp_sensors = {'bottom': False, 'middle': True, 'top': False}
training_days = 20
interval = '00:01:00'

### Development tip

You've seen that requesting for a Dataview result takes some time. Development of a notebook involves running code over and over, so you'll want to avoid long running steps when possible. This is why you can run the cell below once, with the resulting dataframe saved in variable `all_brands_df`. If you don't change any of its input parameter, `all_brands_df` is still valid and can be reused when you run the main function `compute_cooling_predictions` below. 

In [13]:
all_brands_df = get_all_brand_data(training_days, '2017-03-17T07:00', interval)

Urls: ['https://dat-b.osisoft.com/api/v1-preview/Tenants/65292b6c-ec16-414a-b583-ce7ae04046d4/Namespaces/fermenter__vessels/Dataviews/DV_FV31/preview/interpolated?startIndex=2017-03-17T07:00&endIndex=2017-04-06T07:00:00&interval=00:01:00&form=csvh&maxcount=200000', 'https://dat-b.osisoft.com/api/v1-preview/Tenants/65292b6c-ec16-414a-b583-ce7ae04046d4/Namespaces/fermenter__vessels/Dataviews/DV_FV32/preview/interpolated?startIndex=2017-03-17T07:00&endIndex=2017-04-06T07:00:00&interval=00:01:00&form=csvh&maxcount=200000', 'https://dat-b.osisoft.com/api/v1-preview/Tenants/65292b6c-ec16-414a-b583-ce7ae04046d4/Namespaces/fermenter__vessels/Dataviews/DV_FV33/preview/interpolated?startIndex=2017-03-17T07:00&endIndex=2017-04-06T07:00:00&interval=00:01:00&form=csvh&maxcount=200000', 'https://dat-b.osisoft.com/api/v1-preview/Tenants/65292b6c-ec16-414a-b583-ce7ae04046d4/Namespaces/fermenter__vessels/Dataviews/DV_FV34/preview/interpolated?startIndex=2017-03-17T07:00&endIndex=2017-04-06T07:00:00&int

In [14]:
all_brands_df.to_csv('all_brands_df.csv', index=False)
all_brands_df

In [17]:
all_brands_df = all_brands_df.sort_values(['Timestamp'])
all_brands_df

Unnamed: 0,Timestamp,Volume,Top TIC PV,Top TIC OUT,Plato,Middle TIC PV,Middle TIC OUT,FV Full Plato,Fermentation ID,Brand,Bottom TIC PV,Bottom TIC OUT,ADF,Status
0,2017-03-17 07:00:00+00:00,716.566,29.6131516,0,Bad Input,29.35638,0,Bad Input,Fermentor 31201731179653,4,29.8845711,10.9353266,Bad Input,12.0
0,2017-03-17 07:00:00+00:00,0,102.005066,0,Bad Input,100.506538,0,Bad Input,Fermentor 33201731511870,0,99.50749,0,Bad Input,5.0
0,2017-03-17 07:00:00+00:00,715.3655,62.77387,0,9.330639,63.0009155,0,Bad Input,Fermentor 36201731679561,3,63.36019,71.0848846,0.0238323547,7.0
0,2017-03-17 07:00:00+00:00,702.2655,29.756073,0,Bad Input,30.270546,51.11878,Bad Input,FV35201612449149,5,29.78233,0,Bad Input,12.0
0,2017-03-17 07:00:00+00:00,721.235,30.0258846,15.2599144,Bad Input,28.4155426,0,Bad Input,FV342016112860676,21,30.0720634,9.168572,Bad Input,13.0
0,2017-03-17 07:00:00+00:00,711.9072,29.5649452,0,Bad Input,29.7733288,0,Bad Input,FV322016113055113,19,29.6059322,0,Bad Input,12.0
1,2017-03-17 07:01:00+00:00,702.2655,29.74762,0,Bad Input,30.26469,47.51,Bad Input,FV35201612449149,5,29.8157082,0,Bad Input,12.0
1,2017-03-17 07:01:00+00:00,721.235,30.0151138,9.275634,Bad Input,28.4295788,0,Bad Input,FV342016112860676,21,30.0653286,11.7036619,Bad Input,13.0
1,2017-03-17 07:01:00+00:00,711.9072,29.56427,0,Bad Input,29.76849,0,Bad Input,FV322016113055113,19,29.6315823,0,Bad Input,12.0
1,2017-03-17 07:01:00+00:00,0,101.966988,0,Bad Input,100.48127,0,Bad Input,Fermentor 33201731511870,0,99.45587,0,Bad Input,5.0


In [19]:
rh_df = all_brands_df[all_brands_df['Brand'] == 'Realtime Hops']
rh_df

Unnamed: 0,Timestamp,Volume,Top TIC PV,Top TIC OUT,Plato,Middle TIC PV,Middle TIC OUT,FV Full Plato,Fermentation ID,Brand,Bottom TIC PV,Bottom TIC OUT,ADF,Status


In [20]:
all_brands_df.Brand.unique()

array(['4', '0', '3', '5', '21', '19', 'Bad Input', '14'], dtype=object)

In [21]:
rh_df = all_brands_df[all_brands_df['Brand'] == '3']
rh_df

In [32]:
rh_df[rh_df.Status == 7.0]

Unnamed: 0,Timestamp,Volume,Top TIC PV,Top TIC OUT,Plato,Middle TIC PV,Middle TIC OUT,FV Full Plato,Fermentation ID,Brand,Bottom TIC PV,Bottom TIC OUT,ADF,Status
0,2017-03-17 07:00:00+00:00,715.3655,62.77387,0,9.330639,63.0009155,0,Bad Input,Fermentor 36201731679561,3,63.36019,71.0848846,0.0238323547,7.0
1,2017-03-17 07:01:00+00:00,715.3655,62.77585,0,9.340824,63.01376,0,Bad Input,Fermentor 36201731679561,3,63.2959747,65.7217255,0.0240245778,7.0
2,2017-03-17 07:02:00+00:00,715.3655,62.7777863,0,9.351008,63.02634,0,Bad Input,Fermentor 36201731679561,3,63.23536,60.3585625,0.0242167991,7.0
3,2017-03-17 07:03:00+00:00,715.3655,62.7797546,0,9.361194,63.03913,0,Bad Input,Fermentor 36201731679561,3,63.2,52.9479828,0.0244090222,7.0
4,2017-03-17 07:04:00+00:00,715.3655,62.7812958,0,9.371378,63.04912,0,Bad Input,Fermentor 36201731679561,3,63.2,43.0350075,0.0246012434,7.0
5,2017-03-17 07:05:00+00:00,715.3655,62.7828255,0,9.381563,63.059063,0,Bad Input,Fermentor 36201731679561,3,63.2,33.1220322,0.0247934666,7.0
6,2017-03-17 07:06:00+00:00,715.3655,62.78554,0,9.391747,63.07666,0,Bad Input,Fermentor 36201731679561,3,63.2,23.2090588,0.0249856878,7.0
7,2017-03-17 07:07:00+00:00,715.3655,62.78759,0,9.401933,63.08997,3.29620218,Bad Input,Fermentor 36201731679561,3,63.23079,30.90664,0.02517791,7.0
8,2017-03-17 07:08:00+00:00,715.3655,62.78951,0,9.412117,63.0705032,7.33236837,Bad Input,Fermentor 36201731679561,3,63.2870026,42.55761,0.0253701322,7.0
9,2017-03-17 07:09:00+00:00,715.3655,62.7910919,0,9.422302,63.0765457,11.3685341,Bad Input,Fermentor 36201731679561,3,63.3334579,54.20858,0.0255623553,7.0


In [28]:
all_brands_df = pd.read_csv('/home/jovyan/interpolation/acadhub-beer-20days.csv')
brand = 'Realtime Hops'
all_brands_df

Unnamed: 0,Element,Timestamp,Quality,Volume,Top TIC PV,Top TIC OUT,Status,Plato,Middle TIC PV,Middle TIC OUT,FV Full Plato,Fermentation ID,Brand,Bottom TIC PV,Bottom TIC OUT,Bottom Temperature,ADF
0,Fermentor 31,2017-03-17T07:00:00Z,No Data,716.565979003906,29.613151550293,0,Maturation,5,29.3563804626465,0,14.6940097808838,Fermentor 31201731179653,Kerberos,29.8845710754395,10.9353265762329,0,0.659725
1,Fermentor 32,2017-03-17T07:00:00Z,No Data,711.9072265625,29.5649452209473,0,Maturation,3.70000004768372,29.7733287811279,0,14.897970199585,FV322016113055113,5450,29.6059322357178,0,0,0.751644
2,Fermentor 33,2017-03-17T07:00:00Z,No Data,0,102.005065917969,0,Sanitized,0,100.506538391113,0,0,Fermentor 33201731511870,,99.5074920654297,0,0,0.000000
3,Fermentor 34,2017-03-17T07:00:00Z,No Data,721.234985351563,30.0258846282959,15.2599143981934,Ready to Transfer,2.79999995231628,28.4155426025391,0,15.4039897918701,FV342016112860676,Red Wonder,30.0720634460449,9.16857242584229,0,0.818229
4,Fermentor 35,2017-03-17T07:00:00Z,No Data,702.265502929688,29.7560729980469,0,Maturation,4.80000019073486,30.2705459594727,51.1187782287598,17.0012798309326,FV35201612449149,Grey Horse,29.7823295593262,0,0,0.717668
5,Fermentor 36,2017-03-17T07:00:00Z,No Data,715.365478515625,62.773868560791,0,Fermentation,0,63.0009155273438,0,13.5041198730469,Fermentor 36201731679561,Realtime Hops,63.3601913452148,71.0848846435547,0,0.000000
6,Fermentor 31,2017-03-17T07:01:00Z,No Data,716.565979003906,29.5528469085693,0,Maturation,5,29.3788681030273,0,14.6940097808838,Fermentor 31201731179653,Kerberos,29.9314918518066,12.4457864761353,0,0.659725
7,Fermentor 32,2017-03-17T07:01:00Z,No Data,711.9072265625,29.5642700195313,0,Maturation,3.70000004768372,29.7684898376465,0,14.897970199585,FV322016113055113,5450,29.6315822601318,0,0,0.751644
8,Fermentor 33,2017-03-17T07:01:00Z,No Data,0,101.966987609863,0,Sanitized,0,100.481269836426,0,0,Fermentor 33201731511870,,99.4558715820313,0,0,0.000000
9,Fermentor 34,2017-03-17T07:01:00Z,No Data,721.234985351563,30.0151138305664,9.27563381195068,Ready to Transfer,2.79999995231628,28.4295787811279,0,15.4039897918701,FV342016112860676,Red Wonder,30.0653285980225,11.7036619186401,0,0.818229


In [29]:
cool_df, predictions, cool_df_training, start_temp, num_fermentations, num_coolings = \
    compute_cooling_predictions(all_brands_df, brand, temp_sensors, training_days, interval)

  ==> Finished 'brand_df_cleanup' in               0.2394 secs
  ==> Finished 'fermentation_starts' in            0.4510 secs
  @@@ Number of fermentation for brand Realtime Hops: 5330
  ==> Finished 'fermentation_times' in             4233.8995 secs
  ==> Finished 'cooling_data_extraction' in        4244.7132 secs
  ==> Finished 'compute_cooling_predictions' in    4245.4702 secs


## Plot prediction curve along with actual data 

Note: you can zoom into the graph to see how the prediction and data actually differ 

In [31]:
# Plotly trace for prediction curve
label = f'Prediction curve, start temp: {start_temp:5.2f}F'
prediction_trace = go.Scatter(x = cool_df_training.tsc.values, 
                              y = predictions, 
                              mode='lines', 
                              name=label, 
                              marker=dict(color='blue'))


data_trace = go.Scatter(x = cool_df.tsc.values, 
                        y = cool_df.temperature.values, 
                        mode='markers', 
                        name='Actual Data', 
                        opacity=0.4,
                        marker=dict(color='orange'))


plot_title = f'Cooling of {brand} beer, {training_days} days, {num_fermentations} fermentation(s),<br>' \
             f'interval={interval} {temp_sensors}'
layout =  go.Layout(xaxis=dict(title='Cooling time (days)'), 
                    yaxis=dict(title='Temperature (F)'), 
                    title=plot_title)

fig = go.FigureWidget(data=[prediction_trace, data_trace], layout=layout)
fig

FigureWidget({
    'data': [{'marker': {'color': 'blue'},
              'mode': 'lines',
              'name':…

## ---------- Your graph will appear above this line if no error occured ----------

-----
-----
-----
# Extra Credits

![Beer Cooling Outlier Extra](https://academicpi.blob.core.windows.net/software/beer-cooling-prediction-extra.png)