<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Learning-Objectives---Beer-Cooling-Prediction" data-toc-modified-id="Learning-Objectives---Beer-Cooling-Prediction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Learning Objectives - Beer Cooling Prediction</a></span><ul class="toc-item"><li><span><a href="#Exercise-Overview" data-toc-modified-id="Exercise-Overview-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Exercise Overview</a></span></li><li><span><a href="#In-More-Detail" data-toc-modified-id="In-More-Detail-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>In More Detail</a></span></li></ul></li></ul></div>

---
# Learning Objectives - Beer Cooling Prediction
---
At the end of this [exercise](./NB2_EXERCISE_Cooling_Prediction.ipynb) students will be able to:
1. Retrieve process data using a cloud-based platform (OSIsoft Cloud Services or OCS) which stores real industrial sensor data.
2. Perform basic operations using Pandas, a software library widely used in data science applications.
3. Identify problems with data quality.
4. Perform data cleansing operations to ensure the quality of the data fits the analysis goals.
5. Apply regression analysis on real industrial process data.
6. Formulate strategies for cleaning and modeling the data using domain knowledge of the process.

---
## Exercise Overview
Predictive models of the process can be valuable to industry (the ADF prediction model saved Deschutes Brewery from 750k in capital investments). In this [exercise](./NB2_EXERCISE_Cooling_Prediction.ipynb), students will have to use domain knowledge of the brewing process and the physics involved (e.g. heat transfer) to model the temperature profile during the cooling stages.

While non-linear regression may be trivial for more advanced engineering students, this exercise will expose them to the realities of modeling real-world industrial data, where the majority of the effort is spent wrangling the data: dealing with missing or mislabeled entries, and transforming them into a format where regression analysis could be applied.

The exercise notebook also has blocks of missing code which students will have to complete in order for the notebook to run properly and produce results; this ensures that students will interact with the notebook and pay attention to the algorithm and how the code is written.

**The exercise notebook is divided into two parts.**

In [Part 1](./NB2_EXERCISE_Cooling_Prediction.ipynb#section_1) students will create a function `compute_cooling_prediction` which does the following:
* Clean the data by throwing out irrelevant ([1a](./NB2_EXERCISE_Cooling_Prediction.ipynb#section_1a)) and bad ([1c](./NB2_EXERCISE_Cooling_Prediction.ipynb#section_1c)) entries
* Identify when a fermentation event starts ([1d](./NB2_EXERCISE_Cooling_Prediction.ipynb#section_1d))  and then compute the time elapsed since the start of fermentation ([1e](./NB2_EXERCISE_Cooling_Prediction.ipynb#section_1e))
* Identify the cooling stages ([1e](./NB2_EXERCISE_Cooling_Prediction.ipynb#section_1e)) and identify the elapsed times since the start of cooling ([1e](./NB2_EXERCISE_Cooling_Prediction.ipynb#section_1e))
* Fit the cooling data ([1a](./NB2_EXERCISE_Cooling_Prediction.ipynb#section_1a)) to a heat transfer equation ([1f](./NB2_EXERCISE_Cooling_Prediction.ipynb#section_1f)) 

In [Part 2](./NB2_EXERCISE_Cooling_Prediction.ipynb#section_2) students will execute `compute_cooling_prediction` and plot the results:
* Use OSIsoft Cloud Services (OCS) to obtain process data from Deschutes Brewery ([2b](./NB2_EXERCISE_Cooling_Prediction.ipynb#section_2b) and [2c](./NB2_EXERCISE_Cooling_Prediction.ipynb#section_2c))
* Execute the `compute_cooling_prediction` function created in Part 1 ([2d](./NB2_EXERCISE_Cooling_Prediction.ipynb#section_2d))
* Plot the cooling prediction curve using Plotly ([2e](./NB2_EXERCISE_Cooling_Prediction.ipynb#section_2e))

**At the exercise, students will be asked to think about the following questions:**
 
1. If the cooling data had outliers, how would you remove them?
2. How would having an erratic cooling profile affect the prediction curve? 
3. Can you use the cooling prediction curve to determine whether a given batch is out-of-spec? How would you do this?
4. Do you think you can use the cooling rate to predict the beer brand?


---
## In More Detail
**Students will learn how to initialize OSIsoft Cloud Services (OCS) by authenticating their purpose and directing calls to the right namespace:**

```python
config = configparser.ConfigParser()
config.read("config.ini")

hub_client = HubClient(
    config.get("Access", "ApiVersion"),
    config.get("Access", "Tenant"),
    config.get("Access", "Resource"),
    config.get("Credentials", "ClientId"),
    config.get("Credentials", "ClientSecret"),
)

namespace_id = config.get("Configurations", "Namespace")
```

**Students will then use OCS to retrieve process data from Deschutes Brewery:**
```python
all_brands_df = hub_client.dataview_interpolated_pd(
    namespace_id, DATAVIEW_ID, START_INDEX, END_INDEX, INTERVAL
)
```

**In several places in the exercise, students will have to fill in missing pieces of the code:**
```python
# TODO: Remove all data point with bad input.
# All the following columns can have value BAD_INPUT:
#   Brand, Status, Bottom TIC PV, Middle TIC PV, Top TIC PV, Volume
#
brand_df = brand_df[brand_df["Brand"] != BAD_INPUT]
brand_df = brand_df[brand_df["Status"] != BAD_INPUT]
brand_df = brand_df[brand_df["Top TIC PV"] != BAD_INPUT]
# =========== STUDENT BEGIN ==========
brand_df = brand_df[brand_df[@@@ Your code here @@@]
brand_df = brand_df[brand_df[@@@ Your code here @@@]
brand_df = brand_df[brand_df[@@@ Your code here @@@]
# =========== STUDENT END ==========

# Keep only fermentation or post-fermentation stages
brand_status_df = brand_df[brand_df["Status"].isin(POST_FERMENTATION_STAGES)]

# drop rows with NaNs
brand_status_df = brand_status_df.dropna(axis=0)

# Remove all data points from brand_status_df dataframe with communication issues
# TODO: for columns in TIC_PV_COLUMNS, remove all rows with communication failures status (COMM_FAIL) and IO timeout (IO_TIMEOUT)
for tic_pv in TIC_PV_COLUMNS:
    # =========== STUDENT BEGIN ==========
    brand_status_df = brand_status_df[@@@ Your code here @@@]
    brand_status_df = brand_status_df[@@@ Your code here @@@]
    # =========== STUDENT END ==========
    brand_status_df[tic_pv] = brand_status_df[tic_pv].astype(float)
```

**Students will have to have to use domain knowledge to tackle situations where the data have been mislabled but are still valid and necessary for modeling:**
```python
# condition for it to be in cooling phase
# TODO: the condition is that 'TOP TIC OUT', 'Middle TIC OUT', and 'Bottom TIC OUT' are above 99.99
# ============ STUDENT BEGIN ============
cool_stage = brand_status_df[
    (brand_status_df["Top TIC OUT"] > 99.99)
    & (@@@ Your code here @@@)
    & (@@@ Your code here @@@)
]
# ============ STUDENT END ============
```

**Part of the code deals with how regression analysis can be accomplished using Python:**
```python
# training temperature feature
x1_train = cool_df_training["Temperature"].values

# training Volume feature
x2_train = cool_df_training["Volume"].values.astype(float)

# [temperature, volume]
x = [x1_train, x2_train]

# Training of non-linear least squares model
# Nonlinear curve-fitting pass a tuple in curve fitting
popt, pcov = curve_fit(temperature_profile, x, cool_df_training["temp_y"].values)

# get the coefficients alpha and beta in the model
a = popt[0]
b = popt[1]

# Get the initial point of all temperature curves
y_first = [x1_train[0]]

# Compute the prediction for each individual start temperature
for y_predicted in y_first:
    y_pred = [y_predicted]
    cool_df_training = cool_df_training.sort_values(by=["tsc"])

    for i in range(1, len(x2_train)):
        y_predicted = y_predicted * (1 + a / x2_train[i]) - (a * b / x2_train[i])
        y_pred.append(y_predicted)
```