<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Learning-Objectives---Introduction-to-Time-Series-Analysis:-Predicting-the-Apparent-Degree-of-Fermentation-(ADF)" data-toc-modified-id="Learning-Objectives---Introduction-to-Time-Series-Analysis:-Predicting-the-Apparent-Degree-of-Fermentation-(ADF)-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Learning Objectives - Introduction to Time-Series Analysis: Predicting the Apparent Degree of Fermentation (ADF)</a></span><ul class="toc-item"><li><span><a href="#Exercise-Overview" data-toc-modified-id="Exercise-Overview-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Exercise Overview</a></span></li><li><span><a href="#In-More-Detail" data-toc-modified-id="In-More-Detail-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>In More Detail</a></span></li></ul></li></ul></div>

---
# Learning Objectives - Introduction to Time-Series Analysis: Predicting the Apparent Degree of Fermentation (ADF)
---
At the end of this [exercise](.NB1_EXERCISE_ADF_Prediction.ipynb) students will be able to:
1. Retrieve process data using a cloud-based platform (OSIsoft Cloud Services or OCS) which stores real industrial sensor data.
2. Perform basic operations using Pandas, a software library widely-used in data science applications
3. Perform basic data-cleansing operations to ensure that the quality of the data fits the analysis goals
4. Perform linear and piecewise linear regression using Python
5. Evaluate the quality of a model fit
6. Formulate strategies for creating predictive models of real (i.e. dirty, intermittent) process data


---
## Exercise Overview

Students will be given an overview into why creating predictive models could be valuable to industry (in this case, the ADF prediction model saved Deschutes Brewery \$750 k in capital investments). Because real process data is "dirty", students will appreciate the importance of domain knowledge in being able to organize and clean the data to fit analysis goals. Furthermore, Part 4 of the exercise notebook has missing pieces of code students will have to fill in order for the notebook to run properly and produce results; this ensures that students will interact with the notebook and pay attention to how code is written.

**The ADF Prediction exercise notebook is divided into five parts:**

[Part 1](./NB1_EXERCISE_ADF_Prediction.ipynb#section_1): Specify time period, time granularity, fermentor vessels, brand of interest, and other parameters.

[Part 2](./NB1_EXERCISE_ADF_Prediction.ipynb#section_2): Use OSIsoft Cloud Services (OCS) to obtain real process data from Deschutes Brewery ([2a](./NB1_EXERCISE_ADF_Prediction.ipynb#section_2a)) and store it into a dataframe ([2b](./NB1_EXERCISE_ADF_Prediction.ipynb#section_2b)).

[Part 3](./NB1_EXERCISE_ADF_Prediction.ipynb#section_3): Create functions which utilize Plotly to preview the ADF profile ([3a](./NB1_EXERCISE_ADF_Prediction.ipynb#section_3a)) and create plots which will help the user to assess the quality of the model fit ([3b](./NB1_EXERCISE_ADF_Prediction.ipynb#section_3b)).

[Part 4](./NB1_EXERCISE_ADF_Prediction.ipynb#section_4): Pre-process the brewery data before fitting and analysis. Remove obviously bad data points and unnecessary attributes ([4a](./NB1_EXERCISE_ADF_Prediction.ipynb#section_4a)); identify fermentation batches and compute the duration of fermentation ([4c](./NB1_EXERCISE_ADF_Prediction.ipynb#section_4c)); manually remove emergent outliers ([4e](./NB1_EXERCISE_ADF_Prediction.ipynb#section_4e)).


[Part 5](./NB1_EXERCISE_ADF_Prediction.ipynb#section_5): Fit the cleaned up data frame to a linear model ([5a](./NB1_EXERCISE_ADF_Prediction.ipynb#section_5a)) and then to a piecewise linear model ([5b](./NB1_EXERCISE_ADF_Prediction.ipynb#section_5b)). Evaluate and compare the model fits.

**At the end of the exercise, students will be asked to think about the following questions:**
1. Which of the models used in this notebook performed the best and why?
2. Can you think of a different model that would perform better than either of the models used in this notebook?
3. Are there any trade-offs to using more complex models? If so, what are they
4. Enumerate the reasons why we can't immediately use the raw process data for model fitting.


---
## In More Detail
**In the notebook, students will learn how to initialize OCS by authenticating their purpose and directing calls to the right namespace:**
```python
config = configparser.ConfigParser()
config.read("config.ini")

hub_client = HubClient(
    config.get("Access", "ApiVersion"),
    config.get("Access", "Tenant"),
    config.get("Access", "Resource"),
    config.get("Credentials", "ClientId"),
    config.get("Credentials", "ClientSecret"),
)

namespace_id = config.get("Configurations", "Namespace")
print (f"namespace_id: '{namespace_id}'")
```

**The students will learn to efficiently identify fermentation batches and their duration by leveraging Pandas features:**
```python
# find the start of fermentation (ADF ~= 0.0)
fermentation_starts = all_brands_df[abs(all_brands_df["ADF"]) <= 0.000001].reset_index()

# create new dataframe column for recording the time evolution of fermentation
all_brands_df["Elapsed"] = -1

# compute the time elapsed since the starts of a fermentation
for count, ferm_begin in fermentation_starts.iterrows():

    # get the timestamp for each start of fermentation
    fermentation_time = pd.Timestamp(ferm_begin["Timestamp"])

    # get booleans for the new fermentation batch
    batch_boolean = (
        all_brands_df["Timestamp"].apply(lambda t: pd.Timestamp(t)) >= fermentation_time
    )

    # find time elapsed since the start of fermentation in units of hours
    all_brands_df.loc[batch_boolean, "Elapsed"] = all_brands_df.loc[
        batch_boolean, "Timestamp"
    ].apply(lambda t: (pd.Timestamp(t) - fermentation_time).total_seconds() / 3600.0)
```

**Students will be given the opportunity to use their judgment to fill in missing pieces of code and complete filter expressions like the following:**
```python
# filter out outliers
# TODO: Complete filter expressions. 
# HINT: We want data *not* in some outlier region!
# TIP: May use as many filter expressions as necessary
# =========== STUDENT BEGIN ==========
all_brands_df = all_brands_df[
    ~((all_brands_df["Elapsed"] < 20) & (all_brands_df["ADF"] > 0.5))
    & @@@ Your code here @@@
    & @@@ Your code here ??? @@@
]
# =========== STUDENT END ==========
```
**Ultimately, the goal is for the students to fit the data to a linear model and a piecewise model and then assess the model fits:**
![](https://academicpi.blob.core.windows.net/images/NB1_Analyze_Fit.png)