# Exploratory Data Analysis

## Case Description
* The Ponta Grossa (PG) brewery produces many famous beer brands such as Heineken and Amstel.
* The brewing process has two major phases, a hot phase, and a cold phase (Figure 1). During the hot phase,
two types of malts are milled: a base malt which is light colored and comes in greater amounts and a
roast malt, which has a darker color and therefore is used in giving beer its desired color.
* These malts are sent to the malt cooker and cooked with water in the mashing step. The grains
are then filtered in the lautering step after which the wort (liquid extracted from the mashing process)
is boiled with hops. The used hop is then filtered in the whirlpool and the hot wort goes on to cool to be
later fermented. Each production that goes through these phases is call a “batch”.
* Several KPIs are monitored throughout this process. An important one that characterizes beer
brands is color. Since Ponta Grossa (PG) brewery has an issue of color assertiveness during the brewing process
they have decided to use data science tools to correct beer color index. Since it is empirical brewing
knowledge that the hot phase of brewing is what most affects final beer color, PG has decided to
implement an advanced analytics tool that predicts beer color right after the cooling process.
* Based on the data available, fit a model to predict the color of the cold wort for the AMSTEL
brand only.

![opa](pics/brewing-process.jpg)

## Columns description
* Job ID: Unique identifier of each batch
* Date/Time: Datetime at which the batch process started
* Roast amount (kg): Amount of roast malt used
* 1st (base) malt amount (kg): Amount of 1st base malt used
* 2nd (base) malt amount (kg): Amount of 2nd base malt used (* 1st and 2nd malts are
mixed together during milling but may come from different lots)
* MT – Temperature (ºC): Malt cooker’s aggregated temperature
* MT – Time (s): Period of time that the batch stayed on malt cooker
* WK – Temperature (°C): Wort cooker’s aggregated temperature
* WK – Steam: Wort cooker’s aggregated steam amount
* WK – Time (s): Period of time that the batch stayed on wort cooker
* Total cold wort (HL): Total batch volume of cold wort after cooling
* pH: A batch’s aggregated pH measured during cooling
* Extract (ºP): A batch’s aggregated extract measured during cooling (measures
concentration of sugars in wort)
* Color (EBC) (Model Target): Color value generated by a sensor (measured in European Brewery
Conventions)
* WOC – Time (s): Period of time that the batch stayed on Wort Cooler
* WHP Transfer – Time (s): Whirlpool Transfer Time
* WHP Rest – Time (s): Whirlpool Rest Time
* Roast color (EBC): Color of roast malt
* 1st malt color (EBC): Color of 1st malt
* 2nd malt color (EBC): Color of 2nd malt
* Product: Specified product of one batch (Heineken (HNK), Amstel (AMST)…)

In [1]:
import numpy as np