## Polymer Melt Flow Rate

Polymer properties such as density, melt index, and melt flow rate must be kept within tight specifications for each grade. This case study is to analyze polymer production data to predict melt flow rate. See full [problem statement](http://apmonitor.com/pds/index.php/Main/PolymerMeltFlowRate).

### Import Polymer MFR Data

In [6]:
import numpy as np
import pandas as pd
url = 'http://apmonitor.com/pds/uploads/Main/polymer_reactor.txt'
data = pd.read_csv(url)
data.columns = ['Time','C3=','H2R','Pressure','Level','C2=','Cat','Temp','MFR']
data['lnMFR'] = np.log(data['MFR'].values)
data.describe()

Unnamed: 0,C3=,H2R,Pressure,Level,C2=,Cat,Temp,MFR,lnMFR
count,2486.0,2560.0,2484.0,2484.0,2484.0,2486.0,2484.0,2564.0,2564.0
mean,25306.285729,0.178427,30.663706,77.651055,42525.14,0.13853,80.144365,8.185218,1.901381
std,1706.481672,0.077473,0.423345,0.9196,11331.86896,0.041869,0.823554,5.088696,0.638107
min,16106.025,0.0,26.946344,74.575958,9610.4648,0.022162,77.760117,1.5,0.405465
25%,24361.632,0.136,30.446129,76.992151,34795.535,0.113764,79.677458,3.7,1.308333
50%,25365.7545,0.1735,30.622631,77.494477,41550.5625,0.132986,80.044308,4.3,1.458615
75%,26398.45225,0.2,30.925738,78.210867,50010.295,0.15699,80.496296,12.9,2.557227
max,30083.688,0.98,32.674332,83.841675,106073.61,0.677979,91.566544,38.0,3.637586


![idea](https://apmonitor.com/che263/uploads/Begin_Python/idea.png)

### Data Analysis with `pandas-profiling`

Pandas Profiling is a data analysis tool for a more in-depth summary of the data than the `descibe()` function. [Install the package](https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/installation.html) with:

```python
pip install --user pandas-profiling[notebook]
jupyter nbextension enable --py widgetsnbextension
```

You need to restart the Kernel before proceeding. The install only needs to run once.

In [7]:
try:
    import pandas as pd
    from pandas_profiling import ProfileReport
except:
    !pip install --user pandas-profiling
    !jupyter nbextension enable --py widgetsnbextension
    print('Restart the Kernel before proceeding')

After you install `pandas-profiling` and enable the widget extension, you can now import and analysis data. Some of the functions take a long time with a large data set. Two methods for dealing with large data sets are to:

```python
profile = ProfileReport(data, explorative=True, minimal=False)
```

1. Sub-sample the data sets such as with `data = data[::10]` to take every 10th row.
2. Use the `minimal` option to avoid the correlation and other analysis that is slow with large data sets.

In [8]:
profile = ProfileReport(data, explorative=True, minimal=False)

View the profile report in the Jupyter Notebook with `profile.to_widgets()`.

In [9]:
profile.to_widgets()

HBox(children=(HTML(value='Summarize dataset'), FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(HTML(value='Generate report structure'), FloatProgress(value=0.0, max=1.0), HTML(value='')))




HBox(children=(HTML(value='Render widgets'), FloatProgress(value=0.0, max=1.0), HTML(value='')))

VBox(children=(Tab(children=(Tab(children=(GridBox(children=(VBox(children=(GridspecLayout(children=(HTML(valu…