## Polymer Melt Flow Rate

Polymer properties such as density, melt index, and melt flow rate must be kept within tight specifications for each grade. This case study is to analyze polymer production data to predict melt flow rate. See full [problem statement](http://apmonitor.com/pds/index.php/Main/PolymerMeltFlowRate).

### Import Polymer MFR Data

In [None]:
import pandas as pd
url = 'http://apmonitor.com/pds/uploads/Main/polymer_reactor.txt'
data = pd.read_csv(url)
data.head()

Unnamed: 0.1,Unnamed: 0,513FC31103.pv,513HC31114-5.mv,513PC31201.pv,513LC31202.pv,513FC31409.pv,513FC31114-5.pv,513TC31220.pv,MFR
0,04-05-18 19:45,24856.584,0.153,30.414835,79.657906,50850.578,0.163994,80.399605,3.4
1,04-05-18 21:45,25537.25,0.153,30.527121,78.532608,42229.813,0.129739,78.861328,3.2
2,04-05-18 23:45,25689.266,0.153,30.35618,78.842636,45335.852,0.150003,78.818115,3.2
3,04-06-18 1:45,25098.754,0.153,30.894308,79.1735,43077.016,0.151543,79.02272,3.1
4,04-06-18 3:45,24853.941,0.15,30.680647,78.677299,40404.387,0.122582,79.038483,3.3


Rename the columns `['Time','C3=','H2R','Pressure','Level','C2=','Cat','Temp','MFR']`

In [None]:
data.columns = ['Time','C3=','H2R','Pressure','Level','C2=','Cat','Temp','MFR']

Create a new column for the natural log of (MFR) as `lnMFR`

In [None]:
import numpy as np
data['lnMFR'] = np.log(data['MFR'])
data.sample(5)

Unnamed: 0,Time,C3=,H2R,Pressure,Level,C2=,Cat,Temp,MFR,lnMFR
2191,11-08-18 19:45,26146.855,0.13,30.502262,77.200287,45550.746,0.107879,80.623734,3.4,1.223775
1416,8/25/2018 11:45,22500.535,0.167,31.151321,78.426834,37604.477,0.163655,80.60672,3.8,1.335001
161,4/18/2018 15:45,24397.914,0.127,30.478027,78.712746,37211.875,0.129995,79.970398,3.2,1.163151
55,04-10-18 15:45,24191.648,0.195,31.125795,78.683777,38632.813,0.105829,78.581245,12.6,2.533697
1070,7/29/2018 23:45,25932.926,0.258,30.743958,76.143822,58481.039,0.155027,79.997337,13.6,2.61007


Use the `.describe()` function to get a summary of the data.

In [None]:
data.describe()

Unnamed: 0,C3=,H2R,Pressure,Level,C2=,Cat,Temp,MFR,lnMFR
count,2486.0,2560.0,2484.0,2484.0,2484.0,2486.0,2484.0,2564.0,2564.0
mean,25306.285729,0.178427,30.663706,77.651055,42525.14,0.13853,80.144365,8.185218,1.901381
std,1706.481672,0.077473,0.423345,0.9196,11331.86896,0.041869,0.823554,5.088696,0.638107
min,16106.025,0.0,26.946344,74.575958,9610.4648,0.022162,77.760117,1.5,0.405465
25%,24361.632,0.136,30.446129,76.992151,34795.535,0.113764,79.677458,3.7,1.308333
50%,25365.7545,0.1735,30.622631,77.494477,41550.5625,0.132986,80.044308,4.3,1.458615
75%,26398.45225,0.2,30.925738,78.210867,50010.295,0.15699,80.496296,12.9,2.557227
max,30083.688,0.98,32.674332,83.841675,106073.61,0.677979,91.566544,38.0,3.637586


![idea](https://apmonitor.com/che263/uploads/Begin_Python/idea.png)

### Data Analysis with `ydata-profiling`

Pandas Profiling is a data analysis tool for a more in-depth summary of the data than the `descibe()` function. Install the package with `pip install ipywidgets ydata-profiling` You need to restart the kernel before proceeding. The install only needs to run once.

In [None]:
pip install ipywidgets ydata-profiling




[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip





In [None]:
pip install ipywidgets

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
try:
    from ydata_profiling import ProfileReport
except:
    print('Restart the Kernel before proceeding')

After you install `ydata-profiling`, you can now import and analysis data. Some of the functions take a long time with a large data set. Two methods for dealing with large data sets are to:

1. Sub-sample the data sets such as with `data = data[::10]` to take every 10th row.
2. Use the `minimal=True` option to avoid the correlation and other analysis that is slow with large data sets.

In [None]:
profile = ProfileReport(data, explorative=True, minimal=False)

View the profile report in the Jupyter Notebook with `profile.to_widgets()` or export to html file with `profile.to_file("1-Analyze_MFR.html")`.

In [None]:
profile.to_file("1-Analyze_MFR.html")

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

