## Polymer Melt Flow Rate

Polymer properties such as density, melt index, and melt flow rate must be kept within tight specifications for each grade. This case study is to analyze polymer production data to predict melt flow rate. See full [problem statement](http://apmonitor.com/pds/index.php/Main/PolymerMeltFlowRate).

### Import Polymer MFR Data

In [None]:
url = 'http://apmonitor.com/pds/uploads/Main/polymer_reactor.txt'

Rename the columns `['Time','C3=','H2R','Pressure','Level','C2=','Cat','Temp','MFR']`

Create a new column for the natural log of (MFR) as `lnMFR`

Use the `.describe()` function to get a summary of the data.

![idea](https://apmonitor.com/che263/uploads/Begin_Python/idea.png)

### Data Analysis with `pandas-profiling`

Pandas Profiling is a data analysis tool for a more in-depth summary of the data than the `descibe()` function. [Install the package](https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/installation.html) with:

```python
pip install --user pandas-profiling[notebook]
jupyter nbextension enable --py widgetsnbextension
```

You need to restart the Kernel before proceeding. The install only needs to run once.

In [None]:
try:
    import pandas as pd
    from pandas_profiling import ProfileReport
except:
    !pip install --user pandas-profiling
    !jupyter nbextension enable --py widgetsnbextension
    print('Restart the Kernel before proceeding')

After you install `pandas-profiling` and enable the widget extension, you can now import and analysis data. Some of the functions take a long time with a large data set. Two methods for dealing with large data sets are to:

```python
profile = ProfileReport(data, explorative=True, minimal=False)
```

1. Sub-sample the data sets such as with `data = data[::10]` to take every 10th row.
2. Use the `minimal` option to avoid the correlation and other analysis that is slow with large data sets.

View the profile report in the Jupyter Notebook with `profile.to_widgets()`.