# Introduction


The goal of this notebook is to provide guidence for the python package `rheodata` which is a collection of tools that clean, organize, and convert raw data off of rheometers.  


## Installation 
Currently, `rheodata` is in [PyPi](https://pypi.org/project/rheodata/).  While this is not ideal for anaconda, one can still install the package and use it in something like spyder or jupyer notebook.  

To install using pip normally use:

`pip install rheodata`

In your command prompt.

For use in anaconda for things such as spyder or jupyter notebook, first make sure pip is installed and then enter:

`pip install rheodata`

in your anaconda prompt.

If you have never used anaconda prompt, you can enter the above line in the terminal of your spyder program like in the image below:

![Spyder Terminal](images/terminal.png)

## Load and Use in Analysis Scripts

Once it's installed, you can load it in like any other package.

In [1]:
import rheodata

# Ignore pickle warnings
import warnings
warnings.filterwarnings("ignore")

`Rheodata` has different extractors for specific rheometers with the supported instrument models listed in the README.  Here, we are going to use an extractor for data from the Anton Paar MCR302 used at Northwestern as an example. First, we need to instantiate (basically load in) the extractor.

In [2]:
from rheodata.extractors import antonpaar

rheometer = antonpaar.AntonPaarExtractor()

We will then use some test data from an experiment looking at linseed oil.  This data can be found in the `tests/test_data/Anton_Paar/excel_test_data/` folder.  One thing to note is that for the Anton Paar, the data must be saved as an `.xlsx` file.  This is to avoid an issue with UTF-8 encoding that is still being figured out.

We can pass the location of that data into the package and get back a dictionary of cleaned dataframes for each test run on the sample

In [3]:
data = '../tests/test_data/Anton_Paar/excel_test_data/Steady State Viscosity Curve-LO50C_excel.xlsx'

cleaned_dataframes = rheometer.make_analyze_dataframes(data)

`.make_analyze_dataframes` uses the test names as keys in the dictionaries. We can query those keys and then get whatever data we want.

In [4]:
print(cleaned_dataframes.keys())

data_to_use = cleaned_dataframes['Steady State Viscosity Curve-LO80C']
print(data_to_use)

dict_keys(['Steady State Viscosity Curve-LO80C', 'Steady State Viscosity Curve-75C', 'Steady State Viscosity Curve-LO70C', 'Steady State Viscosity Curve-65C', 'Steady State Viscosity Curve-60C', 'Steady State Viscosity Curve-55C', 'Steady State Viscosity Curve-LO50C', 'Steady State Viscosity Curve-LO45C', 'Steady State Viscosity Curve-LO40C', 'Steady State Viscosity Curve-35C', 'Steady State Viscosity Curve-LO30C', 'Steady State Viscosity Curve'])
   Point No. Time Shear Rate Shear Stress Viscosity     Torque   Status
0          1   10       0.01  -0.00022404     -22.4 -0.0073426  Dy_auto
1          2   20     0.0147  -0.00024442     -16.7 -0.0080105  Dy_auto
2          3   30     0.0215  -8.3392e-05      -3.9  -0.002733  Dy_auto
3          4   40     0.0316   0.00015729         5  0.0051547  Dy_auto
4          5   50     0.0464    0.0001165       2.5  0.0038182  Dy_auto
5          6   60     0.0681   0.00042954       6.3   0.014077  Dy_auto
6          7   70        0.1   0.00073792   

You can now do whatever data analysis you want without having to parse the raw data from the machine.  One can also automatically save this cleaned data using:

In [5]:
rheometer.save_analyze_dataframes(data, output_folder_path="saved_data/")

Here, one needs to give the method the data path and then an output folder path.  The files are saved as CSV's with the test names as the filenames

## Converting to HDF5

We can also convert the data to an HDF5 file.  Here, each test is saved as a subfolder within the file structure.  The raw test data and the respective cleaned data is then saved in that subfolder as shown in the diagram below:

![HDF5 File Structure](images/HDF5_file_structure.png)



To do so, first convert the raw data into dictionaries that will be passed to the HDF5 through:

In [6]:
data = '../tests/test_data/Anton_Paar/excel_test_data/Steady State Viscosity Curve-LO50C_excel.xlsx'

modified_dict, test_raw, cols_info, units_info = rheometer.import_rheo_data(data)

Then load the converter from the rheodata package.

In [7]:
from rheodata import data_converter

Now convert the dictionaries to the HDF5. 

In [8]:
# Instantiate the converter
converter = data_converter.rheo_data_transformer(modified_dict, test_raw, cols_info, units_info)

# Load the data to an HDF5 and give it an output path
save_folder_path = "saved_data/Demo"
converter.load_to_hdf(save_folder_path)

Now the data is in an HDF5 file format that can be parsed using the h5py package.

In [9]:
import h5py
import pandas as pd

In [10]:
f = h5py.File("saved_data/Demo.hdf5", "r")

print(f["Project"].keys())
print(f["Project"]['Steady State Viscosity Curve-75C'].keys())

# Read the data into a dataframe
raw_data = pd.read_hdf("saved_data/Demo.hdf5", 'Project/Steady State Viscosity Curve-75C/clean_data')
print(raw_data.head(10))
f.close()


<KeysViewHDF5 ['Steady State Viscosity Curve', 'Steady State Viscosity Curve-35C', 'Steady State Viscosity Curve-55C', 'Steady State Viscosity Curve-60C', 'Steady State Viscosity Curve-65C', 'Steady State Viscosity Curve-75C', 'Steady State Viscosity Curve-LO30C', 'Steady State Viscosity Curve-LO40C', 'Steady State Viscosity Curve-LO45C', 'Steady State Viscosity Curve-LO50C', 'Steady State Viscosity Curve-LO70C', 'Steady State Viscosity Curve-LO80C']>
<KeysViewHDF5 ['clean_data', 'raw_data']>
    1    2       3          4      5         6        7
0   1   10    0.01  0.0028294  282.9  0.092729  Dy_auto
1   2   20  0.0147  0.0029057    198   0.09523  Dy_auto
2   3   30  0.0215  0.0031513  146.3   0.10328  Dy_auto
3   4   40  0.0316  0.0033997  107.5   0.11142  Dy_auto
4   5   50  0.0464   0.003447   74.3   0.11297  Dy_auto
5   6   60  0.0681  0.0035772   52.5   0.11724  Dy_auto
6   7   70     0.1   0.003675   36.8   0.12044  Dy_auto
7   8   80   0.147  0.0042936   29.3   0.14072  Dy_aut

Note: make sure to close your HDF5 file after looking at it.

## Pre-exsiting Metadata  and Adding Metadata


Finally, once the raw data is converted into the HDF5 format, we can add metadata to the different folders.  But before we can do that, the package already adds the column and unit metadata to each test subfolder.  Its saved under the attribute 'columns'.  Let's open up the HDF5 file and then check those out.

In [11]:
f = h5py.File("saved_data/Demo.hdf5", "r")
print(f["Project"]['Steady State Viscosity Curve-55C'].attrs["columns"])
f.close()

{"names": ["Point No.", "Time", "Shear Rate", "Shear Stress", "Viscosity", "Torque", "Status"], "units": ["NaN", "[s]", "[1/s]", "[Pa]", "[mPa\u00b7s]", "[\u00b5N\u00b7m]", "NaN"]}


Because HDF5 doesn't accept dictionaries as attribuates, this column and unit metadata are added in as a JSON and hence the weird encoding such as "mPa\u00b7s."  One can convert the JSON to a python dictionary using the code below.

In [12]:
import json 

f = h5py.File("saved_data/Demo.hdf5", "r")
cols_metadata_json = f["Project"]['Steady State Viscosity Curve-55C'].attrs["columns"]
cols_metadata = json.loads(cols_metadata_json)
f.close()
print(type(cols_metadata))
print(cols_metadata.keys())
print(cols_metadata['units'])

<class 'dict'>
dict_keys(['names', 'units'])
['NaN', '[s]', '[1/s]', '[Pa]', '[mPa·s]', '[µN·m]', 'NaN']


In [13]:
project_metadata = {
    "Project_Name": "Demo",
    'Author': 'John Doe',
    "Doi": "https//8675309",
    'Test_Type': "Strain Sweep",
    'Polymer': ['polystyrene sulfonate', 'poly (4-vinylpyridine)'],
    "Instrument": "Anton Paar MCR32"
}

converter.add_project_metadata("saved_data/Demo.hdf5", project_metadata)

We can then check that this metadata made it into the file.

In [14]:
f = h5py.File("saved_data/Demo.hdf5", "r")
print(f.attrs["Project_Name"])
print(f.attrs["Author"])
print(f.attrs["Doi"])
print(f.attrs["Test_Type"])
print(f.attrs["Polymer"])
print(f.attrs["Instrument"])
f.close()

Demo
John Doe
https//8675309
Strain Sweep
['polystyrene sulfonate' 'poly (4-vinylpyridine)']
Anton Paar MCR32


We can also add metadata to the test subfolders using:

In [15]:
test_metadata = {
    'Steady State Viscosity Curve-LO80C':
    {
    "Temperature":80,
    "Test Type": "Strain Sweep",
    "Polyanion_MW":100000,
    "Polycation_MW": 100000,
    "Polyanion_Charge_Fraction": 100,
    "Polycation_Charge_Fraction": 100,
    "Salt_Type": "potassium bromide",
    "Salt_Concentration": 10,
    "Solvent": "water",
    "Solvent_concentration":25,
    "columns":[]
    },

    'Steady State Viscosity Curve-75C':
    {
    "Temperature":75,
    "Test Type": "Freq Sweep",
    "Polyanion_MW":500,
    "Polycation_MW": 5000,
    "Polyanion_Charge_Fraction": 100,
    "Polycation_Charge_Fraction": 100,
    "Salt_Type": "potassium bromide",
    "Salt_Concentration": 20,
    "Solvent": "water",
    "Solvent_concentration":10,
    "columns":[]
    }
}

converter.add_test_metadata(test_metadata)

We need to pass in a dictionary of dictionaries with the test names being the higher level keys and the information being the lower level keys.  We can then check to make sure this information was added to the file:

In [16]:
f = h5py.File("saved_data/Demo.hdf5", "r")
print(f["Project/Steady State Viscosity Curve-75C"].attrs["Temperature"])
print(f["Project/Steady State Viscosity Curve-LO80C"].attrs["Temperature"])

print(f["Project/Steady State Viscosity Curve-75C"].attrs["Polyanion_MW"])
print(f["Project/Steady State Viscosity Curve-LO80C"].attrs["Polyanion_MW"])

f.close()

75
80
500
100000
