# NMR Processing Overview

---

1. Split files into different categories.
    1. How many individual fids?
    2. How many array experiments?
    3. How are temperature sets stored?
    4. How are materials stored?
2. Develop / confirm metadata for those categories.
    + Cross reference with documentation provided by Trent.
    + Compare processing demo results to Trent's data. 
    + Meet with Trent to confirm assignments.
3. Prioritize subsets.
3. **Design Bokeh application**
4. Process subsets.

#### Set Local Data Path

---

In [13]:
data_folder = '/home/tylerbiggs/data/Sep-2016-23Na'

#### Import Packages

---

In [14]:
import nmrglue as ng
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import glob
import re
import os
%matplotlib inline

In [15]:
from trentnmr import *

## Splitting Files into Categories

Using glob to get all the folders in the data directory.

---

In [16]:
array_glob = data_folder + '/**/*arrays*.fid'
all_fid_glob = data_folder + '/**/*.fid'

array_folders = {fn for fn in glob.iglob(array_glob, recursive=True)}
all_fid_folders = {fn for fn in glob.iglob(all_fid_glob, recursive=True)}

fid_folders = all_fid_folders - array_folders

#### How many individual fids are there?

#### How many array files are there?

In [17]:
print('All fid folders: ', len(all_fid_folders))
print('Array folders: ', len(array_folders))
print('Fid folders: ', len(fid_folders))

All fid folders:  301
Array folders:  86
Fid folders:  215


#### How are temperatures stored?

Build some usefull regex strings.

In [151]:
deg_C_regx = "([0-9]*)C"
all_hz_glob = "([0-9])+(Hz)"
molarity_glob = "(\d*)p*(\d*)M-(AlOH3|LiOH|NaOH)*"
nmr_element_glob = "(27Al|23Na)"
gibbsite_glob = "(\w*-)(Gibbsite)"
milli_gram_glob = "(\d*)p(\d*)mg"

In [152]:
temperatures = [re.search(deg_C_regx, x).group() if re.search(deg_C_regx, x) else None for x in all_fid_folders]
# print(temperatures)

#### How are Materials stored?

**Gibbsite**

In [153]:
gibbsite_arrays = [re.search(gibbsite_glob, x).group() if re.search(gibbsite_glob, x) else None for x in all_fid_folders]
# print(gibbsite_arrays)

**Molarity**

In [154]:
molarities = [re.search(molarity_glob, x).group() if re.search(molarity_glob, x) else None for x in all_fid_folders]
# print(molarities)

#### Milligram Weights

In [155]:
milligrams = [re.search(milli_gram_glob, x).group() if re.search(milli_gram_glob, x) else None for x in all_fid_folders]
# print(milligrams)

### Dataframe

#### Folder Structure

In [156]:
all_fids = ['/'.join(os.path.normpath(path).split(os.sep)[5:]) for path in all_fid_folders]
# all_fids

In [157]:
data = {'filenames': all_fids}

df = pd.DataFrame(data)
df.head()

Unnamed: 0,filenames
0,23Na/27Al/0808G1-0p15M-AlOH3-3M-NaOH-D2O/27Al-...
1,23Na/27Al/0808G1-0p15M-AlOH3-3M-NaOH-D2O/23Na-...
2,VT/130C-down.fid
3,23Na/27Al/0808G1-0p15M-AlOH3-3M-NaOH-D2O/23Na-...
4,23Na/27Al/standard/27Al-0808G1-gibbsite-23p9mg...


In [161]:
df['deg_C'] = df['filenames'].str.extract(deg_C_regx)
df.head()

  """Entry point for launching an IPython kernel.


Unnamed: 0,filenames,deg_C
0,23Na/27Al/0808G1-0p15M-AlOH3-3M-NaOH-D2O/27Al-...,130
1,23Na/27Al/0808G1-0p15M-AlOH3-3M-NaOH-D2O/23Na-...,50
2,VT/130C-down.fid,130
3,23Na/27Al/0808G1-0p15M-AlOH3-3M-NaOH-D2O/23Na-...,60
4,23Na/27Al/standard/27Al-0808G1-gibbsite-23p9mg...,25


In [166]:
al_molarity_regex = "(\d*)p*(\d*)M-AlOH3"

In [176]:
raw_al_molarity = df['filenames'].str.extract(al_molarity_regex)

  """Entry point for launching an IPython kernel.


In [185]:
df['al_molarity'] = raw_al_molarity.loc[:, 0] + '.' + raw_al_molarity.loc[:, 1]
df.head()

Unnamed: 0,filenames,deg_C,al_molarity
0,23Na/27Al/0808G1-0p15M-AlOH3-3M-NaOH-D2O/27Al-...,130,0.15
1,23Na/27Al/0808G1-0p15M-AlOH3-3M-NaOH-D2O/23Na-...,50,0.15
2,VT/130C-down.fid,130,
3,23Na/27Al/0808G1-0p15M-AlOH3-3M-NaOH-D2O/23Na-...,60,0.15
4,23Na/27Al/standard/27Al-0808G1-gibbsite-23p9mg...,25,


In [187]:
raw_mg = df['filenames'].str.extract(milli_gram_glob)
df['mg'] = raw_mg.loc[:, 0] + '.' + raw_mg.loc[:, 1]
df.head()

  """Entry point for launching an IPython kernel.


Unnamed: 0,filenames,deg_C,al_molarity,mg
0,23Na/27Al/0808G1-0p15M-AlOH3-3M-NaOH-D2O/27Al-...,130,0.15,
1,23Na/27Al/0808G1-0p15M-AlOH3-3M-NaOH-D2O/23Na-...,50,0.15,
2,VT/130C-down.fid,130,,
3,23Na/27Al/0808G1-0p15M-AlOH3-3M-NaOH-D2O/23Na-...,60,0.15,
4,23Na/27Al/standard/27Al-0808G1-gibbsite-23p9mg...,25,,23.9


In [188]:
df['filenames'].str.extract(gibbsite_glob)

  """Entry point for launching an IPython kernel.


Unnamed: 0,0,1
0,,
1,,
2,,
3,,
4,,
5,,
6,,
7,,
8,,
9,,
