# NMR Processing Overview

---

1. Split files into different categories.
    1. How many individual fids?
    2. How many array experiments?
    3. How are temperature sets stored?
    4. How are materials stored?
2. Develop / confirm metadata for those categories.
    + Cross reference with documentation provided by Trent.
    + Compare processing demo results to Trent's data. 
    + Meet with Trent to confirm assignments.
3. Prioritize subsets.
3. **Design Bokeh application**
4. Process subsets.

#### Set Local Data Path

Since the total available data is around 2 gb it may be stored in different locations on different machines. Define a base path to the data to simplify this.

---

In [1]:
# data_folder = '/home/tylerbiggs/data/Sep-2016-23Na'
data_folder = '/home/tyler/data/Sep-2016-23Na'

#### Import Packages

---

In [55]:
import nmrglue as ng
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import glob
import re
import os
%matplotlib inline

In [56]:
from trentnmr import *

## Splitting Files into Categories

Using glob to get all the folders in the data directory.

---

In [57]:
array_glob = data_folder + '/**/*arrays*.fid'
all_fid_glob = data_folder + '/**/*.fid'

array_folders = {fn for fn in glob.iglob(array_glob, recursive=True)}
all_fid_folders = {fn for fn in glob.iglob(all_fid_glob, recursive=True)}

fid_folders = all_fid_folders - array_folders

#### How many individual fids are there?

#### How many array files are there?

In [58]:
print('All fid folders: ', len(all_fid_folders))
print('Array folders: ', len(array_folders))
print('Fid folders: ', len(fid_folders))

All fid folders:  301
Array folders:  86
Fid folders:  215


#### How are temperatures stored?

Build some usefull regex strings.

In [59]:
deg_C_regx = "([0-9]*)C"
all_hz_glob = "([0-9])+(Hz)"
molarity_glob = "(\d*)p*(\d*)M-(AlOH3|LiOH|NaOH)*"
nmr_element_glob = "(27Al|23Na)"
gibbsite_glob = "(\w*-)(Gibbsite)"
milli_gram_glob = "(\d*)p(\d*)mg"

In [60]:
temperatures = [re.search(deg_C_regx, x).group() if re.search(deg_C_regx, x) else None for x in all_fid_folders]
# print(temperatures)

#### How are Materials stored?

**Gibbsite**

In [61]:
gibbsite_arrays = [re.search(gibbsite_glob, x).group() if re.search(gibbsite_glob, x) else None for x in all_fid_folders]
# print(gibbsite_arrays)

**Molarity**

In [62]:
molarities = [re.search(molarity_glob, x).group() if re.search(molarity_glob, x) else None for x in all_fid_folders]
# print(molarities)

#### Milligram Weights

In [63]:
milligrams = [re.search(milli_gram_glob, x).group() if re.search(milli_gram_glob, x) else None for x in all_fid_folders]
# print(milligrams)

### Dataframe

#### Folder Structure

In [64]:
all_fids = ['/'.join(os.path.normpath(path).split(os.sep)[5:]) for path in all_fid_folders]
# all_fids

In [65]:
data = {'filenames': all_fids}

df = pd.DataFrame(data)
df.head()

Unnamed: 0,filenames
0,23Na/27Al/0819G1-0p5M-AlOH3-3M-KOH-D2O/27Al-5t...
1,23Na/27Al/27Al-1M-AlNO3-reference-09-02-2016-3...
2,23Na/27Al/0819G1-0p1M-AlOH3-3M-LiOH-D2O/27Al-5...
3,VT/80C.fid
4,23Na/27Al/0808G1-0p15M-AlOH3-3M-NaOH-D2O/23Na-...


In [66]:
df['deg_C'] = df['filenames'].str.extract(deg_C_regx)
df.head()

  """Entry point for launching an IPython kernel.


Unnamed: 0,filenames,deg_C
0,23Na/27Al/0819G1-0p5M-AlOH3-3M-KOH-D2O/27Al-5t...,132.0
1,23Na/27Al/27Al-1M-AlNO3-reference-09-02-2016-3...,
2,23Na/27Al/0819G1-0p1M-AlOH3-3M-LiOH-D2O/27Al-5...,25.0
3,VT/80C.fid,80.0
4,23Na/27Al/0808G1-0p15M-AlOH3-3M-NaOH-D2O/23Na-...,100.0


In [67]:
al_molarity_regex = "(\d*)p*(\d*)M-AlOH3"

In [68]:
raw_al_molarity = df['filenames'].str.extract(al_molarity_regex)

  """Entry point for launching an IPython kernel.


In [69]:
df['al_molarity'] = raw_al_molarity.loc[:, 0] + '.' + raw_al_molarity.loc[:, 1]
df.head()

Unnamed: 0,filenames,deg_C,al_molarity
0,23Na/27Al/0819G1-0p5M-AlOH3-3M-KOH-D2O/27Al-5t...,132.0,0.5
1,23Na/27Al/27Al-1M-AlNO3-reference-09-02-2016-3...,,
2,23Na/27Al/0819G1-0p1M-AlOH3-3M-LiOH-D2O/27Al-5...,25.0,0.1
3,VT/80C.fid,80.0,
4,23Na/27Al/0808G1-0p15M-AlOH3-3M-NaOH-D2O/23Na-...,100.0,0.15


In [70]:
raw_mg = df['filenames'].str.extract(milli_gram_glob)
df['mg'] = raw_mg.loc[:, 0] + '.' + raw_mg.loc[:, 1]
df.head()

  """Entry point for launching an IPython kernel.


Unnamed: 0,filenames,deg_C,al_molarity,mg
0,23Na/27Al/0819G1-0p5M-AlOH3-3M-KOH-D2O/27Al-5t...,132.0,0.5,
1,23Na/27Al/27Al-1M-AlNO3-reference-09-02-2016-3...,,,
2,23Na/27Al/0819G1-0p1M-AlOH3-3M-LiOH-D2O/27Al-5...,25.0,0.1,
3,VT/80C.fid,80.0,,
4,23Na/27Al/0808G1-0p15M-AlOH3-3M-NaOH-D2O/23Na-...,100.0,0.15,


In [71]:
# df['filenames'].str.extract(gibbsite_glob)

# New Folder Globs

Separate based on parent folder into disparate data frames.

---

In [72]:
# Did you know that ls can even do tab-complete?
%ls /home/tyler/data/Sep-2016-23Na/23Na/27Al/background/

[0m[01;34m23Na-empty-rotor-2845Hz-512ctF.fid[0m/  [01;34m27Al-empty-rotor-2845Hz-512ctF.fid[0m/
[01;34m23Na-empty-rotor-2845Hz-512ct.fid[0m/   [01;34m27Al-empty-rotor-2845Hz-512ct.fid[0m/


In [73]:
vt_fid_base = os.path.join(data_folder, 'VT')
na_base = os.path.join(data_folder, '23Na')
al_base = os.path.join(na_base, '27Al')
al_base_standard = os.path.join(al_base, 'background')
al_base_background = os.path.join(al_base, 'standard')

In [78]:
array_glob = '/*arrays*.fid'
fid_glob = '/*.fid'

In [84]:
sodium_temp_fids = {fn for fn in glob.iglob(vt_fid_base + fid_glob, recursive=False)}
na_base_fids = {fn for fn in glob.iglob(na_base + fid_glob, recursive=False)}
al_base_fids = {fn for fn in glob.iglob(al_base + fid_glob, recursive=False)}