# NMR Processing Overview

---

1. Split files into different categories.
    1. How many individual fids?
    2. How many array experiments?
    3. How are temperature sets stored?
    4. How are materials stored?
2. Develop / confirm metadata for those categories.
    + Cross reference with documentation provided by Trent.
    + Compare processing demo results to Trent's data. 
    + Meet with Trent to confirm assignments.
3. Prioritize subsets.
3. **Design Bokeh application**
4. Process subsets.

#### Set Local Data Path

---

In [2]:
data_folder = '/home/tyler/data/Sep-2016-23Na'

#### Import Packages

---

In [18]:
import nmrglue as ng
import numpy as np
import matplotlib.pyplot as plt
import glob
import re
import os
%matplotlib inline

In [19]:
from trentnmr import *

## Splitting Files into Categories

Using glob to get all the folders in the data directory.

---

In [21]:
array_glob = data_folder + '/**/*arrays*.fid'
all_fid_glob = data_folder + '/**/*.fid'

array_folders = {fn for fn in glob.iglob(array_glob, recursive=True)}
all_fid_folders = {fn for fn in glob.iglob(all_fid_glob, recursive=True)}

fid_folders = all_fid_folders - array_folders

#### How many individual fids are there?

#### How many array files are there?

In [22]:
print('All fid folders: ', len(all_fid_folders))
print('Array folders: ', len(array_folders))
print('Fid folders: ', len(fid_folders))

All fid folders:  301
Array folders:  86
Fid folders:  215


#### Folder Structure

In [27]:
# [os.path.normpath(path).split(os.sep)[5:] for path in all_fid_folders]

#### How are temperatures stored?

Build some usefull regex strings.

In [7]:
deg_C_regx = "([0-9]*)C"
all_hz_glob = "([0-9])+(Hz)"
molarity_glob = "([0-9]*(p)*([0-9])+(M)-(AlOH3|LiOH|NaOH))+"
nmr_element_glob = "(27Al|23Na)"
gibbsite_glob = "(\w*-)(Gibbsite)"
milli_gram_glob = "(\d*)p(\d*)mg"

In [17]:
temperatures = [re.search(deg_C_regx, x).group() if re.search(deg_C_regx, x) else None for x in all_fid_folders]
print(temperatures)

# for file in all_fid_folders:
#     x = re.findall(deg_C_regx, file)
#     if x: print(x, file)
#     else: print("NO TEMPERATURE FOUND",file)

['70C', '60C', '25C', '100C', '170C', '70C', '50C', '25C', '132C', '170C', '80C', '25C', '90C', '120C', '25C', '25C', '25C', '140C', '25C', '120C', '60C', '170C', '132C', '120C', '25C', '25C', '25C', '40C', 'C', '35C', '25C', '80C', '25C', None, '132C', '50C', '25C', '25C', '130C', '130C', '132C', '132C', '25C', '70C', '120C', '132C', None, '25C', '25C', '90C', '80C', 'C', '25C', '132C', '25C', '25C', '132C', '80C', '25C', '25C', '132C', '25C', '170C', '25C', '25C', '110C', '25C', '25C', '132C', '132C', None, '25C', '25C', '25C', '25C', '140C', '25C', '132C', '50C', '70C', '140C', '25C', '25C', '25C', None, '25C', '25C', '90C', '25C', '132C', '130C', '25C', '25C', '40C', '120C', '25C', '110C', '70C', '135C', '25C', '25C', '25C', '25C', '70C', '25C', '25C', '25C', '132C', '25C', '25C', '120C', '132C', '132C', '110C', '132C', '135C', '25C', '25C', '25C', '25C', '60C', '25C', '132C', '100C', '35C', '25C', '132C', '25C', '25C', '132C', None, '132C', '110C', '25C', '105C', '100C', '90C', No

#### How are Materials stored?

**Gibbsite**

In [14]:
gibbsite_arrays = [re.search(gibbsite_glob, x).group() if re.search(gibbsite_glob, x) else None for x in all_fid_folders]
print(gibbsite_arrays)

[None, None, None, None, None, None, None, None, None, None, None, None, 'concentrated-Gibbsite', None, None, None, None, 'concentrated-Gibbsite', None, None, None, 'concentrated-Gibbsite', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'concentrated-Gibbsite', 'concentrated-Gibbsite', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'concentrated-Gibbsite', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'concentrated-Gibbsite', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'concentrated-Gibbsite', None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'concentrated-Gibbsite', None, None, None, 'concentrate

**Molarity**

In [10]:
# for file in all_fid_folders:
#     x = re.findall(molarity_glob, file)
#     if x: print(x, file)
#     else: print("NO MOLARITY FOUND",file)

#### Milligram Weights

In [16]:
milligrams = [re.search(milli_gram_glob, x).group() if re.search(milli_gram_glob, x) else None for x in all_fid_folders]
print(milligrams)

# for file in all_fid_folders:
#     x = re.findall(deg_C_regx, file)
#     if x: print(x, file)
#     else: print("NO TEMPERATURE FOUND",file)

[None, None, None, None, None, None, None, '23p9mg', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, '18p7mg', None, None, None, None, None, '23p9mg', None, None, None, None, None, '54p8mg', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, '23p9mg', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, '54p8mg', '18p7mg', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, '18p7mg', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, '18p7mg', None, None, None, None, None, None, None, Non