# NMR Processing Overview

---

1. Split files into different categories.
    1. How many individual fids?
    2. How many array experiments?
    3. How are temperature sets stored?
    4. How are materials stored?
2. Develop / confirm metadata for those categories.
    + Cross reference with documentation provided by Trent.
    + Compare processing demo results to Trent's data. 
    + Meet with Trent to confirm assignments.
3. Prioritize subsets.
3. **Design Bokeh application**
4. Process subsets.

#### Set Local Data Path

---

Since the total available data is around 2 gb it may be stored in different locations on different machines. Define a base path to the data to simplify this.

In [1]:
# data_folder = '/home/tylerbiggs/data/Sep-2016-23Na'
data_folder = '/home/tyler/data/Sep-2016-23Na'

In [12]:
import nmrglue as ng
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import itertools
import multiprocessing as mp
import glob
import re
import os
%matplotlib inline

In [3]:
from trentnmr import *

# File Structure

---

From `tree -I *.fid` to find all non-fid directories.

```bash
└── Sep-2016-23Na
    ├── 23Na
    │   └── 27Al
    │       ├── 0808G1-0p15M-AlOH3-3M-NaOH-D2O
    │       ├── 0808G1-0p5M-AlOH3-3M-NaOH-D2O
    │       ├── 0808G1-1M-AlOH3-3M-NaOH-D2O
    │       ├── 0819G1-0p1M-AlOH3-3M-LiOH-D2O
    │       ├── 0819G1-0p5M-AlOH3-3M-KOH-D2O
    │       ├── 0819G1-0p5M-AlOH3-3M-LiOH-D2O
    │       ├── 0819G1-1M-AlOH3-3M-NaOH-D2O
    │       ├── background
    │       └── standard
    └── VT

```

Which seems like and error. Re-ordering to:

```bash
└── Sep-2016-23Na
    ├── 23Na
    ├── 27Al
    │   ├── 0808G1-0p15M-AlOH3-3M-NaOH-D2O
    │   ├── 0808G1-0p5M-AlOH3-3M-NaOH-D2O
    │   ├── 0808G1-1M-AlOH3-3M-NaOH-D2O
    │   ├── 0819G1-0p1M-AlOH3-3M-LiOH-D2O
    │   ├── 0819G1-0p5M-AlOH3-3M-KOH-D2O
    │   ├── 0819G1-0p5M-AlOH3-3M-LiOH-D2O
    │   ├── 0819G1-1M-AlOH3-3M-NaOH-D2O
    │   ├── background
    │   └── standard
    └── VT

```

## Glob Parent Folders

---

In [4]:
# Sodium folders.
VT   = os.path.join(data_folder, 'VT')
Na23 = os.path.join(data_folder, '23Na')

# Aluminum folders.
Al27 = os.path.join(data_folder, '27Al')
# Aluminum sub-paths.
sub_paths_strings = [
    "0808G1-0p15M-AlOH3-3M-NaOH-D2O",
    "0808G1-0p5M-AlOH3-3M-NaOH-D2O",
    "0808G1-1M-AlOH3-3M-NaOH-D2O",
    "0819G1-0p1M-AlOH3-3M-LiOH-D2O",
    "0819G1-0p5M-AlOH3-3M-KOH-D2O",
    "0819G1-0p5M-AlOH3-3M-LiOH-D2O",
    "0819G1-1M-AlOH3-3M-NaOH-D2O",
    "background",
    "standard"
]

Al_sub_paths = [os.path.join(Al27, p) for p in sub_paths_strings]

In [5]:
Al_sub_paths.append(Al27)
sodium_paths = [VT, Na23]

## Glob Helper Functions

---

In [6]:
array_glob = '/*arrays*.fid'
fid_glob = '/*.fid'
special_files = ['reference', 'REF', 'calibration', 'pwX90', 'static',
                 'spin-up', 'without-liquid']

def nmr_glob(path):
    arrays = {fn for fn in glob.iglob(path + array_glob, recursive=False)}
    fids = {fn for fn in glob.iglob(path + fid_glob, recursive=False)}
    
    other_fids = set()
    for f in fids:
        if any(sf in f for sf in special_files):
            other_fids.add(f)

    fids = fids - other_fids
            
    return [list(x) for x in [arrays, fids, other_fids]]


def trim_folder(folders):
    return ['/'.join(os.path.normpath(path).split(os.sep)[5:]) for path in folders]


def process_group(path_list):
    array, fid, other = list(), list(), list()
    for path in path_list:
        a, f, o = nmr_glob(path)
        if a: array.append(a)
        if f: fid.append(f)        
        if o: other.append(o)
        
    return array, fid, other

## Running the Globs

---

In [13]:
al_array, al_fid, al_other = process_group(Al_sub_paths)
na_array, na_fid, na_other = process_group(sodium_paths)

# Processing the .fid Files

---

### Convert to NMRPipe files

In [14]:
al_data_dict = {f: read_varian_as_nmrpipe(f) for f in 
                itertools.chain.from_iterable(al_array)}

### Auto-Process Data

In [19]:
def mp_proc_fid(dic, data):
    return process_fid(dic, data)

def pool_nmr_proc(processes=mp.cpu_count()):
    pool = mp.Pool(processes=processes)
    results = [pool.apply_async(mp_proc_fid, args=(value[0], value[1])) for f, value in al_data_dict.items()]
    results = [p.get() for p in results]
    return results

In [20]:
processed_dict = pool_nmr_proc()

Optimization terminated successfully.
Optimization terminated successfully.
         Current function value: 34604478661226659840000.000000
         Iterations: 12
         Function evaluations: 34
         Current function value: 10856146804102135808000.000000
         Iterations: 5
         Function evaluations: 15
Optimization terminated successfully.
Optimization terminated successfully.
         Current function value: 34432858090270359552000.000000
         Current function value: 10856146804102135808000.000000
         Iterations: 12
         Iterations: 5
         Function evaluations: 15
         Function evaluations: 34
Optimization terminated successfully.
         Current function value: 31090042042580992000.000000
         Iterations: 13
         Function evaluations: 40
Optimization terminated successfully.
         Current function value: 9643056278011904000.000000
Optimization terminated successfully.
         Iterations: 62
         Current function value: 258403759298

In [37]:
peak_dict = {f: find_nmr_peaks(value) for f, value, uc in processed_dict}

TypeError: unhashable type: 'dict'

In [35]:
processed_dict[0][2]

<nmrglue.fileio.fileiobase.unit_conversion at 0x7f5821de5940>

In [21]:
peak_dict = {f: find_nmr_peaks(value[1]) for f, value in processed_dict.items()}

AttributeError: 'list' object has no attribute 'items'