## Create NDDataset objects

In [1]:
from spectrochempy import *


        SpectroChemPy's API
        Version   : 0.1a2.7
        Copyright : 2014-2017, LCS - Laboratory for Catalysis and Spectrochempy
            


Multidimensional array are defined in Spectrochempy using the ``NDDataset`` object.

``NDDataset`` objects mostly behave as numpy's `numpy.ndarray`.

However, unlike raw numpy's ndarray, the presence of optional properties such
as `uncertainty`, `mask`, `units`, `axes`, and axes `labels` make them
(hopefully) more appropriate for handling spectroscopic information, one of
the major objectives of the SpectroChemPy package.

Additional metadata can also be added to the instances of this class through the
`meta` properties.

### Create a ND-Dataset from scratch

In the following example, a minimal 1D dataset is created from a simple list, to which we can add some metadata:

In [2]:
da = NDDataset([1,2,3])
da.title = 'intensity'   
da.description = 'Some experimental measurements'
da.units = 'dimensionless'
da

0,1
Id/Name,a8a1b694
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-07-22 12:58:13.962081
,
Last Modified,2017-07-22 12:58:13.962637
,
Description,Some experimental measurements
,

0,1
Title,intensity
,
Size,3
,
Units,dimensionless
,
Values,[ 1 2 3]
,


Except few addtional metadata such `author`, `created` ..., there is not much
differences with respect to a conventional `numpy.ndarray`. For example, one
can apply numpy ufunc's directly to a NDDataset or make basic arithmetic
operation with these objects:

In [3]:
da2 = np.sqrt(da**3)
da2

0,1
Id/Name,a8a5536c
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-07-22 12:58:13.988073
,
Last Modified,2017-07-22 12:58:13.988194
,
Description,Some experimental measurements
,

0,1
Title,intensity
,
Size,3
,
Units,dimensionless
,
Values,[ 1 2.83 5.2]
,


In [4]:
da3 = da + da/2.
da3

0,1
Id/Name,a8a7f680
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-07-22 12:58:14.005359
,
Last Modified,2017-07-22 12:58:14.005477
,
Description,Some experimental measurements
,

0,1
Title,intensity
,
Size,3
,
Units,dimensionless
,
Values,[ 1.5 3 4.5]
,


### Create a NDDataset : full example

There are many ways to create |NDDataset| objects.

Above we have created a NDDataset from a simple list, but it is generally more
convenient to create `numpy.ndarray`).

Below is an example of a 3D-Dataset created from a ``numpy.ndarray`` to which axes can be added. 

Let's first create the 3 one-dimensional axes, for which we can define labels, units, and masks! 

In [5]:
axe0 = Axis(coords = np.linspace(200., 300., 3),
            labels = ['cold', 'normal', 'hot'],
            mask = None,
            units = "K",
            title = 'temperature')

axe1 = Axis(coords = np.linspace(0., 60., 100),
            labels = None,
            mask = None,
            units = "minutes",
            title = 'time-on-stream')

axe2 = Axis(coords = np.linspace(4000., 1000., 100),
            labels = None,
            mask = None,
            units = "cm^-1",
            title = 'wavenumber')

Here is the displayed info for axe1 for instance:

In [6]:
axe1

0,1
Title,Time-on-stream
,
Coordinates,"[ 0 0.606 ..., 59.4 60]"
,
Units,min
,


Now we create some 3D data (a ``numpy.ndarray``):

In [7]:
nd_data=np.array([np.array([np.sin(axe2.data*2.*np.pi/4000.)*np.exp(-y/60.) for y in axe1.data])*float(t) 
         for t in axe0.data])**2

The dataset is now created with these data and axis. All needed information are passed as parameter of the 
NDDataset instance constructor. 

In [8]:
mydataset = NDDataset(nd_data,
               axes = [axe0, axe1, axe2],
               title='Absorbance',
               units='absorbance'
              )

mydataset.description = """Dataset example created for this tutorial. 
It's a 3-D dataset (with dimensionless intensity)"""

mydataset.author = 'Tintin and Milou'

We can get some information about this object:

In [9]:
mydataset

0,1
Id/Name,a8afd24c
,
Author,Tintin and Milou
,
Created,2017-07-22 12:58:14.050973
,
Last Modified,2017-07-22 12:58:14.051814
,
Description,Dataset example created for this tutorial. It's a 3-D dataset (with  dimensionless intensity)
,

0,1
Title,Absorbance
,
Size,3 x 100 x 100
,
Units,dimensionless
,
Values,"[[[ 2.4e-27 90.6 ..., 3.99e+04 4e+04]  [2.35e-27 88.8 ..., 3.91e+04 3.92e+04]  ..., [3.31e-28 12.5 ..., 5.51e+03 5.52e+03]  [3.25e-28 12.3 ..., 5.4e+03 5.41e+03]]  [[3.75e-27 142 ..., 6.24e+04 6.25e+04]  [3.67e-27 139 ..., 6.11e+04 6.13e+04]  ..., [5.18e-28 19.5 ..., 8.61e+03 8.63e+03]  [5.07e-28 19.2 ..., 8.44e+03 8.46e+03]]  [[ 5.4e-27 204 ..., 8.98e+04 9e+04]  [5.29e-27 200 ..., 8.8e+04 8.82e+04]  ..., [7.46e-28 28.1 ..., 1.24e+04 1.24e+04]  [7.31e-28 27.6 ..., 1.22e+04 1.22e+04]]]"
,

0,1
Title,Temperature
,
Coordinates,[ 200 250 300]
,
Units,K
,
Labels,['cold' 'normal' 'hot']
,

0,1
Title,Time-on-stream
,
Coordinates,"[ 0 0.606 ..., 59.4 60]"
,
Units,min
,

0,1
Title,Wavenumber
,
Coordinates,"[ 4e+03 3.97e+03 ..., 1.03e+03 1e+03]"
,
Units,cm-1
,


### Copying existing NDDataset

To copy an existing dataset, this is as simple as:

In [10]:
da_copy = da.copy()

or alternatively:

In [11]:
da_copy = da[:]

Finally, it is also possible to initialize a dataset using an existing one:

In [12]:
dc = NDDataset(da3, title='Absorbance')
dc

0,1
Id/Name,a8a7f680
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-07-22 12:58:14.084925
,
Last Modified,2017-07-22 12:58:14.085245
,
Description,
,

0,1
Title,Absorbance
,
Size,3
,
Units,dimensionless
,
Values,[ 1.5 3 4.5]
,


#### See also

Any numpy creation function can be used to set up the initial dataset array:
       [numpy array creation routines](https://docs.scipy.org/doc/numpy/reference/routines.array-creation.html#routines-array-creation)



### Importing from external dataset

NDDataset can be created from the importation of external data

In [13]:
source = NDDataset.read_omnic(os.path.join(data_dir, 'irdata', 'NH4Y-activation.SPG'))
source

0,1
Id/Name,NH4Y-activation.SPG
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-07-22 12:58:14.245630
,
Last Modified,2017-07-22 12:58:14.245630
,
Description,"Dataset from spg file : NH4Y-activation.SPG History of the 1st  spectrum: vz0521.spa, Thu Jul 07 06:10:41 2016  (GMT+02:00)"
,

0,1
Title,Absorbance
,
Size,55 x 5549
,
Units,dimensionless
,
Values,"[[ 2.06 2.06 ..., 2.01 2.01]  [ 2.03 2.04 ..., 1.91 1.91]  ..., [ 1.79 1.79 ..., 1.2 1.2]  [ 1.82 1.82 ..., 1.24 1.24]]"
,

0,1
Title,Acquisition timestamp (gmt)
,
Coordinates,"[1.47e+09 1.47e+09 ..., 1.47e+09 1.47e+09]"
,
Units,s
,
Labels,"[[datetime.datetime(2016, 7, 6, 19, 3, 14, tzinfo=datetime.timezone.utc)  datetime.datetime(2016, 7, 6, 19, 13, 14, tzinfo=datetime.timezone.utc)  ...,  datetime.datetime(2016, 7, 7, 4, 3, 17, tzinfo=datetime.timezone.utc)  datetime.datetime(2016, 7, 7, 4, 13, 17, tzinfo=datetime.timezone.utc)]  ['vz0466.spa, Wed Jul 06 21:00:38 2016 (GMT+02:00)'  'vz0467.spa, Wed Jul 06 21:10:38 2016 (GMT+02:00)' ...,  'vz0520.spa, Thu Jul 07 06:00:41 2016 (GMT+02:00)'  'vz0521.spa, Thu Jul 07 06:10:41 2016 (GMT+02:00)']]"
,

0,1
Title,Wavenumbers
,
Coordinates,"[ 6e+03 6e+03 ..., 651 650]"
,
Units,cm-1
,


## Slicing a NDDataset

NDDataset can be sliced like conventional numpy-array...

*e.g.,*:

1. by index, using a slice such as [3], [0:10], [:, 3:4], [..., 5:10], ...

2. by values, using a slice such as [3000.0:3500.0], [..., 300.0], ...

3. by labels, using a slice such as ['monday':'friday'], ...

In [14]:
new = mydataset[..., 0]
new

0,1
Id/Name,*a8afd24c
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-07-22 12:58:14.262061
,
Last Modified,2017-07-22 12:58:14.263098
,
Description,Dataset example created for this tutorial. It's a 3-D dataset (with  dimensionless intensity)
,

0,1
Title,Absorbance
,
Size,3 x 100 x 1
,
Units,dimensionless
,
Values,"[[[ 2.4e-27]  [2.35e-27]  ..., [3.31e-28]  [3.25e-28]]  [[3.75e-27]  [3.67e-27]  ..., [5.18e-28]  [5.07e-28]]  [[ 5.4e-27]  [5.29e-27]  ..., [7.46e-28]  [7.31e-28]]]"
,

0,1
Title,Temperature
,
Coordinates,[ 200 250 300]
,
Units,K
,
Labels,['cold' 'normal' 'hot']
,

0,1
Title,Time-on-stream
,
Coordinates,"[ 0 0.606 ..., 59.4 60]"
,
Units,min
,

0,1
Title,Wavenumber
,
Coordinates,[ 4e+03]
,
Units,cm-1
,


or using the axes labels:

In [15]:
new = mydataset['hot']

Single-element dimension are kept but can also be squeezed easily:

In [16]:
new = new.squeeze()
new

0,1
Id/Name,a8d308f4
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-07-22 12:58:14.287630
,
Last Modified,2017-07-22 12:58:14.288180
,
Description,Dataset example created for this tutorial. It's a 3-D dataset (with  dimensionless intensity)
,

0,1
Title,Absorbance
,
Size,100 x 100
,
Units,dimensionless
,
Values,"[[ 5.4e-27 204 ..., 8.98e+04 9e+04]  [5.29e-27 200 ..., 8.8e+04 8.82e+04]  ..., [7.46e-28 28.1 ..., 1.24e+04 1.24e+04]  [7.31e-28 27.6 ..., 1.22e+04 1.22e+04]]"
,

0,1
Title,Time-on-stream
,
Coordinates,"[ 0 0.606 ..., 59.4 60]"
,
Units,min
,

0,1
Title,Wavenumber
,
Coordinates,"[ 4e+03 3.97e+03 ..., 1.03e+03 1e+03]"
,
Units,cm-1
,


Be sure to use the correct type for slicing.

Floats are use for slicing by values

In [17]:
correct = mydataset[...,2000.]

In [18]:
outside_limits = mydataset[...,10000.]

The closest limit index is returned


<div class='alert alert-info'>**NOTE:**

If one use an integer value (2000), then the slicing is made **by index not by value**, and in the following particular case, an `IndexError` is issued as index 2000 does not exists (size along axis -1 is only 100, so that index vary between 0 and 99!).

</div>

When slicing by index, an error is generated is the index is out of limits:

In [19]:
try:
    fail = mydataset[...,2000]
except IndexError as e:
    log.error(e)

 ERROR | Empty array of shape (3, 100, 0) resulted from slicing.
Check the indexes and make sure to use floats for location slicing


One can mixed slicing methods for different dimension:

In [20]:
new = mydataset['normal':'hot', 0, 4000.0:2000.]
new

0,1
Id/Name,*a8afd24c
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-07-22 12:58:14.349137
,
Last Modified,2017-07-22 12:58:14.349827
,
Description,Dataset example created for this tutorial. It's a 3-D dataset (with  dimensionless intensity)
,

0,1
Title,Absorbance
,
Size,2 x 1 x 67
,
Units,dimensionless
,
Values,"[[[3.75e-27 142 ..., 142 9.37e-28]]  [[ 5.4e-27 204 ..., 204 1.35e-27]]]"
,

0,1
Title,Temperature
,
Coordinates,[ 250 300]
,
Units,K
,
Labels,['normal' 'hot']
,

0,1
Title,Time-on-stream
,
Coordinates,[ 0]
,
Units,min
,

0,1
Title,Wavenumber
,
Coordinates,"[ 4e+03 3.97e+03 ..., 2.03e+03 2e+03]"
,
Units,cm-1
,



## Loading of experimental data


### NMR Data

Now, lets load a NMR dataset (in the Bruker format).

The builtin **data_dir** variable contains a path to our *test*'s data:

In [21]:
# let check if this directory exists and display its actual content:
import os
if os.path.exists(data_dir):
    l = list_data_dir
print(list_data_dir)

testdata
|__irdata
   |__NH4Y-activation.SPG
|__mydataset.scp
|__nmrdata
   |__bruker
      |__tests
         |__nmr
            |__bruker_1d
               |__1
            |__bruker_2d
               |__1
               |__2
            |__bruker_3d
               |__1
               |__2
   |__simpson
      |__simpson_1d
         |__rr.in
      |__simpson_2d
         |__2d.in



In [22]:
path = os.path.join(data_dir, 'nmrdata','bruker', 'tests', 'nmr','bruker_1d')

# load the data in a new dataset
ndd = NDDataset()
ndd.read_bruker_nmr(path, expno=1, remove_digital_filter=True)
ndd

0,1
Id/Name,a8e208de
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-07-22 12:58:14.369993
,
Last Modified,2017-07-22 12:58:14.385250
,
Description,
,

0,1
Title,
,
Size,2048 (complex)
,
Units,unitless
,
Values,"[ -0.419 -0.216 ..., 0 -0]"
,

0,1
Title,Acquisition time
,
Coordinates,"[ 0 100 ..., 2.05e+05 2.05e+05]"
,
Units,us
,


In [23]:
# view it...
ndd.plot() 

<IPython.core.display.Javascript object>

In [24]:
path = os.path.join(data_dir, 'nmrdata','bruker', 'tests', 'nmr','bruker_2d')

# load the data directly (no need to create the dataset first)
ndd2 = NDDataset.read_bruker_nmr(path, expno=1, remove_digital_filter=True)

# view it...
ndd2.x.to('s')
ndd2.y.to('ms')
fig2 = ndd2.plot() 
fig2

<IPython.core.display.Javascript object>

### IR data

In [25]:
source = NDDataset.read_omnic(os.path.join(data_dir, 'irdata', 'NH4Y-activation.SPG'))
source

0,1
Id/Name,NH4Y-activation.SPG
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-07-22 12:58:15.522995
,
Last Modified,2017-07-22 12:58:15.522995
,
Description,"Dataset from spg file : NH4Y-activation.SPG History of the 1st  spectrum: vz0521.spa, Thu Jul 07 06:10:41 2016  (GMT+02:00)"
,

0,1
Title,Absorbance
,
Size,55 x 5549
,
Units,dimensionless
,
Values,"[[ 2.06 2.06 ..., 2.01 2.01]  [ 2.03 2.04 ..., 1.91 1.91]  ..., [ 1.79 1.79 ..., 1.2 1.2]  [ 1.82 1.82 ..., 1.24 1.24]]"
,

0,1
Title,Acquisition timestamp (gmt)
,
Coordinates,"[1.47e+09 1.47e+09 ..., 1.47e+09 1.47e+09]"
,
Units,s
,
Labels,"[[datetime.datetime(2016, 7, 6, 19, 3, 14, tzinfo=datetime.timezone.utc)  datetime.datetime(2016, 7, 6, 19, 13, 14, tzinfo=datetime.timezone.utc)  ...,  datetime.datetime(2016, 7, 7, 4, 3, 17, tzinfo=datetime.timezone.utc)  datetime.datetime(2016, 7, 7, 4, 13, 17, tzinfo=datetime.timezone.utc)]  ['vz0466.spa, Wed Jul 06 21:00:38 2016 (GMT+02:00)'  'vz0467.spa, Wed Jul 06 21:10:38 2016 (GMT+02:00)' ...,  'vz0520.spa, Thu Jul 07 06:00:41 2016 (GMT+02:00)'  'vz0521.spa, Thu Jul 07 06:10:41 2016 (GMT+02:00)']]"
,

0,1
Title,Wavenumbers
,
Coordinates,"[ 6e+03 6e+03 ..., 651 650]"
,
Units,cm-1
,


In [26]:
source = read_omnic(NDDataset(), os.path.join(data_dir, 'irdata', 'NH4Y-activation.SPG'))

In [27]:
source.plot(kind='stack')

<IPython.core.display.Javascript object>


## Transposition

Dataset can be transposed

In [28]:
newT = new.T
newT

0,1
Id/Name,a9e3676e
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-07-22 12:58:16.072629
,
Last Modified,2017-07-22 12:58:16.072683
,
Description,Dataset example created for this tutorial. It's a 3-D dataset (with  dimensionless intensity)
,

0,1
Title,Absorbance
,
Size,67 x 1 x 2
,
Units,dimensionless
,
Values,"[[[3.75e-27 5.4e-27]]  [[ 142 204]]  ..., [[ 142 204]]  [[9.37e-28 1.35e-27]]]"
,

0,1
Title,Wavenumber
,
Coordinates,"[ 4e+03 3.97e+03 ..., 2.03e+03 2e+03]"
,
Units,cm-1
,

0,1
Title,Time-on-stream
,
Coordinates,[ 0]
,
Units,min
,

0,1
Title,Temperature
,
Coordinates,[ 250 300]
,
Units,K
,
Labels,['normal' 'hot']
,


## Units


Spectrochempy can do calculations with units - it uses [pint](https://pint.readthedocs.io) to define and perform operation on data with units.

### Create quantities

* to create quantity, use for instance, one of the following expression:

In [29]:
Quantity('10.0 cm^-1')

In [30]:
Quantity(1.0, 'cm^-1/hour')

In [31]:
Quantity(10.0, ur.cm/ur.km) 

or may be (?) simpler,

In [32]:
10.0 * ur.meter/ur.gram/ur.volt

`ur` stands for **unit registry**, which handle many type of units
(and conversion between them)

### Do arithmetics with units

In [33]:
a = 900 * ur.km
b = 4.5 * ur.hours
a/b

Such calculations can also be done using the following syntax, using a string expression

In [34]:
Quantity("900 km / (8 hours)")

### Convert between units

In [35]:
c = a/b
c.to('cm/s')

We can make the conversion *inplace* using *ito* instead of *to*

In [36]:
c.ito('m/s')
c

### Do math operations with consistent units

In [37]:
x = 10 * ur.radians
np.sin(x)

Consistency of the units are checked!

In [38]:
x = 10 * ur.meters
np.sqrt(x)

but this is wrong...

In [39]:
x = 10 * ur.meters
try:
    np.cos(x)
except DimensionalityError as e:
    log.error(e)

 ERROR | Cannot convert from 'meter' to 'radian'


Units can be set for NDDataset data and/or Axes

In [40]:
ds = NDDataset([1., 2., 3.], units='g/cm^3', title='concentration')
ds

0,1
Id/Name,a9f36d6c
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-07-22 12:58:16.176275
,
Last Modified,2017-07-22 12:58:16.176887
,
Description,
,

0,1
Title,concentration
,
Size,3
,
Units,g.cm-3
,
Values,[ 1 2 3]
,


In [41]:
ds.to('kg/m^3')

0,1
Id/Name,a9f36d6c
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-07-22 12:58:16.176275
,
Last Modified,2017-07-22 12:58:16.186854
,
Description,
,

0,1
Title,concentration
,
Size,3
,
Units,kg.m-3
,
Values,[ 1e+03 2e+03 3e+03]
,


## Uncertainties

Spectrochempy can do calculations with uncertainties (and units).

A quantity, with an `uncertainty` is called a **Measurement** .

Use one of the following expression to create such `Measurement`:

In [42]:
#Measurement(10.0, .2, 'cm')    TO FINISH (format doesn't work)

In [43]:
# Quantity(10.0, 'cm').pluminus(.2)   TO FINISH

## Numpy universal functions (ufunc's)

A numpy universal function (or `numpy.ufunc` for short) is a function that
operates on `numpy.ndarray` in an element-by-element fashion. It's
vectorized and so rather fast.

As SpectroChemPy NDDataset imitate the behaviour of numpy objects, many numpy
ufuncs can be applied directly.

For example, if you need all the elements of a NDDataset to be changed to the
squared rooted values, you can use the `numpy.sqrt` function:

In [44]:
da = NDDataset([1., 2., 3.])
da_sqrt = np.sqrt(da)
da_sqrt

0,1
Id/Name,a9f8b4ac
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-07-22 12:58:16.212233
,
Last Modified,2017-07-22 12:58:16.212346
,
Description,
,

0,1
Title,
,
Size,3
,
Units,unitless
,
Values,[ 1 1.41 1.73]
,


### Ufuns with NDDataset with units

When NDDataset have units, some restrictions apply on the use of ufuncs:

Some function functions accept only dimensionless quantities. This is the
case for example of logarithmic functions: :`exp` and `log`.

In [45]:
np.log10(da)

0,1
Id/Name,a9fac242
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-07-22 12:58:16.225720
,
Last Modified,2017-07-22 12:58:16.225901
,
Description,
,

0,1
Title,
,
Size,3
,
Units,unitless
,
Values,[ 0 0.301 0.477]
,


In [46]:
da.units = ur.cm

try:
    np.log10(da)
except DimensionalityError as e:
    log.error(e)

 ERROR | Cannot convert from 'centimeter' ([length]) to 'dimensionless' (dimensionless)


## Complex or hypercomplex NDDatasets


NDDataset objects with complex data are handled differently than in
`numpy.ndarray`.

Instead, complex data are stored by interlacing the real and imaginary part.
This allows the definition of data that can be complex in several axis, and *e
.g.,* allows 2D-hypercomplex array that can be transposed (useful for NMR data).

In [47]:
da = NDDataset([  [1.+2.j, 2.+0j], [1.3+2.j, 2.+0.5j],
...                   [1.+4.2j, 2.+3j], [5.+4.2j, 2.+3j ] ])
da

0,1
Id/Name,a9fe52fe
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-07-22 12:58:16.247628
,
Last Modified,2017-07-22 12:58:16.247671
,
Description,
,

0,1
Title,
,
Size,4 x 2(complex)
,
Units,unitless
,
Values,[[ 1 2 2 0]  [ 1.3 2 2 0.5]  [ 1 4.2 2 3]  [ 5 4.2 2 3]]
,


if the dataset is also complex in the first dimension (columns) then we
should have (note the shape description!):