## Create NDDataset objects

In [1]:
from spectrochempy.api import *


        SpectroChemPy's API
        Version   : 0.1a2.post81
        Copyright : 2014-2017 - LCS (Laboratory for Catalysis and Spectrochempy)
            


Multidimensional array are defined in Spectrochempy using the ``NDDataset`` object.

``NDDataset`` objects mostly behave as numpy's `numpy.ndarray`.

However, unlike raw numpy's ndarray, the presence of optional properties such
as `uncertainty`, `mask`, `units`, `axes`, and axes `labels` make them
(hopefully) more appropriate for handling spectroscopic information, one of
the major objectives of the SpectroChemPy package.

Additional metadata can also be added to the instances of this class through the
`meta` properties.

### Create a ND-Dataset from scratch

In the following example, a minimal 1D dataset is created from a simple list, to which we can add some metadata:

In [3]:
da = NDDataset([1,2,3])
da.title = 'intensity'   
da.description = 'Some experimental measurements'
da.units = 'dimensionless'
da

0,1
Id/Name,f5ce8968
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-11-06 21:35:00.285597
,
Last Modified,2017-11-06 21:35:00.286208
,
Description,Some experimental measurements
,

0,1
Title,intensity
,
Size,3
,
Units,dimensionless
,
Values,[ 1 2 3]
,


Except few addtional metadata such `author`, `created` ..., there is not much
differences with respect to a conventional `numpy.ndarray`. For example, one
can apply numpy ufunc's directly to a NDDataset or make basic arithmetic
operation with these objects:

In [4]:
da2 = np.sqrt(da**3)
da2

0,1
Id/Name,f8670a1c
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-11-06 21:35:04.640629
,
Last Modified,2017-11-06 21:35:04.640764
,
Description,Some experimental measurements
,

0,1
Title,intensity
,
Size,3
,
Units,dimensionless
,
Values,[ 1.000 2.828 5.196]
,


In [5]:
da3 = da + da/2.
da3

0,1
Id/Name,fc680c1a
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-11-06 21:35:11.358124
,
Last Modified,2017-11-06 21:35:11.358481
,
Description,Some experimental measurements
,

0,1
Title,intensity
,
Size,3
,
Units,dimensionless
,
Values,[ 1.500 3.000 4.500]
,


### Create a NDDataset : full example

There are many ways to create |NDDataset| objects.

Above we have created a NDDataset from a simple list, but it is generally more
convenient to create `numpy.ndarray`).

Below is an example of a 3D-Dataset created from a ``numpy.ndarray`` to which axes can be added. 

Let's first create the 3 one-dimensional coordinates, for which we can define labels, units, and masks! 

In [6]:
coord0 = Coord(data = np.linspace(200., 300., 3),
            labels = ['cold', 'normal', 'hot'],
            mask = None,
            units = "K",
            title = 'temperature')

coord1 = Coord(data = np.linspace(0., 60., 100),
            labels = None,
            mask = None,
            units = "minutes",
            title = 'time-on-stream')

coord2 = Coord(data = np.linspace(4000., 1000., 100),
            labels = None,
            mask = None,
            units = "cm^-1",
            title = 'wavenumber')

Here is the displayed info for coord1 for instance:

In [7]:
coord1

0,1
Title,Time-on-stream
,
Data,"[ 0.000 0.606 ..., 59.394 60.000]"
,
Units,min
,


Now we create some 3D data (a ``numpy.ndarray``):

In [8]:
nd_data=np.array([np.array([np.sin(coord2.data*2.*np.pi/4000.)*np.exp(-y/60.) for y in coord1.data])*float(t) 
         for t in coord0.data])**2

The dataset is now created with these data and axis. All needed information are passed as parameter of the 
NDDataset instance constructor. 

In [9]:
mydataset = NDDataset(nd_data,
               coordset = [coord0, coord1, coord2],
               title='Absorbance',
               units='absorbance'
              )

mydataset.description = """Dataset example created for this tutorial. 
It's a 3-D dataset (with dimensionless intensity)"""

mydataset.author = 'Tintin and Milou'

We can get some information about this object:

In [10]:
mydataset

0,1
Id/Name,02584680
,
Author,Tintin and Milou
,
Created,2017-11-06 21:35:21.320881
,
Last Modified,2017-11-06 21:35:21.322294
,
Description,Dataset example created for this tutorial. It's a 3-D dataset (with dimensionless intensity)
,

0,1
Title,Absorbance
,
Size,3 x 100 x 100
,
Units,dimensionless
,
Values,"[[[ 0.000 90.562 ..., 39909.438 40000.000]  [ 0.000 88.750 ..., 39111.277 39200.027]  ..., [ 0.000 12.506 ..., 5511.379 5523.885]  [ 0.000 12.256 ..., 5401.155 5413.411]]  [[ 0.000 141.502 ..., 62358.498 62500.000]  [ 0.000 138.672 ..., 61111.370 61250.042]  ..., [ 0.000 19.541 ..., 8611.530 8631.071]  [ 0.000 19.150 ..., 8439.305 8458.455]]  [[ 0.000 203.763 ..., 89796.237 90000.000]  [ 0.000 199.688 ..., 88000.372 88200.061]  ..., [ 0.000 28.139 ..., 12400.603 12428.742]  [ 0.000 27.576 ..., 12152.599 12180.175]]]"
,

0,1
Title,Temperature
,
Data,[ 200.000 250.000 300.000]
,
Units,K
,
Labels,[cold normal hot]
,

0,1
Title,Time-on-stream
,
Data,"[ 0.000 0.606 ..., 59.394 60.000]"
,
Units,min
,

0,1
Title,Wavenumber
,
Data,"[4000.000 3969.697 ..., 1030.303 1000.000]"
,
Units,cm-1
,


### Copying existing NDDataset

To copy an existing dataset, this is as simple as:

In [11]:
da_copy = da.copy()

or alternatively:

In [12]:
da_copy = da[:]

Finally, it is also possible to initialize a dataset using an existing one:

In [13]:
dc = NDDataset(da3, title='Absorbance')
dc

0,1
Id/Name,0542df74
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-11-06 21:35:11.358124
,
Last Modified,2017-11-06 21:35:26.214087
,
Description,
,

0,1
Title,Absorbance
,
Size,3
,
Units,dimensionless
,
Values,[ 1.500 3.000 4.500]
,


#### See also

Any numpy creation function can be used to set up the initial dataset array:
       [numpy array creation routines](https://docs.scipy.org/doc/numpy/reference/routines.array-creation.html#routines-array-creation)



### Importing from external dataset

NDDataset can be created from the importation of external data

In [14]:
import os
source = NDDataset.read_omnic(os.path.join(data, 'irdata', 'NH4Y-activation.SPG'))
source

0,1
Id/Name,NH4Y-activation.SPG
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-11-06 21:35:28.887558
,
Last Modified,2017-11-06 21:35:28.887558
,
Description,"Dataset from spg file : NH4Y-activation.SPG History of the 1st spectrum: vz0521.spa, Thu Jul 07 06:10:41 2016 (GMT+02:00)"
,

0,1
Title,Absorbance
,
Size,55 x 5549
,
Units,dimensionless
,
Values,"[[ 2.057 2.061 ..., 2.013 2.012]  [ 2.033 2.037 ..., 1.913 1.911]  ..., [ 1.794 1.791 ..., 1.198 1.198]  [ 1.816 1.815 ..., 1.240 1.238]]"
,

0,1
Title,Acquisition timestamp (gmt)
,
Data,"[1467831794.000 1467832394.000 ..., 1467864197.000 1467864797.000]"
,
Units,s
,
Labels,"[[2016-07-06 19:03:14+00:00 2016-07-06 19:13:14+00:00 ..., 2016-07-07 04:03:17+00:00 2016-07-07 04:13:17+00:00]  [vz0466.spa, Wed Jul 06 21:00:38 2016 (GMT+02:00) vz0467.spa, Wed Jul 06 21:10:38 2016 (GMT+02:00) ...,  vz0520.spa, Thu Jul 07 06:00:41 2016 (GMT+02:00) vz0521.spa, Thu Jul 07 06:10:41 2016 (GMT+02:00)]]"
,

0,1
Title,Wavenumbers
,
Data,"[5999.556 5998.591 ..., 650.868 649.904]"
,
Units,cm-1
,


## Slicing a NDDataset

NDDataset can be sliced like conventional numpy-array...

*e.g.,*:

1. by index, using a slice such as [3], [0:10], [:, 3:4], [..., 5:10], ...

2. by values, using a slice such as [3000.0:3500.0], [..., 300.0], ...

3. by labels, using a slice such as ['monday':'friday'], ...

In [15]:
new = mydataset[..., 0]
new

0,1
Id/Name,07d3abcc
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-11-06 21:35:30.518402
,
Last Modified,2017-11-06 21:35:30.519038
,
Description,Dataset example created for this tutorial. It's a 3-D dataset (with dimensionless intensity)
,

0,1
Title,Absorbance
,
Size,3 x 100
,
Units,dimensionless
,
Values,"[[ 0.000 0.000 ..., 0.000 0.000]  [ 0.000 0.000 ..., 0.000 0.000]  [ 0.000 0.000 ..., 0.000 0.000]]"
,

0,1
Title,Temperature
,
Data,[ 200.000 250.000 300.000]
,
Units,K
,
Labels,[cold normal hot]
,

0,1
Title,Time-on-stream
,
Data,"[ 0.000 0.606 ..., 59.394 60.000]"
,
Units,min
,


or using the axes labels:

In [16]:
new = mydataset['hot']

Single-element dimension are kept but can also be squeezed easily:

In [17]:
new = new.squeeze()
new

0,1
Id/Name,0905b602
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-11-06 21:35:32.524062
,
Last Modified,2017-11-06 21:35:32.524035
,
Description,Dataset example created for this tutorial. It's a 3-D dataset (with dimensionless intensity)
,

0,1
Title,Absorbance
,
Size,100 x 100
,
Units,dimensionless
,
Values,"[[ 0.000 203.763 ..., 89796.237 90000.000]  [ 0.000 199.688 ..., 88000.372 88200.061]  ..., [ 0.000 28.139 ..., 12400.603 12428.742]  [ 0.000 27.576 ..., 12152.599 12180.175]]"
,

0,1
Title,Time-on-stream
,
Data,"[ 0.000 0.606 ..., 59.394 60.000]"
,
Units,min
,

0,1
Title,Wavenumber
,
Data,"[4000.000 3969.697 ..., 1030.303 1000.000]"
,
Units,cm-1
,


Be sure to use the correct type for slicing.

Floats are use for slicing by values

In [18]:
correct = mydataset[...,2000.]

In [19]:
outside_limits = mydataset[...,10000.]

The closest limit index is returned


<div class='alert alert-info'>**NOTE:**

If one use an integer value (2000), then the slicing is made **by index not by value**, and in the following particular case, an `IndexError` is issued as index 2000 does not exists (size along axis -1 is only 100, so that index vary between 0 and 99!).

</div>

When slicing by index, an error is generated is the index is out of limits:

In [20]:
try:
    fail = mydataset[...,2000]
except IndexError as e:
    log.error(e)

 ERROR | Empty array of shape (3, 100, 0) resulted from slicing.
Check the indexes and make sure to use floats for location slicing


One can mixed slicing methods for different dimension:

In [21]:
new = mydataset['normal':'hot', 0, 4000.0:2000.]
new

0,1
Id/Name,2a998f00
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-11-06 21:36:28.857893
,
Last Modified,2017-11-06 21:36:28.858647
,
Description,Dataset example created for this tutorial. It's a 3-D dataset (with dimensionless intensity)
,

0,1
Title,Absorbance
,
Size,2 x 67
,
Units,dimensionless
,
Values,"[[ 0.000 141.502 ..., 141.502 0.000]  [ 0.000 203.763 ..., 203.763 0.000]]"
,

0,1
Title,Temperature
,
Data,[ 250.000 300.000]
,
Units,K
,
Labels,[normal hot]
,

0,1
Title,Wavenumber
,
Data,"[4000.000 3969.697 ..., 2030.303 2000.000]"
,
Units,cm-1
,



## Loading of experimental data


### NMR Data

Now, lets load a NMR dataset (in the Bruker format).

The builtin **data** variable contains a path to our *test*'s data:

In [22]:
# let check if this directory exists and display its actual content:
import os
if os.path.exists(data):
    l = list_data
print(list_data)

testdata
|__irdata
   |__nh4y-activation.scp
   |__nh4y-activation.spg
|__mydataset.scp
|__nmrdata
   |__bruker
      |__tests
         |__nmr
            |__bruker_1d
               |__1
                  |__pdata
                     |__1
                        |__proc
                        |__procs
                        |__title
            |__bruker_2d
               |__1
                  |__audita.txt
                  |__pdata
                     |__1
                        |__auditp.txt
                        |__proc
                        |__proc2
                        |__proc2s
                        |__procs
                        |__title
               |__2
                  |__pdata
                     |__1
                        |__proc
                        |__proc2
                        |__proc2s
                        |__procs
                        |__title
   |__simpson
      |__simpson_1d
         |__rr.in
      |__simpson_2d
         |__2d.in


In [23]:
path = os.path.join(data, 'nmrdata','bruker', 'tests', 'nmr','bruker_1d')

# load the data in a new dataset
ndd = NDDataset()
ndd.read_bruker_nmr(path, expno=1, remove_digital_filter=True)
ndd

0,1
Id/Name,301b83ca
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-11-06 21:36:38.097909
,
Last Modified,2017-11-06 21:36:38.128025
,
Description,
,

0,1
Title,intensity
,
Size,12411 (complex)
,
Units,unitless
,
Values,"[-1037.267 -1077.841 ..., -0.053 0.101]"
,

0,1
Title,Acquisition time
,
Data,"[ 0.000 4.000 ..., 49636.000 49640.000]"
,
Units,us
,


In [24]:
# view it...
figure()
ndd.plot()
show()  # in notebooks this is not required, as figure are showed automatically

<IPython.core.display.Javascript object>

In [25]:
path = os.path.join(data, 'nmrdata','bruker', 'tests', 'nmr','bruker_2d')

# load the data directly (no need to create the dataset first)
ndd2 = NDDataset.read_bruker_nmr(path, expno=1, remove_digital_filter=True)

# view it...
ndd2.x.to('s')
ndd2.y.to('ms')

figure()
fig2 = ndd2.plot() 

<IPython.core.display.Javascript object>

### IR data

In [26]:
source = NDDataset.read_omnic(os.path.join(data, 'irdata', 'NH4Y-activation.SPG'))
source

0,1
Id/Name,NH4Y-activation.SPG
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-11-06 21:36:42.620507
,
Last Modified,2017-11-06 21:36:42.620507
,
Description,"Dataset from spg file : NH4Y-activation.SPG History of the 1st spectrum: vz0521.spa, Thu Jul 07 06:10:41 2016 (GMT+02:00)"
,

0,1
Title,Absorbance
,
Size,55 x 5549
,
Units,dimensionless
,
Values,"[[ 2.057 2.061 ..., 2.013 2.012]  [ 2.033 2.037 ..., 1.913 1.911]  ..., [ 1.794 1.791 ..., 1.198 1.198]  [ 1.816 1.815 ..., 1.240 1.238]]"
,

0,1
Title,Acquisition timestamp (gmt)
,
Data,"[1467831794.000 1467832394.000 ..., 1467864197.000 1467864797.000]"
,
Units,s
,
Labels,"[[2016-07-06 19:03:14+00:00 2016-07-06 19:13:14+00:00 ..., 2016-07-07 04:03:17+00:00 2016-07-07 04:13:17+00:00]  [vz0466.spa, Wed Jul 06 21:00:38 2016 (GMT+02:00) vz0467.spa, Wed Jul 06 21:10:38 2016 (GMT+02:00) ...,  vz0520.spa, Thu Jul 07 06:00:41 2016 (GMT+02:00) vz0521.spa, Thu Jul 07 06:10:41 2016 (GMT+02:00)]]"
,

0,1
Title,Wavenumbers
,
Data,"[5999.556 5998.591 ..., 650.868 649.904]"
,
Units,cm-1
,


In [27]:
source = read_omnic(NDDataset(), os.path.join(data, 'irdata', 'NH4Y-activation.SPG'))

In [28]:
figure()
fig = source.plot(kind='stack')

<IPython.core.display.Javascript object>


## Transposition

Dataset can be transposed

In [29]:
newT = new.T
newT

0,1
Id/Name,3566fe40
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-11-06 21:36:46.981346
,
Last Modified,2017-11-06 21:36:46.981511
,
Description,Dataset example created for this tutorial. It's a 3-D dataset (with dimensionless intensity)
,

0,1
Title,Absorbance
,
Size,67 x 2
,
Units,dimensionless
,
Values,"[[ 0.000 0.000]  [ 141.502 203.763]  ..., [ 141.502 203.763]  [ 0.000 0.000]]"
,

0,1
Title,Wavenumber
,
Data,"[4000.000 3969.697 ..., 2030.303 2000.000]"
,
Units,cm-1
,

0,1
Title,Temperature
,
Data,[ 250.000 300.000]
,
Units,K
,
Labels,[normal hot]
,


## Units


Spectrochempy can do calculations with units - it uses [pint](https://pint.readthedocs.io) to define and perform operation on data with units.

### Create quantities

* to create quantity, use for instance, one of the following expression:

In [30]:
Quantity('10.0 cm^-1')

In [31]:
Quantity(1.0, 'cm^-1/hour')

In [32]:
Quantity(10.0, ur.cm/ur.km)

or may be (?) simpler,

In [33]:
10.0 * ur.meter/ur.gram/ur.volt

`ur` stands for **unit registry**, which handle many type of units
(and conversion between them)

### Do arithmetics with units

In [34]:
a = 900 * ur.km
b = 4.5 * ur.hours
a/b

Such calculations can also be done using the following syntax, using a string expression

In [35]:
Quantity("900 km / (8 hours)")

### Convert between units

In [36]:
c = a/b
c.to('cm/s')

We can make the conversion *inplace* using *ito* instead of *to*

In [37]:
c.ito('m/s')
c

### Do math operations with consistent units

In [38]:
x = 10 * ur.radians
np.sin(x)

Consistency of the units are checked!

In [39]:
x = 10 * ur.meters
np.sqrt(x)

but this is wrong...

In [40]:
x = 10 * ur.meters
try:
    np.cos(x)
except DimensionalityError as e:
    log.error(e)

 ERROR | Cannot convert from 'meter' to 'radian'


Units can be set for NDDataset data and/or Axes

In [41]:
ds = NDDataset([1., 2., 3.], units='g/cm^3', title='concentration')
ds

0,1
Id/Name,3d063242
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-11-06 21:36:59.768563
,
Last Modified,2017-11-06 21:36:59.769892
,
Description,
,

0,1
Title,concentration
,
Size,3
,
Units,g.cm-3
,
Values,[ 1.000 2.000 3.000]
,


In [42]:
ds.to('kg/m^3')

0,1
Id/Name,3d063242
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-11-06 21:36:59.768563
,
Last Modified,2017-11-06 21:37:00.230404
,
Description,
,

0,1
Title,concentration
,
Size,3
,
Units,kg.m-3
,
Values,[1000.000 2000.000 3000.000]
,


## Uncertainties

Spectrochempy can do calculations with uncertainties (and units).

A quantity, with an `uncertainty` is called a **Measurement** .

Use one of the following expression to create such `Measurement`:

In [43]:
#Measurement(10.0, .2, 'cm')    TO FINISH (format doesn't work)

In [44]:
# Quantity(10.0, 'cm').pluminus(.2)   TO FINISH

## Numpy universal functions (ufunc's)

A numpy universal function (or `numpy.ufunc` for short) is a function that
operates on `numpy.ndarray` in an element-by-element fashion. It's
vectorized and so rather fast.

As SpectroChemPy NDDataset imitate the behaviour of numpy objects, many numpy
ufuncs can be applied directly.

For example, if you need all the elements of a NDDataset to be changed to the
squared rooted values, you can use the `numpy.sqrt` function:

In [45]:
da = NDDataset([1., 2., 3.])
da_sqrt = np.sqrt(da)
da_sqrt

0,1
Id/Name,3ee8bc42
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-11-06 21:37:02.931113
,
Last Modified,2017-11-06 21:37:02.931410
,
Description,
,

0,1
Title,3ee8bc42
,
Size,3
,
Units,unitless
,
Values,[ 1.000 1.414 1.732]
,


### Ufuns with NDDataset with units

When NDDataset have units, some restrictions apply on the use of ufuncs:

Some function functions accept only dimensionless quantities. This is the
case for example of logarithmic functions: :`exp` and `log`.

In [46]:
np.log10(da)

0,1
Id/Name,3f9ee648
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-11-06 21:37:04.124964
,
Last Modified,2017-11-06 21:37:04.125214
,
Description,
,

0,1
Title,3f9ee648
,
Size,3
,
Units,unitless
,
Values,[ 0.000 0.301 0.477]
,


In [47]:
da.units = ur.cm

try:
    np.log10(da)
except DimensionalityError as e:
    log.error(e)

 ERROR | Cannot convert from 'centimeter' ([length]) to 'dimensionless' (dimensionless)


## Complex or hypercomplex NDDatasets


NDDataset objects with complex data are handled differently than in
`numpy.ndarray`.

Instead, complex data are stored by interlacing the real and imaginary part.
This allows the definition of data that can be complex in several axis, and *e
.g.,* allows 2D-hypercomplex array that can be transposed (useful for NMR data).

In [48]:
da = NDDataset([  [1.+2.j, 2.+0j], [1.3+2.j, 2.+0.5j],
...                   [1.+4.2j, 2.+3j], [5.+4.2j, 2.+3j ] ])
da

0,1
Id/Name,40797bdc
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-11-06 21:37:05.557282
,
Last Modified,2017-11-06 21:37:05.557628
,
Description,
,

0,1
Title,untitled
,
Size,4 x 2(complex)
,
Units,unitless
,
Values,[[ 1.000 2.000 2.000 0.000]  [ 1.300 2.000 2.000 0.500]  [ 1.000 4.200 2.000 3.000]  [ 5.000 4.200 2.000 3.000]]
,


if the dataset is also complex in the first dimension (columns) then we
should have (note the shape description!):