# Data Storage in OpenPNM

OpenPNM uses 2 very common python data structures, so it's best to get comfortable with them right away.  In this example we'll cover:

| Topics Covered |
| :- |
| Python's ``list`` |
| Numpy's ``ndarray`` |
| Python's `dict` or *dictionary* |
| OpenPNM's naming convention |
| Data vs Labels in OpenPNM |

## Python Lists: Flexible but Slow
First lets quickly look at Python's *array*, which is called a ``list``.  It is indicated by the square brackets:

In [1]:
L = [0, 2, 4, 6, 8]

You can read and write values as follows:

In [2]:
L[0] = L[2]*L[4]
print(L)

[32, 2, 4, 6, 8]


Note that Python uses 0-indexing, and also that square brackets are used to index into any sequence.

You can make the ``list`` longer, and remove items:

In [3]:
L.append(100)
print(L)

[32, 2, 4, 6, 8, 100]


In [4]:
L.pop(2)
print(L)

[32, 2, 6, 8, 100]


However, this list is not very good at math:

In [5]:
try:
    print(L + 2)
except TypeError:
    print('Adding to a list assumes you are joining 2 lists')
print(L + [2, 3])

Adding to a list assumes you are joining 2 lists
[32, 2, 6, 8, 100, 2, 3]


And multiplication assumes you want to duplicate the list N times:

In [6]:
print(L*2)

[32, 2, 6, 8, 100, 32, 2, 6, 8, 100]


The reason the list is not ideal for numerical operations is that *anything* can be stored in each element:

In [7]:
L[0] = 'str'
print(L)

['str', 2, 6, 8, 100]


This is why it's not possible to add or multiply a list, since Python does not necessarily know the meaning of adding an integer and a string (i.e. 'one' + 1.0).

## Numpy ``ndarray``: Optimized for Numerics
Now let's take a look at the Numpy ``ndarray``. Numpy has been around almost as longs python, and it is used almost exclusively in scientific python because discussed above the native `list` is not very fast. Numpy arrays on the other hand are actually "C" arrays behind the scenes so are very fast. The downside is that you must learn a 'mini-language' to use them. The following few code blocks illustrate this.  

In [8]:
import numpy as np
a = np.arange(0, 100, 15)
print(a)

[ 0 15 30 45 60 75 90]


This is an example of the 'mini-language' that you needs to learn, since ``arange`` is the Numpy version of ``range``.  The Numpy package has hundreds of functions available, and to be proficient with Numpy you need to at least be aware of most of them. 

List the ``list`` you can index into Numpy arrays for reading and writing:

In [9]:
print(a[2])

30


In [10]:
a[0] = 999
print(a)

[999  15  30  45  60  75  90]


You can also use what is called 'fancy indexing', which allow you to index into an ``ndarray`` with another array:

In [11]:
print(a[[0, 2, 4]])

[999  30  60]


You can set multiple locations with a single value:

In [12]:
a[[0, 2, 4]] = -100
print(a)

[-100   15 -100   45 -100   75   90]


Or an array of values:

In [13]:
a[[1, 3, 5]] = [111, 222, 333]
print(a)

[-100  111 -100  222 -100  333   90]


You can also use "masks" of boolean values, which is interpreted to mean any index where the mask is ``True``:

In [17]:
mask = a < 0
a[mask] = 0
print(a)

[  0 111   0 222   0 333  90]


And of course, math makes sense to an ``ndarray`` since *all* the elements are assured to be numbers:

In [18]:
print(a*2)

[  0 222   0 444   0 666 180]


In [19]:
print(a + 100)

[100 211 100 322 100 433 190]


In [20]:
print(a*a)

[     0  12321      0  49284      0 110889   8100]


There are many resources for learning and understaning Numpy arrays.  Below is a list of our favorite:

| Reference | Description |
| :- | :- |
| [1]() | (The one with the pictures) |

## Dictionaries: Holding Things Together
The last piece of the puzzle is Python's built-in ``dict`` which is much like a list, in the sense that it can act as a container for any datatype, but items are addressed by name instead of index. 

In [21]:
d = dict()
d['arr'] = a
d['list'] = L
print(d)

{'arr': array([  0, 111,   0, 222,   0, 333,  90]), 'list': ['str', 2, 6, 8, 100]}


You can retrieve any element by name:

In [22]:
print(d['arr'])

[  0 111   0 222   0 333  90]


And adding new items is easy:

In [23]:
d['test'] = 1.0
print(d)

{'arr': array([  0, 111,   0, 222,   0, 333,  90]), 'list': ['str', 2, 6, 8, 100], 'test': 1.0}


### Subclassing ``dict`` 

This is may seem like an intimidating concept at first, but it's actually beautifully simple once you see how it works. It's also relevant to learning OpenPNM since it uses ``dict``s extensively, and these are often subclassed to augment their functionality.  Subclassing means "taking the basic functionality of the ``dict``, then adding to and/or changing it". 

To illustrate the idea of subclasses, as it pertains to OpenPNM, let's change how the reading and writing of items works. Whenever you use the square brackets ``[ ]`` to index into a ``dict``, this *actually* calls the ``__getitem__`` and ``__setitem__`` methods.  The double underscores indicate that these are intrinsic Python methods which the user should not call directly, but they do the bulk of the work.  So let's try it out by creating a class that tells us what is going on each time we read and write something:

In [28]:
# Create a new class which is a dict PLUS the extra functionality we will add
class NewDict(dict): 
    
    def __setitem__(self, key, value):
        # This is our customization
        print("The key being written is:", key)
        print("The value being written is:", value)
        # Now we call the setitem on the actual dict class
        super().__setitem__(key, value)
        
    def __getitem__(self, key):
        print("The key being retrieved is:", key)
        return super().__getitem__(key)


In [29]:
dnew = NewDict()

In [30]:
dnew['test'] = 1.0

The key being written is: test
The value being written is: 1.0
