# Python Class
**This notebook: [see on github](https://github.com/drinkingkazu/2019-06-17-NeuralNets/blob/master/Python%20Class%20and%20Iterable%20Dataset.ipynb) or [run it on google colab](https://colab.research.google.com/github/drinkingkazu/2019-06-17-Notebooks/blob/master/Python%20Class%20and%20Iteratable%20Dataset.ipynb)**.

In case this audience hasn't practiced Python class, we start with a simple example of class _foo_.

In [1]:
class foo:
    
    def __init__(self):
        self.A = 10
    
    def speak(self):
        print('Hello world! A =',self.A)
        
    def __str__(self):
        return 'I am foo type instance object!'

The example class `foo` above have 3 implemented functions. Those starts with "double-underscore" (i.e. _init_ and _str_) are the **built-in functions** and we are over-riding the definitions. In particular, _str_ is responsible for providing a text representation of `foo` that is returned to `print` function call. 

You also notice that all functions starts with an argument _self_: that is equivalent of `this` pointer in C++. Class attribute functions require, in the definition, the first argument to be the caller's pointer. This is not an argument to be provided by the function caller. Below you see examples of how `foo` class instance may be used:

In [2]:
kazu = foo()
kazu.speak()
print(kazu)

Hello world! A = 10
I am foo type instance object!


## _ala C++ struct_: Attribute Holder
In C++, for quick encapsulation, we often use a _struct_ container. A hacky but handy way of achieving something similar in Python is to define an empty class, and dyncamically attach attributes.

In [3]:
class BLOB:
    pass

blob=BLOB()
blob.data   = kazu
blob.number = 2

Now you can hand over a _blob_ object to/from functions and keep the arguments/return expression compact! Now, however, since this blob object is dynamic, you might want a capability to check, when given a blob object, a certain attribute exists or not. Python has a handy built-in functions `hasattr` and `getattr` ("attr" stands for "attributes").

In [4]:
print( hasattr(blob, "data" ) )
print( hasattr(blob, "tracy") )
print( getattr(blob, "data" ) )
getattr(blob, "data").speak()

True
False
I am foo type instance object!
Hello world! A = 10


# Iteratable Dataset

For training a network with Stochastic Gradient Decent (SGD), we need a method to stream a randomly selected subset of training data. The most common way is to form an _iteratable_ dataset with a random access capability. We practice to design such a Python class below. 

First, let's define a iterable container. For an iteration in Python, you need two built-in functions: a _len_ which returns the "length of array-like data" and _getitem_ which arrows a random-access operator (i.e. "[ ]"). 

In [5]:
class dataset:
    
    def __init__(self):
        self._data = range(100)
        
    def __len__(self):
        return len(self._data)
    
    def __getitem__(self,index):
        return self._data[index]

The above example class `dataset` constructs a simple data, an array of length 100 filled with numbers 0 to 99. The length of the dataset is accordingly 100, and the random access operator returns the corresponding index entry of the array.

In [6]:
data = dataset()
print('Length:',len(data))
print('10th element:',data[10])

Length: 100
10th element: 10


Here's how you can create an iterator for an iterable object using _iter_ built-in function.

In [7]:
iter(data)

<iterator at 0x7f0aa82e9ac8>

You can move the iterator to the next object using the _next_ built-in function. 

In [8]:
it = iter(data)
print(next(it), next(it), next(it))

0 1 2


## Create Stochastic Batch-Data Using DataLoader
Once we have an iterable data representation, the next step is to have a capability to create a randomly selected subset of data. Desired capabilities of this functionality includes also ability to choose random vs. ordered subset, parallelized workers to simultaneously prepare multiple batch data, and so on so forth.

As this is a generic capability useful for _any_ data representation, Pytorch provide a generic API called **DataLoader**. Here is how one can instantiate:

In [9]:
from torch.utils.data import DataLoader
loader = DataLoader(data,batch_size=10,shuffle=True,num_workers=1,pin_memory=True)

The dataloader itself is an iterable object. We created a dataloader with batch size 10 where the dataset instance has the length 100. This means, if we iterate on the dataloader instance, we get 10 separate batch data. 

In [10]:
for index, batch_data in enumerate(loader):
    print('Batch entry',index,'... batch data',batch_data)

Batch entry 0 ... batch data tensor([ 7, 19, 95, 63, 17, 34, 68, 24, 55, 52])
Batch entry 1 ... batch data tensor([ 3, 80, 60, 28, 23,  9, 29, 62, 94, 43])
Batch entry 2 ... batch data tensor([53, 11, 61, 67, 42, 92, 37, 33, 88, 15])
Batch entry 3 ... batch data tensor([22, 76, 20, 81, 36, 59, 41, 14, 72,  2])
Batch entry 4 ... batch data tensor([89, 10,  4, 87, 79, 70, 75, 13, 21, 83])
Batch entry 5 ... batch data tensor([35, 99, 84, 25, 32, 56, 49, 91, 30, 85])
Batch entry 6 ... batch data tensor([96, 57, 47, 45, 65, 97, 27, 77, 69, 82])
Batch entry 7 ... batch data tensor([ 8, 46,  6, 40, 39, 18, 86, 54, 73, 64])
Batch entry 8 ... batch data tensor([90,  1, 44, 48, 38, 31,  5, 78, 16, 74])
Batch entry 9 ... batch data tensor([58, 50, 98, 26, 66, 12, 51, 93, 71,  0])


We can see that data elements are chosen randomly as we chose "shuffle=True". Does this cover all data elements in the dataset? Let's check this by combining all iterated data.

In [11]:
data_collection = []
for index,batch_data in enumerate(loader):
    data_collection += [int(v) for v in batch_data]
    
import numpy as np
np.unique(data_collection)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])