#Kodo-python Getting Started

Welcome to the getting started ipython notebook for kodo-python.

This guide is intended for newcomers to the kodo library. The guide will in tiny steps guide you through the creation and usage of both encoders and decoders.
Even though this guide focuses on the the python language bindings of kodo - similar APIs exists for other languages including C, C++ and Java.

##Importing kodo

Before working with Kodo-python, you obviously need to have it installed and available. To ensure that's the case, try importing it:

In [45]:
# try importing the kodo module
try:
    import kodo
    print("Kodo imported Succesfully")
except ImportError:
    print("Unable to import kodo!")

Kodo imported Succesfully


If the import worked, you are ready to move on to the next step. Otherwise please (re)visit the README.rst for installation instructions.

## Creating an Encoder

In kodo, both encoders and decoders are created using factories. Doing so allows efficient memory management and reuse of various components and computations. 

Therefore, before creating an encoder, let's look at the encoder factories provided by the ``kodo`` module:

In [46]:
# print all members containing "Factory" and "Encoder"
print("\n".join([item for item in dir(kodo) if all([keyword in item for keyword in ["Factory", "Encoder"]])]))

FullVectorEncoderFactoryBinary
FullVectorEncoderFactoryBinary16
FullVectorEncoderFactoryBinary4
FullVectorEncoderFactoryBinary8
NoCodeEncoderFactory
OnTheFlyEncoderFactoryBinary
OnTheFlyEncoderFactoryBinary16
OnTheFlyEncoderFactoryBinary4
OnTheFlyEncoderFactoryBinary8
PerpetualEncoderFactoryBinary
PerpetualEncoderFactoryBinary16
PerpetualEncoderFactoryBinary4
PerpetualEncoderFactoryBinary8
SlidingWindowEncoderFactoryBinary
SlidingWindowEncoderFactoryBinary16
SlidingWindowEncoderFactoryBinary4
SlidingWindowEncoderFactoryBinary8
SparseFullVectorEncoderFactoryBinary
SparseFullVectorEncoderFactoryBinary16
SparseFullVectorEncoderFactoryBinary4
SparseFullVectorEncoderFactoryBinary8


As seen from the output, many different encoder factories exists. Most of these have decoder factory counterparts.
The attentive reader will maybe have seen a pattern from the factory names. The factory names are, with some exceptions, a combination of the encoding algorithm and the underlying finite field.

For this walkthrough we pick the full vector factory using the binary field, i.e. the *``FullVector``*``EncoderFactory``*``Binary``* factory.

Note: *For this guide, the choice of encoding factory, should be interchangable. For this reason I'll define the factory class as ``EncoderFactory``.*

In [47]:
# Store the full vector binary encoder factory as EncoderFactory
EncoderFactory = kodo.FullVectorEncoderFactoryBinary

Using python's ``help`` function, it's easy to inspect which arguments are needed for the  ``EncoderFactory``'s constructor: 

In [48]:
# Get information about the encoder factory's __init__ function
help(EncoderFactory.__init__)

Help on method __init__:

__init__(...) unbound kodo.FullVectorEncoderFactoryBinary method
    Factory constructor.
    
            :param max_symbols: The maximum symbols the coders can expect.
            :param max_symbol_size: The maximum size of a symbol in bytes.



So, to create a factory, we need to pick the ``max_symbols`` and ``max_symbol_size``.
These parameters determines upper bounds to the encoders created by the factory.

The proper values to pick depends on the use case, we'll pick the numbers 4 and 32 for the max_symbols and max_symbol_size, respectively.
These numbers are very low, but they should serve us well for this educational example.

Let's create an encoder_factory object:

In [49]:
max_symbols = 4
max_symbol_size = 32

encoder_factory = EncoderFactory(
    max_symbols=max_symbols,
    max_symbol_size=max_symbol_size)

We can now use the object's ``build`` method to create encoders, but other methods are also available:

In [50]:
# Print all public members
print("\n".join([item for item in dir(encoder_factory) if not item.startswith("__")]))

build
max_block_size
max_payload_size
max_symbol_size
max_symbols
set_symbol_size
set_symbols
symbol_size
symbols


These can be used to either get information about the created factory or set values used for the encoders to be created using the ``build`` method.

Let's print out the maximum block size, i.e. the maximum amount of data that can be encoded during each generation.

In [51]:
max_block_size = encoder_factory.max_block_size()
print("Max block size: {}".format(max_block_size))

Max block size: 128


Note, the maximum block size is directly correlated with the previously set ``max_symbols`` and ``max_symbol_size``.

In [52]:
calculated_max_block_size = max_symbols * max_symbol_size
print("Calculated max block size: {}".format(calculated_max_block_size))

Calculated max block size: 128


Enough talk - let's create an encoder!

In [53]:
encoder = encoder_factory.build()

Fantastic, we've build our first encoder! Let's see what we can use it for:

In [54]:
# Print all public members
print("\n".join([item for item in dir(encoder) if not item.startswith("__")]))

block_size
in_systematic_phase
is_systematic_on
payload_size
rank
set_symbol
set_symbols
set_systematic_off
set_systematic_on
symbol_size
symbols
trace
write_payload


Let's inspect the state of our newly created encoder.

In [55]:
def print_encoder_state(encoder):
    print(
        "block_size: {}\n"
        "is_systematic_on: {}\n"
        "in_systematic_phase: {}\n"
        "payload_size: {}\n"
        "rank: {}\n"
        "symbol_size: {}\n"
        "symbols: {}".format(
            encoder.block_size(),
            encoder.is_systematic_on(),
            encoder.in_systematic_phase(),
            encoder.payload_size(),
            encoder.rank(),
            encoder.symbol_size(),
            encoder.symbols())
    )
print_encoder_state(encoder)

block_size: 128
is_systematic_on: True
in_systematic_phase: False
payload_size: 38
rank: 0
symbol_size: 32
symbols: 4


## Using the Encoder

We use the ``write_payload`` method to encode the data, but since we have yet to tell encoder what data to encode, we can't use it yet.
This can be seen from the encoder rank which is 0.

Let's create some data to encode:

In [56]:
data_in = (
    "The size of this data is exactly 128 bytes "
    "which means it will fit perfectly in a single generation. "
    "That is very lucky, indeed!"
)
print("Length of data string: {}".format(len(data_in)))

Length of data string: 128


Kodo uses python strings as data objects, which means each character represents a byte. Let's set the data to encode on the encoder.

In [57]:
encoder.set_symbols(data_in)

We should now be able to see how the state of the encoder has changed.

In [58]:
print_encoder_state(encoder)

block_size: 128
is_systematic_on: True
in_systematic_phase: True
payload_size: 38
rank: 4
symbol_size: 32
symbols: 4


Notice how the rank is now equal to the number of symbols:

In [59]:
encoder.rank() == max_symbols

True

We can only encode if the rank is > 0.

Let's encode some packets:

In [60]:
packet1 = encoder.write_payload()
packet2 = encoder.write_payload()
packet3 = encoder.write_payload()
packet4 = encoder.write_payload()

print(
    "packet1: {}\n"
    "packet2: {}\n"
    "packet3: {}\n"
    "packet4: {}\n".format(
        packet1,
        packet2,
        packet3,
        packet4,
    )
)

packet1: �    The size of this data is exactly
packet2: �    128 bytes which means it will f
packet3: �   it perfectly in a single generat
packet4: �   ion. That is very lucky, indeed!



Notice how all the packets are prefixed with ``�`` - this is python trying to print the packet header containing the symbol id, as a character.
The reason why the content of the packets are readable is that the encoder is in systematic phase. Systematic means that the encoder starts by leaving each symbol uncoded in the first iteration.
Because we've set the generation size to be 4 symbols, and we've created 4 packets - the encoder is no longer in systematic phase:  

In [61]:
encoder.in_systematic_phase()

False

This means that any subsequent we generate will be encoded.

In [81]:
packet5 = encoder.write_payload()
print("packet5: {}".format(packet5))

packet5:  +696s_kp1</$<woiy)h`l806t,alj}(>


Clearly, depending on how the data have been encoded, the data will now most likely be unreadable.

## Creating a Decoder

Let's create a decoder factory and a decoder so that we can decode our newly generated packets:

In [83]:
decoder_factory = kodo.FullVectorDecoderFactoryBinary(max_symbols, max_symbol_size)
decoder = decoder_factory.build()

Let's investigate the methods that are available for the decoder:

In [27]:
# Print all public members
print("\n".join([item for item in dir(decoder) if not item.startswith("__")]))

block_size
copy_symbols
is_complete
payload_size
rank
read_payload
symbol_size
symbols
symbols_uncoded
trace
write_payload


As seen from the output, the encoder and decoder shares a few methods. Most of these have the same meaning.
Let's inspect the state of our newly created decoder.

In [28]:
def print_decoder_state(decoder):
    print(
        "block_size: {}\n"
        "is_complete: {}\n"
        "payload_size: {}\n"
        "rank: {}\n"
        "symbol_size: {}\n"
        "symbols: {}\n"
        "symbols_uncoded: {}\n".format(
            decoder.block_size(),
            decoder.is_complete(),
            decoder.payload_size(),
            decoder.rank(),
            decoder.symbol_size(),
            decoder.symbols(),
            decoder.symbols_uncoded())
    )
print_decoder_state(decoder)

block_size: 128
is_complete: False
payload_size: 38
rank: 0
symbol_size: 32
symbols: 4
symbols_uncoded: 0



What's probably the most interesting here is the rank. The rank is describes the number of innovative packets recieved.
Hence if we read one of our previously generated packets, we should see the rank increase:

In [29]:
decoder.read_payload(packet1)
decoder.rank()

1

And it does. Also, since the read packet was uncoded we will see that the number of uncoded symbols
in the decoder has increased from 0 to 1.

In [30]:
decoder.symbols_uncoded()

1

In [31]:
decoder.read_payload(packet5)
print(
    "rank: {}\n"
    "symbols_uncoded: {}".format(
        decoder.rank(),
        decoder.symbols_uncoded()
    )
)

rank: 1
symbols_uncoded: 1


The rank is 2, which means that we've read two (innovative) packets.
The number of uncoded symbols is 1 since the first symbol we read was uncoded.

In case the 5th packet were a combination of the first packet and any other packet, we would actually have 2 uncoded symbols. 
But due to performance implications the ``symbols_uncoded`` method will still return 1 - atleast for the full vector algorithm. This can be seen from the documentation:

In [32]:
help(decoder.symbols_uncoded)

Help on method symbols_uncoded:

symbols_uncoded(...) method of kodo.FullVectorDecoderBinary instance
    Returns the number of uncoded symbols currently known.
    
    Depending on the algorithm used the true number of uncoded
    symbols may be higher.
    The reason for this uncertainty is the some algorithms, for
    performance reasons, choose to not keep track of the exact
    status of the decoding matrix.
    It is however guaranteed that at least this amount of uncoded
    symbols exist.
            :returns: The number of symbols which have been uncoded.



If extract the current data in the decoder we get the following output:

In [33]:
decoder.copy_symbols().replace('\x00', '_')

'The size of this data is exactly________________________________________________________________________________________________'

Notice that the first part of the string is readable. Depending on the encoding of the 5th other parts of the string may or may not be readable.
If we feed the same packet(s) to the decoder multiple times we will not increase it's rank - no matter how many times we do so.

In [34]:
decoder.read_payload(packet1)
print(decoder.rank())
decoder.read_payload(packet5)
print(decoder.rank())
decoder.read_payload(packet1)
print(decoder.rank())
decoder.read_payload(packet5)
print(decoder.rank())

1
1
1
1


This is because the data we feed the decoder isn't innovative.
If we start feeding the decoder new data, we will at one point have a complete decoder:

In [35]:
while not decoder.is_complete():
    decoder.read_payload(encoder.write_payload())
    print(decoder.rank())

2
3
3
3
3
4


when the decoder is complete we can extract the whole string:

In [36]:
decoder.copy_symbols()

'The size of this data is exactly 128 bytes which means it will fit perfectly in a single generation. That is very lucky, indeed!'

And with this, I will end the getting started guide. I hope you learned something and are eager to dive into the would of error correcting codes - in particular network coding.

For more information and inspiration please look through some of the many examples available.