# Talos

A library to deal with ML-pipelining in robotics.

> *All dependencies, including the package itself can be downloaded by running this at the root of this repo:*

```bash
pip install -e .
```
> *Make sure to activate your python environment, if you are using one.*

The main module can be imported as shown:

In [1]:
import talos

There are 4 main submodules:
- `models` -> Contains Modules, Architectures related stuff.
- `datapipe` -> Contains data pipelining (creating datasets, saving, loading them etc.) related stuff.
- `training` -> Contains training models related stuff.
- `utils` -> Contains general stuff, GPU stuff etc.

## Utils

Everything in utils is directly accessible in the main `talos` module too, for ease of use.

In [2]:
talos.utils.gpu_exists() # or talos.gpu_exists()

[32m[1mGPU(s) exist![0m


True

In [3]:
talos.utils.gpu_info() # or talos.gpu_info()

Found 1 cuda devices...
[33m0[0m[37m	[0m[37mNVIDIA GeForce RTX 3060 Laptop GPU[0m[37m	[0m[37m5937.94MB[0m[37m	[0m[37m5.80GB[0m


## Datapipe

In [4]:
import talos.datapipe as pipe

The main class of interest here is `Dataset`. You can manage everything about datasets using this, from creating to saving to loading.

Create a `Dataset` object first:

In [5]:
# Names can be path-like. This can be used for better organisation of datasets in directories.
data = pipe.Dataset(
    name = 'testing/d1',
    
    # You can also add metadata for this dataset right here, using kw_args:
    description = 'An example dataset',
    forty_two = 'The meaning of life, the universe and everything.'
)

Creating a new dataset:

In [6]:
data.create(
    # path to the root directory of the dataset
    path = '/home/stealthypanda/collegestuff/robot_data/datasets/dataset4',
    
    # Image size to resize all images to
    image_size = (256, 256),
)

Config file:[90m{
[0m	[37m[92mrate[0m[0m : [33m32.0[0m,
	[37m[92mtype[0m[0m : [37mcolor/image_raw[0m,
	[37m[92mtopic[0m[0m : [37m/camera/color/image_raw[0m,
	[37m[92mnode[0m[0m : [37m/camera[0m,
	[37m[92mname[0m[0m : [37mdataset4[0m,
	[37m[92mpath[0m[0m : [37m/home/stealthypanda/collegestuff/robot_data/dataset4[0m,
	[37m[92mlinear_vel[0m[0m : [33m0.1[0m,
	[37m[92mangular_vel[0m[0m : [33m5[0m
[90m}
[0m
Reading image 1076/1076...
[32m[1mCreated dataset testing/d1![0m


<talos.datapipe.Dataset at 0x7f097bf57e90>

Saving datasets: (File is saved using `safetensors` library, and extension used is `.rdata`)

In [7]:
# By default, datasets are saved @ .talos directory, using the dataset name as file name.
data.save()

#Or, you can provide a full path or filename to save it to a specific path
data.save('example/dataset')

Saved dataset testing/d1 @ .talos/datasets/testing/d1.rdata
Saved dataset testing/d1 @ example/dataset.rdata


Loading an existing `.rdata` dataset:

In [8]:
loaded = pipe.Dataset().load(name='testing/d1')

Loaded dataset testing/d1: [90m{
[0m	[37m[92mtype[0m[0m : [37mcolor/image_raw[0m,
	[37m[92mdescription[0m[0m : [37mAn example dataset[0m,
	[37m[92mangular_vel[0m[0m : [37m5[0m,
	[37m[92mrate[0m[0m : [37m32.0[0m,
	[37m[92msamples[0m[0m : [37m1076[0m,
	[37m[92mforty_two[0m[0m : [37mThe meaning of life, the universe and everything.[0m,
	[37m[92msize[0m[0m : [37m(256, 256)[0m,
	[37m[92mname[0m[0m : [37mtesting/d1[0m,
	[37m[92mlinear_vel[0m[0m : [37m0.1[0m
[90m}
[0m


### Getting batches

To start getting batches for training, first call `.split()` on the dataset:

In [9]:
data._split()

Then, you can keep calling `.get_batch()` to get new batches of (X, y):

In [10]:
batch_x, batch_y = data.get_batch(
    batch_size = 16,
    time_steps = 32,
    include_only_last_y = True, # If true, for each datapoint in batch, only the last y for the last time step is taken.
    split = 'train' # Can be 'train', 'test', 'valid'
)
batch_x.shape, batch_y.shape

(torch.Size([16, 3, 32, 256, 256]), torch.Size([16, 7]))

## Models

This module contains all model related stuff, making it easier for saving, loading and stuff. It also defines `TalosModule`, which is derived from the `torch.nn.Module`. It is the base class for all Modules defined in architectures as well.

In [11]:
import torch

import talos.models as tm
from talos.utils import Tensor

A simple example module:

In [12]:
class SimpleLinear(tm.TalosModule):
    
    def __init__(self, inputs : int, outputs : int, name: str = None):
        super().__init__(name)

        self.layer = torch.nn.Linear(inputs, outputs)
        self.relu = torch.nn.ReLU()
    
    def forward(self, x: Tensor) -> Tensor:
        x = self.layer(x)
        x = self.relu(x)
        
        return x

`TalosModule` has a lot of useful features, such as `.disk_size()`, `.save()`, `.load()` etc. defined to make working with models a lot easier:

In [13]:
model = SimpleLinear(28 * 28, 10)
print(
f'The model has {model.nparams() / 1e3 : .3f}K parameters, \
and uses {model.disk_size() / 1e3 : .3f}KB on disk.'
)

The model has  7.850K parameters, and uses  31.400KB on disk.


Saving a model: (uses `safetensors`)

In [14]:
model.save('test/m1')

Saved model @ `test/m1.model`


Loading a model:

In [15]:
newmodel = SimpleLinear(28 * 28, 10).load('test/m1')

There are also common architectures defined as well:

In [16]:
import torch.nn.functional as F

ffn = tm.FFN(
    layers=[16, 16, 10],
    activation=[F.relu, F.relu, lambda x:x]
)

## Training

This module contains stuff related to training models.

In [21]:
example_dataset = pipe.Dataset().create(
    '/home/stealthypanda/collegestuff/robot_data/datasets/dataset1'
)

Config file:[90m{
[0m	[37m[92mrate[0m[0m : [33m32.0[0m,
	[37m[92mtype[0m[0m : [37mcolor/image_raw[0m,
	[37m[92mtopic[0m[0m : [37m/camera/color/image_raw[0m,
	[37m[92mnode[0m[0m : [37m/camera[0m,
	[37m[92mname[0m[0m : [37mdataset1[0m,
	[37m[92mpath[0m[0m : [37m/home/stealthypanda/collegestuff/robot_data/dataset1[0m,
	[37m[92mlinear_vel[0m[0m : [33m0.1[0m,
	[37m[92mangular_vel[0m[0m : [33m5[0m
[90m}
[0m
Reading image 1152/1152...
[32m[1mCreated dataset dataset_2![0m


In [22]:
example_dataset.y_vel = example_dataset.y_vel[::2]
example_dataset.samples = len(example_dataset.y_vel)
example_dataset.x_data = example_dataset.x_data[:example_dataset.samples]

In [25]:
example_dataset._split()

In [36]:
from talos.utils import Tensor


class ExampleModel(tm.TalosModule):
    
    def __init__(self, name: str = None, *args, **kwargs) -> None:
        super().__init__(name, *args, **kwargs)
        
        self.ublock = torch.nn.ModuleList([
            tm.UNetBlock(c_in =  3, c_out = 16),
            tm.UNetBlock(c_in = 16, c_out = 32),
            tm.UNetBlock(c_in = 32, c_out = 64),
            tm.UNetBlock(c_in = 64, c_out = 64),
        ])
        
        self.flatten = torch.nn.Flatten()
        
        self.ffn = tm.FFN([128, 32, 7], activation=[F.relu, F.relu, lambda x:x])
    
    def forward(self, x: Tensor) -> Tensor:
        for block in self.ublock:
            x = block(x)
        x = self.flatten(x)
        x = self.ffn(x)
        return x

example_model = ExampleModel()

y = example_model(example_dataset.get_batch(1)[0])
print(
    f'Model params: {example_model.nparams()/1e6:.3f}M \tModel size: {example_model.disk_size()/1e6:.3f}MB'
)


Model params: 1.234M 	Model size: 4.937MB


Actual training starts here:

In [37]:
timeline = talos.train(
    example_model, example_dataset,
    epochs = 5, steps = 50
)

Epoch 1/5:
	Step 50/50 : 0.00456
Epoch 2/5:
	Step 50/50 : 0.00017
Epoch 3/5:
	Step 50/50 : 0.00179
Epoch 4/5:
	Step 50/50 : 0.00032
Epoch 5/5:
	Step 50/50 : 0.00470


And that's it so far. More stuff to be added.