# fasttransform

> Transform is the main building block of data pipelines in fastai. And elsewhere if you like.

## Installation

Install latest from the GitHub [repository](https://github.com/AnswerDotAI/fasttransform):

```sh
$ pip install git+https://github.com/AnswerDotAI/fasttransform.git
```

or from [pypi](https://pypi.org/project/fasttransform/):

```sh
$ pip install fasttransform
```

## Quick start

### Transform

Transform is a class that lets you create reusable data transformations with optional setup and decode methods. It behaves like a function but provides additional functionality for data preprocessing pipelines.

The simplest way to create a Transform is by decorating a function:

In [1]:
from fasttransform import Transform, Pipeline

In [2]:
@Transform
def add_one(x: int): 
    return x + 1

# Usage
add_one(2)

3

Transforms are **flexible**. You can specify multiple transforms with different type annotations and it will automatically pick up the correct one.

In [3]:
def inc1(x:int): return x+1
def inc2(x:str): return x+"a"

t = Transform(enc=(inc1,inc2))

t(5), t('b')

(6, 'ba')

If an input type does not match any of the type annotations then the original input is returned.

In [4]:
add_one(2.0)

2.0

Transforms are **reversible**, if you provide a decode function.

In [5]:
def enc(x): return x*2
def dec(x): return x//2

t = Transform(enc,dec)

t(2), t.decode(2), t.decode(t(2))

(4, 1, 2)

Transforms can be **setup**, this may be useful when you want to set scaling parameters based on your training split in your machine learning pipeline.

In [6]:
class NormalizeMean(Transform):
    def setups(self, items): 
        self.mean = sum(items) / len(items)
    
    def encodes(self, x): 
        return x - self.mean
    
    def decodes(self, x): 
        return x + self.mean

normalize = NormalizeMean()
normalize.setup([1, 2, 3, 4, 5])
normalize.mean

3.0

In [7]:
normalize(3.0)

0.0

Transforms are **extendedible**, this may be useful when you want to create one Transform that can handle different data types.

In [8]:
@NormalizeMean
def encodes(self, x:float): return x + self.mean + 5

@NormalizeMean
def decodes(self, x:float): return x + self.mean + 5

normalize(2.0)

10.0

Transforms try to be **type preserving** in the following order:

1. your function's return type annotation
2. your function's actual input type
3. if None is the return type annotation then no conversion will be done

In [15]:
class FS(float):
    def __repr__(self): return f'FS({float(self)})'

Illustration of case 1:

In [16]:
def enc(x)->FS: return x*2
t = Transform(enc)
t(1)

FS(2.0)

Illustration of case 2:

In [17]:
def enc(x): return x*2
t = Transform(enc)
t(FS(1))

FS(2.0)

Illustration of case 3:

In [18]:
def enc(x)->None: return x*2
t = Transform(enc)
t(FS(1))

2.0

In the last case we see a `float` because a mutiplication of `FS` with a `float` returns a `float` and no additional type conversion is done.

## Pipelines

Transforms can be combined into larger **Pipelines**:

In [19]:
p = Pipeline((t, normalize))

p(5)  # 5 * 2 - 3

7.0

In [20]:
p.decode(7) # (7 + 3) / 2

10.0

## Documentation

This was just a quickstart. Learn more by reading the [documentation](https://github.io/AnswerDotAI/fasttransform).