# A quick intro to `adaptive_latents`

This document introduces the basics of the structure of `adaptive_latents` estimators, first by comparing transformers to sklearn estimators, and then by comparing AL pipelines with sklearn Pipelines.

By the end of the document, readers should understand:
* what an AL (adaptive latents) transformer is
* the difference between AL `partial_fit` functions and sklearn `fit` functions
* the purpose of an AL pipeline
* the semantics of `input_streams` and `output_streams` in AL transformers


In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from adaptive_latents import VanillaOnlineRegressor, CenteringTransformer, Pipeline
import adaptive_latents
import sklearn.pipeline
import time


rng = np.random.default_rng(0)

## Estimators

One of the central ideas of scikit-learn is that of an estimator object.
This is how the library organizes its algorithms, where each algorithm has a specific class you can instantiate objects of;
You use a `PCA` estimator to do dimensionality reduction, and a `LinearRegression` to estimate a regression.
Transformers in `adaptive_latents` are similar.

Let's take a look at the correspondence for a basic regression problem.

First, we create a simple regression problem:

In [None]:
x = np.arange(100).reshape(-1, 1)
true_y = x * 2 
noise = rng.normal(size=x.shape, scale=0.01)
y = true_y + noise

A sklearn user might estimate the regression between `y` and `x` like this:

In [None]:
reg = LinearRegression(fit_intercept=True)

reg.fit(x, y)

new_x = np.array([[3]])
reg.predict(new_x)

The `adaptive_latents` code looks similar. In general, `adaptive_latents` functions are inspired by the sklearn API.
(This is also why they use `partial_fit` instead of `fit`, because that's what sklearn uses for online algorithms.)

In [None]:
reg = VanillaOnlineRegressor()

reg.partial_fit(x, stream=0)
reg.partial_fit(y, stream=1)

new_x = np.array([[3]])
reg.predict(new_x)


The main difference is that there are two calls to `fit` functions.
Why make this choice? This allows us to regress between $x$ and $y$ variables that aren't sampled at the same time.
This is what the `stream` arguments are for; they tell the regression when the variable it got passed is an $x$ or a $y$ observation.
By default, the transformer assumes stream `0` should be treated as $x$ observations and stream `1` should be treated as $y$ observations, but it's configurable:

In [None]:
reg = VanillaOnlineRegressor(input_streams={5:'X', 700:'Y'})
# data in stream 5 should be treated as 'X' observations
# data in stream 700 should be treated as 'Y' observations

reg.partial_fit(x, stream=5)
reg.partial_fit(y, stream=700)

new_x = np.array([[3]])
reg.predict(new_x)

You can use any hashable object as a stream, but the convention is to use small positive integers (and 0). 
Unless they are configured otherwise, most estimators will expect $x$ inputs from the `0` stream.

Breaking up the fit function also allows us to fit "out-of-order":

In [None]:
reg = VanillaOnlineRegressor()

reg.partial_fit(x, stream=0)
reg.partial_fit(x, stream=0)
time.sleep(.25)
reg.partial_fit(y, stream=1)

new_x = np.array([[3]])
reg.predict(new_x)
reg.transform(new_x, stream=0)


Note I used `transform` instead of `predict` above.
While some `adaptive_latents` transformers have `predict` functions, the main API uses `transform` or `partial_fit_transform`.

The final point I'll introduce here is that estimators ignore inputs that aren't in a stream they care about.

In [None]:
reg = VanillaOnlineRegressor()

for data, stream in [
    (y, 1),
    (np.zeros([5,5,5,5]), 5),
    (None, 6),
    ('data', 'stream'),
    (lambda x: x, 100),
    (type(type(type)), 1211),
    (adaptive_latents, -3),
    (x, 0),
]:
    reg.partial_fit(data=data, stream=stream)


new_x = np.array([[3]])
reg.transform(new_x)

## Chaining Estimators
Individual estimators can be useful, but they're more useful in groups.
Let's assume we wanted to center our $x$ variable before the regression.
In sklearn, we might do something like this:

In [None]:
centerer = StandardScaler()
reg = LinearRegression()

centered_x = centerer.fit_transform(x)
reg.fit(centered_x, y)

This is how you might do something similar in `adaptive_latents`:

In [None]:
centerer = CenteringTransformer()
reg = VanillaOnlineRegressor()

centered_x = centerer.partial_fit_transform(x, stream=0)
reg.partial_fit(centered_x, stream=0)
reg.partial_fit(y, stream=1)

For bigger groups of estimators, sklearn uses `Pipelines`:

In [None]:
sk_pipeline = sklearn.pipeline.Pipeline([
    ('scaler', StandardScaler()),
    ('reg', LinearRegression())
])
sk_pipeline.fit(x,y)

And `adaptive_latents` does too (although its own implementation).

In [None]:
al_pipeline = adaptive_latents.Pipeline([
    CenteringTransformer(),
    VanillaOnlineRegressor()
])

al_pipeline.partial_fit(x, stream=0)
al_pipeline.partial_fit(y, stream=1)

At its core, an `adaptive_latents` pipeline is mostly a wrapper for calling its member steps in order:

In [None]:
def partial_fit_transform(al_pipeline, data, stream=0, return_output_stream=False):
    for step in al_pipeline.steps:
        data, stream = step.partial_fit_transform(data, stream, return_output_stream=True)
    return (data, stream) if return_output_stream else data


This works because each estimator ignores streams it doesn't care about.
In the regression case, the centering estimator ignores the `1` stream.

If you're familiar with piping data in the command line, this may seem familiar; it's partially inspired by POSIX-style pipe commands.
Also like command-line piping, we can redirect streams.
This is accomplished with the `output_streams` argument to transformers.
The exact semantics of output streams are up to the transformer, but usually they redirect to an output stream based on the input stream data appeared in.


In [None]:
p = Pipeline([], output_streams={0:1}) # an empty pipeline is basically a NOOP transformer

data, stream = None, 0
print(stream)

data, stream = p.partial_fit_transform(data, stream, return_output_stream=True)
print(stream)

In [None]:
p = Pipeline([], output_streams={0:1, 1:2, 2:0})
# inputs from stream 0 get redirected to stream 1, 
# inputs from stream 1 get redirected to stream 2,
# inputs from stream 2 get redirected back to stream 0

data, stream = None, 0
print(stream)
for _ in range(3):
    data, stream = p.partial_fit_transform(data, stream, return_output_stream=True)
    print(stream)



Using pipelines of Transformers, you can construct complex analyses without using special classes or custom control flow.
Any computation that can be represented by a directed acyclic graph can be computed in a Pipeline, and for cases beyond that, you can still use individual estimators.

This is a pipeline I've used to create real figures.


In [None]:
import functools
from adaptive_latents import Bubblewrap, Pipeline, sjPCA, KernelSmoother, Concatenator, proSVD

bw = functools.partial(
    Bubblewrap,
    num=100,
    M=500,
    lam=1e-3,
    nu=1e-3,
    eps=1e-4,
    step=1e-2,
    num_grad_q=1,
    sigma_orig_adjustment=100,
    check_consistent_dt=False,
)

pipeline = Pipeline([
    CenteringTransformer(init_size=100),
    KernelSmoother(tau=.5),
    Concatenator(input_streams={0: 0, 1: 1}, output_streams={0: 0, 1: 0, 'skip': -1}),
    proSVD(k=6),
    sjPCA(),
    bw(input_streams={0: 'X', 3: 'dt'}), # 'dt' means to predict ahead the inputted amount
    VanillaOnlineRegressor(input_streams={0: 'X', 2: 'Y', 3: 'qX'})
])
