# tspace

> Time sequence data pipleline framework </br>for deep reinforcement learning

In [None]:
#| hide
from tspace.core import *

# Overview


tspace is an data pipleline framework for deep reinforcement learning  with IO interface, processing and configuration. The current code base depicts an automotive implementation. The main features are:

- working both training and inferrence mode, supporting
  - coordinated [ETL](https://en.wikipedia.org/wiki/Extract,_transform,_load) and ML pipelines,
  - online and offline training,
  - local and distributed training;
- multiple models of
  - reinforcement learning models with DDPG and 
  - time sequence processing with recurrent models;
- data pipelines compatible to both ETL and ML dataflow with
  - support of multiple data sources (local CAN or remote cloud object storage),
  - stateful time sequence processing with sequential model and
  - support of both NoSQL database, local and cloud data storage

In [None]:
#| hide

draft = '''
Markdown

tspace is an data pipleline framework for deep reinforcement learning  with IO interface, processing and configuration. The main features are:

- Working in training and inferrence mode
  - logging and monitoring with cutelog or TUI interface
  - cascaded threading pool for well-structured Scheduling of [ETL](https://en.wikipedia.org/wiki/Extract,_transform,_load) and ML pipelines
  - Customized Exception handling
  - Graceful shutdown
  - online and offline training
  - local and distributed training
- Support for multiple models
  - reinforcement learning models with DDPG 
  - time sequence models with LSTM and Transformer
- Data pipeline compatible to both ETL and ML dataflow 
  - Support for multiple data sources (local CAN or remote cloud object storage)
  - Support both NoSQL database and local or cloud data storage through Dask with Parquet and Avro interface
  - Full Pandas DataFrame support with raw json codecs
  - Configuration system for vehicles, drivers, data sites, neural network hyperparameters, database, HMI types, etc
  - Timezone aware time sequence data processing
  - Data object meta-info processing and storage linked to configuration system
  - Stateful time sequence processing with sequential model
  - Type hint for data processing and configuration
  - Pydantic integration
'''


<img src="res/tspace_overview.svg" alt="Overview of tspace architecture" width="80%">

The diagram shows the basic architure of tspace. The main components are:

- **`Avatar`**: orchestrates the whole ETL and ML workflow 
  - It configure KvaserCAN, RemoteCAN, Cruncher, Agent, Model, Database, Pipeline.
  - It also manages the scheduling of the workflow threads.
  - It select the either **KvaserCAN** or **RemoteCAN** as the vehicle interface for reading the observation and applying the action.
- **KvaserCAN** is implemented with `Kvaser` which provides 
  - a local interface for reading the observation (CAN messages of vehicle states) via Kvaser using `udp_context` to get CAN messages as json data from a local udp server. Then it encodes the raw json data into a [pandas.DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) for forwarding through the data pipeline to `Cruncher`.
  - It provides a local interface for applying the action (flashing parameters) onto the vehicle ECU (VCU). Before sending the action, it decodes the action from the [pandas.DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) into packed string buffer and then sends it to the ECU by calling `send_float_array` from `VehicleInterface.consume`.
  - The control messages for training HMI go through the same UDP port. They are used to modify the threading events to control the episodic training process with `VehicleInterface.hmi_control`. 
- **RemoteCAN** provides a remote interface to the vehicle via the object storage system on the cloud sent by the onboard TBox. It's implemented with `Cloud`: 
  - It reads the observation (CAN messages of vehicle states) from the cloud object storage system through `RemoteCanClient.get_signals`. It then encodes the raw json data into a [pandas.DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) and forward it to `Cruncher` through the data pipeline.
  - It sends the action (flashing parameters) to the vehicle ECU (VCU) in the shared `VehicleInterface.consume` by calling `RemoteCanClient.send_torque_map`, which decodes the action from the [pandas.DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) into raw json string.
  - It selects the training HMI to get the vehicle and driver information as configuration with `Cloud.hmi_capture_from_udp` for local udp server, with `Cloud.hmi_capture_from_rmq` for remote RocketMQ server, with `Cloud.hmi_capture_from_dummy` for pure inference mode without training or updating models. It shares the same control logic `VehicleInterface.hmi_control` with **KvaserCAN**.
- **Cruncher**:
  - The `Cruncher.filter` reveives the observation through the data pipeline from **KvaserCAN** or **RemoteCAN**. It pre-processes the input data into the quadruple with a timestamp $(timestamp, state, action, reward, state')$ and give it to the reinforcement agent `DPG` for inferring an optimal action determined by its current policy. After getting the prediction of the agent, it encodes the prediction result into an action object and forwards it to `VehicleInterface.consume` to be flashed onto VCU. 
  - It collects the critic, actor loss, the total reward for each episode, the running reward and the action at the end of the episode. It also saves the model checkpoint and the training log to the database.  
- **Agent** provides a wrapper for the reinforcement learning model with `DPG`:
  - It initializes the actor and critic networks with the `Actor` and `Critic` classes. It also initializes the target actor and target critic networks with the `Actor` and `Critic` classes. 
  - It trains the actor and critic networks with the `train` method. It also updates the target actor and target critic networks with the `update_target` method. 
  - It predicts the action with the `predict` method. It also predicts the target action with the `predict_target` method.
- **Model**:
- **Database**:
- **Pipeline**:
- **Config**:
- **Sched** 


## TODO

1. Add time sequence embedding database support with LanceDB for TimeGPT
2. Batch mode for large scale inference and training with Unit of Work pattern

# How to use

## Install

```sh
pip install tspace
```

In [None]:
#| hide
import nbdev 
nbdev.nbdev_export()